Skip to content

Conversation

yewentao256
Copy link

@yewentao256 yewentao256 commented Sep 5, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

if dbo_enabled():
if isinstance(prepare_ret, tuple):
hook, receiver = prepare_ret
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this differ form the if not self.prepare_finalize.supports_async(): path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the self.prepare_finalize.prepare() path, receiver will be called first and then packed to (a1q, a1q_scale, expert_tokens_meta, _expert_topk_ids, _expert_topk_weights). So we don't need to update it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess im curious why this needed since I though HT would go through the:

            if self.shared_experts is not None:
                shared_output = self.shared_experts(a1)

            (a1q, a1q_scale, expert_tokens_meta, _expert_topk_ids,
             _expert_topk_weights) = self.prepare_finalize.prepare(
                 a1,
                 a1_scale,
                 a2_scale,
                 topk_weights,
                 topk_ids,
                 global_num_experts,
                 expert_map,
                 apply_router_weight_on_input,
                 self.fused_experts.quant_config,
             )

path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for HT, supports_async should be False instead of True?

    def supports_async(self) -> bool:
        return True

supports_async equals to True currently will let us go the the branch using self.prepare_finalize.prepare_async(

then we need to be compatible of low latency, because it is returning

return (hook, lambda hook: self._receiver(hook, expert_x, expert_num_tokens,
                                      a1_scale, a1.dtype, quant_config))

compute_sms = total_sms - self.num_sms
assert compute_sms > 0, "compute_sms must be greater than 0"
logger.info("Setting DeepGEMM num_sms to %d for dbo", compute_sms)
dg.set_num_sms(compute_sms)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we restrict this to just when the batch is actually running DBO? or will this do that already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do it already? dbo_enabled() is one of the condition

@yewentao256 yewentao256 force-pushed the wye-dbo-full-cudagraph-ht branch from b01acdd to ac8fbb7 Compare September 10, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants