-
Notifications
You must be signed in to change notification settings - Fork 7
DBO HT without cudagraph #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sage/dbo-full-cudagraphs
Are you sure you want to change the base?
Conversation
if dbo_enabled(): | ||
if isinstance(prepare_ret, tuple): | ||
hook, receiver = prepare_ret | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this differ form the if not self.prepare_finalize.supports_async():
path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the self.prepare_finalize.prepare()
path, receiver
will be called first and then packed to (a1q, a1q_scale, expert_tokens_meta, _expert_topk_ids, _expert_topk_weights)
. So we don't need to update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess im curious why this needed since I though HT would go through the:
if self.shared_experts is not None:
shared_output = self.shared_experts(a1)
(a1q, a1q_scale, expert_tokens_meta, _expert_topk_ids,
_expert_topk_weights) = self.prepare_finalize.prepare(
a1,
a1_scale,
a2_scale,
topk_weights,
topk_ids,
global_num_experts,
expert_map,
apply_router_weight_on_input,
self.fused_experts.quant_config,
)
path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean for HT, supports_async
should be False instead of True?
def supports_async(self) -> bool:
return True
supports_async
equals to True currently will let us go the the branch using self.prepare_finalize.prepare_async(
then we need to be compatible of low latency, because it is returning
return (hook, lambda hook: self._receiver(hook, expert_x, expert_num_tokens,
a1_scale, a1.dtype, quant_config))
compute_sms = total_sms - self.num_sms | ||
assert compute_sms > 0, "compute_sms must be greater than 0" | ||
logger.info("Setting DeepGEMM num_sms to %d for dbo", compute_sms) | ||
dg.set_num_sms(compute_sms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we restrict this to just when the batch is actually running DBO? or will this do that already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do it already? dbo_enabled()
is one of the condition
b01acdd
to
ac8fbb7
Compare
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.