-
Notifications
You must be signed in to change notification settings - Fork 45
Fully overlap model execution #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
778161a
to
68609db
Compare
c9f8e34
to
b2a376d
Compare
Half overlapping. There is still one sync point at https://github.com/vllm-project/vllm-gaudi/pull/134/files#diff-5ffdc7547fbc10ff45e9791caaef30c306a59a0e3f7c9515569f342baed8c0e2R116, but I can't find a safe way to remove it. |
Signed-off-by: Tianmu Li <[email protected]> Incorporate commit by Marcin Signed-off-by: Tianmu Li <[email protected]> Pre-commit fix Signed-off-by: Tianmu Li <[email protected]> Remove unneeded change Signed-off-by: Tianmu Li <[email protected]> WIP Signed-off-by: Tianmu Li <[email protected]> pre-commit fix Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
…racy fix WIP Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
b2a376d
to
fa3b2c0
Compare
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
Signed-off-by: Tianmu Li <[email protected]>
/run-gaudi-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had offline discussion with Tianmu, codes looks good.
Perf and profiling will be updated into ticket
Failed on spec decode when async_scheduler is not on, cancel the run |
…oken_ids behavior with non-spec Signed-off-by: Tianmu Li <[email protected]>
…etween spec decode and async Signed-off-by: Tianmu Li <[email protected]>
Fixed the issue by preventing concatenation of decode_sampled_token_ids and prefill_sampled_token_ids when not using async_scheduling. |
/run-gaudi-tests |
/run-gaudi-tests |
/run-gaudi-tests |
Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>
Porting changes from main branch: #134 i #184 Author: Tianmu Li <[email protected]> --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>
Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>
Dependent on vllm-project/vllm#23569