-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Add glm4.5v tp2,4 fp8 config on H100_80GB #23443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D80713433 |
54eb4a9
to
03e06d5
Compare
Summary: Pull Request resolved: vllm-project#23443 as title, generated with D80713197 Test Plan: Run fused_moe on H100_80GB Rollback Plan: Reviewed By: zzh142857 Differential Revision: D80713433
This pull request was exported from Phabricator. Differential Revision: D80713433 |
03e06d5
to
ab94208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds new FP8 configurations for glm4.5v
on H100_80GB GPUs with tensor parallelism of 2 and 4. It introduces a new environment variable VLLM_USE_FUSED_MOE_KERNEL_IN_COMPRESSED_QUANTIZATION
to control the fused MoE kernel usage. The tensor parallelism logic in glm4_1v.py
is refactored to align with vLLM's standard implementation. My review includes a suggestion to refactor duplicated code in compressed_tensors_moe.py
to improve maintainability.
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]> Signed-off-by: Ekagra Ranjan <[email protected]>
Co-authored-by: Chenxi Yang <[email protected]>
Summary: as title, generated with D80713197
Test Plan:
Run fused_moe on H100_80GB
Rollback Plan:
Reviewed By: zzh142857
Differential Revision: D80713433