-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct #21598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct #21598
Conversation
…Instruct Signed-off-by: 许文卿 <[email protected]>
…Instruct Signed-off-by: 许文卿 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new configuration file with tuned parameters for the fused MoE kernel on NVIDIA H20-3e GPUs for the Qwen3-Coder-480B-A35B-Instruct model. The change is straightforward and the provided benchmark results demonstrate a clear performance improvement. The new configuration file is well-structured and follows the existing format. I don't see any issues with this change. Great work!
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]> Signed-off-by: x22x22 <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…Instruct (vllm-project#21598) Signed-off-by: 许文卿 <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct
Test Plan
python3.12 -m sglang.bench_serving --tokenizer /Models/qwen/Qwen3-Coder-480B-A35B-Instruct --base-url $ENDPOINT --backend vllm --dataset-name random --random-input 4096 --random-output 1024 --max-concurrency 10 --num-prompt 100
Test Result
Result (without Moe config):
Result (with Moe config):
(Optional) Documentation Update