[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue #13425

tlrmchlsmth · 2025-02-17T21:11:20Z

This PR works around an issue where vLLM V1, using Ada Lovelace GPUs, and when building vLLM with CUDA < 12.4, FP8 models with per-channel and/or per-tensor quantization will produce garbage output.

Closes #13212

AFAICT, there is some issue with the way we are setting up TORCH_DEVICE_IDENTITY

Alternatively, now that torch._scaled_mm supports rowwise scaling we could use that instead of the fallback. Unfortunately this only works if the model's dtype is bf16. Otherwise we get the error:

RuntimeError: Only bf16 high precsion output types are supported for row-wise scaling.

Definitely am not happy about the approach here as is puts the onus on the caller of apply_fp8_linear to also call maybe_create_device_identity before the forward pass.

Signed-off-by: Tyler Michael Smith <[email protected]>

github-actions · 2025-02-17T21:11:32Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin

Certainly agree it is unfortunate, but this is critical to fix now. Nice work and LGTM

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]>

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]>

Work around V1 + CUDA Graph + torch._scaled_mm fallback issue

7e65c70

Signed-off-by: Tyler Michael Smith <[email protected]>

tlrmchlsmth requested review from mgoin and robertgshaw2-redhat as code owners February 17, 2025 21:11

mgoin approved these changes Feb 17, 2025

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working v1 labels Feb 17, 2025

tlrmchlsmth enabled auto-merge (squash) February 18, 2025 00:16

tlrmchlsmth merged commit b3942e1 into vllm-project:main Feb 18, 2025
64 checks passed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallb…

64a1faa

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]>

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallb…

b7b9248

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallb…

b53eebe

…ack issue (vllm-project#13425) Signed-off-by: Tyler Michael Smith <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue #13425

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue #13425

Uh oh!

tlrmchlsmth commented Feb 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue #13425

[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue #13425

Uh oh!

Conversation

tlrmchlsmth commented Feb 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrmchlsmth commented Feb 17, 2025 •

edited by github-actions bot

Loading