Fix loading of quantized BigCode models #22463

eldarkurtic · 2025-08-07T15:51:17Z

Loading of GPTBigCodeForCausalLM is broken due to the missing prefixes and incorrect treating of scales for QKV layers (i.e. c_attn.weight_scale, which are merged into a single matrix). Before this PR quantized models would produce garbage output.
After this PR quantized models are matching the BF16 baseline output.

github-actions · 2025-08-07T15:51:24Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request aims to fix the loading of quantized BigCode models by adding necessary prefixes to layer names and correcting the handling of QKV layer scales. While the prefix additions seem correct, the change in how c_attn.weight_scale is loaded appears to be problematic and may lead to incorrect behavior or errors. The default weight loading logic for fused QKV layers seems to only update the scale for the 'query' component, leaving 'key' and 'value' scales at their default values, which is likely incorrect.

vllm/model_executor/models/gpt_bigcode.py

Signed-off-by: Eldar Kurtic <[email protected]>

mgoin

LGTM, thank you!

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: Eldar Kurtic <[email protected]>

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Eldar Kurtic <[email protected]>

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

vllm/model_executor/models/gpt_bigcode.py Outdated Show resolved Hide resolved

fix loading of quantized BigCode models

2d11f55

Signed-off-by: Eldar Kurtic <[email protected]>

eldarkurtic force-pushed the fix-quant-bigcode branch from 3bbe5de to 2d11f55 Compare August 8, 2025 13:07

mgoin approved these changes Aug 8, 2025

View reviewed changes

mgoin enabled auto-merge (squash) August 8, 2025 13:43

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 8, 2025

vllm-bot merged commit 10a0253 into vllm-project:main Aug 9, 2025
44 of 53 checks passed

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

697ca14

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

c73ad75

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

6a6c9ac

Signed-off-by: Eldar Kurtic <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

be5ca44

Signed-off-by: Eldar Kurtic <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

006e5c7

Signed-off-by: Eldar Kurtic <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Fix loading of quantized BigCode models (vllm-project#22463)

f5cbbaa

Signed-off-by: Eldar Kurtic <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix loading of quantized BigCode models #22463

Fix loading of quantized BigCode models #22463

Uh oh!

eldarkurtic commented Aug 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix loading of quantized BigCode models #22463

Fix loading of quantized BigCode models #22463

Uh oh!

Conversation

eldarkurtic commented Aug 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eldarkurtic commented Aug 7, 2025 •

edited by github-actions bot

Loading