[Models][Quantization] Add quantization configuration update in Voxtral model #24122

anmarques · 2025-09-02T20:29:37Z

[Model] This PR updates the quant_config for a Voxtral model (if existent) to map mistralai names to match the vLLM model definition.

This implementation fixes the support of models quantized in the compressed-tensors format being loaded with load_format mistralai.

Signed-off-by: Alexandre Marques <[email protected]>

github-actions · 2025-09-02T20:29:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request adds support for updating quantization configurations in the Voxtral model. The changes introduce a new method to remap module names in the quantization config to match vLLM's internal naming scheme. My review found a couple of critical issues in the implementation of this remapping logic: a duplicated regex pattern that would lead to incorrect mappings, and a faulty condition combined with a missing break in a loop that would prevent quantization target lists from being updated and could cause multiple transformations on a single name. I've provided suggestions to fix these issues.

vllm/model_executor/models/voxtral.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Alexandre Marques <[email protected]>

Signed-off-by: Alexandre Marques <[email protected]>

mgoin

Looks good to me, just a few nits

vllm/model_executor/models/llama.py

vllm/model_executor/models/voxtral.py

Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Alexandre Marques <[email protected]>

Signed-off-by: Alexandre Marques <[email protected]>

Add quantization configuration update in Voxtral model

dd3011f

Signed-off-by: Alexandre Marques <[email protected]>

anmarques requested a review from patrickvonplaten as a code owner September 2, 2025 20:29

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

anmarques and others added 4 commits September 2, 2025 16:35

Update vllm/model_executor/models/voxtral.py

fd9bd9c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Alexandre Marques <[email protected]>

Fix remapping logic

b469c89

Signed-off-by: Alexandre Marques <[email protected]>

Merge branch 'main' into voxtral_quantized

62caf71

Rotate quantization scales

3d90abd

Signed-off-by: Alexandre Marques <[email protected]>

mergify bot added the llama Related to Llama models label Sep 2, 2025

robertgshaw2-redhat changed the title ~~Add quantization configuration update in Voxtral model~~ [Models][Quantization] Add quantization configuration update in Voxtral model Sep 2, 2025

anmarques added 2 commits September 5, 2025 13:59

Merge branch 'main' into voxtral_quantized

87c1e73

Merge branch 'main' into voxtral_quantized

33214c6

mgoin approved these changes Sep 10, 2025

View reviewed changes

vllm/model_executor/models/llama.py Show resolved Hide resolved

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

mgoin self-assigned this Sep 10, 2025

anmarques and others added 3 commits September 10, 2025 09:21

Update vllm/model_executor/models/voxtral.py

86ee01b

Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Alexandre Marques <[email protected]>

Add comments for clairy and fix typos

9471a0f

Signed-off-by: Alexandre Marques <[email protected]>

Merge branch 'main' into voxtral_quantized

d7c8b39

mgoin approved these changes Sep 10, 2025

View reviewed changes

mgoin enabled auto-merge (squash) September 10, 2025 21:37

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

Merge branch 'main' into voxtral_quantized

794fc6c

simon-mo merged commit 5931b7e into vllm-project:main Sep 11, 2025
38 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Models][Quantization] Add quantization configuration update in Voxtral model #24122

[Models][Quantization] Add quantization configuration update in Voxtral model #24122

anmarques commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Models][Quantization] Add quantization configuration update in Voxtral model #24122

[Models][Quantization] Add quantization configuration update in Voxtral model #24122

Conversation

anmarques commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anmarques commented Sep 2, 2025 •

edited by github-actions bot

Loading