[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

petersalas · 2025-09-02T23:42:04Z

Purpose

When Ultravox wraps a multi-modal model (e.g. Gemma) vLLM fails to load because the UltravoxModel.text_config is the multi-modal model's config. This changes UltravoxConfig.text_config to point to the wrapped text config instead. (However, we still instantiate the wrapped multi-modal model in its entirety when using init_vllm_registered_model.)

Additionally, support replacing the text model with a quantized variant by overriding text_model_id.

Test Plan

Confirm that Llama/Gemma/Qwen Ultravox models are able to be loaded in vLLM, and confirm that quantized variants can be loaded as well.

vllm serve fixie-ai/ultravox-v0_6-gemma-3-27b --trust-remote-code
vllm serve fixie-ai/ultravox-v0_5-llama-3_1-8b --trust-remote-code --hf-overrides.text_model_id=nvidia/Llama-3.1-8B-Instruct-FP8

Test Result

The models load.

Confirmed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the configuration handling for Ultravox models to correctly support wrapping multi-modal models like Gemma. The changes introduce a wrapped_model_config to store the full configuration of the wrapped model, while text_config now correctly points to the inner text model's configuration. This is a good clarification that should improve robustness. I've found one potential issue with hardcoded trust_remote_code that could prevent loading of certain custom models.

vllm/transformers_utils/configs/ultravox.py

DarkLight1337

Thanks, can you fix the pre-commit errors?

petersalas · 2025-09-09T22:29:24Z

@DarkLight1337 actually, since I'm going to merge main anyway I'm going to fold in an additional fix for supporting quantized models/overriding text_model_id via --hf-overrides -- so please hold off merging for now :)

…rride.text_model_id Signed-off-by: Peter Salas <[email protected]>

Signed-off-by: Peter Salas <[email protected]>

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

vllm/transformers_utils/configs/ultravox.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Sep 3, 2025

View reviewed changes

petersalas requested a review from DarkLight1337 September 9, 2025 03:04

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

DarkLight1337 approved these changes Sep 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 9, 2025 03:32

DarkLight1337 disabled auto-merge September 10, 2025 00:27

[Ultravox] Fix Gemma instantiation, support quantization via --hf-ove…

7331c69

…rride.text_model_id Signed-off-by: Peter Salas <[email protected]>

petersalas force-pushed the psalas/inner-text-config branch from 15caeaf to 7331c69 Compare September 10, 2025 19:12

petersalas requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 10, 2025 19:12

petersalas changed the title ~~[Ultravox] Fix gemma instantiation~~ [Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides Sep 10, 2025

petersalas added 2 commits September 10, 2025 12:18

Fix types

7738856

Signed-off-by: Peter Salas <[email protected]>

Fix comment

f180d5e

Signed-off-by: Peter Salas <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

petersalas commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

petersalas commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

Are you sure you want to change the base?

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

Conversation

petersalas commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

petersalas commented Sep 9, 2025

Uh oh!

Uh oh!

petersalas commented Sep 2, 2025 •

edited by github-actions bot

Loading