Skip to content

Conversation

petersalas
Copy link
Contributor

@petersalas petersalas commented Sep 2, 2025

Purpose

When Ultravox wraps a multi-modal model (e.g. Gemma) vLLM fails to load because the UltravoxModel.text_config is the multi-modal model's config. This changes UltravoxConfig.text_config to point to the wrapped text config instead. (However, we still instantiate the wrapped multi-modal model in its entirety when using init_vllm_registered_model.)

Additionally, support replacing the text model with a quantized variant by overriding text_model_id.

Test Plan

Confirm that Llama/Gemma/Qwen Ultravox models are able to be loaded in vLLM, and confirm that quantized variants can be loaded as well.

vllm serve fixie-ai/ultravox-v0_6-gemma-3-27b --trust-remote-code
vllm serve fixie-ai/ultravox-v0_5-llama-3_1-8b --trust-remote-code --hf-overrides.text_model_id=nvidia/Llama-3.1-8B-Instruct-FP8

Test Result

The models load.

Confirmed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the configuration handling for Ultravox models to correctly support wrapping multi-modal models like Gemma. The changes introduce a wrapped_model_config to store the full configuration of the wrapped model, while text_config now correctly points to the inner text model's configuration. This is a good clarification that should improve robustness. I've found one potential issue with hardcoded trust_remote_code that could prevent loading of certain custom models.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you fix the pre-commit errors?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 9, 2025 03:32
@petersalas
Copy link
Contributor Author

@DarkLight1337 actually, since I'm going to merge main anyway I'm going to fold in an additional fix for supporting quantized models/overriding text_model_id via --hf-overrides -- so please hold off merging for now :)

@petersalas petersalas force-pushed the psalas/inner-text-config branch from 15caeaf to 7331c69 Compare September 10, 2025 19:12
@petersalas petersalas changed the title [Ultravox] Fix gemma instantiation [Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides Sep 10, 2025
Signed-off-by: Peter Salas <[email protected]>
Signed-off-by: Peter Salas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants