-
-
Notifications
You must be signed in to change notification settings - Fork 10k
[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the configuration handling for Ultravox models to correctly support wrapping multi-modal models like Gemma. The changes introduce a wrapped_model_config
to store the full configuration of the wrapped model, while text_config
now correctly points to the inner text model's configuration. This is a good clarification that should improve robustness. I've found one potential issue with hardcoded trust_remote_code
that could prevent loading of certain custom models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, can you fix the pre-commit errors?
@DarkLight1337 actually, since I'm going to merge main anyway I'm going to fold in an additional fix for supporting quantized models/overriding text_model_id via |
…rride.text_model_id Signed-off-by: Peter Salas <[email protected]>
15caeaf
to
7331c69
Compare
Signed-off-by: Peter Salas <[email protected]>
Signed-off-by: Peter Salas <[email protected]>
Purpose
When Ultravox wraps a multi-modal model (e.g. Gemma) vLLM fails to load because the
UltravoxModel.text_config
is the multi-modal model's config. This changesUltravoxConfig.text_config
to point to the wrapped text config instead. (However, we still instantiate the wrapped multi-modal model in its entirety when usinginit_vllm_registered_model
.)Additionally, support replacing the text model with a quantized variant by overriding
text_model_id
.Test Plan
Confirm that Llama/Gemma/Qwen Ultravox models are able to be loaded in vLLM, and confirm that quantized variants can be loaded as well.
Test Result
The models load.
Confirmed
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.