-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Migrate Qwen2 inputs to TensorSchema #23475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates Qwen2 inputs from TypedDict
to TensorSchema
, which enhances runtime shape validation and improves input contract enforcement. The changes are well-aligned with the project's goals for multi-modal models. However, I've identified several instances where the type hints for tensor fields are incorrect. Specifically, they are declared as Union[torch.Tensor, list[torch.Tensor]]
when the values passed at runtime are always torch.Tensor
. Correcting these type hints will improve code clarity, maintainability, and alignment with static analysis tools.
May I ask the purpose of refactoring existing models? Based on your description, I cannot understand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix pre-commit
Sorry for any confusion. We're migrating from TypedDict to TensorSchema to enable shape validations. More details can be found in this RFC. Feel free to lmk if any questions. |
This pull request has merge conflicts that must be resolved before it can be |
Observing failing MM test similar to #23471, #22021:
|
Head branch was pushed to by a user without write access
224f5a1
to
edc9642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just allow unpadded features for this model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks for the reminder. Will investigate further and work on a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observed some differences in dims between the input_features for Qwen2-Audio (ndim=3) and Qwen2.5-Omni (ndim=2). Qwen2-Audio inputs match the shape definition, but Qwen2.5-Omni currently does not. I’ve yet to find a way of resolving the conflict by squeezing/flattening or updating schema annotation. Need more time to root cause the differences. Feel free to let me know if any thoughts or pointers for investigating further.
Original Schema:
class Qwen2AudioFeatureInputs(TypedDict):
type: Literal["audio_features"]
input_features: torch.Tensor
"""Shape: `(num_audios, num_mel_bins, 3000)`"""
feature_attention_mask: torch.Tensor
"""Shape: `(num_audios, 3000)`"""
# Qwen2-Audio
python -m pytest tests/models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Qwen2AudioForConditionalGeneration-Qwen/Qwen2-Audio-7B-Instruct]
> /home/bbeckca/vllm/vllm/model_executor/models/qwen2_audio.py(369)_parse_and_validate_audio_input()
-> input_features = self._validate_and_reshape_mm_tensor(
(Pdb) [x.shape for x in input_features]
[torch.Size([1, 128, 3000]), torch.Size([1, 128, 3000]), torch.Size([1, 128, 3000])]
(Pdb) [x.shape for x in feature_attention_mask]
[torch.Size([1, 3000]), torch.Size([1, 3000]), torch.Size([1, 3000])]
(Pdb) self._validate_and_reshape_mm_tensor(input_features, 'input_features').shape
torch.Size([3, 128, 3000])
(Pdb) self._validate_and_reshape_mm_tensor(feature_attention_mask, 'feature_attention_mask').shape
torch.Size([3, 3000])
# Qwen2.5-Omni
python -m pytest tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-Qwen/Qwen2.5-Omni-3B]
> /home/bbeckca/vllm/vllm/model_executor/models/qwen2_5_omni_thinker.py(547)_parse_and_validate_audio_input()
(Pdb) [x.shape for x in input_audio_features]
[torch.Size([128, 3000]), torch.Size([128, 1500]), torch.Size([128, 750])]
(Pdb) [x.shape for x in feature_attention_mask]
[torch.Size([1, 30000]), torch.Size([1, 30000]), torch.Size([1, 30000])]
(Pdb) self._validate_and_reshape_mm_tensor(input_audio_features, 'input_audio_features', dim=1).shape
torch.Size([128, 5250])
(Pdb) self._validate_and_reshape_mm_tensor(feature_attention_mask, 'feature_attention_mask').shape
torch.Size([3, 30000])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should define a separate shape definition for Qwen2.5-Omni audio inputs then.
Sorry, pushed incorrectly. Feel free to ignore. Will undo those changes. |
Signed-off-by: Benji Beck <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 Added new input for Omni with best effort naming. Feel free to review and share any thoughts.
Should be resolved. Apologies to any reviewers that were tagged unnecessarily. Feel free to let me know if there's anything I should do to fix the mergify labels. |
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Benji Beck <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Benji Beck <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
This PR migrates Qwen2 inputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
More details: #14764 (comment)
Classes Migrated:
qwen2_vl.py
:Qwen2VLImagePixelInputs
Qwen2VLImageEmbeddingInputs
Qwen2VLVideoPixelInputs
Qwen2VLVideoEmbeddingInputs
qwen2_5_vl.py
:Qwen2_5_VLImagePixelInputs
Qwen2_5_VLImageEmbeddingInputs
Qwen2_5_VLVideoPixelInputs
Qwen2_5_VLVideoEmbeddingInputs
qwen2_audio.py
:Qwen2AudioInputs
Test Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result