Skip to content

Conversation

bbeckca
Copy link
Contributor

@bbeckca bbeckca commented Aug 24, 2025

Purpose

This PR migrates Qwen2 inputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.

More details: #14764 (comment)

Classes Migrated:

qwen2_vl.py:

  • Qwen2VLImagePixelInputs
  • Qwen2VLImageEmbeddingInputs
  • Qwen2VLVideoPixelInputs
  • Qwen2VLVideoEmbeddingInputs

qwen2_5_vl.py:

  • Qwen2_5_VLImagePixelInputs
  • Qwen2_5_VLImageEmbeddingInputs
  • Qwen2_5_VLVideoPixelInputs
  • Qwen2_5_VLVideoEmbeddingInputs

qwen2_audio.py:

  • Qwen2AudioInputs

Test Plan

Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.

Test Result

(venv) benjibeck@Benjis-MacBook-Pro vllm % python3 -m pytest tests/utils_/test_tensor_schema.py -v --log-cli-level=DEBUG
========================================================================================================= test session starts =========================================================================================================
platform darwin -- Python 3.9.6, pytest-8.4.1, pluggy-1.6.0 -- /Users/benjibeck/Projects/vllm/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/benjibeck/Projects/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 19 items                                                                                                                                                                                                                    

tests/utils_/test_tensor_schema.py::test_tensor_schema_valid_tensor PASSED                                                                                                                                                      [  5%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_optional_fields PASSED                                                                                                                                                   [ 10%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_constant_dim_failure PASSED                                                                                                                                              [ 15%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_invalid_types_in_list PASSED                                                                                                                                             [ 21%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_rank_mismatch PASSED                                                                                                                                                     [ 26%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_missing_required_field PASSED                                                                                                                                            [ 31%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_symbolic_dim_mismatch PASSED                                                                                                                                             [ 36%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_list_tensor_valid PASSED                                                                                                                                                 [ 42%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_variable_patch_counts_valid PASSED                                                                                                                                       [ 47%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_tuple_tensor_valid PASSED                                                                                                                                                [ 52%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_inconsistent_shapes_in_list PASSED                                                                                                                                       [ 57%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_empty_list PASSED                                                                                                                                                        [ 63%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_validation_disabled_skips_shape_check PASSED                                                                                                                             [ 68%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_valid_resolve_binding_dims PASSED                                                                                                                                   [ 73%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_invalid_resolve_binding_dims PASSED                                                                                                                                 [ 78%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim PASSED                                                                                                                                         [ 84%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim_mismatch_in_length PASSED                                                                                                                      [ 89%]
tests/utils_/test_tensor_schema.py::test_valid_tensor_schema_with_static_last_dim PASSED                                                                                                                                        [ 94%]
tests/utils_/test_tensor_schema.py::test_invalid_tensor_schema_with_static_last_dim PASSED                                                                                                                                      [100%]

@bbeckca bbeckca requested a review from sighingnow as a code owner August 24, 2025 01:06
@mergify mergify bot added the qwen Related to Qwen models label Aug 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates Qwen2 inputs from TypedDict to TensorSchema, which enhances runtime shape validation and improves input contract enforcement. The changes are well-aligned with the project's goals for multi-modal models. However, I've identified several instances where the type hints for tensor fields are incorrect. Specifically, they are declared as Union[torch.Tensor, list[torch.Tensor]] when the values passed at runtime are always torch.Tensor. Correcting these type hints will improve code clarity, maintainability, and alignment with static analysis tools.

@effortprogrammer
Copy link

effortprogrammer commented Aug 24, 2025

May I ask the purpose of refactoring existing models? Based on your description, I cannot understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix pre-commit

@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 24, 2025

May I ask the purpose of refactoring existing models? Based on your description, I cannot understand.

Sorry for any confusion. We're migrating from TypedDict to TensorSchema to enable shape validations. More details can be found in this RFC. Feel free to lmk if any questions.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 25, 2025 02:27
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2025
Copy link

mergify bot commented Aug 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bbeckca.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 25, 2025
@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 25, 2025

Observing failing MM test similar to #23471, #22021:

[2025-08-25T03:37:08Z] (EngineCore_0 pid=10079) ERROR 08-24 20:37:08 [core.py:779]     raise ValueError(f"{field_name} has rank {len(actual_shape)} "
[2025-08-25T03:37:08Z] (EngineCore_0 pid=10079) ERROR 08-24 20:37:08 [core.py:779] ValueError: second_per_grid_ts has rank 2 but expected 1
[2025-08-25T03:37:08Z] (EngineCore_0 pid=10079) DEBUG 08-24 20:37:08 [core.py:736] EngineCore waiting for work.
[2025-08-25T03:37:09Z] FAILED
[2025-08-25T03:37:09Z] models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Qwen2_5OmniModel-Qwen/Qwen2.5-Omni-3B] 

auto-merge was automatically disabled August 31, 2025 17:53

Head branch was pushed to by a user without write access

@mergify mergify bot removed the needs-rebase label Aug 31, 2025
@bbeckca bbeckca force-pushed the qwen2 branch 2 times, most recently from 224f5a1 to edc9642 Compare September 1, 2025 19:09
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just allow unpadded features for this model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks for the reminder. Will investigate further and work on a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observed some differences in dims between the input_features for Qwen2-Audio (ndim=3) and Qwen2.5-Omni (ndim=2). Qwen2-Audio inputs match the shape definition, but Qwen2.5-Omni currently does not. I’ve yet to find a way of resolving the conflict by squeezing/flattening or updating schema annotation. Need more time to root cause the differences. Feel free to let me know if any thoughts or pointers for investigating further.

Original Schema:

class Qwen2AudioFeatureInputs(TypedDict):
    type: Literal["audio_features"]
    input_features: torch.Tensor
    """Shape: `(num_audios, num_mel_bins, 3000)`"""

    feature_attention_mask: torch.Tensor
    """Shape: `(num_audios, 3000)`"""
# Qwen2-Audio
python -m pytest tests/models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Qwen2AudioForConditionalGeneration-Qwen/Qwen2-Audio-7B-Instruct]
> /home/bbeckca/vllm/vllm/model_executor/models/qwen2_audio.py(369)_parse_and_validate_audio_input()
-> input_features = self._validate_and_reshape_mm_tensor(
(Pdb) [x.shape for x in input_features]
[torch.Size([1, 128, 3000]), torch.Size([1, 128, 3000]), torch.Size([1, 128, 3000])]
(Pdb) [x.shape for x in feature_attention_mask]
[torch.Size([1, 3000]), torch.Size([1, 3000]), torch.Size([1, 3000])]
(Pdb) self._validate_and_reshape_mm_tensor(input_features, 'input_features').shape
torch.Size([3, 128, 3000])
(Pdb) self._validate_and_reshape_mm_tensor(feature_attention_mask, 'feature_attention_mask').shape
torch.Size([3, 3000])
# Qwen2.5-Omni
python -m pytest tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-Qwen/Qwen2.5-Omni-3B]
> /home/bbeckca/vllm/vllm/model_executor/models/qwen2_5_omni_thinker.py(547)_parse_and_validate_audio_input()
(Pdb) [x.shape for x in input_audio_features]
[torch.Size([128, 3000]), torch.Size([128, 1500]), torch.Size([128, 750])]
(Pdb) [x.shape for x in feature_attention_mask]
[torch.Size([1, 30000]), torch.Size([1, 30000]), torch.Size([1, 30000])]
(Pdb) self._validate_and_reshape_mm_tensor(input_audio_features, 'input_audio_features', dim=1).shape
torch.Size([128, 5250])
(Pdb) self._validate_and_reshape_mm_tensor(feature_attention_mask, 'feature_attention_mask').shape
torch.Size([3, 30000])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should define a separate shape definition for Qwen2.5-Omni audio inputs then.

@mergify mergify bot added v1 tpu Related to Google TPUs labels Sep 6, 2025
@bbeckca
Copy link
Contributor Author

bbeckca commented Sep 6, 2025

Sorry, pushed incorrectly. Feel free to ignore. Will undo those changes.

bbeckca and others added 4 commits September 6, 2025 17:15
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
@mergify mergify bot removed the tpu Related to Google TPUs label Sep 6, 2025
Comment on lines 83 to 100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 Added new input for Omni with best effort naming. Feel free to review and share any thoughts.

@bbeckca
Copy link
Contributor Author

bbeckca commented Sep 6, 2025

Sorry, pushed incorrectly. Feel free to ignore. Will undo those changes.

Should be resolved. Apologies to any reviewers that were tagged unnecessarily. Feel free to let me know if there's anything I should do to fix the mergify labels.

@vllm-bot vllm-bot merged commit 37a6fa9 into vllm-project:main Sep 7, 2025
39 of 41 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: Benji Beck <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: Benji Beck <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Benji Beck <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants