Migrate Llama4ImagePatchInputs to TensorSchema #22021

bbeckca · 2025-07-31T15:07:03Z

Purpose

This PR migrates Llama4ImagePatchInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.

Test Plan

Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.

Test Result

(venv) benjibeck@Benjis-MacBook-Pro vllm % python3 -m pytest tests/utils_/test_tensor_schema.py -v --log-cli-level=DEBUG
========================================================================== test session starts ==========================================================================
platform darwin -- Python 3.9.6, pytest-8.4.1, pluggy-1.6.0 -- /Users/benjibeck/Projects/vllm/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/benjibeck/Projects/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 19 items                                                                                                                                                      

tests/utils_/test_tensor_schema.py::test_tensor_schema_valid_tensor PASSED                                                                                        [  5%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_optional_fields PASSED                                                                                     [ 10%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_constant_dim_failure PASSED                                                                                [ 15%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_invalid_types_in_list PASSED                                                                               [ 21%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_rank_mismatch PASSED                                                                                       [ 26%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_missing_required_field PASSED                                                                              [ 31%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_symbolic_dim_mismatch PASSED                                                                               [ 36%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_list_tensor_valid PASSED                                                                                   [ 42%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_variable_patch_counts_valid PASSED                                                                         [ 47%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_tuple_tensor_valid PASSED                                                                                  [ 52%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_inconsistent_shapes_in_list PASSED                                                                         [ 57%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_empty_list PASSED                                                                                          [ 63%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_validation_disabled_skips_shape_check PASSED                                                               [ 68%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_valid_resolve_binding_dims PASSED                                                                     [ 73%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_invalid_resolve_binding_dims PASSED                                                                   [ 78%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim PASSED                                                                           [ 84%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim_mismatch_in_length PASSED                                                        [ 89%]
tests/utils_/test_tensor_schema.py::test_valid_tensor_schema_with_static_last_dim PASSED                                                                          [ 94%]
tests/utils_/test_tensor_schema.py::test_invalid_tensor_schema_with_static_last_dim PASSED                                                                        [100%]

gemini-code-assist

Code Review

This pull request migrates Llama4ImagePatchInputs to use TensorSchema for improved input validation. I've identified a critical issue with the aspect_ratios validation that will cause a runtime error, and a high-severity issue where patches_per_image is missing its shape validation. The detailed comments provide suggestions for fixing these issues.

gemini-code-assist · 2025-07-31T15:08:28Z

vllm/model_executor/models/mllama4.py

The aspect_ratios data is passed as a list[tuple[int, int]] from _parse_and_validate_image_input, but TensorSchema's validation for a 2D shape expects a torch.Tensor or list[torch.Tensor].

When a list that is not a list[torch.Tensor] is passed, TensorSchema treats it as a 1D array, causing a rank mismatch error against the 2D TensorShape("batch_size", "ratio").

To fix this, aspect_ratios should be converted to a tensor before being passed to the Llama4ImagePatchInputs constructor in _parse_and_validate_image_input.

Suggested change

patches_per_image = flatten_bn(kwargs.pop("patches_per_image"))

aspect_ratios = kwargs.pop("aspect_ratios", None)

if aspect_ratios is not None and not isinstance(aspect_ratios, torch.Tensor):

aspect_ratios = torch.tensor(aspect_ratios, device=flat_pixel_values.device)

gemini-code-assist · 2025-07-31T15:08:28Z

vllm/model_executor/models/mllama4.py

To improve input validation, patches_per_image should have a TensorShape annotation. Its shape should be ('batch_size',). This will ensure its shape is validated at runtime.

Suggested change

patches_per_image: Annotated[torch.Tensor]

patches_per_image: Annotated[torch.Tensor, TensorShape("batch_size")]

Updated type annotation to match this suggestion. This will imply a cross field symbol between patches_per_image and aspect_ratios for their first dimension. Please feel free to share any concerns.

github-actions · 2025-07-31T15:12:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-08-20T14:44:22Z

vllm/model_executor/models/mllama4.py

Should still mention what is the content of ratio

Fair. Added description under the Dimensions, but lmk if you prefer the original format.

""" Dimensions: - batch_size: Batch size - total_num_chunks: Batch size * number of chunks - num_channels: Number of channels - image_size: Size of each image - ratio: Aspect ratio pair (where each pair is (ratio_h, ratio_w)) """

The last dimension should have a fixed size of 2 then, right?

Correct. It wasn't enforced previously, but seems reasonable. Will update.

vllm/model_executor/models/mllama4.py

bbeckca · 2025-08-22T15:05:53Z

@DarkLight1337 @Isotr0py @mgoin For the remaining 15+ models migrating from TypedDict to TensorSchema, are there any preferences between creating PRs per model vs. in batches of 3/5 etc?

Isotr0py · 2025-08-22T15:37:15Z

remaining 15+ models

Which models are remaining? For simliar models like Qwen2-VL family, I think we can consolidate them into one PR, because their modifications can be quite similar.

bbeckca · 2025-08-22T15:43:58Z

remaining 15+ models

Which models are remaining? For simliar models like Qwen2-VL family, I think we can consolidate them into one PR, because their modifications can be quite similar.

That sounds good to me. Maybe I can batch the Qwen model family and make individual PRs for the rest?

The remaining models seem to be:

nemotron_vl
paligemma  
phi4mm  
pixtral  
prithvi_geospatial_mae  
qwen2_5_omni_thinker  
qwen2_5_vl  
qwen2_audio  
qwen2_vl  
qwen_vl  
skyworkr1v  
tarsier  
ultravox  
voxtral  
whisper

DarkLight1337 · 2025-08-22T15:54:43Z

You can do Pixtral and Voxtral together as well.

NemotronVL and Skywork are InternVL-based so you can combine them too.

vllm/model_executor/models/mllama4.py

bbeckca · 2025-08-25T14:54:30Z

@Isotr0py Observing failing MM test for:
models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Llama4ForConditionalGeneration-meta-llama/Llama-4-Scout-17B-16E-Instruct] due to [2025-08-25T03:18:07Z] (EngineCore_0 pid=5933) ERROR 08-24 20:18:07 [core.py:779] ValueError: aspect_ratios has rank 3 but expected 2.

Based on existing schema, this seems to be an issue with the inputs:
schema
tensor_schema

Will find time to investigate further, but surfacing in case anything jumps out to you.

bbeckca · 2025-08-27T15:46:45Z

@DarkLight1337 @Isotr0py Added small fix in _parse_and_validate_image_input so aspect_ratios has the shape schema expects. Right now it comes in as [B, 1, 2], but we need [B, 2]. Removing the extra dimension with squeeze(1).

Please let me know if this should be handled earlier such as when data is created.

### Before ###
(Pdb) aspect_ratios
tensor([[[16,  1]],

        [[ 8,  1]],

        [[ 4,  1]]])

### After ###
(Pdb) aspect_ratios
tensor([[16,  1],
        [ 8,  1],
        [ 4,  1]])

Signed-off-by: Benji Beck <[email protected]>

bbeckca · 2025-08-28T14:48:25Z

@DarkLight1337 @Isotr0py Added small fix in _parse_and_validate_image_input so aspect_ratios has the shape schema expects. Right now it comes in as [B, 1, 2], but we need [B, 2]. Removing the extra dimension with squeeze(1).

Please let me know if this should be handled earlier such as when data is created.
### Before ###
(Pdb) aspect_ratios
tensor([[[16,  1]],

        [[ 8,  1]],

        [[ 4,  1]]])

### After ###
(Pdb) aspect_ratios
tensor([[16,  1],
        [ 8,  1],
        [ 4,  1]])

@DarkLight1337 @Isotr0py I noticed a few models with similar issues #22024, #23471, #23475. Before updating _parse_and_validate_, wondering if you have any preferences in how these inputs are handled? For example, should we update the input producer in test_model_tensor_schema instead?

DarkLight1337 · 2025-08-28T14:53:35Z

It's fine to remove the extra dimension as long as the model can still produce the correct output

bbeckca · 2025-08-28T16:21:36Z

It's fine to remove the extra dimension as long as the model can still produce the correct output

Sounds good, fixed MM test for Llama4 using this approach. Will do same for others.

Signed-off-by: Benji Beck <[email protected]>

mergify bot added the llama Related to Llama models label Jul 31, 2025

gemini-code-assist bot reviewed Jul 31, 2025

View reviewed changes

bbeckca force-pushed the mllama4 branch 2 times, most recently from 8eb97d9 to 15db729 Compare August 20, 2025 14:39

bbeckca mentioned this pull request Aug 20, 2025

Migrate MiniCPMOAudioInputs to TensorSchema #21847

Merged

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

vllm/model_executor/models/mllama4.py Outdated Show resolved Hide resolved

bbeckca force-pushed the mllama4 branch 2 times, most recently from e166084 to ce5dd9f Compare August 22, 2025 15:02

bbeckca commented Aug 24, 2025

View reviewed changes

vllm/model_executor/models/mllama4.py Outdated Show resolved Hide resolved

DarkLight1337 enabled auto-merge (squash) August 25, 2025 02:21

DarkLight1337 approved these changes Aug 25, 2025

View reviewed changes

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2025

bbeckca mentioned this pull request Aug 25, 2025

Migrate Qwen2 inputs to TensorSchema #23475

Merged

auto-merge was automatically disabled August 27, 2025 15:43
Head branch was pushed to by a user without write access

bbeckca added 5 commits August 28, 2025 07:41

Migrate Llama4ImagePatchInputs to TensorSchema

77bce40

Signed-off-by: Benji Beck <[email protected]>

Update type annotation and parsing logic

99cfa87

Signed-off-by: Benji Beck <[email protected]>

Update annotation for ratio to constant

d6c2118

Signed-off-by: Benji Beck <[email protected]>

Update annotation for aspect_ratios

a45036b

Signed-off-by: Benji Beck <[email protected]>

Fix rank for aspect_ratios

1e4f658

Signed-off-by: Benji Beck <[email protected]>

bbeckca force-pushed the mllama4 branch from 123b90f to 1e4f658 Compare August 28, 2025 14:42

DarkLight1337 enabled auto-merge (squash) August 28, 2025 16:22

DarkLight1337 merged commit f32a5bc into vllm-project:main Aug 28, 2025
43 checks passed

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Migrate Llama4ImagePatchInputs to TensorSchema (vllm-project#22021)

a5d43ce

Signed-off-by: Benji Beck <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

Migrate Llama4ImagePatchInputs to TensorSchema (vllm-project#22021)

a6bb4cd

Signed-off-by: Benji Beck <[email protected]>

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

Migrate Llama4ImagePatchInputs to TensorSchema (vllm-project#22021)

e5cbc4f

Signed-off-by: Benji Beck <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Migrate Llama4ImagePatchInputs to TensorSchema (vllm-project#22021)

d367324

Signed-off-by: Benji Beck <[email protected]>

-        patches_per_image = flatten_bn(kwargs.pop("patches_per_image"))
+        aspect_ratios = kwargs.pop("aspect_ratios", None)
+        if aspect_ratios is not None and not isinstance(aspect_ratios, torch.Tensor):
+            aspect_ratios = torch.tensor(aspect_ratios, device=flat_pixel_values.device)

	patches_per_image: Annotated[torch.Tensor]
	patches_per_image: Annotated[torch.Tensor, TensorShape("batch_size")]

Uh oh!

Migrate Llama4ImagePatchInputs to TensorSchema #22021

Migrate Llama4ImagePatchInputs to TensorSchema #22021

Uh oh!

Conversation

bbeckca commented Jul 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

DarkLight1337 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

bbeckca Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bbeckca commented Aug 22, 2025

Uh oh!

Isotr0py commented Aug 22, 2025

Uh oh!

bbeckca commented Aug 22, 2025

Uh oh!

DarkLight1337 commented Aug 22, 2025

Uh oh!

Uh oh!

bbeckca commented Aug 25, 2025

Uh oh!

bbeckca commented Aug 27, 2025

Uh oh!

bbeckca commented Aug 28, 2025

Uh oh!

DarkLight1337 commented Aug 28, 2025

Uh oh!

bbeckca commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

bbeckca commented Jul 31, 2025 •

edited by github-actions bot

Loading