Skip to content

Conversation

bbeckca
Copy link
Contributor

@bbeckca bbeckca commented Jul 31, 2025

Purpose

This PR migrates Llama4ImagePatchInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.

Test Plan

Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.

Test Result

(venv) benjibeck@Benjis-MacBook-Pro vllm % python3 -m pytest tests/utils_/test_tensor_schema.py -v --log-cli-level=DEBUG
========================================================================== test session starts ==========================================================================
platform darwin -- Python 3.9.6, pytest-8.4.1, pluggy-1.6.0 -- /Users/benjibeck/Projects/vllm/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/benjibeck/Projects/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 19 items                                                                                                                                                      

tests/utils_/test_tensor_schema.py::test_tensor_schema_valid_tensor PASSED                                                                                        [  5%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_optional_fields PASSED                                                                                     [ 10%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_constant_dim_failure PASSED                                                                                [ 15%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_invalid_types_in_list PASSED                                                                               [ 21%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_rank_mismatch PASSED                                                                                       [ 26%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_missing_required_field PASSED                                                                              [ 31%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_symbolic_dim_mismatch PASSED                                                                               [ 36%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_list_tensor_valid PASSED                                                                                   [ 42%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_variable_patch_counts_valid PASSED                                                                         [ 47%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_tuple_tensor_valid PASSED                                                                                  [ 52%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_inconsistent_shapes_in_list PASSED                                                                         [ 57%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_empty_list PASSED                                                                                          [ 63%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_validation_disabled_skips_shape_check PASSED                                                               [ 68%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_valid_resolve_binding_dims PASSED                                                                     [ 73%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_invalid_resolve_binding_dims PASSED                                                                   [ 78%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim PASSED                                                                           [ 84%]
tests/utils_/test_tensor_schema.py::test_tensor_schema_with_list_of_symbolic_dim_mismatch_in_length PASSED                                                        [ 89%]
tests/utils_/test_tensor_schema.py::test_valid_tensor_schema_with_static_last_dim PASSED                                                                          [ 94%]
tests/utils_/test_tensor_schema.py::test_invalid_tensor_schema_with_static_last_dim PASSED                                                                        [100%]

@mergify mergify bot added the llama Related to Llama models label Jul 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates Llama4ImagePatchInputs to use TensorSchema for improved input validation. I've identified a critical issue with the aspect_ratios validation that will cause a runtime error, and a high-severity issue where patches_per_image is missing its shape validation. The detailed comments provide suggestions for fixing these issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The aspect_ratios data is passed as a list[tuple[int, int]] from _parse_and_validate_image_input, but TensorSchema's validation for a 2D shape expects a torch.Tensor or list[torch.Tensor].

When a list that is not a list[torch.Tensor] is passed, TensorSchema treats it as a 1D array, causing a rank mismatch error against the 2D TensorShape("batch_size", "ratio").

To fix this, aspect_ratios should be converted to a tensor before being passed to the Llama4ImagePatchInputs constructor in _parse_and_validate_image_input.

Suggested change
patches_per_image = flatten_bn(kwargs.pop("patches_per_image"))
aspect_ratios = kwargs.pop("aspect_ratios", None)
if aspect_ratios is not None and not isinstance(aspect_ratios, torch.Tensor):
aspect_ratios = torch.tensor(aspect_ratios, device=flat_pixel_values.device)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To improve input validation, patches_per_image should have a TensorShape annotation. Its shape should be ('batch_size',). This will ensure its shape is validated at runtime.

Suggested change
patches_per_image: Annotated[torch.Tensor]
patches_per_image: Annotated[torch.Tensor, TensorShape("batch_size")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated type annotation to match this suggestion. This will imply a cross field symbol between patches_per_image and aspect_ratios for their first dimension. Please feel free to share any concerns.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Comment on lines -82 to -84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still mention what is the content of ratio

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Added description under the Dimensions, but lmk if you prefer the original format.

    """
    Dimensions:
        - batch_size: Batch size
        - total_num_chunks: Batch size * number of chunks
        - num_channels: Number of channels
        - image_size: Size of each image
        - ratio: Aspect ratio pair (where each pair is (ratio_h, ratio_w))
    """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last dimension should have a fixed size of 2 then, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It wasn't enforced previously, but seems reasonable. Will update.

@bbeckca bbeckca force-pushed the mllama4 branch 2 times, most recently from e166084 to ce5dd9f Compare August 22, 2025 15:02
@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 22, 2025

@DarkLight1337 @Isotr0py @mgoin For the remaining 15+ models migrating from TypedDict to TensorSchema, are there any preferences between creating PRs per model vs. in batches of 3/5 etc?

@Isotr0py
Copy link
Member

remaining 15+ models

Which models are remaining? For simliar models like Qwen2-VL family, I think we can consolidate them into one PR, because their modifications can be quite similar.

@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 22, 2025

remaining 15+ models

Which models are remaining? For simliar models like Qwen2-VL family, I think we can consolidate them into one PR, because their modifications can be quite similar.

That sounds good to me. Maybe I can batch the Qwen model family and make individual PRs for the rest?

The remaining models seem to be:

nemotron_vl
paligemma  
phi4mm  
pixtral  
prithvi_geospatial_mae  
qwen2_5_omni_thinker  
qwen2_5_vl  
qwen2_audio  
qwen2_vl  
qwen_vl  
skyworkr1v  
tarsier  
ultravox  
voxtral  
whisper 

@DarkLight1337
Copy link
Member

You can do Pixtral and Voxtral together as well.

NemotronVL and Skywork are InternVL-based so you can combine them too.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 25, 2025 02:21
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2025
@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 25, 2025

@Isotr0py Observing failing MM test for:
models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[Llama4ForConditionalGeneration-meta-llama/Llama-4-Scout-17B-16E-Instruct] due to [2025-08-25T03:18:07Z] (EngineCore_0 pid=5933) ERROR 08-24 20:18:07 [core.py:779] ValueError: aspect_ratios has rank 3 but expected 2.

Based on existing schema, this seems to be an issue with the inputs:
schema
tensor_schema

Will find time to investigate further, but surfacing in case anything jumps out to you.

auto-merge was automatically disabled August 27, 2025 15:43

Head branch was pushed to by a user without write access

@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 27, 2025

@DarkLight1337 @Isotr0py Added small fix in _parse_and_validate_image_input so aspect_ratios has the shape schema expects. Right now it comes in as [B, 1, 2], but we need [B, 2]. Removing the extra dimension with squeeze(1).

Please let me know if this should be handled earlier such as when data is created.

### Before ###
(Pdb) aspect_ratios
tensor([[[16,  1]],

        [[ 8,  1]],

        [[ 4,  1]]])

### After ###
(Pdb) aspect_ratios
tensor([[16,  1],
        [ 8,  1],
        [ 4,  1]])

@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 28, 2025

@DarkLight1337 @Isotr0py Added small fix in _parse_and_validate_image_input so aspect_ratios has the shape schema expects. Right now it comes in as [B, 1, 2], but we need [B, 2]. Removing the extra dimension with squeeze(1).

Please let me know if this should be handled earlier such as when data is created.

### Before ###
(Pdb) aspect_ratios
tensor([[[16,  1]],

        [[ 8,  1]],

        [[ 4,  1]]])

### After ###
(Pdb) aspect_ratios
tensor([[16,  1],
        [ 8,  1],
        [ 4,  1]])

@DarkLight1337 @Isotr0py I noticed a few models with similar issues #22024, #23471, #23475. Before updating _parse_and_validate_, wondering if you have any preferences in how these inputs are handled? For example, should we update the input producer in test_model_tensor_schema instead?

@DarkLight1337
Copy link
Member

It's fine to remove the extra dimension as long as the model can still produce the correct output

@bbeckca
Copy link
Contributor Author

bbeckca commented Aug 28, 2025

It's fine to remove the extra dimension as long as the model can still produce the correct output

Sounds good, fixed MM test for Llama4 using this approach. Will do same for others.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 28, 2025 16:22
@DarkLight1337 DarkLight1337 merged commit f32a5bc into vllm-project:main Aug 28, 2025
43 checks passed
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants