-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Migrate Llama4ImagePatchInputs to TensorSchema #22021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates Llama4ImagePatchInputs
to use TensorSchema
for improved input validation. I've identified a critical issue with the aspect_ratios
validation that will cause a runtime error, and a high-severity issue where patches_per_image
is missing its shape validation. The detailed comments provide suggestions for fixing these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aspect_ratios
data is passed as a list[tuple[int, int]]
from _parse_and_validate_image_input
, but TensorSchema
's validation for a 2D shape expects a torch.Tensor
or list[torch.Tensor]
.
When a list
that is not a list[torch.Tensor]
is passed, TensorSchema
treats it as a 1D array, causing a rank mismatch error against the 2D TensorShape("batch_size", "ratio")
.
To fix this, aspect_ratios
should be converted to a tensor before being passed to the Llama4ImagePatchInputs
constructor in _parse_and_validate_image_input
.
patches_per_image = flatten_bn(kwargs.pop("patches_per_image")) | |
aspect_ratios = kwargs.pop("aspect_ratios", None) | |
if aspect_ratios is not None and not isinstance(aspect_ratios, torch.Tensor): | |
aspect_ratios = torch.tensor(aspect_ratios, device=flat_pixel_values.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve input validation, patches_per_image
should have a TensorShape
annotation. Its shape should be ('batch_size',)
. This will ensure its shape is validated at runtime.
patches_per_image: Annotated[torch.Tensor] | |
patches_per_image: Annotated[torch.Tensor, TensorShape("batch_size")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated type annotation to match this suggestion. This will imply a cross field symbol between patches_per_image and aspect_ratios for their first dimension. Please feel free to share any concerns.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
8eb97d9
to
15db729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should still mention what is the content of ratio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Added description under the Dimensions, but lmk if you prefer the original format.
"""
Dimensions:
- batch_size: Batch size
- total_num_chunks: Batch size * number of chunks
- num_channels: Number of channels
- image_size: Size of each image
- ratio: Aspect ratio pair (where each pair is (ratio_h, ratio_w))
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last dimension should have a fixed size of 2 then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It wasn't enforced previously, but seems reasonable. Will update.
e166084
to
ce5dd9f
Compare
@DarkLight1337 @Isotr0py @mgoin For the remaining 15+ models migrating from |
Which models are remaining? For simliar models like Qwen2-VL family, I think we can consolidate them into one PR, because their modifications can be quite similar. |
That sounds good to me. Maybe I can batch the Qwen model family and make individual PRs for the rest? The remaining models seem to be:
|
You can do Pixtral and Voxtral together as well. NemotronVL and Skywork are InternVL-based so you can combine them too. |
@Isotr0py Observing failing MM test for: Based on existing schema, this seems to be an issue with the inputs: Will find time to investigate further, but surfacing in case anything jumps out to you. |
Head branch was pushed to by a user without write access
@DarkLight1337 @Isotr0py Added small fix in Please let me know if this should be handled earlier such as when data is created.
|
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
@DarkLight1337 @Isotr0py I noticed a few models with similar issues #22024, #23471, #23475. Before updating |
It's fine to remove the extra dimension as long as the model can still produce the correct output |
Sounds good, fixed MM test for Llama4 using this approach. Will do same for others. |
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Purpose
This PR migrates Llama4ImagePatchInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
Test Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result