-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Migrate Gemma3ImagePixelInputs to TensorSchema #21676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates Gemma3ImagePixelInputs
to use TensorSchema
for improved input validation, which is a great enhancement for robustness and clarity. The changes align well with similar patterns in the codebase. I've identified one potential issue where a missing None
check for num_crops
could lead to a runtime error, and I've provided a suggestion to address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation had an isinstance
check that would implicitly fail if num_crops
was None
. With the removal of this check, if pixel_values
is provided but num_crops
is not, num_crops
will be None
, leading to a crash inside flatten_bn
with a potentially unclear error message. It's safer to add an explicit assertion before this line to ensure num_crops
is provided whenever pixel_values
is present.
assert num_crops is not None, \
"'num_crops' must be provided when 'pixel_values' is present."
image_size = self.config.vision_config.image_size
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Signed-off-by: Benji Beck <[email protected]>
Purpose
This PR migrates Gemma3ImagePixelInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
Test Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result