-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[Bugfix] Fix auto dtype casting for BatchFeature #19316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Isotr0py <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Isotr0py, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here with a summary of this pull request. This PR addresses a bug related to automatic dtype casting when processing inputs using Hugging Face processors, specifically when the processor returns a BatchFeature
object. Previously, the casting logic might not have correctly applied the desired dtype to the tensors contained within the BatchFeature
. This change modifies the casting process to ensure that when a BatchFeature
is returned, the dtype casting is applied directly to the internal data (.data
) of the BatchFeature
, fixing the issue where tensors inside the BatchFeature
were not being cast correctly.
Highlights
- Bugfix: BatchFeature dtype casting: Fixes an issue where the automatic dtype casting for inputs processed by Hugging Face processors did not correctly handle cases where the output was a
BatchFeature
, leading to tensors within theBatchFeature
retaining their original dtype instead of being cast to the model's dtype. - Refined input processing logic: Modifies the
maybe_cast_dtype
function invllm/inputs/registry.py
to specifically target the.data
attribute ofBatchFeature
objects for dtype casting, ensuring that the tensors inside are correctly handled.
Changelog
- vllm/inputs/registry.py
- Modified the
maybe_cast_dtype
function to check if the processor output is aBatchFeature
(line 171). - If the output is a
BatchFeature
, thejson_map_leaves
casting function is now applied tooutput.data
instead ofoutput
(line 172). - The original casting logic for non-
BatchFeature
outputs is now handled after theif
block (line 175).
- Modified the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the bug regarding auto dtype casting for BatchFeature
objects. The core issue, as I understand it, was that json_map_leaves
did not correctly recurse into BatchFeature
instances because isinstance(BatchFeature_instance, dict)
evaluates to False
(as BatchFeature
inherits from UserDict
but not dict
directly). This meant maybe_cast_dtype
was applied to the BatchFeature
object itself, resulting in no casting of its internal tensors.
The fix correctly targets this by explicitly passing output.data
(which is a dict
) to json_map_leaves
when output
is a BatchFeature
. This ensures that the tensors within the BatchFeature
are properly cast. For other types of output
, the existing logic for json_map_leaves
is maintained.
The change is clear, well-targeted, and improves the robustness of the dtype casting mechanism. Well done!
Merge Readiness
The changes in this pull request appear to be correct and effectively address the described bug. The code is clear and the fix is well-targeted. Based on my review, the PR seems ready for merging. However, as an AI, I am not authorized to approve pull requests; please ensure it undergoes further review and approval by the maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Hmm it looks like we are now running into a similar issue as #18862 (comment) |
Hmmm, the stuck test can pass on my side locally with this PR:
Let's see if merge from main branch can fix this... |
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
I reproduced the hanging issue on 34a5713 locally, and seems the deadlock is caused by tensor dtype conversion exactly:
Lines 162 to 166 in 34a5713
|
I assume something interacts badly with the forking but not really sure how to fix it either. |
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
with set_default_torch_num_threads(1): | ||
engine = AsyncLLM.from_engine_args(engine_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that disable openmp by setting torch_num_threads=1
during engine forking can fix the deadlock issue locally. Let's see what's the CI going on then.
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Essential Elements of an Effective PR Description Checklist
Purpose
RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
#19219BatchFeature
is a UserDict, andif isinstance(value, dict)
will return False, causing its data not casted correctly injson_map_leaves
.Test Plan
Test Result