[Bugfix] Fix auto dtype casting for BatchFeature #19316

Isotr0py · 2025-06-07T14:52:10Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

Purpose

FIX [Bug]: deepseek-vl2 RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #19219
BatchFeature is a UserDict, and if isinstance(value, dict) will return False, causing its data not casted correctly in json_map_leaves.

Test Plan

python examples/offline_inference/vision_language.py -m deepseek_vl_v2

Test Result

Signed-off-by: Isotr0py <[email protected]>

github-actions · 2025-06-07T14:52:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @Isotr0py, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here with a summary of this pull request. This PR addresses a bug related to automatic dtype casting when processing inputs using Hugging Face processors, specifically when the processor returns a BatchFeature object. Previously, the casting logic might not have correctly applied the desired dtype to the tensors contained within the BatchFeature. This change modifies the casting process to ensure that when a BatchFeature is returned, the dtype casting is applied directly to the internal data (.data) of the BatchFeature, fixing the issue where tensors inside the BatchFeature were not being cast correctly.

Highlights

Bugfix: BatchFeature dtype casting: Fixes an issue where the automatic dtype casting for inputs processed by Hugging Face processors did not correctly handle cases where the output was a BatchFeature, leading to tensors within the BatchFeature retaining their original dtype instead of being cast to the model's dtype.
Refined input processing logic: Modifies the maybe_cast_dtype function in vllm/inputs/registry.py to specifically target the .data attribute of BatchFeature objects for dtype casting, ensuring that the tensors inside are correctly handled.

Changelog

vllm/inputs/registry.py
- Modified the maybe_cast_dtype function to check if the processor output is a BatchFeature (line 171).
- If the output is a BatchFeature, the json_map_leaves casting function is now applied to output.data instead of output (line 172).
- The original casting logic for non-BatchFeature outputs is now handled after the if block (line 175).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the bug regarding auto dtype casting for BatchFeature objects. The core issue, as I understand it, was that json_map_leaves did not correctly recurse into BatchFeature instances because isinstance(BatchFeature_instance, dict) evaluates to False (as BatchFeature inherits from UserDict but not dict directly). This meant maybe_cast_dtype was applied to the BatchFeature object itself, resulting in no casting of its internal tensors.

The fix correctly targets this by explicitly passing output.data (which is a dict) to json_map_leaves when output is a BatchFeature. This ensures that the tensors within the BatchFeature are properly cast. For other types of output, the existing logic for json_map_leaves is maintained.

The change is clear, well-targeted, and improves the robustness of the dtype casting mechanism. Well done!

Merge Readiness

The changes in this pull request appear to be correct and effectively address the described bug. The code is clear and the fix is well-targeted. Based on my review, the PR seems ready for merging. However, as an AI, I am not authorized to approve pull requests; please ensure it undergoes further review and approval by the maintainers.

DarkLight1337

Thanks for fixing!

DarkLight1337 · 2025-06-09T02:30:02Z

Hmm it looks like we are now running into a similar issue as #18862 (comment)

Isotr0py · 2025-06-09T17:26:33Z

Hmmm, the stuck test can pass on my side locally with this PR:

INFO 06-09 17:15:14 [core.py:455] Waiting for init message from front-end.
INFO 06-09 17:15:14 [core.py:70] Initializing a V1 LLM engine (v0.9.1.dev252+gc1c7dbbee) with config: model='Qwen/Qwen2-VL-2B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2-VL-2B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-2B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":false,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-09 17:15:14 [utils.py:2723] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f852ec33ec0>
[W609 17:15:25.239072132 socket.cpp:200] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[W609 17:15:35.245958761 socket.cpp:200] [c10d] The hostname of the client socket cannot be retrieved. err=-3
INFO 06-09 17:15:35 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
Unused or unrecognized kwargs: return_tensors.
WARNING 06-09 17:15:47 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 06-09 17:15:47 [gpu_model_runner.py:1589] Starting to load model Qwen/Qwen2-VL-2B-Instruct...
INFO 06-09 17:15:47 [gpu_model_runner.py:1594] Loading model from scratch...
INFO 06-09 17:15:47 [cuda.py:256] Using FlexAttenion backend on V1 engine.
INFO 06-09 17:15:47 [weight_utils.py:292] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:03<00:03,  3.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.11s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.37s/it]

INFO 06-09 17:15:52 [default_loader.py:272] Loading weights took 4.78 seconds
INFO 06-09 17:15:53 [gpu_model_runner.py:1618] Model loading took 4.1514 GiB and 5.204752 seconds
INFO 06-09 17:15:56 [gpu_model_runner.py:1943] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
INFO 06-09 17:18:26 [kv_cache_utils.py:715] GPU KV cache size: 224,192 tokens
INFO 06-09 17:18:26 [kv_cache_utils.py:719] Maximum concurrency for 32,768 tokens per request: 6.84x
INFO 06-09 17:18:26 [cuda.py:256] Using FlexAttenion backend on V1 engine.
INFO 06-09 17:18:26 [core.py:171] init engine (profile, create kv cache, warmup model) took 152.97 seconds
INFO 06-09 17:18:29 [loggers.py:137] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 14012
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
PASSED

Let's see if merge from main branch can fix this...

Signed-off-by: Isotr0py <[email protected]>

vllm/model_executor/model_loader/utils.py

Isotr0py · 2025-06-11T17:36:15Z

I reproduced the hanging issue on 34a5713 locally, and seems the deadlock is caused by tensor dtype conversion exactly:

DEBUG 06-12 01:29:07 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:29:17 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:29:27 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:29:37 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:29:47 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:29:57 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:07 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:17 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:27 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:37 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:47 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
DEBUG 06-12 01:30:57 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
Fatal Python error: Aborted

Current thread 0x00007f452cc8f600 (most recent call first):
  File "/home/mozf/develop-projects/vllm/vllm/inputs/registry.py", line 165 in maybe_cast_dtype
  File "/home/mozf/develop-projects/vllm/vllm/jsontree.py", line 40 in json_map_leaves
  File "/home/mozf/develop-projects/vllm/vllm/jsontree.py", line 34 in json_map_leaves
  File "/home/mozf/develop-projects/vllm/vllm/inputs/registry.py", line 172 in call_hf_processor
  File "/home/mozf/develop-projects/vllm/vllm/model_executor/models/qwen2_vl.py", line 1019 in _call_hf_processor
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing.py", line 1291 in _apply_hf_processor_text_mm
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing.py", line 1361 in _apply_hf_processor_mm_only
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing.py", line 1400 in _apply_hf_processor_main
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing.py", line 1553 in _cached_apply_hf_processor
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing.py", line 1787 in apply
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/profiling.py", line 169 in _get_dummy_mm_inputs
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/profiling.py", line 256 in get_mm_max_tokens
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/registry.py", line 132 in get_max_tokens_per_item_by_modality
  File "/home/mozf/develop-projects/vllm/vllm/multimodal/registry.py", line 158 in get_max_tokens_per_item_by_nonzero_modality
  File "/home/mozf/develop-projects/vllm/vllm/v1/core/encoder_cache_manager.py", line 125 in _compute_encoder_budget_multimodal
  File "/home/mozf/develop-projects/vllm/vllm/v1/core/encoder_cache_manager.py", line 95 in compute_encoder_budget
  File "/home/mozf/develop-projects/vllm/vllm/v1/worker/gpu_model_runner.py", line 129 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/worker/gpu_worker.py", line 158 in init_device
  File "/home/mozf/develop-projects/vllm/vllm/worker/worker_base.py", line 606 in init_device
  File "/home/mozf/develop-projects/vllm/vllm/utils.py", line 2657 in run_method
  File "/home/mozf/develop-projects/vllm/vllm/executor/uniproc_executor.py", line 57 in collective_rpc
  File "/home/mozf/develop-projects/vllm/vllm/executor/uniproc_executor.py", line 47 in _init_executor
  File "/home/mozf/develop-projects/vllm/vllm/executor/executor_base.py", line 53 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core.py", line 76 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core.py", line 390 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core.py", line 506 in run_engine_core
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/home/mozf/miniconda3/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/home/mozf/develop-projects/vllm/vllm/v1/utils.py", line 265 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core_client.py", line 479 in _init_engines_direct
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core_client.py", line 422 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core_client.py", line 716 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/core_client.py", line 93 in make_async_mp_client
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/async_llm.py", line 124 in __init__
  File "/home/mozf/develop-projects/vllm/vllm/v1/engine/async_llm.py", line 189 in from_engine_args
  File "/home/mozf/develop-projects/vllm/tests/v1/engine/test_async_llm.py", line 110 in test_load
  File "/home/mozf/miniconda3/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/home/mozf/miniconda3/lib/python3.12/asyncio/base_events.py", line 1999 in _run_once
  File "/home/mozf/miniconda3/lib/python3.12/asyncio/base_events.py", line 645 in run_forever
  File "/home/mozf/miniconda3/lib/python3.12/asyncio/base_events.py", line 678 in run_until_complete
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/mozf/develop-projects/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, regex._regex, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box (total: 162)
FAILED

vllm/vllm/inputs/registry.py

Lines 162 to 166 in 34a5713

    
           def maybe_cast_dtype(x): 
        
               # This mimics the behavior of transformers.BatchFeature 
        
               if isinstance(x, torch.Tensor) and x.is_floating_point(): 
        
                   return x.to(dtype=self.model_config.dtype) 
        
               return x

@lgeiger @njhill Any ideas about this? 😢

lgeiger · 2025-06-11T20:22:26Z

I assume something interacts badly with the forking but not really sure how to fix it either.

Signed-off-by: Isotr0py <[email protected]>

gemini-code-assist · 2025-06-12T03:45:18Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Isotr0py · 2025-06-12T03:48:26Z

tests/v1/engine/test_async_llm.py

+        with set_default_torch_num_threads(1):
+            engine = AsyncLLM.from_engine_args(engine_args)


Seems that disable openmp by setting torch_num_threads=1 during engine forking can fix the deadlock issue locally. Let's see what's the CI going on then.

Signed-off-by: Isotr0py <[email protected]>

fix BatchFeature dtype casting

34a5713

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested a review from DarkLight1337 June 7, 2025 14:52

gemini-code-assist bot reviewed Jun 7, 2025

View reviewed changes

DarkLight1337 approved these changes Jun 7, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) June 7, 2025 15:42

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 7, 2025

jeejeelee approved these changes Jun 8, 2025

View reviewed changes

Isotr0py added 3 commits June 10, 2025 01:26

Merge branch 'vllm-project:main' into fix-dtype-cast

cef0fcf

fix deadlock

6da9036

Signed-off-by: Isotr0py <[email protected]>

cleanup import

84ab35d

Signed-off-by: Isotr0py <[email protected]>

mergify bot added the tpu Related to Google TPUs label Jun 11, 2025

Isotr0py commented Jun 11, 2025

View reviewed changes

vllm/model_executor/model_loader/utils.py Show resolved Hide resolved

Merge branch 'vllm-project:main' into fix-dtype-cast

4c0d49a

revert

c7931cc

Signed-off-by: Isotr0py <[email protected]>

Isotr0py disabled auto-merge June 12, 2025 03:37

fix deadlock

c0c1f3e

Signed-off-by: Isotr0py <[email protected]>

Isotr0py removed the tpu Related to Google TPUs label Jun 12, 2025

mergify bot added the v1 label Jun 12, 2025

Isotr0py commented Jun 12, 2025

View reviewed changes

deadlock for remain tests

ff4c09d

Signed-off-by: Isotr0py <[email protected]>

Isotr0py enabled auto-merge (squash) June 12, 2025 06:51

fix other deadlock

9a177f7

Signed-off-by: Isotr0py <[email protected]>

Isotr0py merged commit 2db9044 into vllm-project:main Jun 14, 2025
68 of 69 checks passed

Isotr0py deleted the fix-dtype-cast branch June 14, 2025 15:18

njhill mentioned this pull request Jun 20, 2025

[CI/Build][Bugfix] Fix deadlock on v1 engine test CI #19872

Merged

4 tasks

h-avsha mentioned this pull request Jul 16, 2025

[fix] fix qwen image_embeds input #21049

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix auto dtype casting for BatchFeature #19316

[Bugfix] Fix auto dtype casting for BatchFeature #19316

Isotr0py commented Jun 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 left a comment

Uh oh!

DarkLight1337 commented Jun 9, 2025

Uh oh!

Isotr0py commented Jun 9, 2025

Uh oh!

Uh oh!

Isotr0py commented Jun 11, 2025 •

edited

Loading

Uh oh!

lgeiger commented Jun 11, 2025

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

Isotr0py Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

		with set_default_torch_num_threads(1):
		engine = AsyncLLM.from_engine_args(engine_args)

Uh oh!

[Bugfix] Fix auto dtype casting for BatchFeature #19316

[Bugfix] Fix auto dtype casting for BatchFeature #19316

Conversation

Isotr0py commented Jun 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Merge Readiness

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Jun 9, 2025

Uh oh!

Isotr0py commented Jun 9, 2025

Uh oh!

Uh oh!

Isotr0py commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgeiger commented Jun 11, 2025

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

Isotr0py Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented Jun 7, 2025 •

edited by github-actions bot

Loading

Isotr0py commented Jun 11, 2025 •

edited

Loading