[Gemma3n] Fix audio batching #24052

NickLucche · 2025-09-01T15:31:32Z

Fix #24006 by enabling proper batched audio_tower inference.
This is done by simply padding the sequences to the max seq len.

Thanks to @pratapyash for reporting the bug!

Test with

# vllm serve google/gemma-3n-E2B-it 
(vllm) ➜  vllm git:(gemma3n-fix-batch) ✗ python examples/online_serving/openai_chat_completion_client_for_multimodal.py -c multi-audio

Chat completion output from input audio: No, they are not the same. The first audio is of a sports broadcast announcing a baseball game. The second audio appears to be a recitation of an ancient poem, possibly related to mythology or religion, in Italian.

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-09-01T15:32:42Z

cc @DarkLight1337

DarkLight1337 · 2025-09-01T15:34:02Z

vllm/model_executor/models/gemma3n_mm.py

-                    input_features_mask=MultiModalFieldConfig.batched("audio"))
+        return dict(
+            pixel_values=MultiModalFieldConfig.batched("image"),
+            input_features=MultiModalFieldConfig.batched("audio"),


Do we still need input_features in that case?

I definitely want to review that once I enable that processor test that required a hf transformer bump.
For now there's no big overhead at runtime 'cause the unpadded it's just a view.

gemini-code-assist

Code Review

This pull request fixes an issue with batched audio inference for Gemma3n models by padding audio sequences. The core logic involves introducing a padded version of input_features for batched processing by the audio tower, while keeping an unpadded version for caching. The changes are generally good, but I've identified a critical issue with a .squeeze(1) call that will likely cause a crash, and a high-severity issue with an incorrect type hint.

gemini-code-assist · 2025-09-01T15:34:31Z

vllm/model_executor/models/gemma3n_mm.py

        assert self.audio_tower is not None
-        input_features = audio_input["input_features"].squeeze(1)
+        # Run on padded features to enable batching
+        input_features = audio_input["input_features_padded"].squeeze(1)


The use of .squeeze(1) here is likely incorrect and will cause a runtime error. input_features_padded is expected to have a shape of (batch_size, seq_length, num_features). Calling .squeeze(1) will only succeed if seq_length is 1, which is not generally the case for audio features. This seems to be a pre-existing issue, but since this line is modified, it's important to address it. The .squeeze(1) should probably be removed.

Suggested change

input_features = audio_input["input_features_padded"].squeeze(1)

input_features = audio_input["input_features_padded"]

vllm/model_executor/models/gemma3n_mm.py

pratapyash · 2025-09-01T15:38:58Z

Thanks for the fix @NickLucche !

Signed-off-by: NickLucche <[email protected]>

DarkLight1337

LGTM as long as tests still pass

NickLucche · 2025-09-02T14:17:09Z

@DarkLight1337 looks green

Signed-off-by: NickLucche <[email protected]> Signed-off-by: 子悬 <[email protected]>

Signed-off-by: NickLucche <[email protected]>

NickLucche added 2 commits September 1, 2025 15:27

fix audio batching

9c04edc

Signed-off-by: NickLucche <[email protected]>

multiaudio example

8257339

Signed-off-by: NickLucche <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Sep 1, 2025

DarkLight1337 reviewed Sep 1, 2025

View reviewed changes

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

NickLucche added 3 commits September 1, 2025 16:16

formatting

6d92354

Signed-off-by: NickLucche <[email protected]>

good gemini

2ba0957

Signed-off-by: NickLucche <[email protected]>

isort

a7b6a48

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 approved these changes Sep 1, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 1, 2025

Merge branch 'main' into gemma3n-fix-batch

bdd154b

DarkLight1337 merged commit 0a74e9d into vllm-project:main Sep 2, 2025
42 checks passed

akaihaoshuai pushed a commit to akaihaoshuai/vllm that referenced this pull request Sep 3, 2025

[Gemma3n] Fix audio batching (vllm-project#24052)

577d2aa

Signed-off-by: NickLucche <[email protected]> Signed-off-by: 子悬 <[email protected]>

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Gemma3n] Fix audio batching (vllm-project#24052)

3b0e808

Signed-off-by: NickLucche <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Gemma3n] Fix audio batching (vllm-project#24052)

7569d7e

Signed-off-by: NickLucche <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Gemma3n] Fix audio batching #24052

[Gemma3n] Fix audio batching #24052

Uh oh!

NickLucche commented Sep 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

NickLucche commented Sep 1, 2025

Uh oh!

DarkLight1337 Sep 1, 2025

Uh oh!

NickLucche Sep 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 1, 2025

Uh oh!

Uh oh!

pratapyash commented Sep 1, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

NickLucche commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

	input_features = audio_input["input_features_padded"].squeeze(1)
	input_features = audio_input["input_features_padded"]

Uh oh!

[Gemma3n] Fix audio batching #24052

[Gemma3n] Fix audio batching #24052

Uh oh!

Conversation

NickLucche commented Sep 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test with

Uh oh!

NickLucche commented Sep 1, 2025

Uh oh!

DarkLight1337 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pratapyash commented Sep 1, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

NickLucche commented Sep 1, 2025 •

edited by github-actions bot

Loading