[V1][TPU] TPU multimodal model support #13496

mgoin · 2025-02-18T21:16:09Z

Now that #13049 has landed, this is an updated version of #12133

Currently only focused on usability and correctness for Llava-style multimodal models, not performance.

When using a multimodal model, we will pre-compile the prefills using the inputs_embeds input rather than input_ids. We will still use input_ids for decode in this iteration, but this will change with the addition of proper chunked prefill.

This does not deal with pre-compiling the encoder forward pass, so in the event that the model is passed in image/video/audio that is a new shape, it will force compilation during runtime.

Tested Examples

Image

Llava ✅

VLLM_USE_V1=1 python examples/offline_inference/vision_language.py -m llava
Processed prompts: 100%|██████████████████████████████████| 4/4 [00:40<00:00, 10.05s/it, est. speed input: 59.29 toks/s, output: 6.37 toks/s]
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree

Audio

Qwen2 Audio ✅

VLLM_USE_V1=1 python examples/offline_inference/audio_language.py -m qwen2_audio
Processed prompts: 100%|█████████████████████████████████| 1/1 [00:10<00:00, 10.11s/it, est. speed input: 43.11 toks/s, output: 4.75 toks/s]
The recited text in the audio is: 'First words I spoke in the original coronavirus a little feat of practical poetry Mary had a little lamb its fleece was white as snow and everywhere that Mary went the lamb was sure to go.'

Signed-off-by: Michael Goin <[email protected]>

github-actions · 2025-02-18T21:16:29Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Michael Goin <[email protected]>

mergify · 2025-02-25T19:24:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Michael Goin <[email protected]>

Multimodal model support for V1 TPU

cf47b78

Signed-off-by: Michael Goin <[email protected]>

mergify bot added the v1 label Feb 18, 2025

Format

2d796aa

Signed-off-by: Michael Goin <[email protected]>

mgoin added the tpu Related to Google TPUs label Feb 18, 2025

Updates to profile with inputs_embeds

a2f602e

Signed-off-by: Michael Goin <[email protected]>

mgoin changed the title ~~Multimodal model support for V1 TPU~~ [V1][TPU] TPU multimodal model support Feb 20, 2025

mgoin marked this pull request as ready for review February 25, 2025 19:23

mgoin requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners February 25, 2025 19:23

mergify bot added the needs-rebase label Feb 25, 2025

Merge branch 'main' into tpu_v1_multimodal

f2b504a

Signed-off-by: Michael Goin <[email protected]>

mergify bot removed the needs-rebase label Feb 25, 2025

mgoin mentioned this pull request Mar 3, 2025

[V1][TPU] TPU multimodal model support for ragged attention #14158

Merged

mgoin closed this Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1][TPU] TPU multimodal model support #13496

[V1][TPU] TPU multimodal model support #13496

Uh oh!

mgoin commented Feb 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

mergify bot commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

[V1][TPU] TPU multimodal model support #13496

[V1][TPU] TPU multimodal model support #13496

Uh oh!

Conversation

mgoin commented Feb 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tested Examples

Image

Llava ✅

Audio

Qwen2 Audio ✅

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

mergify bot commented Feb 25, 2025

Uh oh!

Uh oh!

mgoin commented Feb 18, 2025 •

edited by github-actions bot

Loading