Skip to content

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Feb 18, 2025

Now that #13049 has landed, this is an updated version of #12133

Currently only focused on usability and correctness for Llava-style multimodal models, not performance.

When using a multimodal model, we will pre-compile the prefills using the inputs_embeds input rather than input_ids. We will still use input_ids for decode in this iteration, but this will change with the addition of proper chunked prefill.

This does not deal with pre-compiling the encoder forward pass, so in the event that the model is passed in image/video/audio that is a new shape, it will force compilation during runtime.

Tested Examples

Image

Llava ✅

VLLM_USE_V1=1 python examples/offline_inference/vision_language.py -m llava
Processed prompts: 100%|██████████████████████████████████| 4/4 [00:40<00:00, 10.05s/it, est. speed input: 59.29 toks/s, output: 6.37 toks/s]
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree
The image features a tall tower with a spire, surrounded by a beautiful cherry blossom tree. The tree is filled with pink flowers, creating a picturesque scene. The tower stands tall in the background, with the blossoming tree in the foreground. The combination of the tower and the tree

Audio

Qwen2 Audio ✅

VLLM_USE_V1=1 python examples/offline_inference/audio_language.py -m qwen2_audio
Processed prompts: 100%|█████████████████████████████████| 1/1 [00:10<00:00, 10.11s/it, est. speed input: 43.11 toks/s, output: 4.75 toks/s]
The recited text in the audio is: 'First words I spoke in the original coronavirus a little feat of practical poetry Mary had a little lamb its fleece was white as snow and everywhere that Mary went the lamb was sure to go.'

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Feb 18, 2025
Signed-off-by: Michael Goin <[email protected]>
@mgoin mgoin added the tpu Related to Google TPUs label Feb 18, 2025
@mgoin mgoin changed the title Multimodal model support for V1 TPU [V1][TPU] TPU multimodal model support Feb 20, 2025
@mgoin mgoin marked this pull request as ready for review February 25, 2025 19:23
Copy link

mergify bot commented Feb 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant