-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[WIP] Multimodal model support for V1 TPU #12133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: mgoin <[email protected]>
Signed-off-by: mgoin <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
cc @bvrockwell - FYI |
dea6afd
to
c6f526c
Compare
This pull request has merge conflicts that must be resolved before it can be |
1392a46
to
39c4a4c
Compare
cc @yaochengji could you please take a look? |
Based on and requires #11936
Currently only focused on usability and correctness, not performance.
This does not deal with pre-compiling the encoder forward pass, so in the event that the model is passed in image/video/audio that is a new shape, it will force compilation during runtime.
Tested Examples
Image:
Audio: