-
Notifications
You must be signed in to change notification settings - Fork 284
Enable VLM lookup. #2707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable VLM lookup. #2707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables VLM (Vision Language Model) lookup functionality by enhancing the continuous batching pipeline to support prompt lookup for embedding-based models. The changes introduce a new interface for providing prompt token IDs when using embedding inputs, enabling better integration between visual language models and prompt lookup decoding.
- Move candidate generation from PromptLookupImpl::step() to ContinuousBatchingImpl::step()
- Add new interface to pass prompt token IDs for embedding input models through get_inputs_embeds_with_token_type_ids method
- Update all VLM model implementations to support the new token_type_ids interface
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/cpp/src/visual_language/qwen2vl/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/qwen2vl/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/phi4mm/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/phi4mm/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/phi3_vision/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/phi3_vision/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/minicpm/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/minicpm/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/llava_next/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/llava_next/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/llava/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/llava/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/internvl_chat/classes.hpp | Add get_inputs_embeds_with_token_type_ids method declaration |
src/cpp/src/visual_language/internvl_chat/classes.cpp | Implement new method and refactor existing get_inputs_embeds to use it |
src/cpp/src/visual_language/inputs_embedder.hpp | Update constructors to accept prompt_lookup parameter and add prompt_lookup support |
src/cpp/src/visual_language/inputs_embedder.cpp | Update has_token_type_ids method and constructors for prompt lookup support |
src/cpp/src/sequence_group.hpp | Add handling for embeddings in remove_last_tokens method |
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp | Add constructor for embedding-based models |
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp | Support token_type_ids and remove candidate generation from step method |
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.hpp | Add constructor for embedding models and make generate_candidates virtual |
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp | Fix loop variable type and add candidate padding logic |
src/cpp/src/continuous_batching/pipeline_impl.hpp | Add virtual generate_candidates method declaration |
src/cpp/src/continuous_batching/pipeline_impl.cpp | Move candidate generation to main pipeline step and add empty default implementation |
src/cpp/src/continuous_batching/pipeline.cpp | Pass prompt_lookup flag to InputsEmbedder constructor |
src/cpp/src/continuous_batching/model_runner.hpp | Add proper token_type_ids tensor existence check |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp
Outdated
Show resolved
Hide resolved
@yangsu2022 , could you please help review firstly? |
1: global variable pass prompts_ids. Signed-off-by: xipingya <[email protected]>
Don't need to add new interface, just reuse "token_type_ids". Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
4c2fe5b
to
bbb9de3
Compare
2: fix match bug, for example: input_ids={2, 3, 1, 1, 2, 3, 4, 5, 6, 9, 2, 3, 1, 2, 3} num_pred_tokens=3 max_ngram_size=3 return candidate: 2,3,1 Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: xipingya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: xipingya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: xipingya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Hi Xiping, it seems that you are using token_type_ids to pass input_ids for PLD. LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add PLD support for Gemma3 VLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you kindly explain why you modified this file?
1: Move
m_pipeline->generate_candidates();
fromContinuousBatchingPipeline::PromptLookupImpl::step()
toContinuousBatchingPipeline::ContinuousBatchingImpl::step()
2: Reuse interface
std::optional<std::vector<ov::Tensor>> token_type_ids
from https://github.com/openvinotoolkit/openvino.genai/pull/2340/files#diff-bb6bf907e40c83f4d6c912e886ccb8cd65a2129a3cd4a7a784612efcc5041cc8R112-R115Tickets: CVS-172889