Skip to content

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Sep 5, 2025

1: Move m_pipeline->generate_candidates(); fromContinuousBatchingPipeline::PromptLookupImpl::step() to ContinuousBatchingPipeline::ContinuousBatchingImpl::step()
2: Reuse interface std::optional<std::vector<ov::Tensor>> token_type_ids from https://github.com/openvinotoolkit/openvino.genai/pull/2340/files#diff-bb6bf907e40c83f4d6c912e886ccb8cd65a2129a3cd4a7a784612efcc5041cc8R112-R115

Tickets: CVS-172889

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching no-match-files category: prompt lookup Prompt look-up decoding labels Sep 5, 2025
@xipingyan xipingyan marked this pull request as ready for review September 5, 2025 13:29
@Copilot Copilot AI review requested due to automatic review settings September 5, 2025 13:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables VLM (Vision Language Model) lookup functionality by enhancing the continuous batching pipeline to support prompt lookup for embedding-based models. The changes introduce a new interface for providing prompt token IDs when using embedding inputs, enabling better integration between visual language models and prompt lookup decoding.

  • Move candidate generation from PromptLookupImpl::step() to ContinuousBatchingImpl::step()
  • Add new interface to pass prompt token IDs for embedding input models through get_inputs_embeds_with_token_type_ids method
  • Update all VLM model implementations to support the new token_type_ids interface

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/cpp/src/visual_language/qwen2vl/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/qwen2vl/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/phi4mm/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/phi4mm/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/phi3_vision/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/phi3_vision/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/minicpm/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/minicpm/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/llava_next/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/llava_next/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/llava/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/llava/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/internvl_chat/classes.hpp Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/internvl_chat/classes.cpp Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/inputs_embedder.hpp Update constructors to accept prompt_lookup parameter and add prompt_lookup support
src/cpp/src/visual_language/inputs_embedder.cpp Update has_token_type_ids method and constructors for prompt lookup support
src/cpp/src/sequence_group.hpp Add handling for embeddings in remove_last_tokens method
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp Add constructor for embedding-based models
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp Support token_type_ids and remove candidate generation from step method
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.hpp Add constructor for embedding models and make generate_candidates virtual
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Fix loop variable type and add candidate padding logic
src/cpp/src/continuous_batching/pipeline_impl.hpp Add virtual generate_candidates method declaration
src/cpp/src/continuous_batching/pipeline_impl.cpp Move candidate generation to main pipeline step and add empty default implementation
src/cpp/src/continuous_batching/pipeline.cpp Pass prompt_lookup flag to InputsEmbedder constructor
src/cpp/src/continuous_batching/model_runner.hpp Add proper token_type_ids tensor existence check

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@xipingyan
Copy link
Contributor Author

@yangsu2022 , could you please help review firstly?

1: global variable pass prompts_ids.

Signed-off-by: xipingya <[email protected]>
Don't need to add new interface, just reuse "token_type_ids".

Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
2: fix match bug, for example:
input_ids={2, 3, 1, 1, 2, 3, 4, 5, 6, 9, 2, 3, 1, 2, 3}
num_pred_tokens=3
max_ngram_size=3
return candidate: 2,3,1

Signed-off-by: xipingya <[email protected]>
@xipingyan xipingyan requested a review from Copilot September 11, 2025 05:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: xipingya <[email protected]>
@xipingyan xipingyan requested a review from Copilot September 11, 2025 06:43
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@xipingyan xipingyan requested a review from Copilot September 11, 2025 06:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@xipingyan xipingyan requested a review from Copilot September 11, 2025 07:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yangsu2022
Copy link
Collaborator

Hi Xiping, it seems that you are using token_type_ids to pass input_ids for PLD. LGTM.
Could you add or extend the pytest refer to https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/samples/test_prompt_lookup_decoding_lm.py

Copy link
Collaborator

@yangsu2022 yangsu2022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add PLD support for Gemma3 VLM?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you kindly explain why you modified this file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: prompt lookup Prompt look-up decoding category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants