Enable VLM lookup. #2707

xipingyan · 2025-09-05T00:55:28Z

1: Move m_pipeline->generate_candidates(); fromContinuousBatchingPipeline::PromptLookupImpl::step() to ContinuousBatchingPipeline::ContinuousBatchingImpl::step()
2: Reuse interface std::optional<std::vector<ov::Tensor>> token_type_ids from https://github.com/openvinotoolkit/openvino.genai/pull/2340/files#diff-bb6bf907e40c83f4d6c912e886ccb8cd65a2129a3cd4a7a784612efcc5041cc8R112-R115

Tickets: CVS-172889

Copilot

Pull Request Overview

This PR enables VLM (Vision Language Model) lookup functionality by enhancing the continuous batching pipeline to support prompt lookup for embedding-based models. The changes introduce a new interface for providing prompt token IDs when using embedding inputs, enabling better integration between visual language models and prompt lookup decoding.

Move candidate generation from PromptLookupImpl::step() to ContinuousBatchingImpl::step()
Add new interface to pass prompt token IDs for embedding input models through get_inputs_embeds_with_token_type_ids method
Update all VLM model implementations to support the new token_type_ids interface

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/cpp/src/visual_language/qwen2vl/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/qwen2vl/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/phi4mm/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/phi4mm/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/phi3_vision/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/phi3_vision/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/minicpm/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/minicpm/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/llava_next/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/llava_next/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/llava/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/llava/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/internvl_chat/classes.hpp	Add get_inputs_embeds_with_token_type_ids method declaration
src/cpp/src/visual_language/internvl_chat/classes.cpp	Implement new method and refactor existing get_inputs_embeds to use it
src/cpp/src/visual_language/inputs_embedder.hpp	Update constructors to accept prompt_lookup parameter and add prompt_lookup support
src/cpp/src/visual_language/inputs_embedder.cpp	Update has_token_type_ids method and constructors for prompt lookup support
src/cpp/src/sequence_group.hpp	Add handling for embeddings in remove_last_tokens method
src/cpp/src/prompt_lookup/prompt_lookup_impl.hpp	Add constructor for embedding-based models
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp	Support token_type_ids and remove candidate generation from step method
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.hpp	Add constructor for embedding models and make generate_candidates virtual
src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp	Fix loop variable type and add candidate padding logic
src/cpp/src/continuous_batching/pipeline_impl.hpp	Add virtual generate_candidates method declaration
src/cpp/src/continuous_batching/pipeline_impl.cpp	Move candidate generation to main pipeline step and add empty default implementation
src/cpp/src/continuous_batching/pipeline.cpp	Pass prompt_lookup flag to InputsEmbedder constructor
src/cpp/src/continuous_batching/model_runner.hpp	Add proper token_type_ids tensor existence check

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

xipingyan · 2025-09-05T13:36:29Z

@yangsu2022 , could you please help review firstly?

1: global variable pass prompts_ids. Signed-off-by: xipingya <[email protected]>

Don't need to add new interface, just reuse "token_type_ids". Signed-off-by: xipingya <[email protected]>

Signed-off-by: xipingya <[email protected]>

2: fix match bug, for example: input_ids={2, 3, 1, 1, 2, 3, 4, 5, 6, 9, 2, 3, 1, 2, 3} num_pred_tokens=3 max_ngram_size=3 return candidate: 2,3,1 Signed-off-by: xipingya <[email protected]>

Signed-off-by: xipingya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

src/cpp/src/continuous_batching/model_runner.hpp

Signed-off-by: xipingya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

src/cpp/src/continuous_batching/model_runner.hpp

Signed-off-by: xipingya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

Signed-off-by: xipingya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

yangsu2022 · 2025-09-12T02:29:37Z

Hi Xiping, it seems that you are using token_type_ids to pass input_ids for PLD. LGTM.
Could you add or extend the pytest refer to https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/samples/test_prompt_lookup_decoding_lm.py

yangsu2022

Could you also add PLD support for Gemma3 VLM?

yangsu2022 · 2025-09-12T02:32:23Z

src/cpp/src/continuous_batching/model_runner.hpp

Could you kindly explain why you modified this file?

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching no-match-files category: prompt lookup Prompt look-up decoding labels Sep 5, 2025

xipingyan added do_not_merge do_not_review and removed do_not_merge do_not_review labels Sep 5, 2025

xipingyan marked this pull request as ready for review September 5, 2025 13:29

Copilot AI review requested due to automatic review settings September 5, 2025 13:29

Copilot AI reviewed Sep 5, 2025

View reviewed changes

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Show resolved Hide resolved

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Outdated Show resolved Hide resolved

xipingyan requested review from peterchen-intel, wangleis and yangsu2022 September 5, 2025 13:35

xipingyan added 3 commits September 11, 2025 10:12

Draft enable VLM lookup.

db9da07

1: global variable pass prompts_ids. Signed-off-by: xipingya <[email protected]>

Remove global variable pass prompt ids.

4a8901c

Don't need to add new interface, just reuse "token_type_ids". Signed-off-by: xipingya <[email protected]>

Update some comments.

bbb9de3

Signed-off-by: xipingya <[email protected]>

xipingyan force-pushed the xp/enable_vlm_lookup branch from 4c2fe5b to bbb9de3 Compare September 11, 2025 02:12

xipingyan added 2 commits September 11, 2025 12:39

1: fix potential issue: max_ngram_size < input_length;

eec1fa7

2: fix match bug, for example: input_ids={2, 3, 1, 1, 2, 3, 4, 5, 6, 9, 2, 3, 1, 2, 3} num_pred_tokens=3 max_ngram_size=3 return candidate: 2,3,1 Signed-off-by: xipingya <[email protected]>

avoiding potential signed/unsigned comparison issues

e96d490

Signed-off-by: xipingya <[email protected]>

xipingyan requested a review from Copilot September 11, 2025 05:38

Copilot AI reviewed Sep 11, 2025

View reviewed changes

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Outdated Show resolved Hide resolved

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Show resolved Hide resolved

src/cpp/src/continuous_batching/model_runner.hpp Show resolved Hide resolved

move to loop before.

3a7b3a5

Signed-off-by: xipingya <[email protected]>

xipingyan requested a review from Copilot September 11, 2025 06:43

Copilot AI reviewed Sep 11, 2025

View reviewed changes

don't update param variable.

f2fc501

Signed-off-by: xipingya <[email protected]>

xipingyan requested a review from Copilot September 11, 2025 06:56

Copilot AI reviewed Sep 11, 2025

View reviewed changes

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Outdated Show resolved Hide resolved

static_cast for type convert.

5ee3df6

Signed-off-by: xipingya <[email protected]>

xipingyan requested a review from Copilot September 11, 2025 07:03

Copilot AI reviewed Sep 11, 2025

View reviewed changes

peterchen-intel requested a review from songbell September 12, 2025 01:37

Merge branch 'master' into xp/enable_vlm_lookup

432de9b

yangsu2022 requested changes Sep 12, 2025

View reviewed changes

src/cpp/src/continuous_batching/model_runner.hpp

Copy link

Collaborator

yangsu2022 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you kindly explain why you modified this file?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable VLM lookup. #2707

Enable VLM lookup. #2707

xipingyan commented Sep 5, 2025 •

edited by peterchen-intel

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

xipingyan commented Sep 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

yangsu2022 commented Sep 12, 2025

Uh oh!

yangsu2022 left a comment

Uh oh!

yangsu2022 Sep 12, 2025

Uh oh!

Uh oh!

Enable VLM lookup. #2707

Are you sure you want to change the base?

Enable VLM lookup. #2707

Conversation

xipingyan commented Sep 5, 2025 • edited by peterchen-intel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

xipingyan commented Sep 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

yangsu2022 commented Sep 12, 2025

Uh oh!

yangsu2022 left a comment

Choose a reason for hiding this comment

Uh oh!

yangsu2022 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xipingyan commented Sep 5, 2025 •

edited by peterchen-intel

Loading