Skip to content

Conversation

pavel-esir
Copy link
Contributor

No description provided.

@pavel-esir pavel-esir added this to the 2025.3 milestone Aug 13, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the tokenizer implementation to preserve selected properties from a whitelist instead of clearing all properties. The change allows specific OpenVINO hints and configuration options to be passed through to the tokenizer/detokenizer models on CPU.

  • Introduces a filter_properties method that maintains only whitelisted properties
  • Replaces the previous behavior of clearing all properties with selective filtering
  • Updates the comment to reflect the new filtering approach

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added the category: tokenizers Tokenizer class or submodule update label Aug 13, 2025
properties = {};
// Filter properties by leaving only params from the allowlist
filter_properties(properties);
Copy link
Collaborator

@Wovchena Wovchena Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload doesn't have a path so enable_save_ov_model should result in failure. I think you shouldn't filter properties

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't filter*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't filter, and allow all properties to pass to 'CPU' then it fails because cpu plugin

2025-08-12T13:35:28.4948146Z E       Exception from src/inference/src/dev/plugin.cpp:53:
2025-08-12T13:35:28.4948893Z E       Exception from src/plugins/intel_cpu/src/config.cpp:457:
2025-08-12T13:35:28.4949480Z E       NotFound: Unsupported property NPUW_DEVICES by CPU plugin.
...
2025-08-12T13:35:28.4924214Z E           Exception from src/plugins/intel_cpu/src/config.cpp:457:
2025-08-12T13:35:28.4924878Z E           NotFound: Unsupported property prompt_lookup by CPU plugin.
...

and so on. We should leave only properties which core.compile_model(ov_[de]tokenizer, properties) can handle.

I also didn't quite understand bout saved model. At this point regardless of whether i filter or we don't have 'enable_save_ov_model' in properties. Could you please elaborate what did you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no enable_save_ov_model, what property leads to the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least 3 arguments above NPUW_DEVICES, prompt_lookup, draft_model, probably some more

Copy link
Collaborator

@Wovchena Wovchena Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that should do for now. But this is not a full list of valid properties. It will need to be converted to a black list instead of white list in the next PR because it's easier to resolve a problem reported as an error rather then just observe perf degradation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you

@pavel-esir pavel-esir changed the title preserve properties from white list in Tokenizer preserve properties from allowlist in Tokenizer Aug 13, 2025
@Wovchena Wovchena enabled auto-merge August 13, 2025 10:32
@github-actions github-actions bot added the category: GHA CI based on Github actions label Aug 13, 2025
@Wovchena Wovchena added this pull request to the merge queue Aug 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2025
@Wovchena Wovchena added this pull request to the merge queue Aug 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2025
@Wovchena Wovchena added this pull request to the merge queue Aug 13, 2025
github-merge-queue bot pushed a commit that referenced this pull request Aug 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2025
@Wovchena Wovchena merged commit cf847ad into openvinotoolkit:master Aug 13, 2025
129 of 133 checks passed
@Wovchena
Copy link
Collaborator

Ticket 172236

@pavel-esir pavel-esir deleted the preserve_tok_properties branch August 13, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update Code Freeze
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants