Release 2025.3.0.0 · openvinotoolkit/openvino.genai

What's Changed

Bump product version 2025.3 by @akladiev in #2255
Implement SnapKV by @vshampor in #2067
[WWB] Additional processing of native phi4mm by @nikita-savelyevv in #2276
Update ov genai version in samples by @as-suvorov in #2275
use chat templates in vlm by @eaidova in #2279
Fix 'Unsupported property' fails if set prompt_lookup to False by @sbalandi in #2240
Force the PA implementation in the llm-bench by default by @sbalandi in #2271
Update Whisper README.md as "--disable-stateful" is no longer required to export models for NPU by @luke-lin-vmc in #2249
Removed 'slices' from EncodedImage by @popovaan in #2258
support text embeddings in llm_bench by @eaidova in #2269
[wwb]: load transformers model first, then only trust_remote_code by @eaidova in #2270
[GHA] Coverity pipeline fixes by @mryzhov in #2283
[GHA][DEV] Fixed coverity path creation by @mryzhov in #2285
[GHA][DEV]Save coveity tool to cache by @mryzhov in #2286
[GHA][DEV] Set cache key for coverity tool by @mryzhov in #2288
Image generation multiconcurrency (#2190) by @dkalinowski in #2284
[GGUF] Support GGUF format for tokenizers and detokenizers by @rkazants in #2263
Unskip whisper tests & update optimum-intel by @as-suvorov in #2247
Update README.md with text-to-speech by @rkazants in #2294
add new chat template for qwen3 by @eaidova in #2297
[DOCS] Correct cmd-line for TTS conversion by @rkazants in #2303
[GHA] Enabled product manifest.yml by @mryzhov in #2281
Bump the npm_and_yarn group across 3 directories with 2 updates by @dependabot[bot] in #2309
[GHA] Save artifacts to cloud share by @akladiev in #1943
[GHA][[COVERITY] added manual trigger by @mryzhov in https://github.com//pull/2289
[GHA] Fix missing condition for Extract Artifacts step by @akladiev in #2313
[llm bench] Turn off PA backend for VLM by @sbalandi in #2312
[llm_bench] Add setting of max_num_batched_tokens for SchedulerConfig by @sbalandi in #2316
[GHA] Fix missing condition for LLM & VLM test by @sammysun0711 in #2326
[Test] Skip gguf test on MacOS due to sporadic failure by @sammysun0711 in #2328
[GGUF] support Qwen3 architecture by @TianmengChen in #2273
[llm_bench] Increase max_num_batched_tokens to the largest positive integer by @sbalandi in #2327
Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 by @dependabot[bot] in #2310
Bump actions/download-artifact from 4.1.9 to 4.3.0 by @dependabot[bot] in #2315
Fix system_message forwarding by @Wovchena in #2325
Disabled crop of the prompt for minicpmv. by @andreyanufr in #2320
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum by @sbalandi in #2332
Bump brace-expansion from 2.0.1 to 2.0.2 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2338
Fix multinomial sampling for PromptLookupDecoding by @sbalandi in #2331
[llm bench] Avoid using not supported by beam_search parameters for beam_search case by @sbalandi in #2336
Update Export Requirements by @apaniukov in #2342
[GGUF] Serialize Generated OV Model for Faster LLMPipeline Init by @sammysun0711 in #2218
Fixed system message in chat mode. by @popovaan in #2343
Bump librosa from 0.10.2.post1 to 0.11.0 in /samples by @dependabot[bot] in #2346
[Test][GGUF] Add DeepSeek-R1-Distill-Qwen GGUF in CI by @sammysun0711 in #2329
[llm_bench] Remove default scheduler config by @sbalandi in #2341
master: add Phi-4-multimodal-instruct by @Wovchena in #2264
Fix paths with unicode for tokenizers by @yatarkan in #2337
[WWB] Add try-except block for processor loading by @nikita-savelyevv in #2352
[WWB] Bring back eager attention implementation by default by @nikita-savelyevv in #2353
fix supported models link in TTS samples by @eaidova in #2300
StaticLLMPipeline: Add tests on caching by @smirnov-alexey in #1905
[WWB] Fix loading the tokenizer for VLMs by @l-bat in #2351
Pass Scheduler Config for VLM Pipeline in WhoWhatBenchmark. by @popovaan in #2318
Remove misspelled CMAKE_CURRENT_SOUCE_DIR by @Wovchena in #2362
Increase timeout for LLM & VLM by @Wovchena in #2359
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum for VLM by @sbalandi in #2361
Support multi images for vlm benchmarking in samples and llm_bench by @wgzintel in #2197
CB: Hetero pipeline parallel support by @WeldonWangwang in #2227
Update conversion instructions by @adrianboguszewski in #2287
Merge stderr from failed samples by @Wovchena in #2156
Revert cache folder by @Wovchena in #2372
Update README in Node.js API by @almilosz in #2374
[Docs] Rework home page by @yatarkan in #2368
Align PromptLookupDecoding with greedy when dynamic_split_fuse works by @sbalandi in #2360
Support to collect latency for transformers V4.52.0 by @wgzintel in #2373
Bump diffusers from 0.33.1 to 0.34.0 in /samples by @dependabot[bot] in #2381
Bump diffusers from 0.33.1 to 0.34.0 in /tests/python_tests by @dependabot[bot] in #2380
Structured Output generation with XGrammar by @pavel-esir in #2295
Disable XGrammar on Android by @apaniukov in #2389
[wwb] Take prompts from different categories for def dataset for VLM by @sbalandi in #2349
Fix for cloning NPU Image Generation pipelines (#2376) by @dkalinowski in #2393
Set add_special_tokens=false for image tags in MiniCPM. by @popovaan in #2404
Fix missing use cases for inpainting models and defining use case with relative path by @sbalandi in #2387
temporary skip failing whisper tests by @pavel-esir in #2396
Fix test_vlm_npu_no_exception by @AlexanderKalistratov in #2388
Bump timm from 1.0.15 to 1.0.16 by @dependabot[bot] in #2390
Optimize VisionEncoderQwen2VL::encode by @usstq in #2205
Update calculation of TPOT for cases with non equitable batch size by @sbalandi in #2377
Bump pillow from 11.2.1 to 11.3.0 in /samples in the pip group by @dependabot[bot] in #2399
Bump aquasecurity/trivy-action from 0.31.0 to 0.32.0 by @dependabot[bot] in #2405
Fixed vLLM failure on NPU. by @intelgaoxiong in #2407
Fix debug build for XGrammar on Windows by @pavel-esir in #2413
Add extended perf metrics for speculative decoding by @sbalandi in #2321
[JS] Implement TextEmbeddingPipeline in JS API by @Retribution98 in #2385
Remove nightly test label by @Wovchena in #2408
Remove NPU from lora by @Wovchena in #2418
Add performance metrics for StructuredOutput generation by @pavel-esir in #2398
Bump timm from 1.0.16 to 1.0.17 by @dependabot[bot] in #2420
Install torchcodec by @Wovchena in #2422
[CI] [GHA] Add other Python versions to build wheels and Python API by @akashchi in #2384
[CI][GHA] Increase Python (Cacheopt E2E) timeout for manylinux_2_28 by @sammysun0711 in #2437
Add attention mask for decoder whisper model by @eshiryae in #2018
[Docs] Align supported models, remove SUPPORTED_MODELS.md by @yatarkan in #2435
Implement sparse attention prefill by @vshampor in #2299
Add C++ sample for Structured output generation by @pavel-esir in #2423
Increase timeout by @Wovchena in #2443
[llm_bench] Fix typos for 1st token info and names of memory monitoring phase by @sbalandi in #2444
Cache get_pil_image_by_link() by @Wovchena in #2445
LoRA lm_head and embed_tokens constants support by @likholat in #2395
Fixed compitibility OV version by @mryzhov in #2442
Increase whisper timeout by @Wovchena in #2448
Temporarily disable failed ImageGeneration and LoRA tests by @likholat in #2450
Fix SDL tests by @sammysun0711 in #2451
benchmark_vlm: Fixed vLLM failure on NPU by @JeevakaPrabu in #2446
Add Structural Tags with XGrammar Backend by @apaniukov in #2411
[Docs] Resolve dependabot issues by @yatarkan in #2447
Bump the npm_and_yarn group across 1 directory with 2 updates by @dependabot[bot] in #2452
[JS] Fix error when LLMPipeline.stream() is broken. by @Retribution98 in #2454
[TTS] Correct perf metrics for second or so generation by @rkazants in #2463
Revert "Optimize VisionEncoderQwen2VL::encode (#2205)" by @wgzintel in #2468
[DOCS] Fix README.md for text generation by @rkazants in #2469
Update pybind11 to 3.0.0 by @Wovchena in #2476
LoRA for text2image by @andreyanufr in #2440
Fix pybind warning by @vshampor in #2456
add lock mutex to StructuredOutputController::get_times() by @pavel-esir in #2459
[llm_bench] Remove unused processor for genAI vlm pipeline by @sbalandi in #2460
Skip NPU failing tests by @Wovchena in #2473
[llm_bench] Add apply_chat_template for LLM by @sbalandi in #2475
Fix Coverity by @Wovchena in #2458
[GGUF] Revert GGUF WA for GPU by @sammysun0711 in #2392
Bump timm from 1.0.17 to 1.0.19 by @dependabot[bot] in #2482
Suppress C4146 warning for xgrammar by @olpipi in #2478
[VLM] Fix disabling apply_chat_template in generation config by @yatarkan in #2484
[CVS-170914] Disable GHA pipelines in merge queue by @akladiev in #2485
image_generation: fix argument order by @Wovchena in #2462
[WIN] Set binaries details by @mryzhov in #2433
Run finish_chat() before start_chat() in VLM pipeline. by @popovaan in #2470
Increase VLM timeout by @Wovchena in #2472
Support non string chat templates by @yatarkan in #2426
Dont run SDL in merge queue by @Wovchena in #2487
Add C API for WhisperPipeline by @BrandonWeng in #2414
Test optimized generation longbench trust_remote_code=true by @as-suvorov in #2492
xfail image generation-samples by @Wovchena in #2493
[GHA][CI] Separate GGUF Reader Test by @sammysun0711 in #2488
wwb: no chat template for VLMs by @Wovchena in #2494
Add text rerank pipeline by @as-suvorov in #2436
Add TextEmbedinPipeline documentation by @as-suvorov in #2474
Add TTS to llm_bench by @sbalandi in #2438
Fix Generated tokens in sd debug metrics by @sbalandi in #2498
[llm_bench] Add possibility to find stablelm model by model_type by @sbalandi in #2504
[CI][GHA] Regenerate ov cache in windows by @sammysun0711 in #2509
[llm_bench] Update logging info for stable-diffusion pipeline by @sbalandi in #2517
Added clarification regarding the use of memory consumption mode to help by @sbalandi in #2499
Allow building GenAI with free-threaded Python by @p-wysocki in #2515
Bump langchain-core from 0.3.69 to 0.3.72 in /tests/python_tests by @dependabot[bot] in #2510
Log why optimum-cli fails by @Wovchena in #2511
[CI][GHA]Remove nightly tag in test by @sammysun0711 in #2506
[CI][GHA]Fix RAG sample pytest import error by @sammysun0711 in #2505
[GPU][QWen2-VL][QWen2.5-VL] improve SDPA performance with cu_seqlens and cu_window_seqlens by @ceciliapeng2011 in #2330
Enable back NPU import tests by @smirnov-alexey in #2503
Override load time in wrapper by @Wovchena in #2527
Add stderr check for retry_request by @as-suvorov in #2528
LoRA bug fixes by @likholat in #2525
Enable image generation tests by @likholat in #2531
WWB Readme fix by @likholat in #2536
Update tokenizers by @Wovchena in #2524
Creating image generation pipeline type from another type fix by @dkalinowski in #2530
[NPU] Fix for whisper fp8 models by @eshiryae in #2539
Switch to SDPA Qwen2VL and Qwen2.5VL by @popovaan in #2534
Freeze dependencies by @Wovchena in #2535
Log wwb errors by @Wovchena in #2546
update xgrammar and openvino_tokenizers by @pavel-esir in #2465
Bump actions/download-artifact from 4 to 5 by @dependabot[bot] in #2543
Add long prompts to wwb for LLMPipeline by @sbalandi in #2547
Update optimum-intel by @Wovchena in #2556
Add link to BUILD.md by @Wovchena in #2559
retry pip install by @Wovchena in #2560
xfail image-inpainting by @Wovchena in #2562
Utilize XAttention during prefill by @vshampor in #2489
Add TextReranker documentation by @as-suvorov in #2500
benchark_genai: fix LLM failure on NPU by @Wovchena in #2542
Set encoding by @Wovchena in #2564
Bump actions/cache from 4.2.3 to 4.2.4 by @dependabot[bot] in #2574
Fix AWQ by @Wovchena in #2577
[OV JS] Implement PerfMetrics for the LLMPipeline by @almilosz in #2545
Bump langchain-core from 0.3.72 to 0.3.74 in /tests/python_tests by @dependabot[bot] in #2567
Add minicpmo by @Wovchena in #2579
Include type into error by @Wovchena in #2580
Fix GGUF labeling by @Wovchena in #2583
Test C++ first by @Wovchena in #2584
[NPU]Support dynamic LoRA by @intelgaoxiong in #2477
optimum-intel 1.25.1 was just released by @Wovchena in #2582
Enable test_sample_lora_text2image by @Wovchena in #2578
LLMPipeline(NPU): Configuration section to fine-tune LM head model by @AsyaPronina in #2317
Log only if there is error by @Wovchena in #2585
Enable minja as chat template engine by @yatarkan in #2439
Extend padding by @pavel-esir in #2512
[NPUW] Disable chunking on NPU for VLM pipeline by @AlexanderKalistratov in #2595
Enable gemma3-4b-it in VLM Pipeline by @yangsu2022 in #2340
[JS] Implement getTokenizer into LLMPipeline by @Retribution98 in #2586
[JS] Upgrade the js package versions to the upcoming releases by @Retribution98 in #2596
Add tests and samples dependencies gha component by @as-suvorov in #2594
Add jinja2 in requirements for chat-template by @isanghao in #2597
[CMAKE] Added an option to disable tests, samples and tools build by @mryzhov in #2561
Bump safetensors from 0.5.3 to 0.6.2 in /samples by @as-suvorov in #2598
Bump pybind11-stubgen from 2.5.4 to 2.5.5 by @as-suvorov in #2599
Add 2nd input to Tokenizers transformation by @pavel-esir in #2457
Bump torchvision from 0.17.2 to 0.23.0+cpu by @dependabot[bot] in #2570
Fix SnapKV scoring by @vshampor in #2591
Fix get_max_new_tokens() by @michalkulakowski in #2416
Bump torchvision from 0.17.2 to 0.23.0+cpu in /tests/python_tests by @dependabot[bot] in #2566
Bump actions/checkout from 4 to 5 by @dependabot[bot] in #2602
preserve properties from allowlist in Tokenizer by @pavel-esir in #2604
KVCrush method for cache eviction [Updated] by @gopikrishnajha in #2523
Bump json5 from 0.12.0 to 0.12.1 by @dependabot[bot] in #2606
Structured Output with Compound Grammar by @apaniukov in #2587
Expose StructuredOutputConfig validation method by @mzegla in #2576
[CMAKE][MERGE] Fix samples installation by @mryzhov in #2650
[release] Upgrade transformers to 4.53.3 by @Wovchena in #2651
[release] downgrade xgrammar version by @pavel-esir in #2664
[release] Fix coverity for release by @as-suvorov in #2661

New Contributors

@adrianboguszewski made their first contribution in #2287
@JeevakaPrabu made their first contribution in #2446
@BrandonWeng made their first contribution in #2414
@ceciliapeng2011 made their first contribution in #2330

Full Changelog: 2025.2.0.0...2025.3.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025.3.0.0

What's Changed

New Contributors

Contributors

Uh oh!