Skip to content

Conversation

gopikrishnajha
Copy link
Contributor

Creating new and updated PR for KVCrush as I was having a tough time resolving merge conflicts on the existing PR (#2211). Please consider this as the official PR and ignore the old one.

  • I have addressed ALL the comments apart from a few for which I have added explanation in the old PR.
  • Documentation and accuracy evaluation on LongBench is added here.
  • KV cache budget is in terms of blocks now, not tokens.
  • For all the comments in the older PR where I have clarifications to make, I have added them as my comment, and have marked others as resolved (after making changes here.)

@github-actions github-actions bot added category: continuous batching Continuous batching category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files category: GH Pages Docs Github Pages documentation labels Jul 31, 2025
@vshampor vshampor self-requested a review August 6, 2025 09:41
Copy link
Contributor

@vshampor vshampor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the CI checks green and we'll merge this. You will probably have to update your local version of pybind11-stubgen and regenerate the .pyi file (part of the build process anyway) to resolve some of the CI issues

@vshampor vshampor added this pull request to the merge queue Aug 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Aug 8, 2025
Copy link
Contributor

@l-bat l-bat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to slow down the merge, but the test should be fixed in the following PR.

assert avg_optimization_ratio >= test_struct.avg_cache_usage_optimization_ratio


@pytest.mark.nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.nightly
@pytest.mark.precommit

@Copilot Copilot AI review requested due to automatic review settings August 12, 2025 09:20
@github-actions github-actions bot added the category: GGUF GGUF file reader label Aug 12, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the KVCrush method for cache eviction, an enhancement to the existing H2O/SnapKV cache eviction algorithms. KVCrush selects representative blocks from the evictable cache area using clustering analysis rather than simply evicting low-score blocks.

Key changes include:

  • Implementation of the KVCrush algorithm with configurable anchor point modes (RANDOM, ZEROS, ONES, MEAN, ALTERNATE)
  • Integration of KVCrush configuration into the existing CacheEvictionConfig system
  • Comprehensive test coverage including unit tests and performance evaluation on LongBench datasets

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tools/continuous_batching/benchmark/continuous_batching_benchmark.cpp Updates benchmark configuration to include new KVCrush parameters
tests/python_tests/test_kv_cache_eviction.py Adds KVCrush vs SnapKV baseline comparison tests and new test configurations
tests/cpp/kvcrush.cpp Comprehensive unit tests for KVCrush algorithm components
tests/cpp/cache_eviction.cpp Updates existing cache eviction tests to support KVCrush configuration
src/python/py_continuous_batching_pipeline.cpp Python bindings for KVCrush configuration classes
src/python/openvino_genai/py_openvino_genai.pyi Type hints for new KVCrush Python API
src/python/openvino_genai/__init__.pyi Export declarations for KVCrush classes
src/python/openvino_genai/__init__.py Import statements for KVCrush classes
src/cpp/src/continuous_batching/kvcrush.hpp Header file defining KVCrush algorithm interface
src/cpp/src/continuous_batching/kvcrush.cpp Core KVCrush algorithm implementation
src/cpp/src/continuous_batching/cache_eviction.hpp Integration of KVCrush into cache eviction system
src/cpp/src/continuous_batching/cache_eviction.cpp Implementation of KVCrush integration logic
src/cpp/include/openvino/genai/cache_eviction.hpp Public API definitions for KVCrush configuration
site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md Documentation and performance evaluation results

@MaximProshin MaximProshin added this to the 2025.3 milestone Aug 12, 2025
@gopikrishnajha gopikrishnajha force-pushed the kvcrush_updated branch 7 times, most recently from c7d21f2 to 16dceec Compare August 13, 2025 07:52
@gopikrishnajha gopikrishnajha force-pushed the kvcrush_updated branch 2 times, most recently from 4add1b6 to 41ff328 Compare August 13, 2025 09:48
@Wovchena Wovchena enabled auto-merge August 13, 2025 20:45
@Wovchena Wovchena added this pull request to the merge queue Aug 14, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 14, 2025
@gopikrishnajha gopikrishnajha added this pull request to the merge queue Aug 14, 2025
Merged via the queue into master with commit 8aa1243 Aug 14, 2025
127 of 132 checks passed
@Wovchena Wovchena deleted the kvcrush_updated branch August 15, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: Python API Python API for GenAI Code Freeze no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants