-
Notifications
You must be signed in to change notification settings - Fork 285
KVCrush method for cache eviction [Updated] #2523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
32737fe
to
e19e4ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the CI checks green and we'll merge this. You will probably have to update your local version of pybind11-stubgen
and regenerate the .pyi
file (part of the build process anyway) to resolve some of the CI issues
e19e4ad
to
6c7a5ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to slow down the merge, but the test should be fixed in the following PR.
assert avg_optimization_ratio >= test_struct.avg_cache_usage_optimization_ratio | ||
|
||
|
||
@pytest.mark.nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.nightly | |
@pytest.mark.precommit |
6c7a5ad
to
e17211a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the KVCrush method for cache eviction, an enhancement to the existing H2O/SnapKV cache eviction algorithms. KVCrush selects representative blocks from the evictable cache area using clustering analysis rather than simply evicting low-score blocks.
Key changes include:
- Implementation of the KVCrush algorithm with configurable anchor point modes (RANDOM, ZEROS, ONES, MEAN, ALTERNATE)
- Integration of KVCrush configuration into the existing CacheEvictionConfig system
- Comprehensive test coverage including unit tests and performance evaluation on LongBench datasets
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tools/continuous_batching/benchmark/continuous_batching_benchmark.cpp |
Updates benchmark configuration to include new KVCrush parameters |
tests/python_tests/test_kv_cache_eviction.py |
Adds KVCrush vs SnapKV baseline comparison tests and new test configurations |
tests/cpp/kvcrush.cpp |
Comprehensive unit tests for KVCrush algorithm components |
tests/cpp/cache_eviction.cpp |
Updates existing cache eviction tests to support KVCrush configuration |
src/python/py_continuous_batching_pipeline.cpp |
Python bindings for KVCrush configuration classes |
src/python/openvino_genai/py_openvino_genai.pyi |
Type hints for new KVCrush Python API |
src/python/openvino_genai/__init__.pyi |
Export declarations for KVCrush classes |
src/python/openvino_genai/__init__.py |
Import statements for KVCrush classes |
src/cpp/src/continuous_batching/kvcrush.hpp |
Header file defining KVCrush algorithm interface |
src/cpp/src/continuous_batching/kvcrush.cpp |
Core KVCrush algorithm implementation |
src/cpp/src/continuous_batching/cache_eviction.hpp |
Integration of KVCrush into cache eviction system |
src/cpp/src/continuous_batching/cache_eviction.cpp |
Implementation of KVCrush integration logic |
src/cpp/include/openvino/genai/cache_eviction.hpp |
Public API definitions for KVCrush configuration |
site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md |
Documentation and performance evaluation results |
c7d21f2
to
16dceec
Compare
4add1b6
to
41ff328
Compare
Creating new and updated PR for KVCrush as I was having a tough time resolving merge conflicts on the existing PR (#2211). Please consider this as the official PR and ignore the old one.