You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KVCrush method for cache eviction [Updated] (#2523)
Creating new and updated PR for KVCrush as I was having a tough time
resolving merge conflicts on the existing PR
(#2211). Please
consider this as the official PR and ignore the old one.
- I have addressed ALL the comments apart from a few for which I have
added explanation in the old PR.
- Documentation and accuracy evaluation on LongBench is added
[here](https://github.com/openvinotoolkit/openvino.genai/blob/kvcrush_updated/site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md).
- KV cache budget is in terms of blocks now, not tokens.
- For all the comments in the older PR where I have clarifications to
make, I have added them as my comment, and have marked others as
resolved (after making changes here.)
Co-authored-by: Vladimir Zlobin <[email protected]>
Copy file name to clipboardExpand all lines: site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md
+37Lines changed: 37 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,3 +60,40 @@ It can be enabled by setting the `CacheEvictionConfig.apply_rotation` field to `
60
60
* Cache rotation is only targeted for the regular, linear LLaMa-like RoPE application and may degrade accuracy on models that use other RoPE schemes.
61
61
62
62
* Cache rotation is currently only supported for the models with uniform V embedding sizes across the layers.
63
+
64
+
## (Optional) KVCrush
65
+
66
+
KVCrush enhances the standard H2O/SnapKV eviction by selecting the most representative blocks from the evictable area using clustering analysis, rather than simply evicting the low score blocks.
67
+
68
+
### Algorithm Overview
69
+
70
+
1.**Indicator Creation**: Generate binary indicators for tokens based on importance scores
71
+
2.**Anchor Point Generation**: Create reference patterns using configurable modes
72
+
3.**Distance Calculation**: Measure Hamming distance between block patterns and the anchor point
73
+
4.**Representative Selection**: Select blocks to best represent context diversity
74
+
75
+
### Configuration
76
+
Setup KVCrush config parameters and pass it to ```CacheEvictionConfig```. Sample code to allocate KVCrush a budget of 2 blocks and use MEAN anchor mode is following.
0 commit comments