Skip to content

Conversation

KuntaiDu
Copy link
Collaborator

@KuntaiDu KuntaiDu commented Aug 26, 2025

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector

Checklist at the bottom is considered.

Purpose

This PR aims to support hybrid allocator + kv cache connector code path.

Design doc: link

Related to #23079
Solves #22292

Test Plan

Local correctness test passed. Will further work on instructions to let other people reproduce.

Core test logic:

        first_prompt = "Hello, how are you?" * 5000 + "Hello, my name is"
        second_prompt = [
            "Hello, how are you?" * 1000 + "Tell me a very long story",
        ]
        sampling_params = SamplingParams(temperature=0, top_p=0.95, max_tokens=10)
        print_output(llm, [first_prompt], sampling_params, "first")
        print_output(llm, ["1" + first_prompt], sampling_params, "second")
        print_output(llm, ["2" + first_prompt], sampling_params, "second")

        # Now the first request is evicted. Run this request.
        # It will trigger KV cache loading from LMCache.
        print_output(llm, second_prompt, sampling_params, "third")

Test Result

For the last request:

[2025-08-26 05:36:37,718] LMCache INFO: Reqid: 3, Total tokens 6007, LMCache hit tokens: 5888, need to load: 5888 (vllm_v1_adapter.py:1091:lmcache.integration.vllm.vllm_v1_adapter)
[2025-08-26 05:36:37,820] LMCache INFO: Retrieved 5888 tokens (vllm_v1_adapter.py:822:lmcache.integration.vllm.vllm_v1_adapter)
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.26it/s, est. speed input: 7594.87 toks/s, output: 12.64 toks/s]
--------------------------------------------------
Generated text: '.\n\nOkay, here we go...\n\nOnce'
Generation took 0.80 seconds, third request done.
--------------------------------------------------
[2025-08-26 05:36:44,886] LMCache INFO: Storage manager closed. (storage_manager.py:472:lmcache.v1.storage_backend.storage_manager)
[2025-08-26 05:36:48,332] LMCache INFO: LMCacheEngine closed. (cache_engine.py:965:lmcache.v1.cache_engine)

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: KuntaiDu <[email protected]>
…o GPU memory, the inference results are wrong. Fix this first.

Signed-off-by: KuntaiDu <[email protected]>
Signed-off-by: KuntaiDu <[email protected]>
Copy link

mergify bot commented Aug 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 26, 2025
@mergify mergify bot removed the needs-rebase label Aug 26, 2025
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@heheda12345
Copy link
Collaborator

Will take deeper look later

Copy link

mergify bot commented Sep 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 6, 2025
KuntaiDu and others added 3 commits September 8, 2025 14:10
@KuntaiDu KuntaiDu requested a review from zhuohan123 as a code owner September 8, 2025 21:53
@mergify mergify bot added the tpu Related to Google TPUs label Sep 8, 2025
@mergify mergify bot removed the needs-rebase label Sep 8, 2025
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some random idea to discuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants