-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[V1] [P/D] Refactor KV Connector Path #21980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] [P/D] Refactor KV Connector Path #21980
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable refactoring by encapsulating the KV connector lifecycle within a context manager in GPUModelRunner
. This significantly improves code clarity and maintainability. The consolidation of KV-related fields into a single kv_connector_output
in ModelRunnerOutput
and IntermediateTensors
is also a welcome change that enhances readability.
However, a critical issue has been introduced in the TPUModelRunner
. The refactoring was not applied to it, and it now calls methods that have been removed from KVConnectorModelRunnerMixin
, which will cause runtime failures. This needs to be addressed before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey thanks for your work!
I am just wondering why can't we have KVConnector.get_finished
return a KVConnectorOutput? That would make for easier extensibility as we need to move more stuff from workers to executor.
Thanks @NickLucche ! That's a great question. Shaping the output returned from the connector into a general structure is still a work in progress. To avoid locking in a premature design, I believe it’s best to construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sdavidbd, this looks great to me.
I also feel it would make sense to return KVConnectorOutput
from get_finished()
, but since there's not yet agreement on that we could get this merged asap and handle that as a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sdavidbd! We can continue discussion in follow-on PRs
I think calling "additional API" brings complexity of keeping atomicity? If you get Regarding "third-party impl.", if this is referring the implementation inside vllm/ repo, I guess we just change all of them altogether at once? Never the less, I'm ok merging this PR. At least in follow-up PR, we don't need to touch |
We have to fix tests
|
There are more failing tests in V1 tests, please fix them. The rest should be fixed if you merge from main |
5d3ceee
to
373df31
Compare
@DarkLight1337 Failed checks appear to be caused by known issues unrelated to this PR:
|
… connector path Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]>
373df31
to
25f2873
Compare
Thanks, @lk-chen! A significant part of this PR is the introduction of the KV connector context manager, which manages the connector lifecycle over a single model execution (i.e., a scheduling step). This provides a natural boundary for atomicity - between Regarding third-party implementations, I was referring to out-of-tree connectors that are dynamically loaded via |
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]> Signed-off-by: Noam Gat <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]>
Signed-off-by: David Ben-David <[email protected]> Co-authored-by: David Ben-David <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
This PR refactors the KV connector integration in
GPUModelRunner.execute_model
by introducing a context manager that encapsulates the lifecycle of the KV Connector. This clarifies the execution flow and improves modularity.Additionally, this PR simplifies
IntermediateTensors
andModelRunnerOutput
by consolidating multiple ad-hoc KV-related fields into a singlekv_connector_output
field of typeKVConnectorOutput
, which improves readability and maintainability.Test Plan
Run all existing tests.
Test Result
All tests pass.
(Optional) Documentation Update