-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[V1] Support DP with Ray #18779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Support DP with Ray #18779
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
6e504a4
to
349fa10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. There are some comments that need to be addressed and some follow ups that are not blockers.
This pull request has merge conflicts that must be resolved before it can be |
4d6f14a
to
c2978bc
Compare
c2978bc
to
141f7a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ruisearch42 this looks pretty clean
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
79e843a
to
7f62b02
Compare
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ruisearch42
Curious why all of the new test dependencies, I thought various other tests already used ray.
|
This PR adds support for DP with Ray, with support for multi-node and API server scale out.
We reuse ZMQ communication mechanism between frontend and engine cores, as in #15977 , and the same API server scale out mechanism, as in #17546
Main differences from those PRs:
Examples
This will run DP=4 on the head node.
This will run DP=4 with DP ranks 0 and 1 on the head node and ranks 2 and 3 on other nodes.
This will run DP=4 with only the API server on the head node and all engines other nodes:
Design
See the following illustration. DP Coordinator is omitted, but is the same as #17546 .