-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[v1] torchrun compatibility #13642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] torchrun compatibility #13642
Conversation
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@@ -47,6 +47,9 @@ def test_consistent_across_ranks(obj): | |||
llm.llm_engine.vllm_config.cache_config.num_cpu_blocks) | |||
test_consistent_across_ranks( | |||
llm.llm_engine.vllm_config.cache_config.num_gpu_blocks) | |||
params = list(llm.llm_engine.model_executor.driver_worker.worker.model_runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to test if we can directly access the model wih llm.llm_engine.model_executor.driver_worker.worker.model_runner.model
. it is used in https://github.com/volcengine/verl/blob/0a1b16f800c25ac80504038fd8b8be4282d6c606/verl/workers/sharding_manager/fsdp_vllm.py#L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a comment?
Signed-off-by: youkaichao <[email protected]>
Yes, this should cause deterministic scheduling. Separately, do you think we can switch from an ENV variable to an EngineArg? |
Signed-off-by: youkaichao <[email protected]>
@@ -567,6 +567,10 @@ def init_worker(self, all_kwargs: List[Dict[str, Any]]) -> None: | |||
self.worker = worker_class(**kwargs) | |||
assert self.worker is not None | |||
|
|||
def initialize_from_config(self, kv_cache_configs: List[Any]) -> None: | |||
kv_cache_config = kv_cache_configs[self.rpc_rank] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruisearch42 @comaniac FYI, if a method needs to send different argument to different ranks, the indexing should use self.rpc_rank
, and it should happen in this WorkerWrapperBase
I don't have strong opinion here. |
@@ -151,7 +152,7 @@ def execute_model( | |||
scheduler_output: "SchedulerOutput", | |||
) -> Optional[ModelRunnerOutput]: | |||
output = self.model_runner.execute_model(scheduler_output) | |||
return output if self.rank == 0 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon we need to have a base class for the workers, so that we can reduce this part of duplicate code lol
right now i just change both of them, but we need to do the unification in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Filed #13711 to track the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the PR! Only left minor comments.
@@ -47,6 +47,9 @@ def test_consistent_across_ranks(obj): | |||
llm.llm_engine.vllm_config.cache_config.num_cpu_blocks) | |||
test_consistent_across_ranks( | |||
llm.llm_engine.vllm_config.cache_config.num_gpu_blocks) | |||
params = list(llm.llm_engine.model_executor.driver_worker.worker.model_runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a comment?
@@ -151,7 +152,7 @@ def execute_model( | |||
scheduler_output: "SchedulerOutput", | |||
) -> Optional[ModelRunnerOutput]: | |||
output = self.model_runner.execute_model(scheduler_output) | |||
return output if self.rank == 0 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Filed #13711 to track the issue.
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: youkaichao <[email protected]>
continue of #12071 in v1.
some changes to notice:
VLLM_ENABLE_V1_MULTIPROCESSING
so that the engine lives in the same process as theLLM
class, which is required by RLHF framework https://github.com/volcengine/verl . this also reduces the scheduling non-determinism. (cc @robertgshaw2-redhat to confirm, in this case, can we guarantee that all calls ofllm.generate
will produce the same scheduling decision?)ExecutorWithExternalLauncher
.