Skip to content

Conversation

minosfuture
Copy link
Contributor

@minosfuture minosfuture commented Aug 26, 2025

Purpose

To fix the following assertion error, the rank count should be dispatcher count (EP count) instead of DP count.

(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1122, in apply
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self.fused_experts(**common_kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/home/yming/uv_env/vllm/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self._call_impl(*args, **kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/home/yming/uv_env/vllm/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return forward_call(*args, **kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 789, in forward
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     self.prepare_finalize.finalize(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py", line 219, in finalize
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     combined_x, _, event = self.buffer.combine(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]                            ^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/deep_ep/buffer.py", line 418, in combine
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self.internode_combine(x, handle, topk_weights, bias, config, previous_event, async_finish, allocate_on_comm_stream)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/deep_ep/buffer.py", line 504, in internode_combine
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     combined_x, combined_topk_weights, event = self.runtime.internode_combine(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602] RuntimeError: Failed: Assertion error /data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/csrc/kernels/internode.cu:1854 'num_max_rdma_chunked_send_tokens >= num_warps_per_forwarder'

Test Plan

Test DP4TP4EP16

Test Result

can run

============ Serving Benchmark Result ============
Successful requests:                     4096
Maximum request concurrency:             2048
Benchmark duration (s):                  881.86
Total input tokens:                      8373709
Total generated tokens:                  4194304
Request throughput (req/s):              4.64
Output token throughput (tok/s):         4756.23
Total Token throughput (tok/s):          14251.78
---------------Time to First Token----------------
Mean TTFT (ms):                          85167.84
Median TTFT (ms):                        59799.83
P99 TTFT (ms):                           259874.64
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          298.17
Median TPOT (ms):                        282.56
P99 TPOT (ms):                           512.27
---------------Inter-token Latency----------------
Mean ITL (ms):                           298.17
Median ITL (ms):                         153.81
P99 ITL (ms):                            1207.53
==================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an assertion error in DeepEP by using the dispatcher count instead of the data-parallel size for the combine configuration. The change appears correct based on the issue description. However, I've identified a critical issue where the check for supported rank configurations remains inconsistent with the new value, which could lead to a runtime crash. I've provided a suggestion to correct this.

@robertgshaw2-redhat
Copy link
Collaborator

cc @tlrmchlsmth

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change _get_dispatch_config too?

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minosfuture
Copy link
Contributor Author

@tlrmchlsmth I didn't hit the assertion failure for dispatch. But yea, let me update that and test.

@minosfuture minosfuture force-pushed the fix_deepep_combine_config branch from b59b7e0 to f91bc9c Compare September 8, 2025 02:01
@minosfuture minosfuture changed the title [Bugfix] Fix DeepEP combine config for DP4TP4 [Bugfix] Fix DeepEP config for DP4TP4 Sep 8, 2025
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025
@minosfuture minosfuture force-pushed the fix_deepep_combine_config branch from f91bc9c to 805017e Compare September 9, 2025 20:32
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) September 10, 2025 17:11
@tlrmchlsmth tlrmchlsmth merged commit 4032949 into vllm-project:main Sep 10, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants