[Bugfix] Fix DeepEP config for DP4TP4 #23619

minosfuture · 2025-08-26T05:29:07Z

Purpose

To fix the following assertion error, the rank count should be dispatcher count (EP count) instead of DP count.

(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1122, in apply
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self.fused_experts(**common_kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/home/yming/uv_env/vllm/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self._call_impl(*args, **kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/home/yming/uv_env/vllm/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return forward_call(*args, **kwargs)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 789, in forward
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     self.prepare_finalize.finalize(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py", line 219, in finalize
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     combined_x, _, event = self.buffer.combine(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]                            ^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/deep_ep/buffer.py", line 418, in combine
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     return self.internode_combine(x, handle, topk_weights, bias, config, previous_event, async_finish, allocate_on_comm_stream)
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]   File "/data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/deep_ep/buffer.py", line 504, in internode_combine
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]     combined_x, combined_topk_weights, event = self.runtime.internode_combine(
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602]                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=55411) ERROR 08-25 21:36:19 [multiproc_executor.py:602] RuntimeError: Failed: Assertion error /data/users/yming/gitrepos/vllm/ep_kernels_workspace/DeepEP/csrc/kernels/internode.cu:1854 'num_max_rdma_chunked_send_tokens >= num_warps_per_forwarder'

Test Plan

Test DP4TP4EP16

Test Result

can run

============ Serving Benchmark Result ============
Successful requests:                     4096
Maximum request concurrency:             2048
Benchmark duration (s):                  881.86
Total input tokens:                      8373709
Total generated tokens:                  4194304
Request throughput (req/s):              4.64
Output token throughput (tok/s):         4756.23
Total Token throughput (tok/s):          14251.78
---------------Time to First Token----------------
Mean TTFT (ms):                          85167.84
Median TTFT (ms):                        59799.83
P99 TTFT (ms):                           259874.64
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          298.17
Median TPOT (ms):                        282.56
P99 TPOT (ms):                           512.27
---------------Inter-token Latency----------------
Mean ITL (ms):                           298.17
Median ITL (ms):                         153.81
P99 ITL (ms):                            1207.53
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request fixes an assertion error in DeepEP by using the dispatcher count instead of the data-parallel size for the combine configuration. The change appears correct based on the issue description. However, I've identified a critical issue where the check for supported rank configurations remains inconsistent with the new value, which could lead to a runtime crash. I've provided a suggestion to correct this.

vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py

robertgshaw2-redhat · 2025-08-28T02:06:14Z

cc @tlrmchlsmth

tlrmchlsmth

Do we need to change _get_dispatch_config too?

tlrmchlsmth

It looks like both of these should be passing in the ep_size - could you update the PR to pass that in @minosfuture ?

https://github.com/deepseek-ai/DeepEP/blob/1da73be0bf7e27d1b7aea6d8059267cce0c73b46/deep_ep/buffer.py#L242-L243

https://github.com/deepseek-ai/DeepEP/blob/1da73be0bf7e27d1b7aea6d8059267cce0c73b46/deep_ep/buffer.py#L270-L271

minosfuture · 2025-08-28T02:53:25Z

@tlrmchlsmth I didn't hit the assertion failure for dispatch. But yea, let me update that and test.

Signed-off-by: Ming Yang <[email protected]>

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Aug 28, 2025

View reviewed changes

minosfuture force-pushed the fix_deepep_combine_config branch from b59b7e0 to f91bc9c Compare September 8, 2025 02:01

minosfuture requested a review from tlrmchlsmth September 8, 2025 02:24

minosfuture changed the title ~~[Bugfix] Fix DeepEP combine config for DP4TP4~~ [Bugfix] Fix DeepEP config for DP4TP4 Sep 8, 2025

tlrmchlsmth approved these changes Sep 8, 2025

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

minosfuture added 3 commits September 9, 2025 13:31

[Bugfix] Fix DeepEP combine config for DP4TP4

48f138b

Signed-off-by: Ming Yang <[email protected]>

address gemini comment

3f6eaf4

Signed-off-by: Ming Yang <[email protected]>

Update dispatch config

805017e

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the fix_deepep_combine_config branch from f91bc9c to 805017e Compare September 9, 2025 20:32

tlrmchlsmth enabled auto-merge (squash) September 10, 2025 17:11

tlrmchlsmth merged commit 4032949 into vllm-project:main Sep 10, 2025
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix DeepEP config for DP4TP4 #23619

[Bugfix] Fix DeepEP config for DP4TP4 #23619

Uh oh!

minosfuture commented Aug 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

robertgshaw2-redhat commented Aug 28, 2025

Uh oh!

tlrmchlsmth left a comment •

edited

Loading

Uh oh!

tlrmchlsmth left a comment •

edited

Loading

Uh oh!

minosfuture commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Fix DeepEP config for DP4TP4 #23619

[Bugfix] Fix DeepEP config for DP4TP4 #23619

Uh oh!

Conversation

minosfuture commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

robertgshaw2-redhat commented Aug 28, 2025

Uh oh!

tlrmchlsmth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minosfuture commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

minosfuture commented Aug 26, 2025 •

edited by github-actions bot

Loading

tlrmchlsmth left a comment •

edited

Loading

tlrmchlsmth left a comment •

edited

Loading