[bugfix] fix shared expert dp with hybrid kvcache #2964

linfeng-yuan · 2025-09-16T13:07:02Z

What this PR does / why we need it?

#2849 moves the implementation of shared_expert_dp to torchair deepseek_modeling. However, the calling of set_forward_context with enforce_eager and shared_expert_dp falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file.

Note that current E2E testing lacks the case of deepseek with shared_expert_dp. We need to add an ST with shared_expert_dp in testing workflow.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

e2e vllm serving with enable_shared_expert_dp: true passed.

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@de3e53a

Signed-off-by: linfeng-yuan <[email protected]>

github-actions · 2025-09-16T13:07:11Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a bug in handling attention metadata for shared expert data parallelism with hybrid KV cache. The proposed change correctly retrieves attention metadata from a dictionary. However, the implementation uses a hardcoded key for layer 0, which is incorrect for other layers and can lead to critical errors in multi-layer models. I've suggested a fix to dynamically construct the key using the current layer's index, ensuring the correct metadata is always used.

gemini-code-assist · 2025-09-16T13:08:34Z

vllm_ascend/torchair/models/torchair_deepseek_v2.py


            attn_metadata = get_forward_context().attn_metadata
+            if attn_metadata is not None and isinstance(attn_metadata, dict):
+                attn_metadata = attn_metadata['model.layers.0.self_attn.attn']


Using a hardcoded key 'model.layers.0.self_attn.attn' to access attention metadata is incorrect. This will fetch metadata for layer 0 regardless of the current layer being processed, which can lead to erroneous behavior, especially in multi-layer models. The key should be constructed dynamically using the current layer's index (self.layer_idx) to ensure the correct metadata is used.

Suggested change

attn_metadata = attn_metadata['model.layers.0.self_attn.attn']

attn_metadata = attn_metadata[f"model.layers.{self.layer_idx}.self_attn.attn"]

wangxiyuan · 2025-09-17T01:01:02Z

please update the commit message to explain why e2e test passed and should we update the e2e as well?

linfeng-yuan · 2025-09-17T06:24:37Z

please update the commit message to explain why e2e test passed and should we update the e2e as well?

I've updated the commit message and plan to add this ST before this weekend~

[bugfix] fix shared expert dp with hybrid kvcache

db3c49d

Signed-off-by: linfeng-yuan <[email protected]>

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

wangxiyuan approved these changes Sep 17, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 17, 2025

wangxiyuan merged commit 8bcc0cc into vllm-project:main Sep 17, 2025
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] fix shared expert dp with hybrid kvcache #2964

[bugfix] fix shared expert dp with hybrid kvcache #2964

linfeng-yuan commented Sep 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

wangxiyuan commented Sep 17, 2025

Uh oh!

linfeng-yuan commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

	attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
	attn_metadata = attn_metadata[f"model.layers.{self.layer_idx}.self_attn.attn"]

[bugfix] fix shared expert dp with hybrid kvcache #2964

[bugfix] fix shared expert dp with hybrid kvcache #2964

Conversation

linfeng-yuan commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Sep 17, 2025

Uh oh!

linfeng-yuan commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

linfeng-yuan commented Sep 16, 2025 •

edited

Loading