Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25

zhangsicheng5 · 2025-09-18T14:20:36Z

Merge main to 79a910e

### What this PR does / why we need it? Add an option of enable frozen parameter ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@68dbde5 Signed-off-by: 1Fire4 <[email protected]>

### What this PR does / why we need it? vllm-project#2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@de3e53a Signed-off-by: linfeng-yuan <[email protected]>

when `enable_kv_nz` is true, output of Deepseek R1 is invalid. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 Signed-off-by: realliujiaxu <[email protected]>

### What this PR does / why we need it? Added a new connector for Mooncake store integration to enable kvcache reuse in scenarios with system prompts or multi-turn dialogues. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@5963b98 --------- Signed-off-by: LCAIZJ <[email protected]> Signed-off-by: fems14 <[email protected]> Co-authored-by: fems14 <[email protected]> Co-authored-by: Dreamerleader <[email protected]> Co-authored-by: Pz1116 <[email protected]> Co-authored-by: lizy124 <[email protected]> Co-authored-by: zouyida2052 <[email protected]>

### What this PR does / why we need it? This PR depends on the merge of vllm-project#2707 and has adapted the aclgraph functionality to support MTP. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 --------- Signed-off-by: xuyexiong <[email protected]>

vllm-project#2901) ### What this PR does / why we need it? [Bugfix]:replace npu_incre_flash_attention with npu_fused_infer_attention_score in order to be able to tiling update ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 Signed-off-by: p00465316 <[email protected]> Co-authored-by: p00465316 <[email protected]>

### What this PR does / why we need it? The current linear.py has the following issues: - There is redundant conditional logic in the `comm_group` and `forward` selection for classes such as `AscendMergedColumnParallelLinear`. - Inconsistent comm_group selection logic exists among `AscendMergedColumnParallelLinear`, `AscendColumnParallelLinear`, and `AscendQKVParallelLinear`. To address these two issues, this PR encapsulates `comm_group` and `forward` into classes and extracts the classes selection logic into common functions. For future additions of custom communication groups or forward methods, it will only be necessary to extend `CustomColumnParallelOp` or `CustomRowParallelOp` and add new selection logic. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@dd39baf --------- Signed-off-by: realliujiaxu <[email protected]> Co-authored-by: weijinqian0 <[email protected]>

### What this PR does / why we need it? Add multi-node ray backend tutorial for Qwen235B-A3B ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f4cd80f --------- Signed-off-by: wangli <[email protected]>

vllm-project#2681) This pr fixes two problems while `multistream_moe` enabled in torchair graph mode: 1. check `TorchairAscendW8A8DynamicFusedMoEMethod` instead of incorrect `AscendW8A8DynamicFusedMoEMethod` 2. mc2_mask should be chunked no matter `replace_allreduce` is True or False in forward function of `TorchairAscendFusedMoE` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@0fb2551 Signed-off-by: linfeng-yuan <[email protected]>

github-actions · 2025-09-18T14:20:47Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

1Fire4 and others added 10 commits September 17, 2025 12:00

Add an option of enable frozen parameter (vllm-project#2869)

1f6465c

### What this PR does / why we need it? Add an option of enable frozen parameter ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@68dbde5 Signed-off-by: 1Fire4 <[email protected]>

[Bugfix] fix kv nz accuracy bug (vllm-project#2988)

723d460

when `enable_kv_nz` is true, output of Deepseek R1 is invalid. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 Signed-off-by: realliujiaxu <[email protected]>

Merge remote-tracking branch 'upstream/main' into long_seq_pr

f43bc83

github-actions bot added documentation Improvements or additions to documentation module:core module:tests module:ops labels Sep 18, 2025

LookAround0301 merged commit 2f5102f into LookAround0301:long_seq_pr Sep 19, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25

Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25

Uh oh!

zhangsicheng5 commented Sep 18, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25

Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25

Uh oh!

Conversation

zhangsicheng5 commented Sep 18, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!