forked from vllm-project/vllm-ascend
-
Notifications
You must be signed in to change notification settings - Fork 3
Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LookAround0301
merged 10 commits into
LookAround0301:long_seq_pr
from
zhangsicheng5:long_seq_pr
Sep 19, 2025
Merged
Merge main to 79a910ef4730d3f1be14496a1681eee2566f64a0 #25
LookAround0301
merged 10 commits into
LookAround0301:long_seq_pr
from
zhangsicheng5:long_seq_pr
Sep 19, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What this PR does / why we need it? Add an option of enable frozen parameter ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@68dbde5 Signed-off-by: 1Fire4 <[email protected]>
### What this PR does / why we need it? vllm-project#2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@de3e53a Signed-off-by: linfeng-yuan <[email protected]>
when `enable_kv_nz` is true, output of Deepseek R1 is invalid. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 Signed-off-by: realliujiaxu <[email protected]>
### What this PR does / why we need it? Added a new connector for Mooncake store integration to enable kvcache reuse in scenarios with system prompts or multi-turn dialogues. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@5963b98 --------- Signed-off-by: LCAIZJ <[email protected]> Signed-off-by: fems14 <[email protected]> Co-authored-by: fems14 <[email protected]> Co-authored-by: Dreamerleader <[email protected]> Co-authored-by: Pz1116 <[email protected]> Co-authored-by: lizy124 <[email protected]> Co-authored-by: zouyida2052 <[email protected]>
### What this PR does / why we need it? This PR depends on the merge of vllm-project#2707 and has adapted the aclgraph functionality to support MTP. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 --------- Signed-off-by: xuyexiong <[email protected]>
vllm-project#2901) ### What this PR does / why we need it? [Bugfix]:replace npu_incre_flash_attention with npu_fused_infer_attention_score in order to be able to tiling update ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@2b85697 Signed-off-by: p00465316 <[email protected]> Co-authored-by: p00465316 <[email protected]>
### What this PR does / why we need it? The current linear.py has the following issues: - There is redundant conditional logic in the `comm_group` and `forward` selection for classes such as `AscendMergedColumnParallelLinear`. - Inconsistent comm_group selection logic exists among `AscendMergedColumnParallelLinear`, `AscendColumnParallelLinear`, and `AscendQKVParallelLinear`. To address these two issues, this PR encapsulates `comm_group` and `forward` into classes and extracts the classes selection logic into common functions. For future additions of custom communication groups or forward methods, it will only be necessary to extend `CustomColumnParallelOp` or `CustomRowParallelOp` and add new selection logic. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@dd39baf --------- Signed-off-by: realliujiaxu <[email protected]> Co-authored-by: weijinqian0 <[email protected]>
### What this PR does / why we need it? Add multi-node ray backend tutorial for Qwen235B-A3B ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f4cd80f --------- Signed-off-by: wangli <[email protected]>
vllm-project#2681) This pr fixes two problems while `multistream_moe` enabled in torchair graph mode: 1. check `TorchairAscendW8A8DynamicFusedMoEMethod` instead of incorrect `AscendW8A8DynamicFusedMoEMethod` 2. mc2_mask should be chunked no matter `replace_allreduce` is True or False in forward function of `TorchairAscendFusedMoE` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@0fb2551 Signed-off-by: linfeng-yuan <[email protected]>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge main to 79a910e