Skip to content

Conversation

LookAround0301
Copy link
Contributor

@LookAround0301 LookAround0301 commented Sep 16, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

LookAround0301 and others added 11 commits September 16, 2025 10:42
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: tanwenqin <[email protected]>
Signed-off-by: tanwenqin <[email protected]>
Signed-off-by: zhaoyifan <[email protected]>
Signed-off-by: zhangsicheng5 <[email protected]>
Signed-off-by: LookAround <[email protected]>
…nto long_seq_pr

# Conflicts:
#	vllm_ascend/attention/attention_v1.py
#	vllm_ascend/ops/fused_moe.py
#	vllm_ascend/worker/model_runner_v1.py
Signed-off-by: LookAround <[email protected]>
Signed-off-by: LookAround <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for context parallelism (CP) and sequence parallelism (SP) to enable efficient inference for models with long sequences on Ascend NPUs. The changes are extensive, touching upon the scheduler, model runner, attention implementations (vanilla and MLA), and various operational layers. New metadata structures and logic have been added to manage the distributed state across CP and SP ranks. The implementation appears to leverage hardware-specific features for ring attention and parallel computations. Overall, this is a significant feature addition that enhances long-context capabilities. My review has identified a critical issue related to state management when reordering requests, which needs to be addressed.

…nto long_seq_pr

# Conflicts:
#	vllm_ascend/models/deepseek_v2.py
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Apocalypse990923-qshi <[email protected]>
Signed-off-by: Apocalypse990923-qshi <[email protected]>
weiguihua2 and others added 4 commits September 17, 2025 14:38
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: Apocalypse990923-qshi <[email protected]>
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 18, 2025
@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Sep 18, 2025
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: tanwenqin <[email protected]>
@MengqingCao
Copy link
Collaborator

plz make more details in pr message and link the related vllm pr here

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

tensor_npu = _list_to_tensor(value, self.device)
self.kv_idx_names[key] = tensor_npu

# 处理序列长度相关张量
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use English comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants