[rocm][V0] fix selection logic for custom PA in V0 #16426

divakar-amd · 2025-04-10T19:16:04Z

This PR enables custom Paged Attention for vLLM-V0
When measured on Mistral-7B-v0.1-FP8-KV for input=2048, output=128, bs=128, the e2e latency difference is about 2-3x before and after this PR.
This extends the previous PR: #15982

Signed-off-by: Divakar Verma <[email protected]>

github-actions · 2025-04-10T19:16:14Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

SageMoore

LGTM. Let's make sure that we run the AMD CI on this PR before merging

Alexei-V-Ivanov-AMD · 2025-04-11T18:29:43Z

/ready

ProExpertProg

Could we just improve the comment as the logic is a little obscure?

ProExpertProg · 2025-04-13T20:45:54Z

vllm/platforms/rocm.py

+            and (not envs.VLLM_USE_V1 or sliding_window == 0
+                 or sliding_window == (-1, -1))


Do you mind adding a comment here? Something like: "custom paged attn always supported on V0, only [with(out)...] sliding window on V1" - where the [...] is a short description of why the sliding window checks are there

Signed-off-by: Divakar Verma <[email protected]>

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Yang Wang <[email protected]>

Signed-off-by: Divakar Verma <[email protected]>

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Mu Huai <[email protected]>

enable custom PA for V0

545cd06

Signed-off-by: Divakar Verma <[email protected]>

divakar-amd changed the title ~~enable custom PA for V0~~ [rocm][V0] fix selection logic for custom PA in V0 Apr 10, 2025

SageMoore approved these changes Apr 11, 2025

View reviewed changes

ProExpertProg reviewed Apr 13, 2025

View reviewed changes

add comment for paged attn logic

8e4add0

Signed-off-by: Divakar Verma <[email protected]>

ProExpertProg approved these changes Apr 14, 2025

View reviewed changes

robertgshaw2-redhat approved these changes Apr 15, 2025

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) April 15, 2025 16:30

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2025

vllm-bot merged commit 95aca28 into vllm-project:main Apr 17, 2025
56 of 58 checks passed

lionelvillard pushed a commit to lionelvillard/vllm that referenced this pull request Apr 17, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

dc1f31f

Signed-off-by: Divakar Verma <[email protected]>

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

0b6c05d

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Yang Wang <[email protected]>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

b35dda9

Signed-off-by: Divakar Verma <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

1bbf332

Signed-off-by: Divakar Verma <[email protected]>

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

69647a3

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[rocm][V0] fix selection logic for custom PA in V0 (vllm-project#16426)

28fc3f0

Signed-off-by: Divakar Verma <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[rocm][V0] fix selection logic for custom PA in V0 #16426

[rocm][V0] fix selection logic for custom PA in V0 #16426

Uh oh!

divakar-amd commented Apr 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

SageMoore left a comment

Uh oh!

Alexei-V-Ivanov-AMD commented Apr 11, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

ProExpertProg Apr 13, 2025

Uh oh!

divakar-amd Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

		and (not envs.VLLM_USE_V1 or sliding_window == 0
		or sliding_window == (-1, -1))

Uh oh!

[rocm][V0] fix selection logic for custom PA in V0 #16426

[rocm][V0] fix selection logic for custom PA in V0 #16426

Uh oh!

Conversation

divakar-amd commented Apr 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Alexei-V-Ivanov-AMD commented Apr 11, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

divakar-amd Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

divakar-amd commented Apr 10, 2025 •

edited by github-actions bot

Loading