-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[V1] reduce block size for tree attention correctness test to fix 'ou… #22207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t of resource' triton error Signed-off-by: Giancarlo Delfin <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an 'out of resource' error in the test_tree_attn_correctness
test by reducing the block_size
from 128 to 32. While this fixes the immediate issue on hardware with limited shared memory, it also reduces test coverage for a valid and important configuration. My review includes a suggestion to conditionally set the block_size
based on available GPU resources. This approach maintains test coverage on capable hardware while ensuring the test suite remains stable on more constrained environments, thus improving the overall robustness of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping to avoid broken CI. Let's follow up to add a check that tree attention backend is not chosen when block size is too large then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Noam Gat <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Paul Pak <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
vllm-project#22207) Signed-off-by: Giancarlo Delfin <[email protected]>
Purpose
Fix the following "out of resource" error that occurs during execution of the
kernel_unified_attention_2d
kernel. Here's the error:It is currently triggered by the
test_tree_attention
test, and is a result of using a large block size (128). I have reduced the block size to 32, reducing the per-block shared memory usage to ~38K, which is supported by the majority of modern hardware.Test Plan
(py312conda) bash-5.1$ pytest tests/v1/spec_decode/test_tree_attention.py -k test_tree_attn_correctness
============================================================================================================================================ test session starts ============================================================================================================================================
platform linux -- Python 3.12.9, pytest-8.4.1, pluggy-1.6.0
rootdir: /data/users/gdelfin/gitrepos/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-1.1.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item
tests/v1/spec_decode/test_tree_attention.py . [100%]
============================================================================================================================================ 1 passed in 34.33s =============================================================================================================================================