Skip to content

Conversation

TheEpicDolphin
Copy link
Contributor

@TheEpicDolphin TheEpicDolphin commented Aug 4, 2025

Purpose

Fix the following "out of resource" error that occurs during execution of the kernel_unified_attention_2d kernel. Here's the error:

[2025-08-04T06:55:28Z] v1/spec_decode/test_tree_attention.py:110: in forward_attention
--
  | [2025-08-04T06:55:28Z]     return instance.forward(
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/tree_attn.py:432: in forward
  | [2025-08-04T06:55:28Z]     unified_attention(
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/vllm/attention/ops/triton_unified_attention.py:664: in unified_attention
  | [2025-08-04T06:55:28Z]     kernel_unified_attention_2d[(
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py:347: in <lambda>
  | [2025-08-04T06:55:28Z]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py:591: in run
  | [2025-08-04T06:55:28Z]     kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata,
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py:413: in __getattribute__
  | [2025-08-04T06:55:28Z]     self._init_handles()
  | [2025-08-04T06:55:28Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  | [2025-08-04T06:55:28Z]
  | [2025-08-04T06:55:28Z] self = <triton.compiler.compiler.CompiledKernel object at 0x7f4cedee5df0>
  | [2025-08-04T06:55:28Z]
  | [2025-08-04T06:55:28Z]     def _init_handles(self):
  | [2025-08-04T06:55:28Z]         if self.module is not None:
  | [2025-08-04T06:55:28Z]             return
  | [2025-08-04T06:55:28Z]         device = driver.active.get_current_device()
  | [2025-08-04T06:55:28Z]         # create launcher
  | [2025-08-04T06:55:28Z]         self.run = driver.active.launcher_cls(self.src, self.metadata)
  | [2025-08-04T06:55:28Z]         # not enough shared memory to run the kernel
  | [2025-08-04T06:55:28Z]         max_shared = driver.active.utils.get_device_properties(device)["max_shared_mem"]
  | [2025-08-04T06:55:28Z]         if self.metadata.shared > max_shared:
  | [2025-08-04T06:55:28Z] >           raise OutOfResources(self.metadata.shared, max_shared, "shared memory")
  | [2025-08-04T06:55:28Z] E           triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 155648, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
  | [2025-08-04T06:55:28Z]
  | [2025-08-04T06:55:28Z] /usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py:401: OutOfResources
  | [2025-08-04T06:55:28Z] =============================== warnings summary ===============================
  | [2025-08-04T06:55:28Z] ../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  | [2025-08-04T06:55:28Z]   /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
  | [2025-08-04T06:55:28Z]     ref_error: type[Exception] = jsonschema.RefResolutionError,
  | [2025-08-04T06:55:28Z]
  | [2025-08-04T06:55:28Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
  | [2025-08-04T06:55:28Z] =========================== short test summary info ============================
  | [2025-08-04T06:55:28Z] FAILED v1/spec_decode/test_tree_attention.py::test_tree_attn_correctness - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 155648, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
  | [2025-08-04T06:55:28Z] ============= 1 failed, 24 passed, 1 warning in 233.80s (0:03:53) ==============
  | [2025-08-04T06:55:31Z] 🚨 Error: The command exited with status 1

It is currently triggered by the test_tree_attention test, and is a result of using a large block size (128). I have reduced the block size to 32, reducing the per-block shared memory usage to ~38K, which is supported by the majority of modern hardware.

Test Plan

(py312conda) bash-5.1$ pytest tests/v1/spec_decode/test_tree_attention.py -k test_tree_attn_correctness
============================================================================================================================================ test session starts ============================================================================================================================================
platform linux -- Python 3.12.9, pytest-8.4.1, pluggy-1.6.0
rootdir: /data/users/gdelfin/gitrepos/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-1.1.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item

tests/v1/spec_decode/test_tree_attention.py . [100%]

============================================================================================================================================ 1 passed in 34.33s =============================================================================================================================================

…t of resource' triton error

Signed-off-by: Giancarlo Delfin <[email protected]>
Copy link

github-actions bot commented Aug 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an 'out of resource' error in the test_tree_attn_correctness test by reducing the block_size from 128 to 32. While this fixes the immediate issue on hardware with limited shared memory, it also reduces test coverage for a valid and important configuration. My review includes a suggestion to conditionally set the block_size based on available GPU resources. This approach maintains test coverage on capable hardware while ensuring the test suite remains stable on more constrained environments, thus improving the overall robustness of the tests.

@TheEpicDolphin TheEpicDolphin marked this pull request as ready for review August 4, 2025 18:23
Copy link

@sgrigory sgrigory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to avoid broken CI. Let's follow up to add a check that tree attention backend is not chosen when block size is too large then

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Aug 4, 2025
@vllm-bot vllm-bot merged commit 5ea71ff into vllm-project:main Aug 5, 2025
27 of 30 checks passed
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure Issue about an unexpected test failure in CI ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants