allow calc_kv_scales #23906

frank-wei · 2025-08-29T07:02:23Z

Summary:
When running the gpt-oss, I found that there is a bug when enabling calculate_kv_scales:

The self.calc_kv_scales() should be invoked without checking attn_metadata
attn_metadata is avail only when full graph mode of cudagraph. If a user does not use it, there is an error(NoneType) when checking attn_metadata.enable_kv_scales_calculation
This PR should fix the above problem.

But we can not use torch.compile when we set calculate_kv_scales=True, it will complain using .item() in def calc_kv_scales()

Differential Revision: D81300417

Summary: 1. The self.calc_kv_scales() should be invoked without checking `attn_metadata` 2. `attn_metadata` is avail only when full graph mode of cudagraph. If user did not use it, there is an error when checking `attn_metadata.enable_kv_scales_calculation` This diff should fix the above problem. But we can not use torch.compile when we set `calculate_kv_scales=True`, it will complain using .item() in `def calc_kv_scales()` Differential Revision: D81300417

facebook-github-bot · 2025-08-29T07:02:38Z

This pull request was exported from Phabricator. Differential Revision: D81300417

gemini-code-assist

Code Review

This pull request fixes a crash that occurs when calculate_kv_scales is enabled but the code is not running in cudagraph full graph mode. The fix removes a dependency on attn_metadata, which can be None in this scenario. While this correctly addresses the crash, it introduces a potential new issue: calc_kv_scales is called unconditionally, but it will fail if key or value tensors are None. The existence of checks for key is not None and value is not None later in the forward method suggests this is a valid possibility. I've added a review comment to guard the call to calc_kv_scales to prevent this potential crash.

gemini-code-assist · 2025-08-29T07:03:44Z

vllm/attention/layer.py

-            attn_metadata = get_forward_context().attn_metadata
-            if attn_metadata.enable_kv_scales_calculation:
-                self.calc_kv_scales(query, key, value)
+            self.calc_kv_scales(query, key, value)


The call to self.calc_kv_scales here could lead to a TypeError if key or value is None, as torch.abs(None) would be executed. Later in this method (lines 260-263), there are checks for key is not None and value is not None, which implies they can indeed be None. To prevent a potential crash, it's crucial to ensure key and value are not None before calling calc_kv_scales.

Suggested change

self.calc_kv_scales(query, key, value)

if key is not None and value is not None:

self.calc_kv_scales(query, key, value)

22quinn · 2025-09-05T06:52:40Z

@heheda12345 Do you mind reviewing this as I saw you touched it in #12536 Thanks!

heheda12345 · 2025-09-15T07:32:19Z

@mgoin Can you help to take a look?

ProExpertProg

I don't think we want to always enable scale calculation if not necessary.

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

22quinn requested a review from heheda12345 September 5, 2025 06:51

heheda12345 mentioned this pull request Sep 15, 2025

[Bugfix] guard missing attn_metadata in KV scales path #24290

Open

5 tasks

ProExpertProg requested changes Sep 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

allow calc_kv_scales #23906

allow calc_kv_scales #23906

frank-wei commented Aug 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 29, 2025

Uh oh!

22quinn commented Sep 5, 2025

Uh oh!

heheda12345 commented Sep 15, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

Uh oh!

	self.calc_kv_scales(query, key, value)
	if key is not None and value is not None:
	self.calc_kv_scales(query, key, value)

Uh oh!

allow calc_kv_scales #23906

Are you sure you want to change the base?

allow calc_kv_scales #23906

Conversation

frank-wei commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

22quinn commented Sep 5, 2025

Uh oh!

heheda12345 commented Sep 15, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

frank-wei commented Aug 29, 2025 •

edited by github-actions bot

Loading