Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Aug 28, 2025

Using float32 for uniform_probs can be dangerous because exact 0 can be sampled with a non-negligible probability: pytorch/pytorch#16706

This PR mitigates the issue by using float64 instead of float32

@mergify mergify bot added documentation Improvements or additions to documentation speculative-decoding v1 labels Aug 28, 2025
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the rejection sampler for speculative decoding. By changing the data type for uniform_probs from torch.float32 to torch.float64, it mitigates the risk of torch.rand producing an exact zero, which could lead to incorrect token acceptance. The change is well-justified and correctly implemented. The pull request also includes a fix in the spec_decode.py example to correctly handle prompt inputs. Both changes improve the correctness and robustness of the codebase. The changes look good to me.

@WoosukKwon WoosukKwon enabled auto-merge (squash) August 28, 2025 08:03
@WoosukKwon WoosukKwon merged commit a3432f1 into main Aug 28, 2025
50 checks passed
@WoosukKwon WoosukKwon deleted the woosuk/fix-reject-sample branch August 28, 2025 12:26
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants