Skip to content

Conversation

sarckk
Copy link
Collaborator

@sarckk sarckk commented Sep 5, 2025

Purpose

Raise error if EAGLE spec decoding used with fast prefill. EAGLE expects logits for all tokens to be correct after prefill, this is violated by fast prefill.

Also consolidate warnings about fast prefill in one place.

Test Plan

vllm serve meta-llama/Llama-3.1-8B-Instruct --speculative-config='{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3.1-Instruct-8B", "num_speculative_tokens": 3}' -tp 1 --kv-sharing-fast-prefill

Test Result

(APIServer pid=1041573)   File "/data/users/yhshin/gitrepos/vllm/vllm/config/__init__.py", line 3676, in __post_init__
(APIServer pid=1041573)     raise NotImplementedError(
(APIServer pid=1041573) NotImplementedError: Fast prefill optimization for KV sharing is not compatible with EAGLE as EAGLE requires correct logits for all tokens while fast prefill gives incorrect logits for prompt tokens.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly raises a NotImplementedError when EAGLE-style speculative decoding is used with fast prefill, addressing an incompatibility between the two features. The implementation is sound, with a clear error message. The change also refactors related checks and warnings into a more suitable location, improving code organization. The changes are well-executed and I have no suggestions for improvement.

@heheda12345 heheda12345 changed the title Raise error if using eagle with fast prefill [KV Sharing] Raise error if using eagle with fast prefill Sep 6, 2025
@heheda12345 heheda12345 enabled auto-merge (squash) September 6, 2025 00:16
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025
@vllm-bot vllm-bot merged commit 3c529fc into vllm-project:main Sep 6, 2025
47 of 49 checks passed
@sarckk sarckk deleted the eagle-yoco branch September 6, 2025 22:16
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants