-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[BugFix] Fix EXAONE4 rotary embeddings #23918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: lkm2835 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the application of rotary embeddings for EXAONE4 models, particularly distinguishing between model variants with and without sliding window attention. The logic is corrected to apply RoPE to all layers only when no sliding window attention is used, and conditionally on sliding window layers otherwise. This change is supported by improved benchmark results. My review focuses on a minor but important maintainability issue: a comment that has become outdated and misleading due to the logic change.
|
||
# apply rotary embeddings to every layer | ||
self.apply_all_layers = not is_sliding | ||
self.apply_rope_all_layers = "sliding_attention" not in config.layer_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this logic correctly fixes the bug, the comment on the preceding line (# apply rotary embeddings to every layer
) is now misleading. With this change, rotary embeddings are not always applied to every layer. When sliding window attention is present in any layer, RoPE is only applied to those specific sliding window layers. Please update the comment to accurately reflect this new conditional logic to ensure code clarity and prevent future misunderstandings.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: lkm2835 <[email protected]>
Hi, @DarkLight1337 @hmellor. |
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix
Signed-off-by: lkm2835 <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: 子悬 <[email protected]>
Signed-off-by: lkm2835 <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
Signed-off-by: lkm2835 <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
Purpose
c498483#diff-167e1581ca70123b5871f46e08770bc343bfbcb897aab95603a1cd9adf9b2a35L162-R167
In this commit, the code was refactored to be cleaner and more readable, but a bug occurred because the value of
apply_all_layers
was changed.For EXAONE-4.0-1.2B, the sliding_window is not applied to all layers, and rotary embeddings are always applied (
apply_all_layers
= True).In contrast, for EXAONE-4.0-32B, rotary embeddings should only be applied when sliding_window is applied.
(
is_sliding
= True)Test Plan
AIME25 benchmark evaluation test.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.