-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[BugFix] Fix for IMA in FA3 varlen combine #22967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix for IMA in FA3 varlen combine #22967
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the flash-attention
dependency to a new commit, presumably to incorporate a bug fix for IMA in FlashAttention v3 as indicated by the title. My review focuses on improving the maintainability of this dependency management. I've suggested adding a comment to the cmake
file to provide context for the specific commit hash being used. This will help future developers understand the reason for this pin. It is also highly recommended to provide details on how this change was tested in the pull request description to ensure it fixes the intended bug without introducing regressions, especially given the 'footgun' warning in the cmake
file regarding this dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintainability, it's good practice to add a comment explaining why this specific commit hash is being used. This provides context for future developers who might need to update this dependency. You could reference the bug fix or the associated pull request.1
# From https://github.com/vllm-project/flash-attention/pull/80 for IMA fix in FA3
GIT_TAG 4c6a539a4d69241c612fea88b6af6c46d05eb542
Style Guide References
Footnotes
-
Code should be well-commented to explain the purpose and origin of 'magic' values like raw commit hashes, which improves long-term maintainability. ↩
abbbe77
to
3c1a3a3
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Lucas Wilkinson <[email protected]>
3c1a3a3
to
6bbcf5b
Compare
The fastcheck fails due to temporary aws s3 connection error, FYI. |
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Purpose
vLLM side of vllm-project/flash-attention#80
Test Plan + Result
@sarckk confirms this fixes his issue; may help with some of the H100 related issues in #19483 (not the A100 related ones; this issue appears to have multiple independent IMAs being discussed)
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.