-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Closed
Labels
staleOver 90 days of inactivityOver 90 days of inactivity
Description
As of #12093 Flash Attention 3 is now supported in vLLM for Hopper GPUs (SM 9.0).
It can also be enabled for SM 8.0 and 8.7 using VLLM_FLASH_ATTN_VERSION=3
.
For 8.6 and 8.9 its fully disabled since they don't have enough shared memory for the current implementation, some work needs to be done here.
This issue tracks the remaining features that have yet to be implemented
Hardware Support
- SM 8.9 Ada Lovelace (L4, L40s) Support
- SM 8.6 Ampere (A6000) Support
Optimizations
- FP8 Attention
Metadata
Metadata
Assignees
Labels
staleOver 90 days of inactivityOver 90 days of inactivity