Skip to content

Conversation

amd-xiaoyu12
Copy link

@amd-xiaoyu12 amd-xiaoyu12 commented Jul 9, 2025

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Summary:
Support full fp8 MFMA with wrap level dynamic query quantization to improve fp8 performance on MI308, which can also benefits other MI300x accelerator or latest hardware.

  • Performance
image
  • Unit test - attention output
image * Lm-eval-harness ppl test image

gshtras and others added 30 commits February 17, 2025 15:42
* Enabling ROCm CI on MI250 machines:
- correct build target
- correct queue

Signed-off-by: Alexei V. Ivanov <[email protected]>

---------

Signed-off-by: Alexei V. Ivanov <[email protected]>
* Optimization for quantized gemm skinny sizes

* lint fix

* Add support for bf16/fp16

* code cleanup

* code cleanup

* lint fix2

* cleanup

* Moved the logic into tuned gemm to preserve API compatibility

---------

Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* Removing gfx940 and gfx941 targets. These have been deprecated in favor of gfx942 for MI300X

Signed-off-by: Gregory Shtrasberg <[email protected]>

* Remove from custom kernels as well

---------

Signed-off-by: Gregory Shtrasberg <[email protected]>
* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops

* Make sure to use the submodule commit compatible with the main aiter commit
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
* Using aiter branch that can be built into a whl with PREBUILD_KERNELS=1

* Using fail fast on aiter build to see compilation errors in the log since it fails silently

* Check for build success without installing whl
* Using proposed fix from ROCm/aiter#115

* Build fix
* tuning adjustment for quantized skinny gemm.

* lint fix
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
@amd-xiaoyu12 amd-xiaoyu12 changed the title Update fp8 paged attention Update fp8 paged attention for MI308 Jul 9, 2025
@amd-xiaoyu12 amd-xiaoyu12 changed the title Update fp8 paged attention for MI308 Update fp8 paged attention Aug 4, 2025
@gshtras gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.