[Draft] Enable CUTLASS without host compiler #1967

LuFinch · 2025-08-27T02:13:54Z

This is a draft PR to enable CUTLASS in torch-xpu-ops so that we can test cutlass kernels' accuracy/performance in Pytorch when SDPA/GEMM kernels are ready.

Since there is not determined plan of how to import cutlass-sycl repo, I download it in cmake for debug convinence.

I put all pytorch/aten wrapper functions in ATen/native/cutlass/*.cpp, which extract problem shape and device pointer from at::Tensor. It is compiled and link into libtorch_xpu_ops.a with pure gcc. They will call kernel launch functions from ATen/native/cutlass/sycl/*.cpp

I put all cutlass sycl kernels functions in ATen/native/cutlass/sycl/*.cpp. Since CUTLASS and syclcompat don't support -fsycl-host-compiler=g++, I compile the .cpp files of cutlass kernels into libcutlass_kernels.so library with pure icpx and then link it to libtorch_xpu_ops.a with gcc linker.

Currently, due to

libcutlass_kernels.so links to libtorch_xpu_ops.a
libtorch_xpu_ops.a links to libtorch_xpu.so
libtorch_xpu.so compiles/links aten/src/ATen/native/mkldnn/xpu/detail/Attention.cpp and aten/src/ATen/native/mkldnn/xpu/Attention.cpp
the include directory of torch-xpu-ops is exposed to libtorch_xpu.so.

Pytorch/aten/src/ATen/native/mkldnn/xpu/Attention.cpp can call wrap functions from torch-xpu-ops/src/ATen/native/cutlass/*.h directly like below

#include <ATen/native/cutlass/Attention.h>
at::native::cutlass_sycl::sdpa_backward(...);

I have verified that overrideable_sdpa_backward can call into YuanKun's cutlass sdpa backward kernel now. It passes a few accuracy UT.

LuFinch · 2025-08-27T02:20:05Z

@EikanWang Could you help take a look? I have verified that the aten .cpp in Pytorch can invocate functions from torch-xpu-ops's aten .cpp.

The issues need to discuss are

Where should we put the cutlass kernels? Pytorch or torch-xpu-ops?

temp

558247e

remove unused flags

bf70ef9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Enable CUTLASS without host compiler #1967

[Draft] Enable CUTLASS without host compiler #1967

Uh oh!

LuFinch commented Aug 27, 2025 •

edited

Loading

Uh oh!

LuFinch commented Aug 27, 2025

Uh oh!

Uh oh!

[Draft] Enable CUTLASS without host compiler #1967

Are you sure you want to change the base?

[Draft] Enable CUTLASS without host compiler #1967

Uh oh!

Conversation

LuFinch commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuFinch commented Aug 27, 2025

Uh oh!

Uh oh!

LuFinch commented Aug 27, 2025 •

edited

Loading