AMD RX 9060XT ROCm error: invalid device function


### Discussed in https://github.com/mudler/LocalAI/discussions/6008

<div type='discussions-op-text'>

<sup>Originally posted by **cybershaman** August 10, 2025</sup>
Hello all!

Been hitting my head against this issue for some time now so though I might reach out to the community here for advice :-)

Host - Proxmox 8.4 Bare Metal Install
- AMD RX 9060XT (gfx1200)
- Kernel 6.11.11 to leverage proper GPU detection
- amdgpu Kernel module compiled and inserted as dkms (probably not quite needed though?)
- "rocm-smi" functional
- "rocminfo" shows only 1 Agent (CPU)

Container (LXC) - Ubuntu 22.04.5 LTS
- ROCm 6.4.2 installed via amdgpu-install (of course no DKMS)
- following devices passed through from Host with proper cgroup perms for container:
   - /dev/kfd
   - /dev/dri/*
- "rocm-smi" functional
- "rocminfo" functional (2 Agents, CPU & GPU)

LocalAI compiled from git source:
- `REBUILD=true BUILD_TYPE=hipblas GPU_TARGETS=gfx1200 GO_TAGS=stablediffusion,tts BUILD_GRPC_FOR_BACKEND_LLAMA=true BUILD_GRPC=true make build`
- using environment variable `HSA_OVERRIDE_GFX_VERSION=12.0.0` just in case
- testing with a LLAMA model, relevant config parts:
   - backend: rocm-llama-cpp
   - f16: true
   - threads: 0
   - gpu_layers: 200

LocalAI appears to be recognising and utilizing the GPU as there is VRAM movement and a tiny bit of GPU usage visible while querying the  API.
However, throws an error eventually:

```
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ggml_cuda_init: found 1 ROCm devices:
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr   Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32
[...]
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloading 32 repeating layers to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloading output layer to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloaded 33/33 layers to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors:   CPU_Mapped model buffer size =    70.32 MiB
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors:        ROCm0 model buffer size =  3877.56 MiB
[...]
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ggml_cuda_compute_forward: MUL_MAT failed
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ROCm error: invalid device function
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr   current device: 0, in function ggml_cuda_compute_forward at /LocalAI/backend/cpp/llama-cpp-fallback-build/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2513
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr   err
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr /LocalAI/backend/cpp/llama-cpp-fallback-build/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:84: ROCm error
```

Anyone have any ideas and/or pointers?
Thank you very much in advance!


</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AMD RX 9060XT ROCm error: invalid device function #6044

Discussed in #6008

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

AMD RX 9060XT ROCm error: invalid device function #6044

Description

Discussed in #6008

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions