-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Discussed in #6008
Originally posted by cybershaman August 10, 2025
Hello all!
Been hitting my head against this issue for some time now so though I might reach out to the community here for advice :-)
Host - Proxmox 8.4 Bare Metal Install
- AMD RX 9060XT (gfx1200)
- Kernel 6.11.11 to leverage proper GPU detection
- amdgpu Kernel module compiled and inserted as dkms (probably not quite needed though?)
- "rocm-smi" functional
- "rocminfo" shows only 1 Agent (CPU)
Container (LXC) - Ubuntu 22.04.5 LTS
- ROCm 6.4.2 installed via amdgpu-install (of course no DKMS)
- following devices passed through from Host with proper cgroup perms for container:
- /dev/kfd
- /dev/dri/*
- "rocm-smi" functional
- "rocminfo" functional (2 Agents, CPU & GPU)
LocalAI compiled from git source:
REBUILD=true BUILD_TYPE=hipblas GPU_TARGETS=gfx1200 GO_TAGS=stablediffusion,tts BUILD_GRPC_FOR_BACKEND_LLAMA=true BUILD_GRPC=true make build
- using environment variable
HSA_OVERRIDE_GFX_VERSION=12.0.0
just in case - testing with a LLAMA model, relevant config parts:
- backend: rocm-llama-cpp
- f16: true
- threads: 0
- gpu_layers: 200
LocalAI appears to be recognising and utilizing the GPU as there is VRAM movement and a tiny bit of GPU usage visible while querying the API.
However, throws an error eventually:
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ggml_cuda_init: found 1 ROCm devices:
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32
[...]
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloading 32 repeating layers to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloading output layer to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: offloaded 33/33 layers to GPU
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: CPU_Mapped model buffer size = 70.32 MiB
11:04AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr load_tensors: ROCm0 model buffer size = 3877.56 MiB
[...]
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ggml_cuda_compute_forward: MUL_MAT failed
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr ROCm error: invalid device function
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr current device: 0, in function ggml_cuda_compute_forward at /LocalAI/backend/cpp/llama-cpp-fallback-build/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2513
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr err
11:05AM DBG GRPC(discolm_german-127.0.0.1:37515): stderr /LocalAI/backend/cpp/llama-cpp-fallback-build/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:84: ROCm error
Anyone have any ideas and/or pointers?
Thank you very much in advance!
Metadata
Metadata
Assignees
Labels
No labels