-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Description
Name and Version
./llama.cpp/build/bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
version: 6264 (043fb27)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
[1] 13557 segmentation fault (core dumped) ./llama.cpp/build/bin/llama-cli --version
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
./llama.cpp/build/bin/llama-bench -m /mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf -fa 1 -n 99 -o json
Problem description & steps to reproduce
I'm instruction llama-bench to output in json so i expect only json as output.
It also adds:
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
Which is outside the json format and annoying to parse out. It just shouldn't add that when a format other then md
is asked.
First Bad Commit
Irrelevant.
Sidenote though, i intentionally also left in [1] 11372 segmentation fault (core dumped) ./llama.cpp/build/bin/llama-bench -m -fa 1 -n 99 -o json
which started happening some days ago. Don't know which commit.
Relevant log output
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
[
{
"build_commit": "043fb27d",
"build_number": 6264,
"cpu_info": "AMD Ryzen Threadripper 1950X 16-Core Processor",
"gpu_info": "AMD Radeon RX 580 Series (RADV POLARIS10)",
"backends": "Vulkan",
"model_filename": "/mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf",
"model_type": "qwen3 4B MXFP4 MoE",
"model_size": 4274448384,
"model_n_params": 4022468096,
"n_batch": 2048,
"n_ubatch": 512,
"n_threads": 16,
"cpu_mask": "0x0",
"cpu_strict": false,
"poll": 50,
"type_k": "f16",
"type_v": "f16",
"n_gpu_layers": 99,
"split_mode": "layer",
"main_gpu": 0,
"no_kv_offload": false,
"flash_attn": true,
"tensor_split": "0.00",
"tensor_buft_overrides": "none",
"use_mmap": true,
"embeddings": false,
"no_op_offload": 0,
"n_prompt": 512,
"n_gen": 0,
"n_depth": 0,
"test_time": "2025-08-25T01:02:39Z",
"avg_ns": 1378257653,
"stddev_ns": 7488379,
"avg_ts": 371.492279,
"stddev_ts": 2.016004,
"samples_ns": [ 1368536333, 1375579069, 1379896477, 1378068835, 1389207552 ],
"samples_ts": [ 374.122, 372.207, 371.042, 371.534, 368.555 ]
},
{
"build_commit": "043fb27d",
"build_number": 6264,
"cpu_info": "AMD Ryzen Threadripper 1950X 16-Core Processor",
"gpu_info": "AMD Radeon RX 580 Series (RADV POLARIS10)",
"backends": "Vulkan",
"model_filename": "/mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf",
"model_type": "qwen3 4B MXFP4 MoE",
"model_size": 4274448384,
"model_n_params": 4022468096,
"n_batch": 2048,
"n_ubatch": 512,
"n_threads": 16,
"cpu_mask": "0x0",
"cpu_strict": false,
"poll": 50,
"type_k": "f16",
"type_v": "f16",
"n_gpu_layers": 99,
"split_mode": "layer",
"main_gpu": 0,
"no_kv_offload": false,
"flash_attn": true,
"tensor_split": "0.00",
"tensor_buft_overrides": "none",
"use_mmap": true,
"embeddings": false,
"no_op_offload": 0,
"n_prompt": 0,
"n_gen": 99,
"n_depth": 0,
"test_time": "2025-08-25T01:02:48Z",
"avg_ns": 2568050318,
"stddev_ns": 505588,
"avg_ts": 38.550648,
"stddev_ts": 0.007532,
"samples_ns": [ 2568450490, 2567664647, 2567498539, 2567965832, 2568672085 ],
"samples_ts": [ 38.5446, 38.5564, 38.5589, 38.5519, 38.5413 ]
}
]
[1] 11372 segmentation fault (core dumped) ./llama.cpp/build/bin/llama-bench -m -fa 1 -n 99 -o json