Skip to content

Misc. bug: llama-bench json output is too verbose #15554

@markg85

Description

@markg85

Name and Version

./llama.cpp/build/bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
version: 6264 (043fb27)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
[1] 13557 segmentation fault (core dumped) ./llama.cpp/build/bin/llama-cli --version

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

./llama.cpp/build/bin/llama-bench -m /mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf -fa 1 -n 99 -o json

Problem description & steps to reproduce

I'm instruction llama-bench to output in json so i expect only json as output.
It also adds:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

Which is outside the json format and annoying to parse out. It just shouldn't add that when a format other then md is asked.

First Bad Commit

Irrelevant.

Sidenote though, i intentionally also left in [1] 11372 segmentation fault (core dumped) ./llama.cpp/build/bin/llama-bench -m -fa 1 -n 99 -o json which started happening some days ago. Don't know which commit.

Relevant log output

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
[
  {
    "build_commit": "043fb27d",
    "build_number": 6264,
    "cpu_info": "AMD Ryzen Threadripper 1950X 16-Core Processor",
    "gpu_info": "AMD Radeon RX 580 Series (RADV POLARIS10)",
    "backends": "Vulkan",
    "model_filename": "/mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf",
    "model_type": "qwen3 4B MXFP4 MoE",
    "model_size": 4274448384,
    "model_n_params": 4022468096,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 16,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 99,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": true,
    "tensor_split": "0.00",
    "tensor_buft_overrides": "none",
    "use_mmap": true,
    "embeddings": false,
    "no_op_offload": 0,
    "n_prompt": 512,
    "n_gen": 0,
    "n_depth": 0,
    "test_time": "2025-08-25T01:02:39Z",
    "avg_ns": 1378257653,
    "stddev_ns": 7488379,
    "avg_ts": 371.492279,
    "stddev_ts": 2.016004,
    "samples_ns": [ 1368536333, 1375579069, 1379896477, 1378068835, 1389207552 ],
    "samples_ts": [ 374.122, 372.207, 371.042, 371.534, 368.555 ]
  },
  {
    "build_commit": "043fb27d",
    "build_number": 6264,
    "cpu_info": "AMD Ryzen Threadripper 1950X 16-Core Processor",
    "gpu_info": "AMD Radeon RX 580 Series (RADV POLARIS10)",
    "backends": "Vulkan",
    "model_filename": "/mnt/4TBRaid0/models/Qwen3-4B-Instruct-2507-abliterated.mxfp4.gguf",
    "model_type": "qwen3 4B MXFP4 MoE",
    "model_size": 4274448384,
    "model_n_params": 4022468096,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 16,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 99,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": true,
    "tensor_split": "0.00",
    "tensor_buft_overrides": "none",
    "use_mmap": true,
    "embeddings": false,
    "no_op_offload": 0,
    "n_prompt": 0,
    "n_gen": 99,
    "n_depth": 0,
    "test_time": "2025-08-25T01:02:48Z",
    "avg_ns": 2568050318,
    "stddev_ns": 505588,
    "avg_ts": 38.550648,
    "stddev_ts": 0.007532,
    "samples_ns": [ 2568450490, 2567664647, 2567498539, 2567965832, 2568672085 ],
    "samples_ts": [ 38.5446, 38.5564, 38.5589, 38.5519, 38.5413 ]
  }
]
[1]    11372 segmentation fault (core dumped)  ./llama.cpp/build/bin/llama-bench -m  -fa 1 -n 99 -o json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions