Skip to content

Eval bug: llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed #15812

@vt-alt

Description

@vt-alt

Name and Version

G:\llama_cpp> llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from G:\llama_cpp\ggml-cuda.dll
load_backend: loaded RPC backend from G:\llama_cpp\ggml-rpc.dll
load_backend: loaded CPU backend from G:\llama_cpp\ggml-cpu-haswell.dll
version: 6387 (4fd1242)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

Operating systems

Windows

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 3060

Models

Voxtral-Mini-3B-2507-GGUF:Q4_K_M and mistralai_Voxtral-Small-24B-2507-GGUF

Problem description & steps to reproduce

Voxtral-Mini-3B-2507-GGUF works via llama-mtmd-cli but fails via llama-server.

This command works (taken from the tests): llama-mtmd-cli -hf ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M --image test-2.mp3 -p "what is the publisher name of the newspaper?" --temp 0 -n 128

API request to the llama-server fails with: server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed message. Full log below.

Similar bug reports there #13433

First Bad Commit

No response

Relevant log output

srv  log_server_r: request: GET /props 192.168.1.1 200
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 190
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 0.015789
slot update_slots: id  0 | task 0 | kv cache rm [3, end)
srv  process_chun: processing audio...
encoding audio slice...
audio slice encoded in 5542 ms
decoding audio batch 1/1, n_tokens_batch = 187
audio decoded (batch 1/1) in 15 ms
srv  process_chun: audio processed in 5560 ms
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 190, n_tokens = 0, progress = 1.000000
D:/a/llama.cpp/llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions