Eval bug: llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed

### Name and Version

G:\llama_cpp> llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from G:\llama_cpp\ggml-cuda.dll
load_backend: loaded RPC backend from G:\llama_cpp\ggml-rpc.dll
load_backend: loaded CPU backend from G:\llama_cpp\ggml-cpu-haswell.dll
version: 6387 (4fd1242b)
built with clang version 19.1.5 for x86_64-pc-windows-msvc


### Operating systems

Windows

### GGML backends

CUDA

### Hardware

NVIDIA GeForce RTX 3060


### Models

Voxtral-Mini-3B-2507-GGUF:Q4_K_M and mistralai_Voxtral-Small-24B-2507-GGUF

### Problem description & steps to reproduce

Voxtral-Mini-3B-2507-GGUF works via llama-mtmd-cli but fails via llama-server.

This command works (taken from the tests): `llama-mtmd-cli -hf ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M --image test-2.mp3 -p "what is the publisher name of the newspaper?" --temp 0 -n 128`

API request to the llama-server fails with: `server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed` message. Full log below.

Similar bug reports there https://github.com/ggml-org/llama.cpp/issues/13433


### First Bad Commit

_No response_

### Relevant log output

```shell
srv  log_server_r: request: GET /props 192.168.1.1 200
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 190
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 0.015789
slot update_slots: id  0 | task 0 | kv cache rm [3, end)
srv  process_chun: processing audio...
encoding audio slice...
audio slice encoded in 5542 ms
decoding audio batch 1/1, n_tokens_batch = 187
audio decoded (batch 1/1) in 15 ms
srv  process_chun: audio processed in 5560 ms
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 190, n_tokens = 0, progress = 1.000000
D:/a/llama.cpp/llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed #15812

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: llama.cpp/tools/server/server.cpp:3562: GGML_ASSERT(batch.n_tokens > 0) failed #15812

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions