Skip to content

Can't compute multiple embeddings in a single call #2051

@jeberger

Description

@jeberger

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Running this code:

model = llama_cpp.Llama ("mxbai-embed-xsmall-v1-q8_0.gguf", embedding = True)
embeddings = model.embed (["Hello", "World"])

used to work in v0.3.14

Current Behavior

The code raises an exception RuntimeError: llama_decode returned -1. The following messages are printed to the console:

init: invalid seq_id[3][0] = 1 >= 1
encode: failed to initialize batch

Environment and Context

llama-cpp-python was compiled in CUDA mode

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import llama_cpp
>>> model = llama_cpp.Llama ("../models/mxbai-embed-xsmall-v1-q8_0.gguf", embedding = True)
...
>>> embeddings = model.embed (["Hello", "World"])
decode: cannot decode batches with this context (calling encode() instead)
init: invalid seq_id[3][0] = 1 >= 1
encode: failed to initialize batch
llama_decode: failed to decode, ret = -1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/llama_cpp/llama.py", line 1108, in embed
    decode_batch(s_batch)
  File ".../site-packages/llama_cpp/llama.py", line 1045, in decode_batch
    self._ctx.decode(self._batch)
  File ".../site-packages/llama_cpp/_internals.py", line 327, in decode
    raise RuntimeError(f"llama_decode returned {return_code}")
RuntimeError: llama_decode returned -1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions