Can't compute multiple embeddings in a single call

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Running this code:

```python
model = llama_cpp.Llama ("mxbai-embed-xsmall-v1-q8_0.gguf", embedding = True)
embeddings = model.embed (["Hello", "World"])
```

used to work in v0.3.14

# Current Behavior

The code raises an exception `RuntimeError: llama_decode returned -1`. The following messages are printed to the console:

```
init: invalid seq_id[3][0] = 1 >= 1
encode: failed to initialize batch
```

# Environment and Context

llama-cpp-python was compiled in CUDA mode

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

# Steps to Reproduce

```python
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import llama_cpp
>>> model = llama_cpp.Llama ("../models/mxbai-embed-xsmall-v1-q8_0.gguf", embedding = True)
...
>>> embeddings = model.embed (["Hello", "World"])
decode: cannot decode batches with this context (calling encode() instead)
init: invalid seq_id[3][0] = 1 >= 1
encode: failed to initialize batch
llama_decode: failed to decode, ret = -1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/llama_cpp/llama.py", line 1108, in embed
    decode_batch(s_batch)
  File ".../site-packages/llama_cpp/llama.py", line 1045, in decode_batch
    self._ctx.decode(self._batch)
  File ".../site-packages/llama_cpp/_internals.py", line 327, in decode
    raise RuntimeError(f"llama_decode returned {return_code}")
RuntimeError: llama_decode returned -1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't compute multiple embeddings in a single call #2051

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can't compute multiple embeddings in a single call #2051

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions