Skip to content

Add method to reset model KV cache (llama_reset) #50

@shashankb-cc

Description

@shashankb-cc

Feature Request

I’m using LLM.swift for on-device inference in a macOS app. I noticed that there is currently no exposed way to reset the model context or KV cache between runs without reinitializing the entire model. This becomes an issue in apps that perform multiple inferences in sequence (e.g., analyzing transcript segments or handling multiple short prompts in a session).

Problem

Calling .predict(...) multiple times on the same LLMModel instance can cause:

  • Errors like:
  • decode: failed to find KV cache slot for ubatch of size 512
  • llama_decode: failed to decode, ret = 1
  • Memory buildup or inference inconsistencies over time.
  • Reinitializing the model every time is inefficient and causes performance degradation, especially on smaller prompts.

Proposed Solution

Expose a method on LLMModel like this:

public func resetContext() {
  llama_reset(context)
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions