Add method to reset model KV cache (llama_reset)

### Feature Request

I’m using LLM.swift for on-device inference in a macOS app. I noticed that there is currently no exposed way to reset the model context or KV cache between runs without reinitializing the entire model. This becomes an issue in apps that perform multiple inferences in sequence (e.g., analyzing transcript segments or handling multiple short prompts in a session).

### Problem

Calling `.predict(...)` multiple times on the same `LLMModel` instance can cause:

- Errors like:  
- decode: failed to find KV cache slot for ubatch of size 512
- llama_decode: failed to decode, ret = 1
- Memory buildup or inference inconsistencies over time.
- Reinitializing the model every time is inefficient and causes performance degradation, especially on smaller prompts.

### Proposed Solution

Expose a method on `LLMModel` like this:

```swift
public func resetContext() {
  llama_reset(context)
}


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add method to reset model KV cache (llama_reset) #50

Feature Request

Problem

Proposed Solution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Add method to reset model KV cache (llama_reset) #50

Description

Feature Request

Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions