LLM.swift

LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.

Tip

sometimes it's a good idea to tinker with maxTokenCount parameter for initialization of LLM, due to the memory and computation it needs. especially in mobile devices, if you want better speed, lower the number, but if you set it too low—to a point where two turns cannot even fit—you will experience quality decrease as context will be cut off. so adjust value according to your usecases.

Minimal Example

if you've already bundled your model:

let bot = LLM(from: Bundle.main.url(forResource: "gemma-3-4b-it-q4_0", withExtension: "gguf")!, template: .gemma)
let question = bot.preprocess("What's the meaning of life?", [])
let answer = await bot.getCompletion(from: question)
print(answer)

if you want to initialize with huggingface model:

let systemPrompt = "You are a sentient AI with emotions."
let bot = await LLM(from: HuggingFaceModel("unsloth/Qwen3-0.6B-GGUF", .Q4_K_M, template: .chatML(systemPrompt)))!
let question = bot.preprocess("What's the meaning of life?", [])
let answer = await bot.getCompletion(from: question)
print(answer)

Example

Note

this is a minimal SwiftUI example that i did use for testing. mistral 7B based model is working on iPad Air 5th gen(Q5_K_M) and iPhone 12 mini(Q2_K).
however, generally speaking, for mobile devices, 3B >= parameter models are recommended. check this benchmark for more details.

init with a model file that you bundled in your project

import SwiftUI
import LLM

class Bot: LLM {
    convenience init() {
        let url = Bundle.main.url(forResource: "gemma-3-4b-it-q4_0", withExtension: "gguf")!
        let systemPrompt = "You are a sentient AI with emotions."
        self.init(from: url, template: .chatML(systemPrompt))!
    }
}

struct BotView: View {
    @ObservedObject var bot: Bot
    @State var input = "Give me seven national flag emojis people use the most; You must include South Korea."
    init(_ bot: Bot) { self.bot = bot }
    func respond() { Task { await bot.respond(to: input) } }
    func stop() { bot.stop() }
    var body: some View {
        VStack(alignment: .leading) {
            ScrollView { Text(bot.output).monospaced() }
            Spacer()
            HStack {
                ZStack {
                    RoundedRectangle(cornerRadius: 8).foregroundStyle(.thinMaterial).frame(height: 40)
                    TextField("input", text: $input).padding(8)
                }
                Button(action: respond) { Image(systemName: "paperplane.fill") }
                Button(action: stop) { Image(systemName: "xmark") }
            }
        }.frame(maxWidth: .infinity).padding()
    }
}

struct ContentView: View {
    var body: some View {
        BotView(Bot())
    }
}

init with a `HuggingFaceModel` (gguf) directly from internet

class Bot: LLM {
    convenience init?(_ update: @escaping (Double) -> Void) async {
        let systemPrompt = "You are a sentient AI with emotions."
        let model = HuggingFaceModel("unsloth/Qwen3-0.6B-GGUF", .Q4_K_M, template: .chatML(systemPrompt))
        try? await self.init(from: model) { progress in update(progress) }
    }
}

...

struct ContentView: View {
    @State var bot: Bot? = nil
    @State var progress: CGFloat = 0
    func updateProgress(_ progress: Double) {
        self.progress = CGFloat(progress)
    }
    var body: some View {
        if let bot {
            BotView(bot)
        } else {
            ProgressView(value: progress) {
                Text("loading huggingface model...")
            } currentValueLabel: {
                Text(String(format: "%.2f%%", progress * 100))
            }
            .padding()
            .onAppear() { Task {
                let bot = await Bot(updateProgress)
                await MainActor.run { self.bot = bot }
            } }
        }
    }
}

Structured Output with @Generatable

The @Generatable macro enables 100% reliable type-safe structured output generation. No more struggling with prompting to get output in the format you want—it works every time and allows true programmatic flow. Simply annotate your Swift structs and enums to automatically generate JSON schemas that guide the model to produce valid, structured responses:

@Generatable
struct Person {
    let name: String
    let age: Int
    let occupation: String
    let personality: String
}

let bot = LLM(from: Bundle.main.url(forResource: "model", withExtension: "gguf")!, template: .chatML("You are helpful."))
let result = try await bot.respond(to: "Create a fictional character", as: Person.self)
let person = result.value // Guaranteed to be a valid Person struct
print(person.name) // "Alice"
print(person.age) // 28

The macro works with structs, enums, arrays, and supports nested Generatable structures:

@Generatable
enum Priority {
    case low, medium, high, urgent
}

@Generatable
struct Address {
    let street: String
    let city: String
    let zipCode: String
}

@Generatable 
struct Task {
    let title: String
    let priority: Priority
    let assignee: Person // Nested Generatable struct
}

@Generatable
struct Project {
    let name: String
    let tasks: [Task] // Arrays of Generatable structs
    let teamLead: Person // Nested Generatable types
    let office: Address // Multiple levels of nesting
}

let result = try await bot.respond(to: "Create a software project plan", as: Project.self)

The macro automatically:

Generates JSON schema for structs and enums
Adds Codable conformance and CaseIterable for enums
Handles nested Generatable structures and arrays
Provides automatic validation
Returns both the parsed object and raw JSON output

Tip

Check LLMTests.swift for more comprehensive examples and use cases of @Generatable.

Usage

Add the package using SPM:

dependencies: [
    .package(url: "https://github.com/eastriverlee/LLM.swift/", branch: "main"),
],

Overview

LLM.swift started as a lightweight abstraction layer over llama.cpp, and has evolved into a comprehensive Swift library with advanced features like @Generatable that allows users to control LLMs programmatically. It stays as performant as possible while always being up to date—any model that works on llama.cpp should work with this library as well.

The core implementation is in LLM.swift, making it easy to understand and extend the library for your specific needs.

there are some lines that are especially worth paying your attention to to grasp its internal structure:

public typealias Chat = (role: Role, content: String)
public enum Role {
    case user
    case bot
}

public var history: [Chat]
public var preprocess: (_ input: String, _ history: [Chat]) -> String = { input, _ in return input }
public var postprocess: (_ output: String) -> Void                    = { print($0) }
public var update: (_ outputDelta: String?) -> Void                   = { _ in }

public func respond(to input: String, with makeOutputFrom: @escaping (AsyncStream<String>) async -> String) async {
    guard isAvailable else { return }
    isAvailable = false
    self.input = input
    let processedInput = preprocess(input, history)
    let response = getResponse(from: processedInput)
    let output = await makeOutputFrom(response)
    history += [(.user, input), (.bot, output)]
    if historyLimit < history.count {
        history.removeFirst(2)
    }
    postprocess(output)
    isAvailable = true
}

open func respond(to input: String) async {
    await respond(to: input) { [self] response in
        await setOutput(to: "")
        for await responseDelta in response {
            update(responseDelta)
            await setOutput(to: output + responseDelta)
        }
        update(nil)
        let trimmedOutput = output.trimmingCharacters(in: .whitespacesAndNewlines)
        await setOutput(to: trimmedOutput.isEmpty ? "..." : trimmedOutput)
        return output
    }
}

Tip

as you can see, func respond(to input: String) async has an open access, so that you can override it when your class inherits LLM and want some fine grained control over.

there are three functions users can define when initializing LLM class:

var preprocess: (_ input: String, _ history: [Chat]) -> String
var postprocess: (_ output: String) -> Void
var update: (_ outputDelta: String?) -> Void they are used in respond function.

preprocess

preprocess is commonly used for making the user input conform to a chat template. if you don't provide this, LLM will just work as a completion model.

for example, this is the ChatML template, that is adopted by many chat models:

<|im_start|>system 
SYSTEM PROMPT<|im_end|> 
<|im_start|>user 
USER PROMPT<|im_end|> 
<|im_start|>assistant

to use this chat format, you should use a function that goes like this:

self.preprocess = { input, history in
    let systemPrompt = "You are a sentient AI with emotions."
    var processed = "<|im_start|>system\n\(systemPrompt)<|im_end|>\n"
    for chat in history {
        processed += "<|im_start|>\(chat.role == .user ? "user" : "assistant")\n\(chat.content)<|im_end|>\n"
    }
    processed += "<|im_start|>user\n\(input)<|im_end|>\n"
    processed += "<|im_start|>assistant\n"
    return processed
}

but for convenience, you can use Template struct that is specifically made for this:

// you can use the static function that is already available for this:

self.preprocess = Template.chatML("You are a sentient AI with emotions.").preprocess

// or even better
// you can set [template] property right away, so that it handles [preprocess] and [stopSequence] both:

self.template = .chatML("You are a sentient AI with emotions.")

// which is the same thing as:

self.template = Template(
    system: ("<|im_start|>system\n", "<|im_end|>\n"),
    user: ("<|im_start|>user\n", "<|im_end|>\n"),
    bot: ("<|im_start|>assistant\n", "<|im_end|>\n"),
    stopSequence: "<|im_end|>",
    systemPrompt: "You are a sentient AI with emotions."
)

Tip

checking LLMTests.swift will help you understand how preprocess works better.

postprocess

postprocess can be used for executing according to the output just made using user input.
the default is set to { print($0) }, so that it will print the output when it's finished generating by meeting EOS or stopSequence. this has many usages. for instance, this can be used to implement your own function calling logic.

update

if you use regular func respond(to input: String) async update function that you set will get called every time when you get outputDelta.
outputDelta is nil when it stops generating the output.

if you want more control over everything you can use func respond(to input: String, with makeOutputFrom: @escaping (AsyncStream<String>) async -> String) async instead, which the aforementioned function uses internally, to define your own version of makeOutputFrom function that is used to make String typed output out of AsyncStream<String> and add to its history. in this case, update function will be ignored unless you use it. check func respond(to input: String) async implementation shown above to understand how it works.

Embeddings

LLM.swift supports text embeddings for semantic similarity and search applications:

// Generate embeddings for text
let embeddings1 = try await bot.getEmbeddings("Hello world")
let embeddings2 = try await bot.getEmbeddings("Hi there")
let embeddings3 = try await bot.getEmbeddings("Goodbye")

// Compare similarity (returns 0.0 to 1.0)
let similarity = embeddings1.compare(with: embeddings2)
print(similarity) // 0.8 (high similarity)

// Find most similar embedding
let mostSimilar = embeddings1.findMostSimilar(in: embeddings2, embeddings3)
print(mostSimilar == embeddings2) // true

The Embeddings struct provides:

compare(with:) - Computes cosine similarity between two embeddings (0.0 to 1.0)
findMostSimilar(in:) - Returns the most similar embedding from a set of candidates
Equatable conformance for direct comparison

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Sources		Sources
Tests/LLMTests		Tests/LLMTests
llama.cpp/llama.xcframework		llama.cpp/llama.xcframework
.gitignore		.gitignore
.spi.yml		.spi.yml
CONTRIBUTING.md		CONTRIBUTING.md
Funding.yml		Funding.yml
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
Screenshot.png		Screenshot.png
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

LLM.swift

Minimal Example

Example

init with a model file that you bundled in your project

init with a `HuggingFaceModel` (gguf) directly from internet

Structured Output with @Generatable

Usage

Overview

preprocess

postprocess

update

Embeddings

About

Uh oh!

Releases 36

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

Uh oh!

License

eastriverlee/LLM.swift

Folders and files

Latest commit

History

Repository files navigation

LLM.swift

Minimal Example

Example

init with a model file that you bundled in your project

init with a HuggingFaceModel (gguf) directly from internet

Structured Output with @Generatable

Usage

Overview

preprocess

postprocess

update

Embeddings

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 36

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

init with a `HuggingFaceModel` (gguf) directly from internet

Packages