Skip to content

Commit 81c2e1a

Browse files
docs: Add batch inference use case (#5116)
## Changes Made Add batch inference use case
1 parent f1505a1 commit 81c2e1a

File tree

3 files changed

+109
-4
lines changed

3 files changed

+109
-4
lines changed

docs/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
* [Overview](index.md)
33
* [Quickstart](quickstart.md)
44
* [Installation](install.md)
5+
* Common Use Cases
6+
* [Batch Inference](use-case/batch-inference.md)
57
* Working with Modalities
68
* [Overview](modalities/index.md)
79
* [Custom Modalities](modalities/custom.md)

docs/connectors/custom.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ You can also take a look at actual code references on how we implemented:
1919

2020
### Step 1: Implement the `DataSource` and `DataSourceTask` Interfaces
2121

22-
Create a class that inherits from the [`DataSource` class](../api/io.md#daft.io.source.DataSource) and [`DataSourceTask` class](../api/io.md#daft.io.source.DataSourceTask), and implements the required methods. Here's a simple example doing this with a custom local file reader that reads each line from a file as a single String row.
22+
Create a class that inherits from [`DataSource`](../api/io.md#daft.io.source.DataSource), a class that inherits from [`DataSourceTask`](../api/io.md#daft.io.source.DataSourceTask), and implement the required methods. Here's a simple example doing this with a custom local file reader that reads each line from a file as a single String row.
2323

2424
=== "🐍 Python"
2525
```python
@@ -163,7 +163,7 @@ data_source = TextFileDataSource([sample_file])
163163

164164
### Step 1: Implement the `DataSink` Interface
165165

166-
Create a class that inherits the [`DataSink` class](../api/io.md#daft.io.sink.DataSink) and implements the required methods. Here's a simple example doing this with a custom local file writer.
166+
Create a class that inherits from [`DataSink`](../api/io.md#daft.io.sink.DataSink) and implements the required methods. Here's a simple example doing this with a custom local file writer.
167167

168168
=== "🐍 Python"
169169
```python
@@ -186,7 +186,7 @@ class LocalFileDataSink(DataSink[dict]):
186186
self,
187187
output_dir: str | Path,
188188
filename_prefix: str = "data",
189-
max_rows_per_file: int = 100
189+
max_rows_per_file: int = 10
190190
):
191191
"""Initialize the local file data sink.
192192
@@ -332,7 +332,7 @@ data = {
332332
local_file_data_sink = LocalFileDataSink(
333333
output_dir="./output_folder",
334334
filename_prefix="users",
335-
max_rows_per_file=100
335+
max_rows_per_file=10
336336
)
337337

338338
(

docs/use-case/batch-inference.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Batch Inference
2+
3+
Run prompts, embeddings, and model scoring over large datasets, then stream the results to durable storage. Daft is a reliable engine to express batch inference pipelines and scale them from your laptop to a distributed cluster.
4+
5+
## When to use Daft for batch inference
6+
7+
- **You need to run models over your data:** Express inference on a column (e.g., [`llm_generate`](#example-text-generation-with-openai), [`embed_text`](../modalities/text.md#how-to-use-the-embed_text-function), [`embed_image`](../api/ai.md)) and let Daft handle batching, concurrency, and backpressure.
8+
- **You have data that are large objects in cloud storage:** Daft has [record-setting](https://www.daft.ai/blog/announcing-daft-02) performance when reading and writing from S3, and provides flexible APIs for working with [URLs and Files](../modalities/urls.md).
9+
- **You're working with multimodal data:** Daft supports datatypes like [images](../modalities/images.md) and [video](../modalities/videos.md), and supports the ability to define [custom data sources and sinks](../connectors/custom.md) and [custom functions over this data](../custom-code/udfs.md).
10+
- **You want end-to-end pipelines where data sizes expand and shrink:** For example, downloading images from URLs, decoding them, then embedding them; [Daft streams across stages to keep memory well-behaved](https://www.daft.ai/blog/processing-300k-images-without-oom).
11+
12+
If you’re new to Daft, see the [quickstart](../quickstart.md) first. For distributed execution, see our docs on [Scaling Out and Deployment](../distributed.md).
13+
14+
## Core idea
15+
16+
Daft provides first-class APIs for model inference. Under the hood, Daft pipelines data operations so that reading, inference, and writing overlap automatically, and is optimized for throughput.
17+
18+
## Example: Text generation with OpenAI
19+
20+
=== "🐍 Python"
21+
```python
22+
import daft
23+
from daft.functions import llm_generate
24+
25+
(
26+
daft.read_huggingface("fka/awesome-chatgpt-prompts")
27+
.with_column( # Generate model outputs in a new column
28+
"output",
29+
llm_generate(
30+
daft.col("prompt"),
31+
model="gpt-4o", # Any chat/completions-capable model
32+
provider="openai", # Switch providers by changing this; e.g. to "vllm"
33+
api_key="...", # Pass via environment variable or secret manager
34+
temperature=0.2,
35+
max_tokens=256,
36+
),
37+
)
38+
.write_parquet("output.parquet/", write_mode="overwrite") # Write to Parquet as the pipeline runs
39+
)
40+
```
41+
42+
What this does:
43+
44+
- Uses [`llm_generate()`](../../api/functions/llm_generate) to express inference.
45+
- Streams rows through OpenAI concurrently while reading from Hugging Face and writing to Parquet.
46+
- Requires no explicit async, batching, rate limiting, or retry code in your script.
47+
48+
## Example: Local text embedding with LM Studio
49+
50+
=== "🐍 Python"
51+
```python
52+
import daft
53+
from daft.ai.provider import load_provider
54+
from daft.functions.ai import embed_text
55+
56+
provider = load_provider("lm_studio")
57+
model = "text-embedding-nomic-embed-text-v1.5"
58+
59+
(
60+
daft.read_huggingface("Open-Orca/OpenOrca")
61+
.with_column("embedding", embed_text(daft.col("response"), provider=provider, model=model))
62+
.show()
63+
)
64+
```
65+
66+
Notes:
67+
68+
- [LM Studio](https://lmstudio.ai/) is a local AI model platform that lets you run Large Language Models like Qwen, Mistral, Gemma, or gpt-oss on your own machine. By using Daft with LM Studio, you can perform inference with any model locally, and utilize accelerators like [Apple's Metal Performance Shaders (MPS)](https://developer.apple.com/documentation/metalperformanceshaders).
69+
70+
## Scaling out on Ray
71+
72+
Turn on distributed execution with a single line; then run the same script on a Ray cluster.
73+
74+
```python
75+
import daft
76+
daft.context.set_runner_ray() # Enable Daft's distributed runner
77+
```
78+
79+
Daft partitions the data, schedules remote execution, and orchestrates your workload across the cluster-no pipeline rewrites.
80+
81+
## Patterns that work well
82+
83+
- **Read → Preprocess → Infer → Write**: Daft parallelizes and pipelines automatically to maximize throughput and resource utilization.
84+
- **Provider-agnostic pipelines**: Switch between OpenAI and local LLMs by changing a single parameter.
85+
86+
## Case Studies
87+
88+
For inspiration and real-world scale:
89+
90+
- [Processing 24 trillion tokens with 0 crashes—How Essential AI built Essential-Web v1.0 with Daft](https://www.daft.ai/blog/how-essential-ai-built-essential-web-v1-with-daft)
91+
- [Processing 300K Images Without OOMs](https://www.daft.ai/blog/processing-300k-images-without-oom)
92+
- [Embedding millions of text documents with Qwen3, achieving near 100% GPU utilization](https://www.daft.ai/blog/embedding-millions-of-text-documents-with-qwen3)
93+
94+
## Next Steps
95+
96+
Ready to explore Daft further? Check out these topics:
97+
98+
- [AI functions](../api/ai.md)
99+
- Reading from and writing to common data sources:
100+
- [S3](../connectors/aws.md)
101+
- [Hugging Face 🤗](../connectors/huggingface.md)
102+
- [Turbopuffer](../connectors/turbopuffer.md)
103+
- [Scaling out and deployment](../distributed.md)

0 commit comments

Comments
 (0)