Skip to content

Commit 1b2fb81

Browse files
committed
chore(docs): update list of supported backends
Signed-off-by: Ettore Di Giacinto <[email protected]>
1 parent be132fe commit 1b2fb81

File tree

3 files changed

+119
-17
lines changed

3 files changed

+119
-17
lines changed

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,60 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
233233
- 🔊 Voice activity detection (Silero-VAD support)
234234
- 🌍 Integrated WebUI!
235235

236+
## 🧩 Supported Backends & Acceleration
237+
238+
LocalAI supports a comprehensive range of AI backends with multiple acceleration options:
239+
240+
### Text Generation & Language Models
241+
| Backend | Description | Acceleration Support |
242+
|---------|-------------|---------------------|
243+
| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
244+
| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12, ROCm, Intel |
245+
| **transformers** | HuggingFace transformers framework | CUDA 11/12, ROCm, Intel, CPU |
246+
| **exllama2** | GPTQ inference library | CUDA 12 |
247+
| **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
248+
| **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
249+
250+
### Audio & Speech Processing
251+
| Backend | Description | Acceleration Support |
252+
|---------|-------------|---------------------|
253+
| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
254+
| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12, ROCm, Intel, CPU |
255+
| **bark** | Text-to-audio generation | CUDA 12, ROCm, Intel |
256+
| **bark-cpp** | C++ implementation of Bark | CUDA, Metal, CPU |
257+
| **coqui** | Advanced TTS with 1100+ languages | CUDA 12, ROCm, Intel, CPU |
258+
| **kokoro** | Lightweight TTS model | CUDA 12, ROCm, Intel, CPU |
259+
| **chatterbox** | Production-grade TTS | CUDA 11/12, CPU |
260+
| **piper** | Fast neural TTS system | CPU |
261+
| **kitten-tts** | Kitten TTS models | CPU |
262+
| **silero-vad** | Voice Activity Detection | CPU |
263+
264+
### Image & Video Generation
265+
| Backend | Description | Acceleration Support |
266+
|---------|-------------|---------------------|
267+
| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12, Intel SYCL, Vulkan, CPU |
268+
| **diffusers** | HuggingFace diffusion models | CUDA 11/12, ROCm, Intel, Metal, CPU |
269+
270+
### Specialized AI Tasks
271+
| Backend | Description | Acceleration Support |
272+
|---------|-------------|---------------------|
273+
| **rfdetr** | Real-time object detection | CUDA 12, Intel, CPU |
274+
| **rerankers** | Document reranking API | CUDA 11/12, ROCm, Intel, CPU |
275+
| **local-store** | Vector database | CPU |
276+
| **huggingface** | HuggingFace API integration | API-based |
277+
278+
### Hardware Acceleration Matrix
279+
280+
| Acceleration Type | Supported Backends | Hardware Support |
281+
|-------------------|-------------------|------------------|
282+
| **NVIDIA CUDA 11** | llama.cpp, whisper, stablediffusion, diffusers, rerankers, bark, chatterbox | Nvidia hardware |
283+
| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware |
284+
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark | AMD Graphics |
285+
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark | Intel Arc, Intel iGPUs |
286+
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cpp | Apple M1/M2/M3+ |
287+
| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
288+
| **NVIDIA Jetson** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI |
289+
| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |
236290

237291
### 🔗 Community and integrations
238292

backend/index.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@
147147
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-vlm"
148148
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
149149
urls:
150-
- https://github.com/ml-explore/mlx-vlm
150+
- https://github.com/Blaizzy/mlx-vlm
151151
mirrors:
152152
- localai/localai-backends:latest-metal-darwin-arm64-mlx-vlm
153153
license: MIT

docs/content/docs/reference/compatibility-table.md

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,29 +14,77 @@ LocalAI will attempt to automatically load models which are not explicitly confi
1414

1515
{{% /alert %}}
1616

17+
## Text Generation & Language Models
18+
1719
{{< table "table-responsive" >}}
1820
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
1921
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
20-
| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
21-
| [whisper](https://github.com/ggerganov/whisper.cpp) | whisper | no | Audio | no | no | N/A |
22+
| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
23+
| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel |
24+
| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12, ROCm, Intel, CPU |
25+
| [exllama2](https://github.com/turboderp-org/exllamav2) | GPTQ | yes | GPT only | no | no | CUDA 12 |
26+
| [MLX](https://github.com/ml-explore/mlx-lm) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
27+
| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
2228
| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
23-
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
24-
| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
25-
| `bark` | bark | no | Audio generation | no | no | yes |
26-
| `autogptq` | GPTQ | yes | GPT | yes | no | N/A |
27-
| `diffusers` | SD,... | no | Image generation | no | no | N/A |
28-
| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
29-
| `exllama2` | GPTQ | yes | GPT only | no | no | N/A |
30-
| `transformers-musicgen` | | no | Audio generation | no | no | N/A |
31-
| stablediffusion | no | Image | no | no | N/A |
32-
| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
33-
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CPU/CUDA |
34-
| `transformers` | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CPU/CUDA/XPU |
35-
| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | yes |
36-
| [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
29+
{{< /table >}}
30+
31+
## Audio & Speech Processing
32+
33+
{{< table "table-responsive" >}}
34+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
35+
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
36+
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
37+
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU |
38+
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU |
39+
| [bark](https://github.com/suno-ai/bark) | bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel |
40+
| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
41+
| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU |
42+
| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU |
43+
| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU |
44+
| [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS | no | Text-to-speech | no | no | CPU |
3745
| [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU |
3846
{{< /table >}}
3947

48+
## Image & Video Generation
49+
50+
{{< table "table-responsive" >}}
51+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
52+
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
53+
| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU |
54+
| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU |
55+
| [transformers-musicgen](https://github.com/huggingface/transformers) | MusicGen | no | Audio generation | no | no | CUDA, CPU |
56+
{{< /table >}}
57+
58+
## Specialized AI Tasks
59+
60+
{{< table "table-responsive" >}}
61+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
62+
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
63+
| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU |
64+
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU |
65+
| [local-store](https://github.com/mudler/LocalAI) | Vector database | no | Vector storage | yes | no | CPU |
66+
| [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
67+
{{< /table >}}
68+
69+
## Acceleration Support Summary
70+
71+
### GPU Acceleration
72+
- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends
73+
- **AMD ROCm**: HIP-based acceleration for AMD GPUs
74+
- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
75+
- **Vulkan**: Cross-platform GPU acceleration
76+
- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)
77+
78+
### Specialized Hardware
79+
- **NVIDIA Jetson (L4T)**: ARM64 support for embedded AI
80+
- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
81+
- **Darwin x86**: Intel Mac support
82+
83+
### CPU Optimization
84+
- **AVX/AVX2/AVX512**: Advanced vector extensions for x86
85+
- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
86+
- **Mixed Precision**: F16/F32 mixed precision support
87+
4088
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})).
4189

4290
- \* Only for CUDA and OpenVINO CPU/XPU acceleration.

0 commit comments

Comments
 (0)