You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description)| yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
21
-
|[whisper](https://github.com/ggerganov/whisper.cpp)| whisper | no | Audio | no | no | N/A |
22
+
|[llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description)| yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
23
+
|[vLLM](https://github.com/vllm-project/vllm)| Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel |
24
+
|[transformers](https://github.com/huggingface/transformers)| Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes*| CUDA 11/12, ROCm, Intel, CPU |
25
+
|[exllama2](https://github.com/turboderp-org/exllamav2)| GPTQ | yes | GPT only | no | no | CUDA 12 |
26
+
|[MLX](https://github.com/ml-explore/mlx-lm)| Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
27
+
|[MLX-VLM](https://github.com/Blaizzy/mlx-vlm)| Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
22
28
|[langchain-huggingface](https://github.com/tmc/langchaingo)| Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
23
-
|[piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
24
-
|[sentencetransformers](https://github.com/UKPLab/sentence-transformers)| BERT | no | Embeddings only | yes | no | N/A |
25
-
|`bark`| bark | no | Audio generation | no | no | yes |
26
-
|`autogptq`| GPTQ | yes | GPT | yes | no | N/A |
27
-
|`diffusers`| SD,... | no | Image generation | no | no | N/A |
28
-
|`vllm`| Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
29
-
|`exllama2`| GPTQ | yes | GPT only | no | no | N/A |
30
-
|`transformers-musicgen`|| no | Audio generation | no | no | N/A |
31
-
| stablediffusion | no | Image | no | no | N/A |
32
-
|`coqui`| Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
33
-
|[rerankers](https://github.com/AnswerDotAI/rerankers)| Reranking API | no | Reranking | no | no | CPU/CUDA |
34
-
|`transformers`| Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes*| CPU/CUDA/XPU |
35
-
|[bark-cpp](https://github.com/PABannier/bark.cpp)| bark | no | Audio-Only | no | no | yes |
36
-
|[stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp)| stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
29
+
{{< /table >}}
30
+
31
+
## Audio & Speech Processing
32
+
33
+
{{< table "table-responsive" >}}
34
+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|[whisper.cpp](https://github.com/ggml-org/whisper.cpp)| whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
37
+
|[faster-whisper](https://github.com/SYSTRAN/faster-whisper)| whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU |
38
+
|[piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU |
39
+
|[bark](https://github.com/suno-ai/bark)| bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel |
40
+
|[bark-cpp](https://github.com/PABannier/bark.cpp)| bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
41
+
|[coqui](https://github.com/idiap/coqui-ai-TTS)| Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU |
42
+
|[kokoro](https://github.com/hexgrad/kokoro)| Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU |
43
+
|[chatterbox](https://github.com/resemble-ai/chatterbox)| Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU |
44
+
|[kitten-tts](https://github.com/KittenML/KittenTTS)| Kitten TTS | no | Text-to-speech | no | no | CPU |
37
45
|[silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go)| Silero VAD | no | Voice Activity Detection | no | no | CPU |
38
46
{{< /table >}}
39
47
48
+
## Image & Video Generation
49
+
50
+
{{< table "table-responsive" >}}
51
+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|[stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp)| stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU |
54
+
|[diffusers](https://github.com/huggingface/diffusers)| SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU |
55
+
|[transformers-musicgen](https://github.com/huggingface/transformers)| MusicGen | no | Audio generation | no | no | CUDA, CPU |
56
+
{{< /table >}}
57
+
58
+
## Specialized AI Tasks
59
+
60
+
{{< table "table-responsive" >}}
61
+
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|[rfdetr](https://github.com/roboflow/rf-detr)| RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU |
64
+
|[rerankers](https://github.com/AnswerDotAI/rerankers)| Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU |
65
+
|[local-store](https://github.com/mudler/LocalAI)| Vector database | no | Vector storage | yes | no | CPU |
66
+
|[huggingface](https://huggingface.co/docs/hub/en/api)| HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
67
+
{{< /table >}}
68
+
69
+
## Acceleration Support Summary
70
+
71
+
### GPU Acceleration
72
+
-**NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends
73
+
-**AMD ROCm**: HIP-based acceleration for AMD GPUs
74
+
-**Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
75
+
-**Vulkan**: Cross-platform GPU acceleration
76
+
-**Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)
77
+
78
+
### Specialized Hardware
79
+
-**NVIDIA Jetson (L4T)**: ARM64 support for embedded AI
80
+
-**Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
81
+
-**Darwin x86**: Intel Mac support
82
+
83
+
### CPU Optimization
84
+
-**AVX/AVX2/AVX512**: Advanced vector extensions for x86
85
+
-**Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
86
+
-**Mixed Precision**: F16/F32 mixed precision support
87
+
40
88
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})).
41
89
42
90
-\* Only for CUDA and OpenVINO CPU/XPU acceleration.
0 commit comments