|
| 1 | +# Multimodal |
| 2 | + |
| 3 | +llama.cpp supports multimodal input via `libmtmd`. Currently, there are 2 tools support this feature: |
| 4 | +- [llama-mtmd-cli](../tools/mtmd/README.md) |
| 5 | +- [llama-server](../tools/server/README.md) via OpenAI-compatible `/chat/completions` API |
| 6 | + |
| 7 | +To enable it, can use use one of the 2 methods below: |
| 8 | + |
| 9 | +- Use `-hf` option with a [supported model](../../docs/multimodal.md) |
| 10 | + - To load a model using `-hf` while disabling multimodal, use `--no-mmproj` |
| 11 | + - To load a model using `-hf` while using a custom mmproj file, use `--mmproj local_file.gguf` |
| 12 | +- Use `-m model.gguf` option with `--mmproj file.gguf` to specify text and multimodal projector respectively |
| 13 | + |
| 14 | +By default, multimodal projector will be offloaded to GPU. To disable this, add `--no-mmproj-offload` |
| 15 | + |
| 16 | +For example: |
| 17 | + |
| 18 | +```sh |
| 19 | +# simple usage with CLI |
| 20 | +llama-mtmd-cli -hf ggml-org/gemma-3-4b-it-GGUF |
| 21 | + |
| 22 | +# simple usage with server |
| 23 | +llama-server -hf ggml-org/gemma-3-4b-it-GGUF |
| 24 | + |
| 25 | +# using local file |
| 26 | +llama-server -m gemma-3-4b-it-Q4_K_M.gguf --mmproj mmproj-gemma-3-4b-it-Q4_K_M.gguf |
| 27 | + |
| 28 | +# no GPU offload |
| 29 | +llama-server -hf ggml-org/gemma-3-4b-it-GGUF --no-mmproj-offload |
| 30 | +``` |
| 31 | + |
| 32 | +## Pre-quantized models |
| 33 | + |
| 34 | +These are ready-to-use models, most of them come with `Q4_K_M` quantization by default. |
| 35 | + |
| 36 | +Replaces the `(tool_name)` with the name of binary you want to use. For example, `llama-mtmd-cli` or `llama-server` |
| 37 | + |
| 38 | +NOTE: some models may require large context window, for example: `-c 8192` |
| 39 | + |
| 40 | +```sh |
| 41 | +# Gemma 3 |
| 42 | +(tool_name) -hf ggml-org/gemma-3-4b-it-GGUF |
| 43 | +(tool_name) -hf ggml-org/gemma-3-12b-it-GGUF |
| 44 | +(tool_name) -hf ggml-org/gemma-3-27b-it-GGUF |
| 45 | + |
| 46 | +# SmolVLM |
| 47 | +(tool_name) -hf ggml-org/SmolVLM-Instruct-GGUF |
| 48 | +(tool_name) -hf ggml-org/SmolVLM-256M-Instruct-GGUF |
| 49 | +(tool_name) -hf ggml-org/SmolVLM-500M-Instruct-GGUF |
| 50 | +(tool_name) -hf ggml-org/SmolVLM2-2.2B-Instruct-GGUF |
| 51 | +(tool_name) -hf ggml-org/SmolVLM2-256M-Video-Instruct-GGUF |
| 52 | +(tool_name) -hf ggml-org/SmolVLM2-500M-Video-Instruct-GGUF |
| 53 | + |
| 54 | +# Pixtral 12B |
| 55 | +(tool_name) -hf ggml-org/pixtral-12b-GGUF |
| 56 | + |
| 57 | +# Qwen 2 VL |
| 58 | +(tool_name) -hf ggml-org/Qwen2-VL-2B-Instruct-GGUF |
| 59 | +(tool_name) -hf ggml-org/Qwen2-VL-7B-Instruct-GGUF |
| 60 | + |
| 61 | +# Qwen 2.5 VL |
| 62 | +(tool_name) -hf ggml-org/Qwen2.5-VL-3B-Instruct-GGUF |
| 63 | +(tool_name) -hf ggml-org/Qwen2.5-VL-7B-Instruct-GGUF |
| 64 | +(tool_name) -hf ggml-org/Qwen2.5-VL-32B-Instruct-GGUF |
| 65 | +(tool_name) -hf ggml-org/Qwen2.5-VL-72B-Instruct-GGUF |
| 66 | + |
| 67 | +# Mistral Small 3.1 24B (IQ2_M quantization) |
| 68 | +(tool_name) -hf ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF |
| 69 | +``` |
0 commit comments