You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: add Dockerfiles for each platform that user ./server instead of ./main
* feat: update .github/workflows/docker.yml to build server-first docker containers
* doc: add information about running the server with Docker to README.md
* doc: add information about running with docker to the server README
* doc: update n-gpu-layers to show correct GPU usage
* fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA
Copy file name to clipboardExpand all lines: README.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -931,17 +931,20 @@ Place your desired model into the `~/llama.cpp/models/` directory and execute th
931
931
*Create a folder to store big models & intermediate files (ex. /llama/models)
932
932
933
933
#### Images
934
-
We have twoDocker images available forthis project:
934
+
We have threeDocker images available forthis project:
935
935
936
936
1. `ghcr.io/ggerganov/llama.cpp:full`:This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
937
937
2. `ghcr.io/ggerganov/llama.cpp:light`:This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
938
+
3. `ghcr.io/ggerganov/llama.cpp:server`:This image only includes the server executabhle file. (platforms: `linux/amd64`, `linux/arm64`)
938
939
939
940
Additionally, there the following images, similar to the above:
940
941
941
942
- `ghcr.io/ggerganov/llama.cpp:full-cuda`:Same as `full` but compiled with CUDA support. (platforms: `linux/amd64`)
942
943
- `ghcr.io/ggerganov/llama.cpp:light-cuda`:Same as `light` but compiled with CUDA support. (platforms: `linux/amd64`)
944
+
- `ghcr.io/ggerganov/llama.cpp:server-cuda`:Same as `server` but compiled with CUDA support. (platforms: `linux/amd64`)
943
945
- `ghcr.io/ggerganov/llama.cpp:full-rocm`:Same as `full` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`)
944
946
- `ghcr.io/ggerganov/llama.cpp:light-rocm`:Same as `light` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`)
947
+
- `ghcr.io/ggerganov/llama.cpp:server-rocm`:Same as `server` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`)
945
948
946
949
TheGPU enabled images are not currently tested by CI beyond being built. They are not built with any variation from the ones in the Dockerfiles defined in [.devops/](.devops/) and the GitHubAction defined in [.github/workflows/docker.yml](.github/workflows/docker.yml).If you need different settings (for example, a different CUDA or ROCm library, you'll need to build the images locally for now).
947
950
@@ -967,6 +970,12 @@ or with a light image:
967
970
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
@@ -976,6 +985,7 @@ Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
@@ -989,6 +999,7 @@ The resulting images, are essentially the same as the non-CUDA images:
989
999
990
1000
1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
991
1001
2. `local/llama.cpp:light-cuda`: This image only includes the main executable file.
1002
+
3. `local/llama.cpp:server-cuda`: This image only includes the server executable file.
992
1003
993
1004
#### Usage
994
1005
@@ -997,6 +1008,7 @@ After building locally, Usage is similar to the non-CUDA examples, but you'll ne
997
1008
```bash
998
1009
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:"-n 512--n-gpu-layers 1
999
1010
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:"-n 512--n-gpu-layers 1
1011
+
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000--host 0.0.0.0-n 512--n-gpu-layers 1
0 commit comments