Skip to content

Commit 6c65a42

Browse files
authored
Improve Docker instructions
1 parent 91bf8fa commit 6c65a42

File tree

1 file changed

+24
-25
lines changed

1 file changed

+24
-25
lines changed

docker/README.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,55 @@
1-
# Install Docker Server
2-
3-
**Note #1:** This was tested with Docker running on Linux. If you can get it working on Windows or MacOS, please update this `README.md` with a PR!
1+
### Install Docker Server
2+
> [!IMPORTANT]
3+
> This was tested with Docker running on Linux. <br>If you can get it working on Windows or MacOS, please update this `README.md` with a PR!<br>
44
55
[Install Docker Engine](https://docs.docker.com/engine/install)
66

7-
**Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
87

9-
# Simple Dockerfiles for building the llama-cpp-python server with external model bin files
10-
## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image
8+
## Simple Dockerfiles for building the llama-cpp-python server with external model bin files
9+
### openblas_simple
10+
A simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image:
1111
```
1212
cd ./openblas_simple
1313
docker build -t openblas_simple .
1414
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
1515
```
1616
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
1717

18-
## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image
18+
### cuda_simple
19+
> [!WARNING]
20+
> Nvidia GPU CuBLAS support requires an Nvidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker Nvidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) <br>
21+
22+
A simple Dockerfile for CUDA-accelerated CuBLAS, where the model is located outside the Docker image:
23+
1924
```
2025
cd ./cuda_simple
2126
docker build -t cuda_simple .
22-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
27+
docker run --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
2328
```
2429
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
2530

26-
# "Open-Llama-in-a-box"
27-
## Download an Apache V2.0 licensed 3B paramter Open Llama model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server
31+
--------------------------------------------------------------------------
32+
33+
### "Open-Llama-in-a-box"
34+
Download an Apache V2.0 licensed 3B params Open LLaMA model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server:
2835
```
2936
$ cd ./open_llama
3037
./build.sh
3138
./start.sh
3239
```
3340

34-
# Manually choose your own Llama model from Hugging Face
41+
### Manually choose your own Llama model from Hugging Face
3542
`python3 ./hug_model.py -a TheBloke -t llama`
3643
You should now have a model in the current directory and `model.bin` symlinked to it for the subsequent Docker build and copy step. e.g.
3744
```
3845
docker $ ls -lh *.bin
3946
-rw-rw-r-- 1 user user 4.8G May 23 18:30 <downloaded-model-file>q5_1.bin
4047
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_1.bin
4148
```
42-
**Note #1:** Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
43-
**TWICE** as much disk space as the size of the model:
49+
50+
> [!NOTE]
51+
> Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
52+
**TWICE** as much disk space as the size of the model:<br>
4453
4554
| Model | Quantized size |
4655
|------:|----------------:|
@@ -50,17 +59,7 @@ lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_
5059
| 33B | 25 GB |
5160
| 65B | 50 GB |
5261

53-
**Note #2:** If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
5462

55-
## Use OpenBLAS
56-
Use if you don't have a NVidia GPU. Defaults to `python:3-slim-bullseye` Docker base image and OpenBLAS:
57-
### Build:
58-
`docker build -t openblas .`
59-
### Run:
60-
`docker run --cap-add SYS_RESOURCE -t openblas`
63+
> [!NOTE]
64+
> If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
6165
62-
## Use CuBLAS
63-
### Build:
64-
`docker build --build-arg IMAGE=nvidia/cuda:12.1.1-devel-ubuntu22.04 -t cublas .`
65-
### Run:
66-
`docker run --cap-add SYS_RESOURCE -t cublas`

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy