This guide provides detailed instructions for deploying Ollama in Docker containers, enabling consistent, isolated environments and streamlined deployment across different systems.
Why Use Ollama with Docker?
Docker provides several advantages for running Ollama:
Isolation : Run Ollama in a contained environment without affecting the host system
Portability : Deploy the same Ollama setup across different environments
Resource control : Limit CPU, memory, and GPU resources allocated to Ollama
Version management : Easily switch between different Ollama versions
Orchestration : Integrate with Kubernetes or Docker Swarm for scaling
Prerequisites
Before getting started, ensure you have:
Docker installed on your system:
Copy # Linux
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in to apply group changes
# Verify Docker installation
docker --version
Docker Compose (optional but recommended):
Copy # Install Docker Compose V2
sudo apt update && sudo apt install -y docker-compose-plugin
# Verify installation
docker compose version
At least 8GB of RAM and sufficient disk space for models (~5-10GB per model)
Basic Ollama Docker Setup
Using Official Docker Image
Pull and run the official Ollama Docker image:
Copy # Pull the latest Ollama image
docker pull ollama/ollama:latest
# Create a volume for persistent storage
docker volume create ollama-data
# Run Ollama container
docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
ollama/ollama
Testing Your Ollama Container
Copy # Check if the container is running
docker ps
# Download and run a model
docker exec -it ollama ollama run mistral "Hello, how are you?"
# Access the API from the host
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "What is Docker?"
}'
Docker Compose Setup
A more manageable way to configure and run Ollama is using Docker Compose.
Basic Docker Compose Configuration
Create a file named docker-compose.yml
:
Copy version: '3'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
volumes:
ollama-data:
Run with:
Advanced Docker Compose with Resource Limits
For more control over container resources:
Copy version: '3'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama-data:/root/.ollama
- ./modelfiles:/modelfiles
ports:
- "11434:11434"
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_KEEP_ALIVE=15m
deploy:
resources:
limits:
cpus: '8'
memory: 16G
reservations:
cpus: '4'
memory: 8G
restart: unless-stopped
volumes:
ollama-data:
GPU-Accelerated Docker Setup
NVIDIA GPU Support
To enable NVIDIA GPU acceleration with Docker:
Install the NVIDIA Container Toolkit:
Copy # Add NVIDIA package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install nvidia-container-toolkit
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Run Ollama with GPU support:
Copy docker run -d \
--name ollama-gpu \
--gpus all \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
ollama/ollama
Docker Compose with NVIDIA GPUs
Copy version: '3'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-gpu
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
environment:
- OLLAMA_COMPUTE_TYPE=float16
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
volumes:
ollama-data:
AMD ROCm GPU Support
For AMD GPUs with ROCm:
Copy docker run -d \
--name ollama-rocm \
--device=/dev/kfd \
--device=/dev/dri \
--security-opt seccomp=unconfined \
--group-add video \
-p 11434:11434 \
-e OLLAMA_COMPUTE_TYPE=rocm \
-v ollama-data:/root/.ollama \
ollama/ollama
Multi-Container Setups
Ollama with Open WebUI
This setup combines Ollama with the Open WebUI for a more user-friendly interface:
Copy version: '3'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
volumes:
- open-webui-data:/app/backend/data
ports:
- "3000:8080"
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434/api
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama-data:
open-webui-data:
Ollama for DevOps
A setup designed for DevOps workflows with Ollama and RAG capabilities:
Copy version: '3'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-devops
volumes:
- ollama-data:/root/.ollama
- ./models:/models
- ./devops-docs:/data
ports:
- "11434:11434"
environment:
- OLLAMA_MODELS=/models
restart: unless-stopped
vector-db:
image: chroma/chroma:latest
container_name: chroma-db
volumes:
- chroma-data:/chroma/data
ports:
- "8000:8000"
restart: unless-stopped
rag-service:
image: ghcr.io/yourusername/ollama-rag-service:latest
container_name: rag-service
volumes:
- ./data:/data
ports:
- "5000:5000"
environment:
- OLLAMA_HOST=ollama:11434
- CHROMA_HOST=vector-db:8000
depends_on:
- ollama
- vector-db
restart: unless-stopped
volumes:
ollama-data:
chroma-data:
Docker Network Configuration
Creating an Isolated Network
For multi-container deployments, create an isolated network:
Copy # Create a dedicated network
docker network create ollama-network
# Run Ollama in the network
docker run -d \
--name ollama \
--network ollama-network \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
ollama/ollama
Accessing Ollama from Other Containers
Other containers can access Ollama using the container name as hostname:
Copy docker run -it --rm --network ollama-network alpine/curl \
-X POST http://ollama:11434/api/generate \
-d '{"model": "mistral", "prompt": "Hello!"}'
Custom Ollama Docker Images
Creating a Custom Dockerfile
Create a Dockerfile
with pre-loaded models and custom configuration:
Copy FROM ollama/ollama:latest
# Set environment variables
ENV OLLAMA_HOST=0.0.0.0:11434
ENV OLLAMA_KEEP_ALIVE=5m
# Copy custom Modelfiles
COPY ./modelfiles /modelfiles
# Pre-download models during build (optional)
RUN ollama serve & sleep 5 && \
ollama pull mistral:7b && \
ollama pull codellama:7b && \
ollama create devops-assistant -f /modelfiles/DevOps-Assistant
# Expose port
EXPOSE 11434
# Default command
CMD ["ollama", "serve"]
Build and run your custom image:
Copy # Build the image
docker build -t custom-ollama:latest .
# Run the container
docker run -d \
--name custom-ollama \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
custom-ollama:latest
Production Best Practices
Security Considerations
TLS Encryption :
Copy services:
ollama:
environment:
- OLLAMA_TLS_CERT=/certs/cert.pem
- OLLAMA_TLS_KEY=/certs/key.pem
volumes:
- ./certs:/certs
Authentication (using a reverse proxy like Nginx):
Copy # Example nginx.conf snippet
server {
listen 443 ssl;
server_name ollama.example.com;
ssl_certificate /etc/nginx/certs/cert.pem;
ssl_certificate_key /etc/nginx/certs/key.pem;
auth_basic "Ollama API";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://ollama:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Health Checks and Monitoring
Add health checks to your Docker Compose:
Copy services:
ollama:
# ...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 20s
Docker Swarm and Kubernetes
Docker Swarm Deployment
Copy # Initialize Swarm if not already done
docker swarm init
# Deploy Ollama stack
docker stack deploy -c docker-compose.yml ollama-stack
Kubernetes Deployment
Create a kubernetes.yaml
file:
Copy apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
resources:
limits:
memory: "16Gi"
cpu: "8"
requests:
memory: "8Gi"
cpu: "4"
volumes:
- name: ollama-data
persistentVolumeClaim:
claimName: ollama-pvc
---
apiVersion: v1
kind: Service
metadata:
name: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
type: ClusterIP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
Apply with:
Copy kubectl apply -f kubernetes.yaml
Troubleshooting Docker Issues
Check logs with docker logs ollama
Verify volume permissions with docker exec -it ollama ls -la /root/.ollama
Test with docker exec -it ollama curl localhost:11434/api/tags
Increase memory limits in Docker settings
Verify the NVIDIA Container Toolkit installation and check logs
Next Steps
After setting up Ollama in Docker: