This guide provides detailed instructions for configuring Ollama to utilize GPU acceleration on different hardware platforms including NVIDIA, AMD, and Intel GPUs.
Install the CUDA toolkit (11.4 or newer recommended):
# Download and install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
Ollama automatically detects NVIDIA GPUs when available. You can customize GPU utilization with environment variables:
# Use specific GPUs (zero-indexed)
export CUDA_VISIBLE_DEVICES=0,1
# Limit memory usage per GPU (in MiB)
export GPU_MEMORY_UTILIZATION=90
# Start Ollama with GPU acceleration
ollama serve
Verifying GPU Usage
# Check if CUDA is detected
ollama run mistral "Are you using my GPU?" --verbose
# Monitor GPU usage
nvidia-smi -l 1
NVIDIA Docker Setup
For Docker-based deployments:
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Run Ollama with GPU support
docker run --gpus all -p 11434:11434 ollama/ollama
# Check OneAPI configuration
sycl-ls
# Test with Ollama
ollama run mistral "Are you using my GPU?" --verbose
Troubleshooting GPU Issues
Common NVIDIA Issues
Issue
Solution
CUDA not found
Verify CUDA installation: nvcc --version
Insufficient memory
Reduce model size or context window: ollama run mistral:7b-q4_0 -c 2048
Multiple GPU conflict
Specify device: export CUDA_VISIBLE_DEVICES=0
Driver/CUDA mismatch
Common AMD Issues
Issue
Solution
ROCm device not found
Check installation: rocm-smi
Hip runtime error
Set HSA_OVERRIDE_GFX_VERSION=10.3.0
Permission issues
Add user to render group: sudo usermod -aG render $USER
Common Intel Issues
Issue
Solution
GPU not detected
Verify driver installation: clinfo
Memory allocation failed
Set -cl-intel-greater-than-4GB-buffer-required
Driver too old
Update Intel GPU driver
Performance Optimization
NVIDIA Performance Tips
# Use mixed precision for better performance
export OLLAMA_COMPUTE_TYPE=float16
# For large models on limited VRAM
export OLLAMA_GPU_LAYERS=35
AMD Performance Tips
# Adjust compute type for better performance
export OLLAMA_COMPUTE_TYPE=float16
# For large models on limited VRAM
export HIP_VISIBLE_DEVICES=0
export OLLAMA_GPU_LAYERS=28
Intel Performance Tips
# Optimize for Intel GPUs
export OLLAMA_COMPUTE_TYPE=float16
export SYCL_CACHE_PERSISTENT=1
Multi-GPU Configuration
For systems with multiple GPUs:
# Use specific GPUs (comma-separated, zero-indexed)
export CUDA_VISIBLE_DEVICES=0,1 # NVIDIA
export HIP_VISIBLE_DEVICES=0,1 # AMD
# Set number of GPUs to use
export OLLAMA_NUM_GPU=2
Real-World Deployment Examples
High-Performance Server (4x NVIDIA RTX 4090)
# Create a systemd service
sudo nano /etc/systemd/system/ollama.service
# For NVIDIA
CUDA_VISIBLE_DEVICES=0 ollama serve
# For AMD (in separate instance)
HIP_VISIBLE_DEVICES=0 OLLAMA_COMPUTE_TYPE=rocm ollama serve --port 11435
NixOS GPU Configuration
For NixOS users, configure GPU acceleration in configuration.nix: