Configuration
This guide covers essential configuration options for optimizing Ollama performance, managing resources, and customizing model behavior.
Environment Variables
Ollama's behavior can be controlled using environment variables, which can be set before running the ollama
command:
Core Environment Variables
OLLAMA_HOST
Network address to listen on
127.0.0.1:11434
OLLAMA_MODELS
Directory to store models
~/.ollama/models
OLLAMA_KEEP_ALIVE
Keep models loaded in memory (minutes)
5
OLLAMA_TIMEOUT
Request timeout (seconds)
30
Performance Environment Variables
CUDA_VISIBLE_DEVICES
Control which NVIDIA GPUs are used
All available
OLLAMA_NUM_GPU
Number of GPUs to use
All available
OLLAMA_NUM_THREAD
Number of CPU threads to use
Auto-detected
OLLAMA_COMPUTE_TYPE
Compute type for inference (float16, float32, auto)
auto
Security Environment Variables
OLLAMA_ORIGINS
CORS origins to allow
All (*)
OLLAMA_TLS_CERT
Path to TLS certificate
None
OLLAMA_TLS_KEY
Path to TLS key
None
Configuration File
Ollama supports a JSON configuration file located at ~/.ollama/config.json
:
GPU Configuration
NVIDIA GPU Setup
For NVIDIA GPUs, ensure you have the CUDA toolkit installed:
AMD ROCm Setup
For AMD GPUs with ROCm support:
Intel GPU Setup
For Intel Arc GPUs:
Memory Management
Optimize Ollama's memory usage with these settings:
Network Configuration
Binding to External Interfaces
To make Ollama accessible from other machines on your network:
Configuring TLS
For secure communications:
Model Configuration with Modelfiles
Create custom models with Modelfiles:
Modelfile Commands
FROM
Base model
FROM mistral:latest
PARAMETER
Set inference parameter
PARAMETER temperature 0.7
SYSTEM
Set system message
SYSTEM You are a helpful assistant
TEMPLATE
Define prompt template
TEMPLATE <s>{{.System}}</s>{{.Prompt}}
LICENSE
Specify model license
LICENSE MIT
Real-world Configuration Examples
High-Performance Server Setup
For a dedicated Ollama server with multiple powerful GPUs:
Low Resource Environment
For systems with limited resources:
API Configuration
Configure the Ollama API for integration with other tools:
API Rate Limiting
Add rate limiting with a reverse proxy like Nginx:
Multi-User Setup
For shared environments, use Docker with multiple containers:
Troubleshooting Configuration Issues
Model loads slowly
Check OLLAMA_NUM_THREAD
and OLLAMA_COMPUTE_TYPE
High memory usage
Reduce context size or use smaller models
Network timeout
Increase OLLAMA_TIMEOUT
or check firewall
Permission errors
Check file ownership of OLLAMA_MODELS
directory
Next Steps
After configuring Ollama:
Last updated