Models and Fine-tuning
This guide covers the available models in Ollama, how to use them, and techniques for customizing models to suit your specific requirements.
Available Models
Ollama supports a variety of open-source LLMs. Here are some of the most commonly used models:
General-Purpose Models
Llama 2
7B to 70B
Meta's general-purpose model
ollama pull llama2
Mistral
7B
High-quality open-source model
ollama pull mistral
Mixtral
8x7B
Mixture-of-experts model
ollama pull mixtral
Phi-2
2.7B
Microsoft's compact model
ollama pull phi
Neural Chat
7B
Optimized for chat
ollama pull neural-chat
Vicuna
7B to 33B
Fine-tuned LLaMa model
ollama pull vicuna
Code-Specialized Models
CodeLlama
7B to 34B
Code-focused Llama variant
ollama pull codellama
WizardCoder
7B to 34B
Fine-tuned for code tasks
ollama pull wizardcoder
DeepSeek Coder
6.7B to 33B
Specialized for code
ollama pull deepseek-coder
Small/Efficient Models
TinyLlama
1.1B
Compact model for limited resources
ollama pull tinyllama
Gemma
2B to 7B
Google's lightweight model
ollama pull gemma
Phi-2
2.7B
Efficient and compact
ollama pull phi
Multilingual Models
BLOOM
Multilingual capabilities
ollama pull bloom
Qwen
Chinese and English
ollama pull qwen
Japanese Stable LM
Japanese language
ollama pull stablej
Model Management
Listing Models
Pulling Models
Removing Models
Model Parameters
Control model behavior with these parameters:
temperature
Controls randomness
0.0 - 2.0
top_p
Nucleus sampling threshold
0.0 - 1.0
top_k
Limits vocabulary to top K tokens
1 - 100+
context_length
Maximum context window size
Model dependent
seed
Random seed for reproducibility
Any integer
Example usage:
Customizing Models with Modelfiles
Ollama uses Modelfiles (similar to Dockerfiles) to create custom model configurations.
Basic Modelfile Example
Save this in a file named Modelfile
and create a custom model:
Advanced Modelfile Example
Modelfile Commands Reference
FROM
Base model
FROM mistral:latest
PARAMETER
Set inference parameter
PARAMETER temperature 0.7
SYSTEM
Set system message
SYSTEM You are a helpful assistant
TEMPLATE
Define chat template
TEMPLATE <s>{{.System}}</s>{{.Prompt}}
ADAPTER
Apply LoRA adapter
ADAPTER ./adapter.bin
MESSAGE
Add example conversation
MESSAGE user "Hi"
PROMPT
Default prompt
PROMPT Answer in bullet points
LICENSE
Specify model license
LICENSE MIT
Fine-tuning with Custom Data
While Ollama doesn't directly support fine-tuning, you can use pre-fine-tuned models and adapt them with Modelfiles.
Using External Fine-tuned Models
Convert the model to GGUF format
Import into Ollama:
With Modelfile:
Behavior Fine-tuning with Examples
You can "soft fine-tune" model behavior by providing examples in the Modelfile:
Model Quantization
Ollama supports various quantization levels to balance performance and resource usage:
Q4_K_M
3-4GB (7B model)
Good
ollama pull mistral:7b-q4_k_m
Q5_K_M
4-5GB (7B model)
Better
ollama pull mistral:7b-q5_k_m
Q8_0
7-8GB (7B model)
Best
ollama pull mistral:7b-q8_0
For resource-constrained environments, use more aggressive quantization:
RAG (Retrieval-Augmented Generation)
Enhance models with external knowledge using RAG:
Practical Model Selection Guide
General chat
mistral:7b
Good balance of size and capability
Code assistance
codellama:7b
Specialized for code understanding/generation
Resource-constrained
tinyllama:1.1b
Small memory footprint
Technical documentation
neural-chat:7b
Clear instruction following
Complex reasoning
mixtral:8x7b
or llama2:70b
Sophisticated reasoning capabilities
DevOps-Specific Model Configuration
For DevOps-specific tasks, create a specialized model configuration:
Create this model:
Next Steps
Now that you understand Ollama's models:
Last updated