Ollama is a powerful, open-source platform that lets you run large language models locally on your Linux machine. Think of it as Docker for AI models. It simplifies downloading, managing, and running models like Llama 3, Mistral, Gemma, and Phi with just a few commands.
Whether you want complete data privacy, freedom from API costs, or the ability to work offline, Ollama delivers all of that in a lightweight package optimized for both CPU and GPU execution. With access to over 100 open-source models, it is the go-to tool for running LLMs locally in 2026.
Why Run LLMs Locally with Ollama
Running language models on your own hardware offers advantages that cloud-based APIs simply cannot match.
Privacy and control stand out as the biggest benefit. Every prompt and response stays on your machine. No data leaves your network, making Ollama ideal for sensitive projects, proprietary code analysis, or personal assistants that handle confidential information.
Cost savings add up quickly. Cloud API pricing charges per token, and heavy usage can cost hundreds of dollars monthly. With Ollama, after the initial hardware investment, every query is free.
Offline capability means you can work anywhere. Once a model is downloaded, you never need an internet connection to use it. This is perfect for air-gapped environments, travel, or unreliable network situations.
Customization gives you full control over model behavior. Create custom Modelfiles, adjust parameters, and build specialized assistants tuned exactly to your needs.
System Requirements
Before installing Ollama, make sure your Linux system meets these requirements.
Minimum Hardware
| Component | Specification |
|---|---|
| OS | Ubuntu 20.04+, Debian 11+, Fedora 38+, RHEL 9+, Arch Linux |
| CPU | 2+ cores (x86_64 or ARM64) |
| RAM | 4GB minimum, 8GB+ recommended |
| Storage | 10GB free space, models range from 2GB to 50GB+ |
| Kernel | Linux kernel 5.4 or newer |
Recommended Production Setup
For serious work with larger models, aim for 16GB or more RAM, an NVIDIA GPU with at least 6GB VRAM or an AMD GPU with ROCm support, an NVMe SSD with 50GB+ free space, and stable internet for initial model downloads.
Installation Methods
Method 1: One-Line Installer (Recommended)
The official installation script handles everything automatically. It installs dependencies, sets up the systemd service, and detects GPU support.
curl -fsSL https://ollama.com/install.sh | sh
This command downloads the latest Ollama binary, configures systemd for auto-start, sets proper permissions, and detects any available GPU hardware.
Verify the installation succeeded by checking the version and service status.
ollama --version
You should see output similar to ollama version 0.5.0. Then confirm the service is running.
sudo systemctl status ollama
The output should show active (running).
Method 2: Manual Installation
Use this method for air-gapped environments or when you need a specific version.
Download the binary from the official GitHub releases page.
cd /tmp
wget https://github.com/ollama/ollama/releases/download/v0.5.0/ollama-linux-amd64
Install the binary to your system path.
sudo mkdir -p /usr/local/bin
sudo install -o root -g root -m 755 ollama-linux-amd64 /usr/local/bin/ollama
Create a dedicated system user for running the Ollama service.
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
sudo mkdir -p /usr/share/ollama/.ollama
sudo chown -R ollama:ollama /usr/share/ollama
Set up the systemd service by creating the service file.
sudo tee /etc/systemd/system/ollama.service > /dev/null <<EOF
[Unit]
Description=Ollama Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=ollama
Group=ollama
ExecStart=/usr/local/bin/ollama serve
Environment="HOME=/usr/share/ollama"
Environment="OLLAMA_HOST=127.0.0.1:11434"
Restart=always
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
Start the service and enable it on boot.
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Method 3: Package Managers
Several Linux distributions offer Ollama through their package management systems.
Arch Linux users can install from the AUR.
yay -S ollama-bin
Alternatively, use paru -S ollama-bin or build from source with yay -S ollama.
NixOS users can install temporarily or permanently.
nix-shell -p ollama
For a permanent installation, use nix-env -iA nixos.ollama.
Fedora and RHEL users can check for community builds.
sudo dnf copr enable jwillikers/ollama
sudo dnf install ollama
Post-Installation Setup
Add Your User to the Ollama Group
This step ensures your user account has the proper permissions to interact with the Ollama service.
sudo usermod -aG ollama $USER
Log out and back in for the group change to take effect.
Configure Firewall Rules
By default, Ollama only listens on localhost. If you need network access, configure your firewall accordingly.
sudo ufw allow from 127.0.0.1 to any port 11434
For remote access from other machines on your network, open the port more broadly. Use this with caution and only on trusted networks.
sudo ufw allow 11434/tcp
Set Environment Variables
Add these to your ~/.bashrc or /etc/environment file to customize Ollama behavior.
export OLLAMA_MODELS=/path/to/large/storage
export OLLAMA_HOST=0.0.0.0:11434
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_KEEP_ALIVE=30m
The OLLAMA_MODELS variable changes the default model storage location from ~/.ollama. The OLLAMA_HOST setting binds to all interfaces for container or network access. OLLAMA_MAX_LOADED_MODELS controls how many models stay loaded simultaneously, and OLLAMA_KEEP_ALIVE determines how long models remain in memory after the last request.
GPU Acceleration Guide
GPU acceleration dramatically improves inference speed. Ollama supports both NVIDIA and AMD GPUs.
NVIDIA GPU Setup
Install NVIDIA drivers for your distribution.
sudo apt install nvidia-driver-550
sudo reboot
Install the CUDA toolkit for GPU compute support.
sudo apt install nvidia-cuda-toolkit
Install the NVIDIA container toolkit to enable GPU passthrough.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart ollama
Verify GPU detection by checking that your GPU appears in the system.
nvidia-smi
Then run a model with verbose output to confirm GPU usage.
ollama run llama3.2 --verbose
Look for gpu in the model loading output to confirm GPU acceleration is active.
AMD GPU Setup with ROCm
Install ROCm on Ubuntu or Debian systems.
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
sudo apt install ./amdgpu-install_6.1.60100-1_all.deb
sudo amdgpu-install -y --usecase=rocm
sudo reboot
Verify ROCm is working correctly.
rocminfo
Your AMD GPU should appear in the output. Run a model with --verbose to confirm ROCm initialization.
Running Your First Model
Pull and Run a Model
Download your first model with the pull command.
ollama pull llama3.2
This downloads the Llama 3.2 3B model, approximately 3.5GB in size.
List all downloaded models on your system.
ollama list
Start an interactive chat session.
ollama run llama3.2
Type your prompt at the >>> marker and press Enter. The AI responds directly in your terminal. Type /bye to exit the session.
For a single query without entering interactive mode, pass your prompt directly.
ollama run llama3.2 "Explain quantum computing in simple terms"
Model Size Reference
Choosing the right model depends on your available hardware. Here is a quick reference for popular models.
| Model | Size | RAM Required | VRAM Recommended |
|---|---|---|---|
| llama3.2:3b | 3.5GB | 4GB | 4GB |
| llama3.2:7b | 6.5GB | 8GB | 6GB |
| llama3:70b | 39GB | 64GB | 48GB |
| mistral:7b | 4.2GB | 8GB | 6GB |
| phi3:mini | 2.3GB | 4GB | 3GB |
| gemma2:9b | 5.3GB | 8GB | 6GB |
Performance Tuning
Fine-tune performance with these environment variables.
export OLLAMA_NUM_THREADS=4
export OLLAMA_GPU_LAYERS=32
export OLLAMA_MEMORY_LIMIT=8GB
OLLAMA_NUM_THREADS sets the number of CPU threads used for inference. OLLAMA_GPU_LAYERS controls how many model layers are offloaded to the GPU. OLLAMA_MEMORY_LIMIT caps the total memory Ollama can use.
Configuration and Optimization
Custom Model Storage Location
If your home partition is small, move model storage to a larger drive.
sudo systemctl stop ollama
sudo mkdir -p /mnt/models/ollama
sudo chown ollama:ollama /mnt/models/ollama
echo 'export OLLAMA_MODELS=/mnt/models/ollama' | sudo tee -a /etc/environment
sudo systemctl restart ollama
Service Management
Monitor Ollama logs in real time.
sudo journalctl -u ollama -f
Restart the service after configuration changes.
sudo systemctl restart ollama
Stop the service when not needed.
sudo systemctl stop ollama
REST API Access
Ollama exposes a REST API on port 11434 for programmatic access.
Generate text from a model.
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
List available models on your system.
curl http://localhost:11434/api/tags
Pull a model through the API.
curl -X POST http://localhost:11434/api/pull -d '{
"name": "mistral:7b"
}'
Creating Custom Models with Modelfiles
Modelfiles let you create specialized AI assistants with custom system prompts and parameters.
cat > Modelfile <<EOF
FROM llama3.2
SYSTEM "You are a helpful Linux expert. Always provide command-line solutions."
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
Build and run your custom model.
ollama create linux-assistant -f Modelfile
ollama run linux-assistant
This creates a Linux-focused assistant that defaults to providing terminal commands and CLI solutions.
Troubleshooting Common Issues
Port Already in Use
If Ollama fails to start because port 11434 is occupied, find and stop the conflicting process.
sudo lsof -i :11434
sudo kill -9 <PID>
Alternatively, run Ollama on a different port.
export OLLAMA_HOST=127.0.0.1:11435
ollama serve
Permission Denied Errors
Fix permission issues by adding your user to the Ollama group and correcting socket permissions.
sudo usermod -aG ollama $USER
sudo chmod 666 /var/run/ollama.sock
sudo systemctl restart ollama
GPU Not Detected
Verify your NVIDIA drivers are properly installed.
nvidia-smi
Check Ollama logs for GPU-related messages.
sudo journalctl -u ollama | grep -i gpu
Force GPU detection by setting the visible devices variable.
export CUDA_VISIBLE_DEVICES=0
sudo systemctl restart ollama
Model Download Failures
Check available disk space first.
df -h /usr/share/ollama
Verify network connectivity to the Ollama servers.
curl -I https://ollama.com/api/versions
Out of Memory Errors
Switch to a smaller model that fits your available RAM.
ollama pull phi3:mini
Reduce the context window size to lower memory usage.
ollama run llama3.2 --num-ctx 2048
Set an explicit memory limit.
export OLLAMA_MEMORY_LIMIT=6GB
Slow Response Times
Limit concurrent model loading to free resources.
export OLLAMA_MAX_LOADED_MODELS=1
Increase CPU thread allocation for faster processing.
export OLLAMA_NUM_THREADS=8
Use a quantized model variant for faster inference at a small quality trade-off.
ollama pull llama3.2:3b-q4_0
Uninstalling Ollama
Complete Removal
Stop and disable the service, then remove all Ollama files.
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /usr/local/bin/ollama
sudo rm /etc/systemd/system/ollama.service
sudo rm -rf /usr/share/ollama
sudo rm -rf ~/.ollama
sudo userdel ollama
sudo groupdel ollama
sudo systemctl daemon-reload
This removes the binary, service file, all downloaded models, and the dedicated system user.
Partial Cleanup
Remove specific models while keeping Ollama installed.
ollama rm llama3.2
ollama rm mistral:7b
Purge all model data without removing the Ollama binary.
rm -rf ~/.ollama/models/blobs
Frequently Asked Questions
Can I run Ollama without root or sudo access?
Yes. Skip the systemd service setup and run ollama serve directly in your terminal. The official installer supports user-mode operation by default.
How do I update Ollama to the latest version?
Re-run the installation script. Your existing models remain intact.
curl -fsSL https://ollama.com/install.sh | sh
Can I use Ollama with Docker?
Yes. Use the official Docker image for containerized deployments.
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Which models work best on CPU only?
Phi-3 Mini at 2.3GB, Gemma 2B, and TinyLlama are all optimized for CPU inference and deliver reasonable performance without GPU acceleration.
How much disk space should I plan for?
Each model varies in size. Plan for at least 20GB of free space if you want to experiment with two or three medium-sized models.
Is Ollama production-ready?
Yes. Many organizations run Ollama in production with proper monitoring, load balancing, and backup strategies in place.
Can I fine-tune models through Ollama?
Ollama supports custom Modelfiles for parameter tuning and system prompt configuration. For full model fine-tuning with custom datasets, use dedicated tools like Unsloth or Axolotl alongside Ollama.
What to Do Next
Once Ollama is running smoothly on your Linux system, explore these next steps.
Browse the model library with ollama pull <model-name> to try over 100 available models. Build custom assistants using Modelfiles for specialized tasks. Integrate Ollama into your applications through the REST API using Python, Node.js, or Go. Set up monitoring with Prometheus and Grafana for production deployments. Scale horizontally using load balancers across multiple Ollama instances.
Conclusion
Ollama brings the power of large language models to your Linux system with remarkable simplicity. Whether you are an AI researcher, developer, or technology enthusiast, this guide covers everything you need to install, configure, and optimize Ollama for your specific workflow.
Running LLMs locally means complete data privacy, zero ongoing API costs, and full control over your AI infrastructure. With GPU acceleration enabled, you can achieve performance comparable to cloud services entirely on your own hardware.
The one-line installer handles 95 percent of use cases. GPU acceleration delivers significant performance improvements. Custom Modelfiles offer unlimited flexibility for building specialized assistants. Local deployment guarantees data sovereignty and eliminates vendor lock-in.
Start experimenting today. Pull your first model and unlock the full potential of open-source AI on your Linux machine.
Discussion
Loading comments...