AI Tools 5 min read

Ollama Linux Installation: Complete Setup Guide 2026

Suresh Suresh
Ollama Linux Installation: Complete Setup Guide 2026

Ollama is a powerful, open-source platform that lets you run large language models locally on your Linux machine. Think of it as Docker for AI models. It simplifies downloading, managing, and running models like Llama 3, Mistral, Gemma, and Phi with just a few commands.

Whether you want complete data privacy, freedom from API costs, or the ability to work offline, Ollama delivers all of that in a lightweight package optimized for both CPU and GPU execution. With access to over 100 open-source models, it is the go-to tool for running LLMs locally in 2026.

Why Run LLMs Locally with Ollama

Running language models on your own hardware offers advantages that cloud-based APIs simply cannot match.

Privacy and control stand out as the biggest benefit. Every prompt and response stays on your machine. No data leaves your network, making Ollama ideal for sensitive projects, proprietary code analysis, or personal assistants that handle confidential information.

Cost savings add up quickly. Cloud API pricing charges per token, and heavy usage can cost hundreds of dollars monthly. With Ollama, after the initial hardware investment, every query is free.

Offline capability means you can work anywhere. Once a model is downloaded, you never need an internet connection to use it. This is perfect for air-gapped environments, travel, or unreliable network situations.

Customization gives you full control over model behavior. Create custom Modelfiles, adjust parameters, and build specialized assistants tuned exactly to your needs.

System Requirements

Before installing Ollama, make sure your Linux system meets these requirements.

Minimum Hardware

ComponentSpecification
OSUbuntu 20.04+, Debian 11+, Fedora 38+, RHEL 9+, Arch Linux
CPU2+ cores (x86_64 or ARM64)
RAM4GB minimum, 8GB+ recommended
Storage10GB free space, models range from 2GB to 50GB+
KernelLinux kernel 5.4 or newer

For serious work with larger models, aim for 16GB or more RAM, an NVIDIA GPU with at least 6GB VRAM or an AMD GPU with ROCm support, an NVMe SSD with 50GB+ free space, and stable internet for initial model downloads.

Installation Methods

The official installation script handles everything automatically. It installs dependencies, sets up the systemd service, and detects GPU support.

curl -fsSL https://ollama.com/install.sh | sh

This command downloads the latest Ollama binary, configures systemd for auto-start, sets proper permissions, and detects any available GPU hardware.

Verify the installation succeeded by checking the version and service status.

ollama --version

You should see output similar to ollama version 0.5.0. Then confirm the service is running.

sudo systemctl status ollama

The output should show active (running).

Method 2: Manual Installation

Use this method for air-gapped environments or when you need a specific version.

Download the binary from the official GitHub releases page.

cd /tmp
wget https://github.com/ollama/ollama/releases/download/v0.5.0/ollama-linux-amd64

Install the binary to your system path.

sudo mkdir -p /usr/local/bin
sudo install -o root -g root -m 755 ollama-linux-amd64 /usr/local/bin/ollama

Create a dedicated system user for running the Ollama service.

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
sudo mkdir -p /usr/share/ollama/.ollama
sudo chown -R ollama:ollama /usr/share/ollama

Set up the systemd service by creating the service file.

sudo tee /etc/systemd/system/ollama.service > /dev/null <<EOF
[Unit]
Description=Ollama Service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=ollama
Group=ollama
ExecStart=/usr/local/bin/ollama serve
Environment="HOME=/usr/share/ollama"
Environment="OLLAMA_HOST=127.0.0.1:11434"
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

Start the service and enable it on boot.

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Method 3: Package Managers

Several Linux distributions offer Ollama through their package management systems.

Arch Linux users can install from the AUR.

yay -S ollama-bin

Alternatively, use paru -S ollama-bin or build from source with yay -S ollama.

NixOS users can install temporarily or permanently.

nix-shell -p ollama

For a permanent installation, use nix-env -iA nixos.ollama.

Fedora and RHEL users can check for community builds.

sudo dnf copr enable jwillikers/ollama
sudo dnf install ollama

Post-Installation Setup

Add Your User to the Ollama Group

This step ensures your user account has the proper permissions to interact with the Ollama service.

sudo usermod -aG ollama $USER

Log out and back in for the group change to take effect.

Configure Firewall Rules

By default, Ollama only listens on localhost. If you need network access, configure your firewall accordingly.

sudo ufw allow from 127.0.0.1 to any port 11434

For remote access from other machines on your network, open the port more broadly. Use this with caution and only on trusted networks.

sudo ufw allow 11434/tcp

Set Environment Variables

Add these to your ~/.bashrc or /etc/environment file to customize Ollama behavior.

export OLLAMA_MODELS=/path/to/large/storage
export OLLAMA_HOST=0.0.0.0:11434
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_KEEP_ALIVE=30m

The OLLAMA_MODELS variable changes the default model storage location from ~/.ollama. The OLLAMA_HOST setting binds to all interfaces for container or network access. OLLAMA_MAX_LOADED_MODELS controls how many models stay loaded simultaneously, and OLLAMA_KEEP_ALIVE determines how long models remain in memory after the last request.

GPU Acceleration Guide

GPU acceleration dramatically improves inference speed. Ollama supports both NVIDIA and AMD GPUs.

NVIDIA GPU Setup

Install NVIDIA drivers for your distribution.

sudo apt install nvidia-driver-550
sudo reboot

Install the CUDA toolkit for GPU compute support.

sudo apt install nvidia-cuda-toolkit

Install the NVIDIA container toolkit to enable GPU passthrough.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart ollama

Verify GPU detection by checking that your GPU appears in the system.

nvidia-smi

Then run a model with verbose output to confirm GPU usage.

ollama run llama3.2 --verbose

Look for gpu in the model loading output to confirm GPU acceleration is active.

AMD GPU Setup with ROCm

Install ROCm on Ubuntu or Debian systems.

wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
sudo apt install ./amdgpu-install_6.1.60100-1_all.deb
sudo amdgpu-install -y --usecase=rocm
sudo reboot

Verify ROCm is working correctly.

rocminfo

Your AMD GPU should appear in the output. Run a model with --verbose to confirm ROCm initialization.

Running Your First Model

Pull and Run a Model

Download your first model with the pull command.

ollama pull llama3.2

This downloads the Llama 3.2 3B model, approximately 3.5GB in size.

List all downloaded models on your system.

ollama list

Start an interactive chat session.

ollama run llama3.2

Type your prompt at the >>> marker and press Enter. The AI responds directly in your terminal. Type /bye to exit the session.

For a single query without entering interactive mode, pass your prompt directly.

ollama run llama3.2 "Explain quantum computing in simple terms"

Model Size Reference

Choosing the right model depends on your available hardware. Here is a quick reference for popular models.

ModelSizeRAM RequiredVRAM Recommended
llama3.2:3b3.5GB4GB4GB
llama3.2:7b6.5GB8GB6GB
llama3:70b39GB64GB48GB
mistral:7b4.2GB8GB6GB
phi3:mini2.3GB4GB3GB
gemma2:9b5.3GB8GB6GB

Performance Tuning

Fine-tune performance with these environment variables.

export OLLAMA_NUM_THREADS=4
export OLLAMA_GPU_LAYERS=32
export OLLAMA_MEMORY_LIMIT=8GB

OLLAMA_NUM_THREADS sets the number of CPU threads used for inference. OLLAMA_GPU_LAYERS controls how many model layers are offloaded to the GPU. OLLAMA_MEMORY_LIMIT caps the total memory Ollama can use.

Configuration and Optimization

Custom Model Storage Location

If your home partition is small, move model storage to a larger drive.

sudo systemctl stop ollama
sudo mkdir -p /mnt/models/ollama
sudo chown ollama:ollama /mnt/models/ollama
echo 'export OLLAMA_MODELS=/mnt/models/ollama' | sudo tee -a /etc/environment
sudo systemctl restart ollama

Service Management

Monitor Ollama logs in real time.

sudo journalctl -u ollama -f

Restart the service after configuration changes.

sudo systemctl restart ollama

Stop the service when not needed.

sudo systemctl stop ollama

REST API Access

Ollama exposes a REST API on port 11434 for programmatic access.

Generate text from a model.

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

List available models on your system.

curl http://localhost:11434/api/tags

Pull a model through the API.

curl -X POST http://localhost:11434/api/pull -d '{
  "name": "mistral:7b"
}'

Creating Custom Models with Modelfiles

Modelfiles let you create specialized AI assistants with custom system prompts and parameters.

cat > Modelfile <<EOF
FROM llama3.2
SYSTEM "You are a helpful Linux expert. Always provide command-line solutions."
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

Build and run your custom model.

ollama create linux-assistant -f Modelfile
ollama run linux-assistant

This creates a Linux-focused assistant that defaults to providing terminal commands and CLI solutions.

Troubleshooting Common Issues

Port Already in Use

If Ollama fails to start because port 11434 is occupied, find and stop the conflicting process.

sudo lsof -i :11434
sudo kill -9 <PID>

Alternatively, run Ollama on a different port.

export OLLAMA_HOST=127.0.0.1:11435
ollama serve

Permission Denied Errors

Fix permission issues by adding your user to the Ollama group and correcting socket permissions.

sudo usermod -aG ollama $USER
sudo chmod 666 /var/run/ollama.sock
sudo systemctl restart ollama

GPU Not Detected

Verify your NVIDIA drivers are properly installed.

nvidia-smi

Check Ollama logs for GPU-related messages.

sudo journalctl -u ollama | grep -i gpu

Force GPU detection by setting the visible devices variable.

export CUDA_VISIBLE_DEVICES=0
sudo systemctl restart ollama

Model Download Failures

Check available disk space first.

df -h /usr/share/ollama

Verify network connectivity to the Ollama servers.

curl -I https://ollama.com/api/versions

Out of Memory Errors

Switch to a smaller model that fits your available RAM.

ollama pull phi3:mini

Reduce the context window size to lower memory usage.

ollama run llama3.2 --num-ctx 2048

Set an explicit memory limit.

export OLLAMA_MEMORY_LIMIT=6GB

Slow Response Times

Limit concurrent model loading to free resources.

export OLLAMA_MAX_LOADED_MODELS=1

Increase CPU thread allocation for faster processing.

export OLLAMA_NUM_THREADS=8

Use a quantized model variant for faster inference at a small quality trade-off.

ollama pull llama3.2:3b-q4_0

Uninstalling Ollama

Complete Removal

Stop and disable the service, then remove all Ollama files.

sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /usr/local/bin/ollama
sudo rm /etc/systemd/system/ollama.service
sudo rm -rf /usr/share/ollama
sudo rm -rf ~/.ollama
sudo userdel ollama
sudo groupdel ollama
sudo systemctl daemon-reload

This removes the binary, service file, all downloaded models, and the dedicated system user.

Partial Cleanup

Remove specific models while keeping Ollama installed.

ollama rm llama3.2
ollama rm mistral:7b

Purge all model data without removing the Ollama binary.

rm -rf ~/.ollama/models/blobs

Frequently Asked Questions

Can I run Ollama without root or sudo access?

Yes. Skip the systemd service setup and run ollama serve directly in your terminal. The official installer supports user-mode operation by default.

How do I update Ollama to the latest version?

Re-run the installation script. Your existing models remain intact.

curl -fsSL https://ollama.com/install.sh | sh

Can I use Ollama with Docker?

Yes. Use the official Docker image for containerized deployments.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Which models work best on CPU only?

Phi-3 Mini at 2.3GB, Gemma 2B, and TinyLlama are all optimized for CPU inference and deliver reasonable performance without GPU acceleration.

How much disk space should I plan for?

Each model varies in size. Plan for at least 20GB of free space if you want to experiment with two or three medium-sized models.

Is Ollama production-ready?

Yes. Many organizations run Ollama in production with proper monitoring, load balancing, and backup strategies in place.

Can I fine-tune models through Ollama?

Ollama supports custom Modelfiles for parameter tuning and system prompt configuration. For full model fine-tuning with custom datasets, use dedicated tools like Unsloth or Axolotl alongside Ollama.

What to Do Next

Once Ollama is running smoothly on your Linux system, explore these next steps.

Browse the model library with ollama pull <model-name> to try over 100 available models. Build custom assistants using Modelfiles for specialized tasks. Integrate Ollama into your applications through the REST API using Python, Node.js, or Go. Set up monitoring with Prometheus and Grafana for production deployments. Scale horizontally using load balancers across multiple Ollama instances.

Conclusion

Ollama brings the power of large language models to your Linux system with remarkable simplicity. Whether you are an AI researcher, developer, or technology enthusiast, this guide covers everything you need to install, configure, and optimize Ollama for your specific workflow.

Running LLMs locally means complete data privacy, zero ongoing API costs, and full control over your AI infrastructure. With GPU acceleration enabled, you can achieve performance comparable to cloud services entirely on your own hardware.

The one-line installer handles 95 percent of use cases. GPU acceleration delivers significant performance improvements. Custom Modelfiles offer unlimited flexibility for building specialized assistants. Local deployment guarantees data sovereignty and eliminates vendor lock-in.

Start experimenting today. Pull your first model and unlock the full potential of open-source AI on your Linux machine.

Suresh S

Written by Suresh S

Founder of FreeTechLearner, a technology blog dedicated to Linux, Open Source, Cybersecurity, Cloud Computing, Self-Hosting, and AI. I create practical tutorials and learning resources that help students, beginners, and tech enthusiasts build real-world skills and stay updated with modern technology.

Discussion

Loading comments...