After this lesson you'll know
- How to install and configure Ollama on macOS, Linux, and Windows
- Pulling, running, and managing models from the command line
- Using the Ollama API for programmatic access
- Essential Ollama commands every user should know
What Is Ollama?
Ollama is the Docker of local AI. It packages large language models into a simple command-line interface -- pull a model, run it, done. No Python environments, no dependency hell, no CUDA driver nightmares. It handles model downloading, quantization selection, memory management, and GPU acceleration automatically.
Ollama supports hundreds of open-source models: Llama 3.1, Mistral, Gemma 2, Qwen 2.5, DeepSeek, Phi-3, and more. It runs on macOS (Apple Silicon and Intel), Linux, and Windows. It exposes a local API on port 11434 that any application can connect to -- making it the foundation for everything we build in this course.
Installation
macOS:
curl -fsSL https://ollama.com/install.sh | sh
Or download the .dmg from ollama.com. Both methods install the CLI and the background service. Apple Silicon Macs get automatic GPU acceleration through Metal.
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Supports Ubuntu 20.04+, Debian 11+, Fedora 36+, and most modern distributions. NVIDIA GPU acceleration requires CUDA drivers (installed separately).
Windows:
Download the installer from ollama.com. Requires Windows 10 or later. NVIDIA GPU support included. AMD GPU support is in preview.
Verify installation:
ollama --version
You should see the version number. If not, ensure the Ollama service is running.
Your First Model
Pull and run a model in two commands:
ollama pull llama3.1:8b
ollama run llama3.1:8b
The first command downloads the model (about 4.7GB for the 8B quantized version). The second launches an interactive chat session. Type your prompt, get a response, no API key required.
Recommended starter models by hardware:
- 8GB RAM:
llama3.1:8borgemma2:2b(fast, lightweight) - 16GB RAM:
qwen2.5:14bormistral:7b(good balance) - 32GB RAM:
qwen2.5:32bordeepseek-r1:32b(strong reasoning) - 64GB+ RAM:
llama3.1:70borqwen2.5:72b(near-frontier quality)
Quick Test Prompts
Once your model is running, try these to verify it works:
>>> Explain quantum computing in 3 sentences.
>>> Write a Python function that reverses a string.
>>> Summarize the key differences between TCP and UDP.
If you get coherent responses, your local AI lab is operational.
Essential Commands
These are the commands you'll use daily:
# List all downloaded models
ollama list
# Pull a specific model
ollama pull mistral:7b
# Run a model interactively
ollama run llama3.1:8b
# Run with a system prompt
ollama run llama3.1:8b "You are a helpful coding assistant."
# Show model details (size, parameters, license)
ollama show llama3.1:8b
# Remove a model to free disk space
ollama rm gemma2:2b
# List running models
ollama ps
# Copy a model (for creating custom variants)
ollama cp llama3.1:8b my-custom-model
Multiline input: In the interactive session, use triple quotes for long prompts:
>>> """
Analyze the following code for security vulnerabilities:
[paste code here]
"""
The Ollama API
Ollama exposes a REST API on localhost:11434 that lets any application use your local models. This is what makes Ollama a platform, not just a chat tool.
# Generate a completion
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "What is the capital of France?",
"stream": false
}'
# Chat format (multi-turn conversation)
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1:8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain Docker in one paragraph."}
],
"stream": false
}'
# Generate embeddings (for search/RAG)
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text",
"input": "This is a document to embed."
}'
The API is OpenAI-compatible, meaning tools built for OpenAI's API often work with Ollama by just changing the base URL to http://localhost:11434/v1.
OLLAMA_HOST=0.0.0.0. Only do this on trusted networks -- there's no authentication built in.
Quiz
1What port does the Ollama API run on by default?
2What is the recommended model for a machine with 16GB RAM?