Private Embeddings & Vector Search.

Turn your documents into searchable vectors -- entirely on your machine, with zero data leaving your network.

After this lesson you'll know

What embeddings are and why they matter for AI search
How to generate embeddings locally with Ollama
Setting up a local vector database with ChromaDB or SQLite
Building semantic search over your own documents

What Are Embeddings?

An embedding is a list of numbers -- a vector -- that represents the meaning of a piece of text. The word "dog" and the phrase "canine companion" have very different spellings but similar embeddings because they mean similar things. This is the foundation of semantic search: finding documents by meaning, not just keyword matching.

Traditional search (like grep or Ctrl+F) finds exact text matches. Semantic search with embeddings finds conceptually related content. Search for "employee burnout" and find documents about "staff retention challenges" and "workplace wellness programs" even if they never use the word "burnout."

When embeddings are generated by a cloud API, every document you embed is sent to a third-party server. With local embeddings, the model runs on your machine -- your documents never leave.

Generating Local Embeddings

Ollama makes local embedding generation trivial:

# Pull an embedding model
ollama pull nomic-embed-text

# Generate an embedding via API
curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "This is my private document content."
}'

The response includes a vector (array of 768 floating-point numbers for nomic-embed-text). This vector is a mathematical fingerprint of that text's meaning.

In Python:

import requests

def get_embedding(text):
    response = requests.post(
        "http://localhost:11434/api/embed",
        json={"model": "nomic-embed-text", "input": text}
    )
    return response.json()["embeddings"][0]

# Embed a document
vector = get_embedding("Quarterly revenue increased 15% year over year.")
print(f"Vector dimension: {len(vector)}")  # 768

Chunking matters: Embedding models have a maximum input length (typically 512-8192 tokens). Split long documents into overlapping chunks of 200-500 words. Overlap by 50 words to preserve context at chunk boundaries. The quality of your chunks determines the quality of your search.

🔒

This lesson is for Pro members

Unlock all 518+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.