Build Your First RAG
Theory is over. In this lesson, you build a complete RAG system from scratch — loading documents, chunking them, embedding them, storing them in a vector database, and querying them with an LLM. Every line of code is explained. By the end, you will have a working system you can adapt to any knowledge base.
What You Are Building
A knowledge-base chatbot that can answer questions about any collection of documents. You will load text files, split them into searchable chunks, embed them with OpenAI, store them in either Chroma (for quick prototyping) or Supabase pgvector (for production), and query them with Claude. The same architecture powers customer support bots, internal knowledge systems, and documentation search at companies of every size.
pip install openai anthropic chromadb
Step 1: Load Your Documents
Your RAG system can only answer questions about information it has seen. The first step is loading the documents that will become your knowledge base.
import os
from pathlib import Path
def load_documents(directory):
"""Load all text files from a directory."""
docs = []
for filepath in Path(directory).glob("*.txt"):
text = filepath.read_text(encoding="utf-8")
docs.append({
"content": text,
"source": filepath.name,
"char_count": len(text)
})
print(f"Loaded {len(docs)} documents")
return docs
# Load your knowledge base
documents = load_documents("./knowledge-base")
In production, you would also support PDFs (PyPDF2), web pages (BeautifulSoup), Markdown, and databases. The pattern is the same: extract text + attach metadata.
Step 2: Chunk the Documents
Documents are too long to embed as single vectors. We split them into focused chunks with overlap to prevent boundary information loss.
import re
def chunk_document(doc, max_words=200, overlap_sentences=1):
"""Split a document into sentence-based chunks with overlap."""
sentences = re.split(r'(?<=[.!?])\s+', doc["content"])
chunks = []
current = []
word_count = 0
for sentence in sentences:
s_words = len(sentence.split())
if word_count + s_words > max_words and current:
chunks.append({
"content": " ".join(current),
"source": doc["source"],
"chunk_index": len(chunks)
})
current = current[-overlap_sentences:]
word_count = sum(len(s.split()) for s in current)
current.append(sentence)
word_count += s_words
if current:
chunks.append({
"content": " ".join(current),
"source": doc["source"],
"chunk_index": len(chunks)
})
return chunks
# Chunk all documents
all_chunks = []
for doc in documents:
all_chunks.extend(chunk_document(doc))
print(f"Created {len(all_chunks)} chunks from {len(documents)} documents")
Step 3: Embed All Chunks
Convert every chunk into a vector using the OpenAI embedding API. Batch processing is significantly cheaper and faster than embedding one at a time.
from openai import OpenAI
client = OpenAI()
def embed_chunks(chunks, batch_size=100):
"""Embed all chunks in batches for efficiency."""
for i in range(0, len(chunks), batch_size):
batch = chunks[i:i + batch_size]
texts = [c["content"] for c in batch]
response = client.embeddings.create(
input=texts,
model="text-embedding-3-small"
)
for j, item in enumerate(response.data):
chunks[i + j]["embedding"] = item.embedding
print(f"Embedded batch {i // batch_size + 1}/{(len(chunks) - 1) // batch_size + 1}")
return chunks
all_chunks = embed_chunks(all_chunks)
print(f"All {len(all_chunks)} chunks embedded (1536 dimensions each)")
Cost estimate: embedding 10,000 chunks of 200 words each costs about $0.06 with text-embedding-3-small. Embedding is the cheapest part of a RAG system.
This lesson is for Pro members
Unlock all 520+ lessons across 52 courses with Academy Pro.
Already a member? Sign in to access your lessons.