📚Academy
likeone
online

Build Your First RAG

Theory is over. In this lesson, you build a complete RAG system from scratch — loading documents, chunking them, embedding them, storing them in a vector database, and querying them with an LLM. Every line of code is explained. By the end, you will have a working system you can adapt to any knowledge base.

What You Are Building

A knowledge-base chatbot that can answer questions about any collection of documents. You will load text files, split them into searchable chunks, embed them with OpenAI, store them in either Chroma (for quick prototyping) or Supabase pgvector (for production), and query them with Claude. The same architecture powers customer support bots, internal knowledge systems, and documentation search at companies of every size.

Prerequisites: Python 3.9+, an OpenAI API key (for embeddings), and an Anthropic API key (for generation). Install dependencies: pip install openai anthropic chromadb

Step 1: Load Your Documents

Your RAG system can only answer questions about information it has seen. The first step is loading the documents that will become your knowledge base.

import os
from pathlib import Path

def load_documents(directory):
    """Load all text files from a directory."""
    docs = []
    for filepath in Path(directory).glob("*.txt"):
        text = filepath.read_text(encoding="utf-8")
        docs.append({
            "content": text,
            "source": filepath.name,
            "char_count": len(text)
        })
    print(f"Loaded {len(docs)} documents")
    return docs

# Load your knowledge base
documents = load_documents("./knowledge-base")

In production, you would also support PDFs (PyPDF2), web pages (BeautifulSoup), Markdown, and databases. The pattern is the same: extract text + attach metadata.

Step 2: Chunk the Documents

Documents are too long to embed as single vectors. We split them into focused chunks with overlap to prevent boundary information loss.

import re

def chunk_document(doc, max_words=200, overlap_sentences=1):
    """Split a document into sentence-based chunks with overlap."""
    sentences = re.split(r'(?<=[.!?])\s+', doc["content"])
    chunks = []
    current = []
    word_count = 0

    for sentence in sentences:
        s_words = len(sentence.split())
        if word_count + s_words > max_words and current:
            chunks.append({
                "content": " ".join(current),
                "source": doc["source"],
                "chunk_index": len(chunks)
            })
            current = current[-overlap_sentences:]
            word_count = sum(len(s.split()) for s in current)
        current.append(sentence)
        word_count += s_words

    if current:
        chunks.append({
            "content": " ".join(current),
            "source": doc["source"],
            "chunk_index": len(chunks)
        })
    return chunks

# Chunk all documents
all_chunks = []
for doc in documents:
    all_chunks.extend(chunk_document(doc))
print(f"Created {len(all_chunks)} chunks from {len(documents)} documents")

Step 3: Embed All Chunks

Convert every chunk into a vector using the OpenAI embedding API. Batch processing is significantly cheaper and faster than embedding one at a time.

from openai import OpenAI

client = OpenAI()

def embed_chunks(chunks, batch_size=100):
    """Embed all chunks in batches for efficiency."""
    for i in range(0, len(chunks), batch_size):
        batch = chunks[i:i + batch_size]
        texts = [c["content"] for c in batch]

        response = client.embeddings.create(
            input=texts,
            model="text-embedding-3-small"
        )

        for j, item in enumerate(response.data):
            chunks[i + j]["embedding"] = item.embedding

        print(f"Embedded batch {i // batch_size + 1}/{(len(chunks) - 1) // batch_size + 1}")

    return chunks

all_chunks = embed_chunks(all_chunks)
print(f"All {len(all_chunks)} chunks embedded (1536 dimensions each)")

Cost estimate: embedding 10,000 chunks of 200 words each costs about $0.06 with text-embedding-3-small. Embedding is the cheapest part of a RAG system.

🔒

This lesson is for Pro members

Unlock all 520+ lessons across 52 courses with Academy Pro.

Already a member? Sign in to access your lessons.

Academy
Built with soul — likeone.ai