Advanced RAG Patterns

Basic RAG handles straightforward questions. But real-world queries are messy — vague, multi-step, requiring calculations, or spanning multiple knowledge bases. Four advanced patterns handle these cases: Multi-Step RAG, Self-RAG, RAG+Tools, and Agentic RAG. This lesson teaches you when to use each and how to implement them.

When Basic RAG Is Not Enough

Basic RAG works beautifully for direct questions with clear answers in your knowledge base: "What is the refund policy?" retrieves the policy chunk and generates a grounded answer. But consider these harder cases:

"What causes that disease where you forget things?" → The query is too vague for precise retrieval. Basic RAG might return general cognitive decline articles instead of targeted Alzheimer's content.

"What is 2 + 2?" → This does not need retrieval at all. Fetching documents wastes time and money.

"How much did we spend on marketing last quarter?" → The answer requires a calculation on retrieved data, not just a summary.

"Compare our 2023 and 2024 product roadmaps" → The answer spans multiple document collections and requires synthesis.

Each of these scenarios needs a different pattern. Here is when to use each:

Decision Guide:
Is the query vague or uses informal language? → Multi-Step RAG
Does the query even need retrieval? → Self-RAG
Does the answer need math, API calls, or live data? → RAG + Tools
Does the question span multiple databases or need planning? → Agentic RAG
None of the above? → Basic RAG is fine.

Pattern 1: Multi-Step RAG

Like asking follow-up questions. The first retrieval finds relevant documents, the LLM extracts better keywords from those documents, and a second retrieval uses those refined terms for precise results.

Best for: Vague, colloquial queries that need technical vocabulary. Multi-hop questions requiring information from different sections.

def multi_step_rag(question, rag_search, generate):
    """Refine the query using first-round retrieval, then search again."""

    # Step 1: Initial retrieval with the vague query
    initial_chunks = rag_search(question, top_k=3)
    initial_context = "\n".join([c["content"] for c in initial_chunks])

    # Step 2: Ask the LLM to refine the query
    refined = generate(
        system="Based on the context below, rewrite the user's question "
               "using precise technical terms found in the documents. "
               "Return ONLY the refined query, nothing else.",
        user=f"Context:\n{initial_context}\n\nOriginal query: {question}"
    )

    # Step 3: Second retrieval with the refined query
    final_chunks = rag_search(refined, top_k=5)

    # Step 4: Generate answer from better context
    return generate_grounded_answer(question, final_chunks)

# Example: "that disease where you forget things"
# Step 1 retrieves general cognitive decline articles
# Step 2 refines to "Alzheimer's disease amyloid plaques tau proteins"
# Step 3 retrieves precise Alzheimer's research papers

Pattern 2: Self-RAG

The LLM decides whether it needs to retrieve at all, then self-evaluates the quality of its answer after generating. This saves retrieval costs for simple questions and catches hallucinations through self-critique.

Best for: High-volume systems where many queries are simple and do not need retrieval.

def self_rag(question, rag_search, generate):
    """Let the LLM decide if retrieval is needed, then self-evaluate."""

    # Step 1: Should we retrieve?
    needs_retrieval = generate(
        system="Does this question require looking up specific information "
               "from a knowledge base? Answer YES or NO only.",
        user=question
    ).strip().upper()

    if needs_retrieval == "YES":
        chunks = rag_search(question, top_k=5)
        answer = generate_grounded_answer(question, chunks)
    else:
        answer = generate(system="Answer directly.", user=question)

    # Step 2: Self-evaluate
    evaluation = generate(
        system="Rate the answer's quality 1-5. If below 3, say RETRY.",
        user=f"Q: {question}\nA: {answer}"
    )

    if "RETRY" in evaluation:
        # Force retrieval on retry
        chunks = rag_search(question, top_k=8)
        answer = generate_grounded_answer(question, chunks)

    return answer

🔒

This lesson is for Pro members

Unlock all 520+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.