Prompt Augmentation

The "A" in RAG. You have retrieved the right documents — now you need to hand them to the LLM in a way that produces accurate, grounded answers instead of confident hallucinations. Prompt engineering for RAG is a discipline of its own. This lesson teaches you the templates, the rules, and the code to get it right.

The Critical Moment

You have retrieved your top-K chunks. They contain the information the user needs. But the answer quality now depends entirely on how you present that context to the LLM. A poorly constructed prompt can produce hallucinations even with perfect retrieval. A well-constructed prompt produces grounded, citable answers every time.

There are three elements to get right: grounding instructions (tell the LLM to use only the context), context formatting (make the context easy to parse), and fallback handling (what to do when the context does not contain the answer).

Same Question, Dramatically Different Answers

Without RAG Context

"What is the refund policy for the Pro plan?"

Based on industry standards, most SaaS companies offer a 30-day money-back guarantee for their Pro plans. You can typically request a refund through the settings page or by emailing support@company.com.

3 hallucinations. Sounds confident but completely fabricated.

With RAG Context

"What is the refund policy for the Pro plan?"

According to the billing documentation, the Pro plan has a 14-day refund window from the date of purchase. To request a refund, contact billing@acme.io with your order number. After 14 days, refunds are handled on a case-by-case basis.

0 hallucinations. Every claim traced to retrieved context.

The difference is not the LLM — it is the prompt. The grounded answer uses a template that forces the model to answer from context, not from its training data.

The RAG Prompt Template

Here is the production-grade template that prevents hallucination:

import anthropic

claude = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a knowledgeable assistant. Follow these rules strictly:

1. Answer based ONLY on the provided context documents.
2. If the context does not contain enough information to answer,
   say "I don't have that information in my knowledge base."
3. Cite the source document for each claim using [Source: filename].
4. If multiple sources agree, mention all of them.
5. If sources contradict each other, note the discrepancy.
6. Keep answers concise — 2-4 sentences for simple questions,
   structured paragraphs for complex ones.
7. Never speculate or fill gaps with general knowledge."""

def build_augmented_prompt(question, chunks):
    """Build a RAG prompt with retrieved context."""

    # Format each chunk with clear delimiters and source
    context_sections = []
    for i, chunk in enumerate(chunks):
        source = chunk.get("source", "unknown")
        score = chunk.get("similarity", 0)
        context_sections.append(
            f"--- Document {i+1} [Source: {source}] (relevance: {score:.2f}) ---\n"
            f"{chunk['content']}"
        )

    context = "\n\n".join(context_sections)

    user_message = f"""Context documents:

{context}

---

Question: {question}"""

    return user_message

def generate_answer(question, chunks):
    """Generate a grounded answer from retrieved chunks."""
    user_message = build_augmented_prompt(question, chunks)

    response = claude.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        temperature=0.1,  # Low temperature for factual answers
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": user_message}]
    )
    return response.content[0].text

Six Rules for RAG Prompts

These rules are the difference between a RAG system that hallucInates 30% of the time and one that hallucInates 3% of the time:

Rule 1: Be Explicit About Grounding

"Answer ONLY based on the provided context" is the most important instruction in any RAG prompt. Without it, the LLM blends context with training data — introducing potential hallucinations even with perfect retrieval.

Rule 2: Handle Missing Information

"If the context doesn't contain the answer, say 'I don't have that information.'" This prevents the model from filling gaps with plausible-sounding inventions. A truthful "I don't know" is infinitely more valuable than a confident hallucination.

Rule 3: Request Citations

"Cite the source for each claim" makes answers verifiable. Users can check the original document. It also catches hallucinations — if the model cites a source that does not exist or does not support the claim, you know something went wrong.

Rule 4: Use Clear Delimiters

Separate context blocks with --- or triple backticks. Label each chunk with its source. This helps the LLM distinguish context from instructions and different documents from each other.

Rule 5: System vs. User Role

Put grounding instructions in the system message and context + question in the user message. The system message sets persistent behavior that the model follows more reliably than instructions mixed into the user message.

Rule 6: Control Output Format

"Answer in 2-3 sentences" or "Use bullet points" keeps responses focused. Long, rambling answers are harder to verify and more likely to contain unsupported claims buried in the text.

🔒

This lesson is for Pro members

Unlock all 520+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.