Claude Structured Output: JSON Guide (2026)

Force Claude to return valid JSON every time. Complete guide to structured output: system prompts, Pydantic models, TypeScript schemas, and real examples.

Claude does not have a native "JSON mode" toggle like some other APIs. What it has is better: flexible, instruction-driven structured output that works with any schema you define — simple dictionaries, Pydantic models, nested objects, or TypeScript interfaces. When configured correctly, you get valid JSON every time.

This guide covers the complete pattern: system prompt setup, forcing JSON, validation, error handling, and the library options that make it production-ready.

The Core Pattern: System Prompt + Instruction

The most reliable way to get structured JSON from Claude is to specify the exact format in your system prompt and repeat the instruction in the user message. Two-layer reinforcement dramatically reduces malformed output:

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="""You are a data extraction assistant.
Always respond with valid JSON only. No explanation, no markdown fences, no extra text.
Return exactly the fields requested in the exact format specified.""",
    messages=[
        {
            "role": "user",
            "content": """Extract the contact information from this text and return JSON:

Text: 'Please reach out to Sarah Johnson at [email protected] or call 415-555-0192.'

Return this exact JSON structure:
{"name": "string", "email": "string or null", "phone": "string or null"}"""
        }
    ]
)

# Parse the response
data = json.loads(response.content[0].text)
print(data)  # {'name': 'Sarah Johnson', 'email': '[email protected]', 'phone': '415-555-0192'}

Two rules that prevent most failures:

Say "valid JSON only" in the system prompt — not just "return JSON"
Include "No explanation, no markdown fences, no extra text" — Claude sometimes wraps JSON in a code block unless told not to

Using Pydantic for Validated Output

For production systems, parse Claude output directly into Pydantic models. This gives you type validation, clear error messages, and IDE autocompletion:

from pydantic import BaseModel, EmailStr
from typing import Optional
import anthropic
import json

client = anthropic.Anthropic()

class ContactInfo(BaseModel):
    name: str
    email: Optional[EmailStr] = None
    phone: Optional[str] = None
    company: Optional[str] = None

def extract_contact(text: str) -> ContactInfo:
    schema = ContactInfo.model_json_schema()

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system="You extract contact information. Return valid JSON only matching the schema provided. No extra text.",
        messages=[
            {
                "role": "user",
                "content": f"Extract contact info from this text. Schema: {json.dumps(schema)}\n\nText: {text}"
            }
        ]
    )

    raw = response.content[0].text.strip()
    return ContactInfo.model_validate_json(raw)

result = extract_contact("Contact James Lee at [email protected], VP of Engineering at TechCo")
print(result.name)   # James Lee
print(result.email)  # [email protected]

Passing the Pydantic JSON schema directly to Claude removes ambiguity — Claude sees the exact field names, types, and constraints it must match.

TypeScript with Zod

The TypeScript equivalent uses Zod schemas for the same pattern:

import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const client = new Anthropic();

const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  currency: z.string(),
  inStock: z.boolean(),
  category: z.string().optional(),
});

type Product = z.infer<typeof ProductSchema>;

async function extractProduct(text: string): Promise<Product> {
  const schema = zodToJsonSchema(ProductSchema);

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 512,
    system: "Extract product data. Return valid JSON only matching the schema. No extra text.",
    messages: [
      {
        role: "user",
        content: `Extract product info. Schema: ${JSON.stringify(schema)}\n\nText: ${text}`,
      },
    ],
  });

  const raw = response.content[0].text.trim();
  return ProductSchema.parse(JSON.parse(raw));
}

const product = await extractProduct(
  "The AeroPress Coffee Maker is $29.95 USD, in stock, category: Kitchen"
);
console.log(product.price); // 29.95

Handling Markdown Fences

Even with explicit instructions, Claude occasionally wraps JSON in markdown code fences (```json ... ```). Add a cleanup step before parsing:

import re

def clean_json_response(text: str) -> str:
    """Strip markdown fences from Claude JSON output."""
    text = text.strip()
    # Remove ```json ... ``` or ``` ... ``` wrappers
    match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
    if match:
        return match.group(1).strip()
    return text

raw = response.content[0].text
clean = clean_json_response(raw)
data = json.loads(clean)

This one-liner handles the majority of wrapper cases and makes your parsing resilient regardless of whether Claude adds fencing.

Prefill the Assistant Turn

A powerful technique for forcing JSON output: start the assistant response with an opening brace. Claude continues from where you left off, almost always producing valid JSON:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You extract data. Return JSON only.",
    messages=[
        {"role": "user", "content": f"Extract invoice data from: {invoice_text}"},
        {"role": "assistant", "content": "{"}  # <-- prefill the opening brace
    ]
)

# Claude's response continues from "{", so prepend it back
raw_json = "{" + response.content[0].text
data = json.loads(raw_json)

This is the most reliable technique for strict JSON output. When Claude sees it has already begun a JSON object, it completes the structure rather than adding explanation text. Note: strip markdown cleanup is not needed here since the prefill bypasses Claude adding fences.

Complex Nested Structures

Claude handles nested objects and arrays reliably when you provide a clear example:

system = """You analyze customer feedback. Return valid JSON only. No extra text.
Format:
{
  "sentiment": "positive" | "negative" | "neutral",
  "score": 1-10,
  "themes": ["string"],
  "action_items": [
    {"priority": "high" | "medium" | "low", "action": "string"}
  ]
}"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=system,
    messages=[{"role": "user", "content": f"Analyze: {feedback_text}"}]
)

result = json.loads(response.content[0].text)
print(result["sentiment"])      # "positive"
print(result["action_items"])   # [{"priority": "high", "action": "..."}, ...]

Batch Structured Extraction

For processing many documents, structure your prompt to extract from all documents in one call and return an array:

documents = [
    "Invoice #1042 from Acme Corp, $4,500 due 2026-07-15",
    "Invoice #1043 from Globex Inc, $2,200 due 2026-07-20",
    "Invoice #1044 from Initech LLC, $8,100 due 2026-07-10"
]

batch_text = "\n---\n".join(f"[{i+1}] {doc}" for i, doc in enumerate(documents))

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="Extract invoice data. Return a JSON array with one object per document. Fields: invoice_number, vendor, amount, currency, due_date. No extra text.",
    messages=[{"role": "user", "content": batch_text}]
)

invoices = json.loads(response.content[0].text)
for inv in invoices:
    print(f"{inv['invoice_number']}: ${inv['amount']} due {inv['due_date']}")

Error Handling and Retries

Even with best practices, JSON parsing will occasionally fail. Build retry logic that includes the error in the next attempt:

import json
from typing import Any

def structured_call(prompt: str, schema_description: str, max_retries: int = 2) -> Any:
    messages = [{"role": "user", "content": prompt}]

    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=f"Return valid JSON only. Schema: {schema_description}. No extra text.",
            messages=messages
        )

        raw = clean_json_response(response.content[0].text)

        try:
            return json.loads(raw)
        except json.JSONDecodeError as e:
            if attempt == max_retries:
                raise
            # Include the failed output and error in the retry
            messages.append({"role": "assistant", "content": raw})
            messages.append({
                "role": "user",
                "content": f"That response was not valid JSON. Error: {e}. Please return only valid JSON matching the schema."
            })

    raise RuntimeError("Max retries exceeded")

This retry pattern uses the failed response as context, which often resolves the issue in one additional call.

Temperature for Structured Output

Set temperature to 0 for structured extraction tasks. Higher temperatures increase the chance that Claude adds explanatory text, reformats the structure, or deviates from the schema. Deterministic sampling is your friend for any pipeline that parses Claude output programmatically.

When to Use Streaming with Structured Output

Streaming and JSON parsing conflict — you cannot parse a partial JSON object. For structured output, use non-streaming calls and parse the complete response. The exception: if you are streaming to a frontend that displays the JSON incrementally, stream the raw text and parse only after the stream completes.

The Bottom Line

Claude produces reliable structured output with the right setup: specify the schema explicitly in the system prompt, reinforce it in the user message, use the prefill technique for maximum reliability, and validate output with Pydantic or Zod. Add a cleanup step for markdown fences and retry logic for production resilience.

For reducing the token cost of repeated extraction pipelines, combine structured output with prompt caching — cache your schema instructions and system prompt, pay full price only once per 5-minute window.