← Back to Blog

Claude Agent SDK: How to Build Autonomous AI Agents in Python (2026)

Build your first autonomous AI agent in Python with Anthropic's Agent SDK. Complete tutorial with real code: tool use, multi-step planning, error recovery, and production deployment. From zero to working agent in one guide.


Anthropic shipped the Claude Agent SDK in May 2025 and quietly changed the trajectory of AI agent development. Before this, building an autonomous AI agent meant stitching together LangChain, custom tool registries, retry logic, and prayer. Now you import one package, define your tools, and let Claude handle the planning, execution, and error recovery.

This is not a toy framework. It is the same architecture that powers Claude Code, Anthropic's own coding agent that ships production software. The SDK exposes that capability as a Python library you can build on directly.

We use the Agent SDK across our entire infrastructure at Like One — from autonomous content generation to grant research to codebase maintenance. This guide covers everything you need to build production agents, from first import to deployment.

What the Claude Agent SDK Actually Is

The Claude Agent SDK is a Python framework for building AI agents that can plan multi-step tasks, use tools, handle errors, and operate with minimal human oversight. It wraps the Claude API with an agentic loop — the model calls tools, observes results, decides what to do next, and repeats until the task is complete.

The key components:

  • Agent — The core class. Wraps a Claude model with a system prompt, tools, and configuration.
  • Tools — Python functions decorated as tools that the agent can call. File I/O, API calls, database queries, shell commands — anything you can write in Python.
  • Agentic loop — The execution engine. The agent receives a task, plans its approach, calls tools, observes results, and iterates until done. No manual orchestration required.
  • MCP integration — Native support for Model Context Protocol servers, so your agent can connect to any MCP-compatible tool without writing wrapper code.
  • Guardrails — Built-in safety controls for tool approval, output validation, and scope limiting.

The SDK is open source, Apache 2.0 licensed, and available on PyPI. Install it with pip install claude-agent-sdk.

Your First Agent in 20 Lines

Here is a minimal agent that can read files and answer questions about code:

from claude_agent_sdk import Agent, tool
import os

@tool
def read_file(path: str) -> str:
    """Read a file and return its contents."""
    with open(path, 'r') as f:
        return f.read()

@tool
def list_directory(path: str) -> list[str]:
    """List files in a directory."""
    return os.listdir(path)

agent = Agent(
    model="claude-sonnet-4-6",
    tools=[read_file, list_directory],
    system="You are a code analysis agent. Read files and answer questions about the codebase."
)

result = agent.run("What does the main function do in ./src/app.py?")
print(result)

That is it. The agent receives the task, decides it needs to read the file, calls read_file, analyzes the contents, and returns a coherent answer. If the file imports other modules, it might call list_directory and read_file again to understand dependencies. The agentic loop handles all of this automatically.

Compare this to the equivalent LangChain implementation, which requires an LLM wrapper, a tool registry, an agent executor, a prompt template, output parsers, and roughly 60 lines of boilerplate before you get the same result. The SDK eliminates the middleware.

Tools: The Agent's Capabilities

Tools are Python functions that the agent can call during execution. The @tool decorator registers them with type information and documentation that Claude uses to decide when and how to call them.

Good tool design follows three principles:

  1. Single responsibility. Each tool does one thing. A search_database tool searches. A write_file tool writes. Do not combine search and write into one tool — the agent is better at composing simple tools than parsing complex ones.
  2. Clear documentation. The docstring is the tool's interface contract. Claude reads it to understand what the tool does, what parameters it accepts, and what it returns. Vague docstrings produce vague tool usage.
  3. Explicit error handling. Return error messages as strings rather than raising exceptions. The agent can reason about error messages and retry with different parameters. An unhandled exception terminates the loop.

Here is a production-quality tool example:

@tool
def query_database(sql: str, database: str = "main.db") -> str:
    """Execute a read-only SQL query against the specified SQLite database.
    Returns results as a formatted table. Maximum 100 rows returned.
    Only SELECT queries are allowed — mutations will be rejected."""
    import sqlite3
    if not sql.strip().upper().startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."
    try:
        conn = sqlite3.connect(database)
        cursor = conn.execute(sql)
        rows = cursor.fetchmany(100)
        columns = [d[0] for d in cursor.description]
        conn.close()
        if not rows:
            return "Query returned no results."
        header = " | ".join(columns)
        lines = [header, "-" * len(header)]
        for row in rows:
            lines.append(" | ".join(str(v) for v in row))
        return "\n".join(lines)
    except Exception as e:
        return f"Query error: {e}"

Notice the safety constraints baked into the tool itself. Read-only enforcement, row limits, and error-as-string returns. The agent does not need to know about these constraints — they are enforced at the tool level. This is defense in depth.

MCP Integration: Connect to Everything

The Agent SDK has native MCP support, which means your agent can connect to any MCP server without writing tool wrappers. This is the killer feature for production systems.

MCP servers expose tools, resources, and prompts through a standardized protocol. There are MCP servers for GitHub, Slack, databases, file systems, browsers, and hundreds of other integrations. Instead of writing a custom GitHub tool, you point your agent at the GitHub MCP server and it gets full API access automatically.

from claude_agent_sdk import Agent
from claude_agent_sdk.mcp import MCPServerStdio

agent = Agent(
    model="claude-sonnet-4-6",
    mcp_servers=[
        MCPServerStdio(command="npx", args=["-y", "@modelcontextprotocol/server-github"]),
        MCPServerStdio(command="npx", args=["-y", "@modelcontextprotocol/server-filesystem", "./src"]),
    ],
    system="You are a development agent with access to GitHub and the local filesystem."
)

result = agent.run("Review the latest PR and check if the changed files follow our coding conventions.")

The agent discovers the available MCP tools at startup, uses them as needed during execution, and handles the protocol details internally. You never write serialization code, manage connections, or parse responses. If you have built your own MCP server, it plugs in the same way.

This is where the ecosystem effects get powerful. Every MCP server ever built is instantly available to your agent. The security considerations still apply — audit any MCP server before connecting it to an autonomous agent — but the integration cost drops to near zero.

The Agentic Loop: How Agents Think

The agentic loop is the execution engine that makes autonomous behavior possible. Understanding how it works is essential for debugging and optimizing your agents.

Here is what happens when you call agent.run():

  1. Planning. Claude receives the task and its available tools. It generates a plan — either explicitly or implicitly through its first tool call.
  2. Tool execution. The agent calls a tool. The SDK executes the function and captures the result.
  3. Observation. The tool result is appended to the conversation. Claude reads it and decides what to do next.
  4. Iteration. Steps 2-3 repeat until Claude determines the task is complete and returns a final response.
  5. Termination. The loop ends when Claude responds without a tool call, indicating it has finished the task.

This is the same agentic loop pattern used across all modern AI agent frameworks. The difference is execution quality — Claude's reasoning about when to use which tool, how to recover from errors, and when to stop is significantly better than wrapper-based approaches that bolt reasoning onto weaker models.

You can configure loop behavior with parameters:

agent = Agent(
    model="claude-opus-4-6",
    tools=[...],
    max_turns=50,          # Maximum tool call iterations
    system="..."
)

For complex tasks, Claude Opus 4.6 is the right model choice. It handles longer reasoning chains, makes fewer tool call errors, and produces higher quality output on ambiguous tasks. For simpler, high-volume tasks, Claude Sonnet 4.6 offers the best speed-to-quality ratio. Claude Haiku 4.5 works for classification, routing, and lightweight operations where cost matters more than depth.

Multi-Agent Architectures

Single agents hit limits on complex workflows. The solution is multi-agent architectures where specialized agents coordinate to complete larger tasks. The SDK supports this through agent composition.

The two primary patterns:

Orchestrator-worker pattern. A planning agent breaks the task into subtasks and delegates to specialized worker agents. The orchestrator uses a powerful model (Opus) while workers use faster models (Sonnet or Haiku) for execution.

from claude_agent_sdk import Agent, tool

researcher = Agent(
    model="claude-sonnet-4-6",
    tools=[web_search, read_url],
    system="You are a research agent. Find and summarize information."
)

writer = Agent(
    model="claude-sonnet-4-6",
    tools=[read_file, write_file],
    system="You are a writing agent. Create content based on research provided."
)

@tool
def delegate_research(query: str) -> str:
    """Delegate a research task to the research agent."""
    return researcher.run(query)

@tool
def delegate_writing(brief: str) -> str:
    """Delegate a writing task to the writing agent."""
    return writer.run(brief)

orchestrator = Agent(
    model="claude-opus-4-6",
    tools=[delegate_research, delegate_writing],
    system="You are a content director. Break tasks into research and writing phases."
)

result = orchestrator.run("Write a blog post about accessibility testing trends in 2026.")

Pipeline pattern. Agents execute in sequence, each transforming the output of the previous one. Research agent feeds writer agent feeds editor agent feeds publisher agent. This is simpler to debug because each stage has clear inputs and outputs.

Multi-agent systems introduce coordination complexity. The orchestrator must pass enough context to workers without overwhelming their context windows. Workers must return structured enough output for the orchestrator to synthesize. Start with single agents and only add multi-agent coordination when you hit genuine capability limits.

Guardrails and Safety

Autonomous agents need constraints. An agent with unrestricted file system access, network capabilities, and shell execution is a security incident waiting to happen. The SDK provides multiple layers of control.

Tool-level safety. Build constraints into tools themselves. Read-only database access, allowlisted directories for file operations, rate limits on API calls. The agent never sees these constraints — they are enforced at execution time.

Human-in-the-loop. Configure tools that require approval before execution. Destructive operations (file deletion, database writes, external API calls) can require human confirmation while read operations proceed automatically.

System prompt boundaries. The system prompt defines what the agent should and should not do. Claude respects these boundaries more reliably than any other model — Anthropic's constitutional AI training makes instruction-following a core capability, not an afterthought.

Output validation. Post-process agent output to catch policy violations, PII leakage, or off-topic responses before they reach the user. This is your last line of defense and should be implemented for any user-facing agent.

The general principle: give the agent the minimum capabilities it needs to complete its task. An agent that reviews code does not need file write access. An agent that answers questions does not need shell execution. Capability restriction is not a limitation — it is architecture.

Persistent Memory for Agents

Agents without memory repeat the same discoveries every session. They rediscover your project structure, re-learn your preferences, and re-investigate solved problems. Adding persistent memory transforms a stateless tool into a system that compounds knowledge.

The simplest approach is a memory tool:

import json

MEMORY_FILE = "agent_memory.json"

def load_memory() -> dict:
    try:
        with open(MEMORY_FILE, 'r') as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

@tool
def remember(key: str, value: str) -> str:
    """Store a fact or decision for future sessions."""
    memory = load_memory()
    memory[key] = {"value": value, "timestamp": str(datetime.now())}
    with open(MEMORY_FILE, 'w') as f:
        json.dump(memory, f, indent=2)
    return f"Stored: {key}"

@tool
def recall(query: str) -> str:
    """Retrieve stored facts matching the query."""
    memory = load_memory()
    matches = {k: v for k, v in memory.items() if query.lower() in k.lower()}
    if not matches:
        return "No matching memories found."
    return json.dumps(matches, indent=2)

For production systems, graduate to a hybrid memory architecture with vector search, structured storage, and temporal decay. The memory system we built at Like One stores over 900 entries with semantic search across 2,900+ vectors, graph-connected knowledge, and automatic staleness detection. Every session starts by loading relevant context from memory, and every discovery gets written back.

Production Deployment Patterns

Getting an agent working locally is step one. Deploying it reliably is where most teams stall. Here are the patterns that work.

Daemon agents. Long-running agents that execute on a schedule or in response to events. Use launchd (macOS), systemd (Linux), or container orchestration. Our content agent runs as a scheduled daemon, checking for content tasks and executing them without human involvement.

API-backed agents. Wrap the agent in a FastAPI or Flask endpoint. External systems send tasks via HTTP, the agent processes them, and returns results. Add a task queue (Redis, SQS) for async processing when tasks take longer than an HTTP timeout.

CLI agents. Package the agent as a command-line tool for developer workflows. Accept task descriptions as arguments, output results to stdout. This is the simplest deployment model and often the right one for internal tools.

Cost management. Agent loops consume tokens proportional to the number of iterations and tool results. A 10-turn agent conversation with Claude Sonnet 4.6 costs roughly $0.05-0.15. With Opus 4.6, expect $0.30-1.00 for complex tasks. Set max_turns to prevent runaway loops, and use Haiku for subtasks where quality requirements are lower.

Observability. Log every tool call, its arguments, and its results. When an agent produces wrong output, the tool call log is your debugging lifeline. The SDK provides hooks for intercepting the loop at each stage.

Agent SDK vs LangChain vs CrewAI

The agent framework landscape is crowded. Here is how the Claude Agent SDK compares to the major alternatives.

vs LangChain. LangChain is a general-purpose framework that supports multiple LLM providers. It is more flexible but significantly more complex. The abstraction layers (chains, agents, callbacks, memory modules) add cognitive overhead without proportional capability gains. If you are building exclusively with Claude, the Agent SDK is simpler, faster, and produces better results. If you need multi-provider support, LangChain still has a role.

vs CrewAI. CrewAI focuses on multi-agent orchestration with role-based agents. It is opinionated about agent architecture in ways that can be helpful for prototyping but limiting for production. The Claude Agent SDK gives you the primitives to build your own orchestration patterns without being locked into CrewAI's abstractions.

vs building from scratch. You can build an agentic loop with raw API calls in about 100 lines. The SDK saves you from reimplementing tool parsing, error recovery, MCP integration, and loop management. It is not a heavy framework — it is a well-tested implementation of patterns you would build yourself.

The honest answer: if you are building with Claude, use the Agent SDK. If you need multiple providers, evaluate LangChain or LiteLLM. If you need opinionated multi-agent orchestration for prototyping, try CrewAI. Do not use more than one.

Real-World Agent Architectures

Abstract patterns are useful. Concrete examples are better. Here are three agent architectures running in production.

Code review agent. Monitors a GitHub repository for new pull requests. When a PR is opened, the agent reads the diff, checks for security vulnerabilities, style violations, and logical errors, then posts a review comment. Uses the GitHub MCP server for repository access and a custom security_scan tool that runs static analysis. Runs as a GitHub Actions workflow triggered on PR events.

Content research agent. Receives a topic, searches the web for current information, reads and summarizes sources, cross-references claims against a fact database, and produces a research brief. Uses web search, URL reader, and fact-checking tools. The output feeds into a writing agent that produces the final content. Total pipeline cost: approximately $0.20 per article.

Customer support agent. Handles incoming support tickets by searching a knowledge base, querying order history, and drafting responses. Escalates to humans for refund requests over a threshold or complaints that require empathy beyond template responses. Uses MCP servers for the CRM and knowledge base. Resolves 60 percent of tickets without human intervention.

None of these architectures are revolutionary individually. The revolution is that they each took less than a day to build. The SDK handles the hard parts — tool orchestration, error recovery, context management — while you focus on the domain logic that makes the agent useful.

Common Mistakes and How to Avoid Them

After building dozens of agents on this SDK, these are the mistakes that cost the most time:

  • Too many tools. An agent with 30 tools spends more tokens deciding which tool to use than actually using them. Keep tool counts under 10 per agent. Split into multi-agent architectures if you need more capabilities.
  • Vague system prompts. "You are a helpful assistant" produces generic behavior. "You are a Python code reviewer. Check for security issues, type errors, and PEP 8 violations. Respond with specific line numbers and fixes" produces targeted behavior. Specificity in the system prompt is the highest-leverage optimization.
  • No exit conditions. Without max_turns, an agent stuck on an impossible task will loop until it hits API limits. Always set a maximum. 20-30 turns handles most tasks. 50 for complex multi-step workflows.
  • Ignoring cost. A single agent run can cost anywhere from $0.01 to $5.00 depending on model, turns, and context size. Monitor costs from day one. Use cheaper models for subtasks. Cache tool results when possible.
  • Testing in production. Build a test harness with known inputs and expected outputs. Run it before every deployment. Agent behavior is probabilistic — the same input can produce different tool call sequences. Test for outcomes, not for exact execution paths.

Getting Started Today

Here is the fastest path from zero to a working agent:

  1. Install the SDK. pip install claude-agent-sdk. Set your ANTHROPIC_API_KEY environment variable.
  2. Define two tools. Start with file reading and web search. These cover most research and analysis tasks.
  3. Write a specific system prompt. Tell the agent exactly what it is, what it should do, and what it should not do.
  4. Run it. Give it a real task from your workflow. Watch the tool calls. Observe where it succeeds and where it struggles.
  5. Iterate. Add tools for the capabilities it needs. Refine the system prompt based on observed behavior. Add guardrails for any destructive operations.

The Agent SDK is not the future of AI development. It is the present. Every week you spend building custom orchestration code is a week you could spend building the domain logic that makes your agent uniquely valuable. Start with the SDK. Build the agent. Ship it.

For the broader context on AI agent patterns, read our guide to agentic loops. For framework comparisons, see our agent framework analysis. And for connecting your agent to external tools via MCP, start with our MCP server tutorial.


Keep learning — for free

52 AI courses. 520+ lessons. No paywall for starters.

Need help building this?

We build MCP servers, Claude workflows, and AI agents for teams. Strategy calls start at $150/hr.