Production Deployment

Architecture patterns, error handling, cost tracking, and monitoring for real agent products

From Prototype to Production

Building an agent that works on your laptop is lesson 2. Shipping an agent that serves real users, handles failures gracefully, stays within budget, and runs 24/7 — that is this lesson. Production agents need infrastructure that prototypes do not: error recovery, cost tracking, graceful shutdowns, monitoring, and operational runbooks.

Real-world analogy: A prototype is a food truck. Production is a restaurant. Both serve food, but the restaurant needs a kitchen that can handle the dinner rush, health inspections, supply chain management, and a plan for when the oven breaks at 7 PM on a Friday.

Error Handling Architecture

In production, everything fails. APIs time out, rate limits kick in, tools crash, and network connections drop. Your agent needs to handle every failure mode gracefully:

TypeScript — production error handling

import { Claude, AgentError, BudgetExceededError } from "@anthropic-ai/claude-agent";

async function runAgentSafely(prompt: string) {
  const agent = new Claude({
    model: "claude-sonnet-4-6",
    tools: "defaults",
    maxBudgetUsd: 2.00,
    maxTurns: 25,
  });

  try {
    const result = await agent.query(prompt);
    return {
      success: true,
      text: result.text,
      cost: result.cost,
      toolCalls: result.toolCalls,
    };
  } catch (error) {
    if (error instanceof BudgetExceededError) {
      // Agent hit the cost limit — not a crash
      return { success: false, reason: "budget_exceeded", spent: error.spent };
    }
    if (error.code === "rate_limit") {
      // API rate limit — wait and retry
      await sleep(error.retryAfter * 1000);
      return runAgentSafely(prompt);  // retry once
    }
    // Unknown error — log and report
    console.error("Agent error:", error);
    return { success: false, reason: "unknown", error: error.message };
  }
}

Abort Controllers

Long-running agents need a way to be stopped — by users, by timeouts, or by your system. Abort controllers provide clean cancellation:

TypeScript — abort controller with timeout

// Create a controller that aborts after 60 seconds
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 60_000);

try {
  const result = await agent.query(
    "Analyze the codebase and generate a report.",
    { signal: controller.signal }  // pass the abort signal
  );
  clearTimeout(timeout);
  console.log(result.text);
} catch (e) {
  if (e.name === "AbortError") {
    console.log("Agent timed out after 60 seconds.");
  }
}

🔒

This lesson is for Pro members

Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.

Go Pro — $4.90/mo ← Back to course

Already a member? Sign in to access your lessons.