From Prototype to Production
Building an agent that works on your laptop is lesson 2. Shipping an agent that serves real users, handles failures gracefully, stays within budget, and runs 24/7 — that is this lesson. Production agents need infrastructure that prototypes do not: error recovery, cost tracking, graceful shutdowns, monitoring, and operational runbooks.
Error Handling Architecture
In production, everything fails. APIs time out, rate limits kick in, tools crash, and network connections drop. Your agent needs to handle every failure mode gracefully:
import { Claude, AgentError, BudgetExceededError } from "@anthropic-ai/claude-agent";
async function runAgentSafely(prompt: string) {
const agent = new Claude({
model: "claude-sonnet-4-6",
tools: "defaults",
maxBudgetUsd: 2.00,
maxTurns: 25,
});
try {
const result = await agent.query(prompt);
return {
success: true,
text: result.text,
cost: result.cost,
toolCalls: result.toolCalls,
};
} catch (error) {
if (error instanceof BudgetExceededError) {
// Agent hit the cost limit — not a crash
return { success: false, reason: "budget_exceeded", spent: error.spent };
}
if (error.code === "rate_limit") {
// API rate limit — wait and retry
await sleep(error.retryAfter * 1000);
return runAgentSafely(prompt); // retry once
}
// Unknown error — log and report
console.error("Agent error:", error);
return { success: false, reason: "unknown", error: error.message };
}
}
Abort Controllers
Long-running agents need a way to be stopped — by users, by timeouts, or by your system. Abort controllers provide clean cancellation:
// Create a controller that aborts after 60 seconds
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 60_000);
try {
const result = await agent.query(
"Analyze the codebase and generate a report.",
{ signal: controller.signal } // pass the abort signal
);
clearTimeout(timeout);
console.log(result.text);
} catch (e) {
if (e.name === "AbortError") {
console.log("Agent timed out after 60 seconds.");
}
}