Not every Claude request needs to happen in real time. If you are processing 10,000 support tickets overnight, scoring a backlog of resumes, or running nightly content audits, you do not need synchronous responses — you need throughput and a lower bill. That is exactly what the Message Batches API is for: submit up to 100,000 requests in one call, let Anthropic process them asynchronously, and pay half the price of standard API calls.
This guide covers how to create, monitor, and retrieve results from a Claude batch, the cost math that makes it worth using, and when batch processing is the wrong tool for the job.
What Is the Message Batches API?
The Batch API is an asynchronous endpoint that accepts a list of individual Messages API requests, processes them as a group, and returns results once the batch finishes — typically within a few hours, with a hard ceiling of 24 hours per batch. Each request in the batch is independent: different prompts, different models, even different system prompts can live in the same batch. You get a single batch ID to track, and results as a JSONL file you download and parse once processing completes.
The trade-off is latency for cost. You lose the ability to get an answer in seconds, but you cut the per-token price in half — a meaningful saving for any workload that does not require an immediate response.
Creating a Batch in Python
Each item in a batch is a Request object with a custom_id (yours to assign, used to match results back to inputs) and a params object identical to what you would pass to client.messages.create():
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
client = anthropic.Anthropic()
requests = [
Request(
custom_id="ticket-001",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Classify this support ticket: 'My order never arrived.'"}]
)
),
Request(
custom_id="ticket-002",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Classify this support ticket: 'How do I reset my password?'"}]
)
)
]
batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")The custom_id is critical — results come back in whatever order Anthropic processes them, not the order you submitted them in. Your code matches each result to its original request using custom_id, so make it something you can map back to your source data (a database row ID, a ticket number, a document filename).
Creating a Batch in TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "ticket-001",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Classify this support ticket: 'My order never arrived.'" }]
}
},
{
custom_id: "ticket-002",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Classify this support ticket: 'How do I reset my password?'" }]
}
}
]
});
console.log(`Batch ID: ${batch.id}`);Checking Batch Status
Batches move through a small set of processing states: in_progress, canceling, and ended. Poll the batch by ID until processing_status reaches ended:
import time
def wait_for_batch(batch_id: str, poll_seconds: int = 30):
while True:
batch = client.messages.batches.retrieve(batch_id)
if batch.processing_status == "ended":
return batch
print(f"Status: {batch.processing_status} — checking again in {poll_seconds}s")
time.sleep(poll_seconds)
finished_batch = wait_for_batch(batch.id)
print(finished_batch.request_counts)request_counts breaks down how many requests in the batch succeeded, errored, canceled, and expired — check this before assuming every request came back clean.
Retrieving Batch Results
Once a batch reaches ended, stream the results as JSONL — one JSON object per line, each tagged with the custom_id you assigned:
for result in client.messages.batches.results(finished_batch.id):
if result.result.type == "succeeded":
message = result.result.message
print(f"{result.custom_id}: {message.content[0].text}")
elif result.result.type == "errored":
print(f"{result.custom_id} failed: {result.result.error}")
elif result.result.type == "expired":
print(f"{result.custom_id} expired before processing")Results are available for 29 days after the batch completes, then they are deleted. Download and persist anything you need to keep before that window closes.
Batch Limits
| Limit | Value |
|---|---|
| Max requests per batch | 100,000 |
| Max batch size | 256 MB total request payload |
| Processing window | Usually a few hours, up to 24 hours |
| Results availability | 29 days after completion |
| Streaming within a batch request | Not supported — each request is non-streaming |
Cost Savings
Batch requests cost 50% of standard synchronous API pricing, on both input and output tokens, for every model. There is no minimum batch size required to qualify — even a batch of 10 requests gets the discount. The math is simple: whatever a workload costs synchronously, it costs half that through the Batch API, with no change to output quality.
For any workload with more than a handful of requests and no real-time requirement, batching is close to a free cost cut. Combine it with prompt caching for compounding savings — a batch of requests that share a large static system prompt gets the caching discount on top of the batch discount.
Handling Per-Request Errors
A batch does not fail as a unit — individual requests inside it can fail while others succeed. Always check result.result.type for every item rather than assuming success:
succeeded, errored, expired = [], [], []
for result in client.messages.batches.results(finished_batch.id):
if result.result.type == "succeeded":
succeeded.append(result)
elif result.result.type == "errored":
errored.append(result)
elif result.result.type == "expired":
expired.append(result)
print(f"{len(succeeded)} succeeded, {len(errored)} errored, {len(expired)} expired")
# Retry errored and expired requests in a new batch
if errored or expired:
retry_requests = [build_request(r.custom_id) for r in errored + expired]
retry_batch = client.messages.batches.create(requests=retry_requests)Canceling a Batch
If you submit a batch by mistake or need to stop processing early, cancel it by ID. Requests already completed are kept; requests still queued are dropped:
client.messages.batches.cancel(batch.id)The batch moves to canceling status immediately, then ended once in-flight requests finish. Results for anything already processed before cancellation are still retrievable.
Batch vs Streaming vs Synchronous: When to Use Each
| Use Case | Recommendation |
|---|---|
| User-facing chat | Synchronous or streaming — needs real-time response |
| Large one-off backlog (classification, scoring, extraction) | Batch — 50% cheaper, latency does not matter |
| Nightly / scheduled jobs | Batch — submit before you sleep, results ready by morning |
| Interactive agent loop | Synchronous — each step depends on the previous result |
| Bulk document processing (thousands of files) | Batch — avoids hitting rate limits from sequential sync calls |
The Bottom Line
The Batch API is the easiest cost optimization in the entire Claude API surface — half price, same model quality, zero code complexity beyond tracking a custom_id and polling for completion. Use it for anything that does not need a response in the next few seconds: classification jobs, content audits, backlog processing, nightly reports. Anything interactive stays synchronous or streamed.
For related cost and performance patterns, see our guides on prompt caching (stacks with batch discounts), streaming (the right choice for real-time UIs), and structured output (pairs well with batch classification and extraction jobs).