What is Claude's Batch API?

The Message Batches API is an asynchronous endpoint that accepts up to 100,000 individual Claude requests in one submission, processes them as a group (usually within a few hours, up to 24 hours), and returns results as a JSONL file at half the cost of standard synchronous API calls.

How much cheaper is Claude's Batch API than regular requests?

Batch requests cost 50% of standard synchronous pricing on both input and output tokens, for every Claude model. There is no minimum batch size to qualify for the discount.

How long does a Claude batch take to process?

Most batches complete within a few hours, with a hard ceiling of 24 hours. Poll the batch by ID and check processing_status until it reaches 'ended' rather than assuming a fixed completion time.

How do I check if a Claude batch is finished?

Call client.messages.batches.retrieve(batch_id) and check the processing_status field. It moves through in_progress, canceling, and ended. Poll on an interval (e.g. every 30 seconds) until processing_status equals 'ended', then retrieve results.

What happens if one request in my Claude batch fails?

A batch does not fail as a whole — each request succeeds or fails independently. Check result.result.type for every item in the results: 'succeeded', 'errored', 'canceled', or 'expired'. Use batch.request_counts to see the breakdown before processing individual results.

How long are Claude batch results available?

Batch results are available for 29 days after the batch completes. After that window, they are deleted. Download and persist any results you need to keep before the 29-day expiration.

Can I cancel a Claude batch after submitting it?

Yes. Call client.messages.batches.cancel(batch_id). The batch moves to 'canceling' status, then to 'ended' once any in-flight requests finish. Requests already completed before cancellation remain in the results; queued requests that had not started are dropped.

Does Claude's Batch API support streaming responses?

No. Every request inside a batch is non-streaming — you get the complete response for each request in the results file once the batch ends, not incremental chunks. For real-time streaming, use the standard synchronous Messages API with the stream() method instead.

When should I use Claude's Batch API instead of synchronous requests?

Use batch processing for large volumes of independent requests with no real-time requirement: classification jobs, content audits, backlog scoring, nightly reports, or bulk document extraction. Use synchronous or streaming requests for anything interactive, like user-facing chat or multi-step agent loops where each step depends on the previous result.

Claude Batch API Guide: 50% Cost Savings (2026)

Q: What is custom_id used for in the Batch API?

custom_id is a value you assign to each request in a batch so you can match results back to their original inputs. Results are returned in whatever order Anthropic processes them, not the submission order, so custom_id (a database row ID, ticket number, or filename) is the only reliable way to reconnect a result to its source.

How Claude's Batch API works: 50% cost savings, async processing, polling for status, JSONL results, and Python/TypeScript code examples.

Not every Claude request needs to happen in real time. If you are processing 10,000 support tickets overnight, scoring a backlog of resumes, or running nightly content audits, you do not need synchronous responses — you need throughput and a lower bill. That is exactly what the Message Batches API is for: submit up to 100,000 requests in one call, let Anthropic process them asynchronously, and pay half the price of standard API calls.

This guide covers how to create, monitor, and retrieve results from a Claude batch, the cost math that makes it worth using, and when batch processing is the wrong tool for the job.

What Is the Message Batches API?

The Batch API is an asynchronous endpoint that accepts a list of individual Messages API requests, processes them as a group, and returns results once the batch finishes — typically within a few hours, with a hard ceiling of 24 hours per batch. Each request in the batch is independent: different prompts, different models, even different system prompts can live in the same batch. You get a single batch ID to track, and results as a JSONL file you download and parse once processing completes.

The trade-off is latency for cost. You lose the ability to get an answer in seconds, but you cut the per-token price in half — a meaningful saving for any workload that does not require an immediate response.

Creating a Batch in Python

Each item in a batch is a Request object with a custom_id (yours to assign, used to match results back to inputs) and a params object identical to what you would pass to client.messages.create():

import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

requests = [
    Request(
        custom_id="ticket-001",
        params=MessageCreateParamsNonStreaming(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Classify this support ticket: 'My order never arrived.'"}]
        )
    ),
    Request(
        custom_id="ticket-002",
        params=MessageCreateParamsNonStreaming(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Classify this support ticket: 'How do I reset my password?'"}]
        )
    )
]

batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

The custom_id is critical — results come back in whatever order Anthropic processes them, not the order you submitted them in. Your code matches each result to its original request using custom_id, so make it something you can map back to your source data (a database row ID, a ticket number, a document filename).

Creating a Batch in TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const batch = await client.messages.batches.create({
  requests: [
    {
      custom_id: "ticket-001",
      params: {
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: "Classify this support ticket: 'My order never arrived.'" }]
      }
    },
    {
      custom_id: "ticket-002",
      params: {
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: "Classify this support ticket: 'How do I reset my password?'" }]
      }
    }
  ]
});

console.log(`Batch ID: ${batch.id}`);

Checking Batch Status

Batches move through a small set of processing states: in_progress, canceling, and ended. Poll the batch by ID until processing_status reaches ended:

import time

def wait_for_batch(batch_id: str, poll_seconds: int = 30):
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        if batch.processing_status == "ended":
            return batch
        print(f"Status: {batch.processing_status} — checking again in {poll_seconds}s")
        time.sleep(poll_seconds)

finished_batch = wait_for_batch(batch.id)
print(finished_batch.request_counts)

request_counts breaks down how many requests in the batch succeeded, errored, canceled, and expired — check this before assuming every request came back clean.

Retrieving Batch Results

Once a batch reaches ended, stream the results as JSONL — one JSON object per line, each tagged with the custom_id you assigned:

for result in client.messages.batches.results(finished_batch.id):
    if result.result.type == "succeeded":
        message = result.result.message
        print(f"{result.custom_id}: {message.content[0].text}")
    elif result.result.type == "errored":
        print(f"{result.custom_id} failed: {result.result.error}")
    elif result.result.type == "expired":
        print(f"{result.custom_id} expired before processing")

Results are available for 29 days after the batch completes, then they are deleted. Download and persist anything you need to keep before that window closes.

Batch Limits

Limit	Value
Max requests per batch	100,000
Max batch size	256 MB total request payload
Processing window	Usually a few hours, up to 24 hours
Results availability	29 days after completion
Streaming within a batch request	Not supported — each request is non-streaming

Cost Savings

Batch requests cost 50% of standard synchronous API pricing, on both input and output tokens, for every model. There is no minimum batch size required to qualify — even a batch of 10 requests gets the discount. The math is simple: whatever a workload costs synchronously, it costs half that through the Batch API, with no change to output quality.

For any workload with more than a handful of requests and no real-time requirement, batching is close to a free cost cut. Combine it with prompt caching for compounding savings — a batch of requests that share a large static system prompt gets the caching discount on top of the batch discount.

Handling Per-Request Errors

A batch does not fail as a unit — individual requests inside it can fail while others succeed. Always check result.result.type for every item rather than assuming success:

succeeded, errored, expired = [], [], []

for result in client.messages.batches.results(finished_batch.id):
    if result.result.type == "succeeded":
        succeeded.append(result)
    elif result.result.type == "errored":
        errored.append(result)
    elif result.result.type == "expired":
        expired.append(result)

print(f"{len(succeeded)} succeeded, {len(errored)} errored, {len(expired)} expired")

# Retry errored and expired requests in a new batch
if errored or expired:
    retry_requests = [build_request(r.custom_id) for r in errored + expired]
    retry_batch = client.messages.batches.create(requests=retry_requests)

Canceling a Batch

If you submit a batch by mistake or need to stop processing early, cancel it by ID. Requests already completed are kept; requests still queued are dropped:

client.messages.batches.cancel(batch.id)

The batch moves to canceling status immediately, then ended once in-flight requests finish. Results for anything already processed before cancellation are still retrievable.

Batch vs Streaming vs Synchronous: When to Use Each

Use Case	Recommendation
User-facing chat	Synchronous or streaming — needs real-time response
Large one-off backlog (classification, scoring, extraction)	Batch — 50% cheaper, latency does not matter
Nightly / scheduled jobs	Batch — submit before you sleep, results ready by morning
Interactive agent loop	Synchronous — each step depends on the previous result
Bulk document processing (thousands of files)	Batch — avoids hitting rate limits from sequential sync calls

The Bottom Line

The Batch API is the easiest cost optimization in the entire Claude API surface — half price, same model quality, zero code complexity beyond tracking a custom_id and polling for completion. Use it for anything that does not need a response in the next few seconds: classification jobs, content audits, backlog processing, nightly reports. Anything interactive stays synchronous or streamed.

For related cost and performance patterns, see our guides on prompt caching (stacks with batch discounts), streaming (the right choice for real-time UIs), and structured output (pairs well with batch classification and extraction jobs).