Batch API Cost Calculator

Compare standard API cost vs Batch API pricing and see your savings.

OpenAI's Batch API processes requests asynchronously within a 24-hour window in exchange for a 50% discount on both input and output tokens — the same pattern is now offered by Anthropic, Google, and others. If your workload tolerates latency, batching can roughly halve your API bill. Use this calculator to quantify the savings on your real request volume.

100,000
1,000
500

Standard API cost

$2,000.00

$5.00/M input + $30.00/M output

Batch API cost

$1,000.00

$2.50/M input + $15.00/M output

You save with Batch API

$1,000.00(50.0%)

On 100,000 requests (100,000,000 input + 50,000,000 output tokens) using GPT-5.5.

What batch API pricing is

The Batch API on OpenAI, Anthropic, and Google takes your request file and returns results within 24 hours. In exchange for that latency, you pay 50% of standard input and output rates for most models. It is not a separate model — the underlying inference is identical. The API is ideal for offline jobs: nightly summarisation, large-scale evaluation, embedding generation, content moderation backfills, structured data extraction across an archive, and similar.

When batch is the right choice

Batch wins for any non-interactive workload above ~10,000 requests per day. Below that the absolute savings are usually small. Batch is wrong for user-facing chat, real-time agents, or anything that needs responses in seconds. A common pattern: send latency-sensitive requests through the standard API and route everything else (analytics, indexing, evals) through batch. Many teams cut their inference bill by 30–45% just by routing the right traffic through the batch endpoint.

Frequently Asked Questions

How much cheaper is the OpenAI Batch API?
The OpenAI Batch API is exactly 50% cheaper than the standard API on most models. For example, GPT-4o is $2.50/M input and $10.00/M output standard — batch rates are $1.25/M input and $5.00/M output. The same 50% discount applies to GPT-4o mini, GPT-5, GPT-5 mini, GPT-5 nano, GPT-5.2, GPT-5.4, and the o-series reasoning models.
What is the catch with batch API pricing?
Two things: latency and queue limits. Batch results return within 24 hours (often much faster, but no SLA guarantees sub-24-hour completion). And there are concurrent-batch and per-day token limits per organisation. For most non-interactive workloads neither is a problem.
Do all providers offer batch pricing?
OpenAI, Anthropic, and Google offer first-class batch APIs with a 50% discount. xAI, DeepSeek, Mistral, and Cohere do not currently expose a public batch tier. The model selector in this calculator only shows models with a published batch rate.
How do I estimate my batch savings before switching?
Enter your average request volume (per day or month), average input and output tokens per request, and the model you currently use. The calculator returns absolute and percentage savings. As a rule of thumb: at 50% off, every $1,000 of routable batch traffic saves $500.

Batch pricing applies to asynchronous requests completed within 24 hours. Rates are sourced from official provider pricing pages.