AI Inference Cost Calculator
Estimate the run-rate of serving LLM completions or chat responses at scale. Enter your average prompt and completion token counts, expected monthly request load, published per-1K token rates, and any per-token surcharge for GPUs, orchestration, or guardrail services to see the all-in cost projection.
Examples
- 1,200 prompt tokens + 800 completion tokens, 5,000 requests, $0.0015/$0.0020 token rates, $0.0005 surcharge ⇒ $22.00 per month
- 800 prompt tokens + 1,200 completion tokens, 10,000 requests, $0.0010/$0.0025 rates, no surcharge ⇒ $38.00 per month
FAQ
How do I factor in reserved capacity discounts?
Reduce the per-1K token prices to reflect committed-use or annual contract discounts; the calculator will then reflect your negotiated rate card.
Can I include GPU hosting costs?
Yes. Convert GPU or accelerator infrastructure into a per-1K token surcharge—divide total infra spend by the monthly token volume and enter the result in the Surcharge field.
What if I expect streaming tokens?
Estimate the average streamed tokens per response and add them to the output field; the formula multiplies both directions by the total request count automatically.
How do I compare providers with tiered pricing?
Model each tier separately and sum the totals, or input the blended per-1K token rate you expect after volume discounts to approximate the effective cost.
Additional Information
- Input and output token pricing are calculated separately to match provider invoices, then combined with any infrastructure surcharge for a blended unit cost.
- Layer this forecast with the API Rate Limit Planner to ensure throughput aligns with vendor quotas, and with the Cloud Storage Cost calculator to estimate embedding or cache storage.
- For multi-region deployments, convert the total using the Import Currency Hedge Coverage calculator so you understand exposure to currency swings.