AI Inference Cost Calculator
Estimate the run-rate for serving AI prompts at scale. Combine average prompt and completion tokens, monthly traffic, published per-1K token prices, and any infrastructure surcharge to project the total monthly inference spend.
Educational information, not professional advice.
Examples
- 1,200 prompt tokens + 800 completion tokens, 5,000 requests, $0.0015/$0.0020 pricing, $0.0005 surcharge ⇒ $22.00 per month
- 800 prompt tokens + 1,200 completion tokens, 10,000 requests, $0.0010/$0.0025 pricing, no surcharge ⇒ $38.00 per month
- 2,000 prompt tokens + 1,000 completion tokens, 50,000 requests, $0.0020/$0.0030 pricing, $0.0007 surcharge ⇒ $255.00 per month
FAQ
How do I factor in reserved capacity discounts?
Reduce the per-1K token prices to reflect committed-use or annual contract discounts; the calculator will then reflect your negotiated rate card.
Can I include GPU hosting costs?
Yes. Convert your infrastructure expenses into a per-1K token surcharge by dividing total GPU or orchestration spend by the monthly token volume, then add it in the Surcharge field.
What if I expect streaming tokens?
Add the average streamed tokens to the output field so both prompt and streamed completions are billed through the same formula automatically.
How do I compare providers with tiered pricing?
Run the calculator once per pricing tier and sum the outputs, or enter a blended per-1K token rate based on your expected volume to approximate the effective monthly charge.
Can this estimate per-user costs?
Divide the monthly spend by your active users or sessions to calculate unit economics like cost per MAU or per chat conversation.
Additional Information
- The calculation mirrors provider invoices by multiplying total input and output tokens by their respective per-1K rates and then layering in any internal surcharge.
- Feed in observed averages from production logs to improve accuracy—token lengths vary significantly across customer cohorts and prompt templates.
- Combine this forecast with the API Rate Limit Planner to ensure throughput aligns with vendor quotas, and with Cloud Storage Cost to capture vector database expenses.
- Operate globally? Hedge currency swings by pairing the output with the Import Currency Hedge calculator or convert the total using real-time FX rates.