AI Inference Cost Calculator

Estimate the run-rate for serving AI prompts at scale. Combine average prompt and completion tokens, monthly traffic, published per-1K token prices, and any infrastructure surcharge to project the total monthly inference spend.

Average prompt length in tokens, including system messages and tool calls.
Average completion size in tokens returned to the user or downstream system.
Projected monthly request volume or sessions served.
Provider price for 1,000 prompt tokens in your billing currency.
Provider price for 1,000 completion tokens—usually higher than input.
Optional internal markup for GPUs, vector databases, guardrails, or observability.

Educational information, not professional advice.

Examples

  • 1,200 prompt tokens + 800 completion tokens, 5,000 requests, $0.0015/$0.0020 pricing, $0.0005 surcharge ⇒ $22.00 per month
  • 800 prompt tokens + 1,200 completion tokens, 10,000 requests, $0.0010/$0.0025 pricing, no surcharge ⇒ $38.00 per month
  • 2,000 prompt tokens + 1,000 completion tokens, 50,000 requests, $0.0020/$0.0030 pricing, $0.0007 surcharge ⇒ $255.00 per month

FAQ

How do I factor in reserved capacity discounts?

Reduce the per-1K token prices to reflect committed-use or annual contract discounts; the calculator will then reflect your negotiated rate card.

Can I include GPU hosting costs?

Yes. Convert your infrastructure expenses into a per-1K token surcharge by dividing total GPU or orchestration spend by the monthly token volume, then add it in the Surcharge field.

What if I expect streaming tokens?

Add the average streamed tokens to the output field so both prompt and streamed completions are billed through the same formula automatically.

How do I compare providers with tiered pricing?

Run the calculator once per pricing tier and sum the outputs, or enter a blended per-1K token rate based on your expected volume to approximate the effective monthly charge.

Can this estimate per-user costs?

Divide the monthly spend by your active users or sessions to calculate unit economics like cost per MAU or per chat conversation.

Additional Information

  • The calculation mirrors provider invoices by multiplying total input and output tokens by their respective per-1K rates and then layering in any internal surcharge.
  • Feed in observed averages from production logs to improve accuracy—token lengths vary significantly across customer cohorts and prompt templates.
  • Combine this forecast with the API Rate Limit Planner to ensure throughput aligns with vendor quotas, and with Cloud Storage Cost to capture vector database expenses.
  • Operate globally? Hedge currency swings by pairing the output with the Import Currency Hedge calculator or convert the total using real-time FX rates.