AI Inference Cost Calculator

Estimate the run-rate of serving LLM completions or chat responses at scale. Enter your average prompt and completion token counts, expected monthly request load, published per-1K token rates, and any per-token surcharge for GPUs, orchestration, or guardrail services to see the all-in cost projection.

Average number of input tokens per request (prompt length).
Average number of output tokens returned per response.
Monthly request count or sessions you expect to serve.
Provider's price for 1,000 input tokens in your chosen currency.
Provider's price for 1,000 output tokens (often higher than input).
Optional per-1K token surcharge covering GPUs, vector search, or guardrails.

Examples

  • 1,200 prompt tokens + 800 completion tokens, 5,000 requests, $0.0015/$0.0020 token rates, $0.0005 surcharge ⇒ $22.00 per month
  • 800 prompt tokens + 1,200 completion tokens, 10,000 requests, $0.0010/$0.0025 rates, no surcharge ⇒ $38.00 per month

FAQ

How do I factor in reserved capacity discounts?

Reduce the per-1K token prices to reflect committed-use or annual contract discounts; the calculator will then reflect your negotiated rate card.

Can I include GPU hosting costs?

Yes. Convert GPU or accelerator infrastructure into a per-1K token surcharge—divide total infra spend by the monthly token volume and enter the result in the Surcharge field.

What if I expect streaming tokens?

Estimate the average streamed tokens per response and add them to the output field; the formula multiplies both directions by the total request count automatically.

How do I compare providers with tiered pricing?

Model each tier separately and sum the totals, or input the blended per-1K token rate you expect after volume discounts to approximate the effective cost.

Additional Information

  • Input and output token pricing are calculated separately to match provider invoices, then combined with any infrastructure surcharge for a blended unit cost.
  • Layer this forecast with the API Rate Limit Planner to ensure throughput aligns with vendor quotas, and with the Cloud Storage Cost calculator to estimate embedding or cache storage.
  • For multi-region deployments, convert the total using the Import Currency Hedge Coverage calculator so you understand exposure to currency swings.