Generative AI Inference Unit Cost Calculator

Convert token pricing into actionable unit economics for your AI product. Enter published input and output token rates, typical prompt and completion sizes, optional cache hit rate, rerun surcharges, and subscriber usage to see per-request cost, cost per million inferences, monthly infrastructure cost per subscriber, and the breakeven subscription price at your target margin.

Input token price per 1,000 (USD)

Published price per 1,000 prompt tokens for your model tier.

Output token price per 1,000 (USD)

Published price per 1,000 completion tokens for your model tier.

Average input tokens per request

Include system prompt, user prompt, and retrieved context per call.

Average output tokens per request

Typical completion length returned to the end user.

Premium surcharges per call (optional, USD)

Defaults to $0. Add any guardrail rerun or compliance surcharge per request.

Cache hit rate (optional, %)

Defaults to 0%. Represents prompts served from cache that avoid token spend.

Requests per subscriber (optional, monthly)

Defaults to 200 calls per month when estimating breakeven pricing.

Target gross margin (optional, %)

Defaults to 30%. Determines the markup applied to compute a breakeven subscription price.

Results assume vendor pricing remains constant. Monitor API announcements for rate changes or new token bundles.

Examples

$0.0015 input price, $0.0020 output price, 750 input tokens, 1,200 output tokens, 10% cache hit, $0.0005 surcharge, 200 requests, 30% margin ⇒ Per-request cost: $0.0037 (0.0037 per call) • Cost per 1,000,000 inferences: $3,672.50 • Monthly infrastructure cost per subscriber: $0.73 before margin • Usage basis: 200.00 requests per subscriber with 30% gross margin target • Breakeven subscription price: $1.05 per month • Cache hit rate reduces token spend by 10% • Surcharges per call included: $0.0005.
$0.0030 input price, $0.0040 output price, 1,200 input tokens, 1,600 output tokens, no cache, no surcharge, 500 requests, 40% margin ⇒ Per-request cost: $0.0100 (0.0100 per call) • Cost per 1,000,000 inferences: $10,000.00 • Monthly infrastructure cost per subscriber: $5.00 before margin • Usage basis: 500.00 requests per subscriber with 40% gross margin target • Breakeven subscription price: $8.33 per month.

FAQ

How do I account for multiple models in one workflow?

Calculate the per-request cost for each model separately and add them together. You can enter the combined total as the surcharge input to roll every pass into one metric.

What if I sell pay-as-you-go instead of subscriptions?

Use the per-request output directly as your floor price. Add your desired markup to arrive at a profitable usage-based rate.

How can I model request batching or higher cache efficiency?

Reduce the average input and output tokens or increase the cache hit rate to reflect batching improvements. The calculator automatically lowers per-request cost when fewer fresh tokens are billed.

Can I include retriever and embedding costs?

Yes. Estimate the additional token or vector charges per call and enter them as a surcharge so the calculator folds them into unit economics.

What happens if my gross margin target exceeds 95%?

The tool caps gross margin at 95% to avoid divide-by-zero math. If you require higher margins, treat the breakeven subscription price as a baseline and apply your markup manually on top of the cost per subscriber output.

Additional Information

Cache hit rate reduces both prompt and completion tokens proportionally to reflect avoided recomputation or response reuse.
Monthly infrastructure cost per subscriber multiplies per-request economics by expected calls so you can compare against subscription ARPU or seat pricing.
Breakeven subscription pricing divides cost per subscriber by the portion of revenue retained after your target gross margin.
Surcharges capture guardrail reruns, safety classifier passes, vector lookups, or other workflow costs beyond raw token usage.
Raising the target margin increases the breakeven subscription price; the tool caps margins at 95% to keep the math stable.

Generative AI Inference Unit Cost Calculator

Examples

FAQ

How do I account for multiple models in one workflow?

What if I sell pay-as-you-go instead of subscriptions?

How can I model request batching or higher cache efficiency?

Can I include retriever and embedding costs?

What happens if my gross margin target exceeds 95%?

Additional Information

Trusted, consistent, and transparent

Team roles

Review cadence

Examples

FAQ

How do I account for multiple models in one workflow?

What if I sell pay-as-you-go instead of subscriptions?

How can I model request batching or higher cache efficiency?

Can I include retriever and embedding costs?

What happens if my gross margin target exceeds 95%?

Additional Information

Related calculators

Trusted, consistent, and transparent

Team roles

Review cadence