Generative AI Inference Unit Cost Calculator
Convert token pricing into actionable unit economics for your AI product. Enter published input and output token rates, typical prompt and completion sizes, optional cache hit rate, rerun surcharges, and subscriber usage to see per-request cost, cost per million inferences, monthly infrastructure cost per subscriber, and the breakeven subscription price at your target margin.
Results assume vendor pricing remains constant. Monitor API announcements for rate changes or new token bundles.
Examples
- $0.0015 input price, $0.0020 output price, 750 input tokens, 1,200 output tokens, 10% cache hit, $0.0005 surcharge, 200 requests, 30% margin ⇒ Per-request cost: $0.0037 (0.0037 per call) • Cost per 1,000,000 inferences: $3,672.50 • Monthly infrastructure cost per subscriber: $0.73 before margin • Usage basis: 200.00 requests per subscriber with 30% gross margin target • Breakeven subscription price: $1.05 per month • Cache hit rate reduces token spend by 10% • Surcharges per call included: $0.0005.
- $0.0030 input price, $0.0040 output price, 1,200 input tokens, 1,600 output tokens, no cache, no surcharge, 500 requests, 40% margin ⇒ Per-request cost: $0.0100 (0.0100 per call) • Cost per 1,000,000 inferences: $10,000.00 • Monthly infrastructure cost per subscriber: $5.00 before margin • Usage basis: 500.00 requests per subscriber with 40% gross margin target • Breakeven subscription price: $8.33 per month.
FAQ
How do I account for multiple models in one workflow?
Calculate the per-request cost for each model separately and add them together. You can enter the combined total as the surcharge input to roll every pass into one metric.
What if I sell pay-as-you-go instead of subscriptions?
Use the per-request output directly as your floor price. Add your desired markup to arrive at a profitable usage-based rate.
How can I model request batching or higher cache efficiency?
Reduce the average input and output tokens or increase the cache hit rate to reflect batching improvements. The calculator automatically lowers per-request cost when fewer fresh tokens are billed.
Can I include retriever and embedding costs?
Yes. Estimate the additional token or vector charges per call and enter them as a surcharge so the calculator folds them into unit economics.
What happens if my gross margin target exceeds 95%?
The tool caps gross margin at 95% to avoid divide-by-zero math. If you require higher margins, treat the breakeven subscription price as a baseline and apply your markup manually on top of the cost per subscriber output.
Additional Information
- Cache hit rate reduces both prompt and completion tokens proportionally to reflect avoided recomputation or response reuse.
- Monthly infrastructure cost per subscriber multiplies per-request economics by expected calls so you can compare against subscription ARPU or seat pricing.
- Breakeven subscription pricing divides cost per subscriber by the portion of revenue retained after your target gross margin.
- Surcharges capture guardrail reruns, safety classifier passes, vector lookups, or other workflow costs beyond raw token usage.
- Raising the target margin increases the breakeven subscription price; the tool caps margins at 95% to keep the math stable.