AI Inference GPU Cost Parity Calculator
Convert traffic, utilisation, and throughput assumptions into cost per 1,000 tokens so you can judge whether owning inference GPUs beats paying managed API rates.
Outputs are illustrative and exclude model training, storage, and bandwidth expenses. Validate with your actual infrastructure telemetry before making capacity commitments.
Examples
- Tokens 120,000,000, GPU $2.40/hr, utilisation 55%, API $0.012 ⇒ GPUs run 1,200.00 hours monthly for $2,880.00 total ($0.02 per 1,000 tokens). The API at $0.012 per 1,000 tokens bills $1,440.00, so self-hosting costs $1,440.00 more each month unless API pricing drops to $0.02 per 1,000 tokens.
- Tokens 45,000,000, GPU $1.80/hr, utilisation 80%, API $0.018, 34.56 sec/1K ⇒ Higher latency drives 540.00 GPU hours for $972.00 total ($0.02 per 1,000 tokens). The API quote of $0.018 per 1,000 tokens totals $810.00, so GPUs cost $162.00 more per month unless API pricing rises to $0.02 per 1,000 tokens.
FAQ
Can I compare multiple GPU types?
Yes. Run the calculator with each GPU's hourly rate, utilisation, and throughput to compare per-token costs before provisioning hardware.
How should I account for spot pricing or reserved instances?
Adjust the GPU hourly rate to the blended price you expect to pay after factoring in spot interruptions, reservation discounts, or committed use contracts.
Does this include networking or engineering costs?
No. Add those costs to the GPU hourly rate or as a separate per-token surcharge when evaluating build-versus-buy decisions.
Additional Information
- Seconds per 1,000 tokens captures decode latency and batching efficiency—larger or unbatched models require higher values.
- Utilisation reflects real duty cycle including idle and warmup time—pushing it higher spreads the same GPU spend across more tokens.
- Breakeven pricing is expressed in USD per 1,000 tokens so you can line it up with API quotes or use it in enterprise negotiations.