Generative AI API Rate Card Profitability Checker

Plug in prompt size, wholesale token cost, price points, and usage to confirm that a generative AI API hits your unit economics. Layer in overhead and an optional target margin to surface the minimum viable price before publishing rate cards.

Average total tokens processed per user request, including prompt and completion.
Wholesale rate you pay the model provider per 1,000 tokens.
What you intend to charge customers for each request or metered call.
Average number of billable requests each user sends per month.
Include support, infrastructure, or moderation costs per request. Leave blank to assume $0.00.
Enter your desired gross margin to see the minimum price per request that meets it. Leave blank to skip.

Pricing model reference only—validate against real usage telemetry before publishing rate cards.

Examples

  • 2,800 tokens, $0.0040 provider cost, $0.180 price, 60 requests, no overhead, 65% target margin ⇒ Cost per request: $0.01 | Gross margin per request: $0.17 | Monthly margin per user: $10.13 | Breakeven price per 1K tokens: $0.0040 | Target price per request (@ margin input): $0.03
  • 1,800 tokens, $0.0035 provider cost, $0.140 price, 120 requests, $0.015 overhead, 55% target margin ⇒ Cost per request: $0.01 | Gross margin per request: $0.12 | Monthly margin per user: $14.24 | Breakeven price per 1K tokens: $0.0118 | Target price per request (@ margin input): $0.05

FAQ

How do cached responses affect the result?

Lower the average tokens per request or add a negative overhead line reflecting cache hit savings to represent memoization benefits.

Can I model multiple providers?

Average the per-1K token cost using expected routing percentages or run separate scenarios for each provider mix.

What if I bill per token instead of per request?

Set the planned price per request to the average bill you expect for the token bucket so the calculator returns comparable margins.

How should I choose a target margin percentage?

Match your SaaS gross margin benchmark or board target. Growth-stage teams often aim for 60–70% on usage-based features to offset R&D and support costs.

Additional Information

  • Provider costs scale linearly with token usage—refresh the inputs whenever you expand context windows or add tool calls.
  • Include optional overhead to capture observability, vector search, or moderation expenses triggered on every request.
  • The target margin helper converts your margin goal into the minimum viable price per request.
  • Run heavy- and light-user scenarios to stress-test pricing tiers and auto-scaling commitments.