RAG Query Cost Calculator
Track the marginal cost of serving a RAG response by combining retrieved context width, model pricing, cache hit rates, and any reranker or vector database charges. Adjust the optional fields to mirror your actual architecture.
Pricing fluctuates between providers. Update costs frequently and validate against live billing dashboards for production systems.
Examples
- 8 chunks × 1,500 tokens, 600-token answer, $0.0020 LLM, $0.0001 embedding, 0% cache ⇒ Cost per query: $0.0264 (prompt 12,000 tokens + completion 600 tokens) • Cache savings worth $0.0000 per query • Model spend share 95.45% • 1K queries/week ≈ $26.40.
- 6 chunks × 900 tokens, 400-token answer, $0.0015 LLM, cache hit 40%, reranker $0.0010, vector $0.0002 ⇒ Cost per query: $0.0074 (prompt 5,400 tokens + completion 400 tokens) • Cache savings worth $0.0039 per query • Model spend share 70.16% • 1K queries/week ≈ $7.44.
FAQ
How do I include fixed infrastructure cost?
Use the per-query cost here for marginal spend and add a separate fixed monthly line item (e.g., hosting) when building your full P&L.
What if I use two models (rerank + generator)?
Enter the reranker or API fee in the dedicated field and keep the generator price in the LLM cost input. The calculator treats them separately so you can see the share of spend.
Can I model semantic cache warm-up?
Yes—change the cache hit rate to reflect expected hit ratios at different traffic levels (e.g., 10% during launch vs 60% steady state).
Additional Information
- Embedding cost covers fresh query embeddings; cached responses still require embedding unless you skip retrieval entirely.
- Cache hit rate removes both model and reranker spend for served responses while retaining vector and embedding costs.
- Adjust the tokens per chunk to reflect truncation after re-ranking—overestimating inflates the marginal cost.