RAG Recall@K Calculator

Measure retrieval-augmented generation coverage by dividing the relevant documents retrieved within the top-K results by the total number of relevant documents available.

Count of ground-truth documents that fully answer the query.
How many of the ground-truth documents appeared in the first K results.
Defaults to the relevant retrieved count. Supply to compute precision@K.
Defaults to 1. Increase above 1 to weigh recall more than precision in the Fβ score.

Evaluation helper; pair scores with qualitative review and production telemetry before shipping ranking changes.

Examples

  • 12 relevant docs, 8 retrieved, K = 20, β = 1 ⇒ Recall 0.67 (66.67%), Precision 0.40 (40.00%), F1 0.50 (50.00%).
  • 5 relevant docs, 4 retrieved, leave K blank, β = 2 ⇒ Recall 0.80 (80.00%), Precision defaults to 1.00, F2 0.83 (83.33%).

FAQ

Why measure Recall@K for RAG systems?

High recall indicates that answer-bearing passages consistently appear in the retrieved context window, which reduces hallucinations and boosts grounded responses.

What should I set for K?

Choose the number of documents you actually stuff into the generator prompt or reranker—common values are 5, 10, or 20 depending on chunk size and token budget.

How do I count relevant documents in the corpus?

Label a validation set manually or with weak supervision, ensuring you count every passage that contains sufficient information to answer the query without additional context.

Can recall exceed 1?

No. When the retrieved relevant count exceeds the corpus total the result is capped, so review your labelling or deduplicate overlapping passages.

Additional Information

  • Recall@K expresses coverage of the relevant knowledge base; values range from 0 to 1.
  • Precision@K is optional and only computed when the total documents returned is supplied.
  • The Fβ score emphasises recall when β > 1 and precision when β < 1.