GenAI P95 Latency Budget Calculator
Translate median and p99 observations into a 95th percentile latency budget under a log-normal distribution and configurable overhead assumptions.
For SLO design; validate with production telemetry before committing to user-facing guarantees.
Examples
- Median 420 ms, p99 980 ms, 60 ms stream warm-up, 35 ms jitter, target 750 ms ⇒ 859.59 ms P95 latency (over budget by 109.59 ms)
- Median 275 ms, p99 520 ms, 25 ms warm-up, 18 ms queueing, target 650 ms ⇒ 474.47 ms P95 latency (headroom 175.53 ms)
FAQ
Why use a log-normal distribution?
Latency in modern distributed systems typically skews right with multiplicative factors, making the log-normal approximation a good fit when only a few percentiles are known.
How often should I refresh the percentiles?
Update the median and p99 inputs whenever the model, tokenizer, or infrastructure changes materially. Many teams recompute daily or per deployment to capture drift.
Can I model batching or streaming separately?
Yes. Run the calculator for each processing mode with its own percentile data, then combine the resulting P95 figures weighted by request share.
What if my p99 is unstable?
Use a longer observation window or trimmed p99.5 values to reduce noise. You can also input a synthetic p99 from load testing scenarios.
Additional Information
- Assumes latency samples follow a log-normal distribution parameterised by the median and p99 values.
- Overhead inputs are additive to the modelled prompt runtime.
- Negative headroom indicates the pipeline exceeds the target P95 budget.
- Outputs are rounded to two decimal places in milliseconds.