Synthetic Data Coverage Score Calculator

Combine structural coverage, distribution alignment, and explicit treatment of rare scenarios to benchmark the completeness of a synthetic data portfolio before deployment.

Distinct production scenarios or classes the synthetic set must represent.
Number of production scenarios that have at least one high-quality synthetic analogue.
Jensen-Shannon or population stability index scaled between 0 (perfect alignment) and 1 (complete mismatch).
Fraction of high-risk or rare scenarios replicated. Defaults to 0.50 when blank.
Optional weight applied to critical scenario coverage. Defaults to 0.15 when omitted.

Data governance aid. Validate synthetic datasets with qualitative reviews, privacy assessments, and downstream model testing before production use.

Examples

  • 320 production scenarios, 290 covered, divergence 0.18, rare coverage 0.75 with weight 0.20 ⇒ Base coverage 90.63%, alignment 82.00%, final score 89.01%.
  • 480 scenarios, 360 covered, divergence 0.32, leaving optional fields blank ⇒ Base coverage 75.00%, alignment 68.00%, rare coverage default 50.00% at 15.00% weight, score 68.80%.

FAQ

What qualifies as a critical scenario?

Label scenarios that drive safety, compliance, or revenue exposure as critical. Examples include rare failure modes or high-value customer journeys that require higher coverage guarantees.

How do I derive the divergence index?

Compute a Jensen-Shannon divergence, population stability index, or maximum mean discrepancy between real and synthetic feature distributions, then normalise the result to the 0–1 band used here.

Can I change the component weights?

Yes. Adjust the critical scenario weight field to emphasise or down-weight rare cases. The base and distribution weights remain 50% and 35% to maintain comparability across teams.

Additional Information

  • Result unit: composite score expressed as a percentage from 0% to 100%.
  • Distribution divergence should be scaled between 0 and 1. Convert PSI or KL divergence accordingly before entry.
  • Critical scenario defaults assume half of rare cases are represented with a 15% weight when optional fields are blank.