Synthetic Data Coverage Score Calculator
Combine structural coverage, distribution alignment, and explicit treatment of rare scenarios to benchmark the completeness of a synthetic data portfolio before deployment.
Data governance aid. Validate synthetic datasets with qualitative reviews, privacy assessments, and downstream model testing before production use.
Examples
- 320 production scenarios, 290 covered, divergence 0.18, rare coverage 0.75 with weight 0.20 ⇒ Base coverage 90.63%, alignment 82.00%, final score 89.01%.
- 480 scenarios, 360 covered, divergence 0.32, leaving optional fields blank ⇒ Base coverage 75.00%, alignment 68.00%, rare coverage default 50.00% at 15.00% weight, score 68.80%.
FAQ
What qualifies as a critical scenario?
Label scenarios that drive safety, compliance, or revenue exposure as critical. Examples include rare failure modes or high-value customer journeys that require higher coverage guarantees.
How do I derive the divergence index?
Compute a Jensen-Shannon divergence, population stability index, or maximum mean discrepancy between real and synthetic feature distributions, then normalise the result to the 0–1 band used here.
Can I change the component weights?
Yes. Adjust the critical scenario weight field to emphasise or down-weight rare cases. The base and distribution weights remain 50% and 35% to maintain comparability across teams.
Additional Information
- Result unit: composite score expressed as a percentage from 0% to 100%.
- Distribution divergence should be scaled between 0 and 1. Convert PSI or KL divergence accordingly before entry.
- Critical scenario defaults assume half of rare cases are represented with a 15% weight when optional fields are blank.