Validation Theory: Backtest Power and Multiple Testing

A convincing backtest needs power against realistic alternatives and control of data-mining false discoveries.

Explanation

Backtest power is the probability of detecting a true effect; low power makes failures inconclusive.

Multiple testing without correction inflates the chance of at least one ‘significant’ but spurious result.

Robust validation combines pre-specification, out-of-sample testing, and multiple-testing control procedures.

validationbacktestpowermultiple testing

Interactive visualisation

Backtests have power and a multiple-testing problem. With many candidate strategies, a few will look “significant” by luck. Adjusting for this reduces false discoveries but also makes it harder for true signals to pass.

Number of backtested strategies N: 50Per-test significance level α: 5.0%

Numbers

Target family-wise error ≈ 5.0% (via Bonferroni)

FWE (nominal α) ≈ 92.3%

FWE (Bonferroni α_B) ≈ 4.9%

True signals: 3, detected (nominal / Bonf) ≈ 3 / 1

Noise strategies flagged (nominal / Bonf) ≈ 2 / 0

Interpretation

As N grows, a fixed α cuts more and more of the noise strategies below the blue line: the family-wise chance of at least one false discovery climbs towards 100%. Bonferroni shrinks α to keep that error near the 5% target, but some genuine signals (green) now fall above the threshold and are missed.

Robust validation means combining multiplicity control with other defences: out-of-sample tests, hold-out periods, economic priors and independent replication. The goal is not zero false discoveries, but a transparent and controlled trade-off between power and reliability.