Validation Theory: Backtest Power and Multiple Testing
A convincing backtest needs power against realistic alternatives and control of data-mining false discoveries.
Backtest power is the probability of detecting a true effect; low power makes failures inconclusive.
Multiple testing without correction inflates the chance of at least one ‘significant’ but spurious result.
Robust validation combines pre-specification, out-of-sample testing, and multiple-testing control procedures.
Backtests have power and a multiple-testing problem. With many candidate strategies, a few will look “significant” by luck. Adjusting for this reduces false discoveries but also makes it harder for true signals to pass.
As N grows, a fixed α cuts more and more of the noise strategies below the blue line: the family-wise chance of at least one false discovery climbs towards 100%. Bonferroni shrinks α to keep that error near the 5% target, but some genuine signals (green) now fall above the threshold and are missed.
Robust validation means combining multiplicity control with other defences: out-of-sample tests, hold-out periods, economic priors and independent replication. The goal is not zero false discoveries, but a transparent and controlled trade-off between power and reliability.