Quant with Vahab
Quant Systems Lab · Control Systems for Quantitative Finance

Validation Theory: Backtest Power and Multiple Testing

A convincing backtest needs power against realistic alternatives and control of data-mining false discoveries.

Explanation

Backtest power is the probability of detecting a true effect; low power makes failures inconclusive.

Multiple testing without correction inflates the chance of at least one ‘significant’ but spurious result.

Robust validation combines pre-specification, out-of-sample testing, and multiple-testing control procedures.


validationbacktestpowermultiple testing
Interactive visualisation

Backtests have power and a multiple-testing problem. With many candidate strategies, a few will look “significant” by luck. Adjusting for this reduces false discoveries but also makes it harder for true signals to pass.

Sorted p-values for 50 backtests. Green = real signals, grey = noise.0.000.250.500.751.00α = 0.050 (5.0%)α_B = 0.001 (Bonferroni)
False-positive probability vs detection rate for true signals.FWE (nominal)92.3%FWE (Bonf)4.9%Detect (nominal)100.0%Detect (Bonf)33.3%
Numbers
Target family-wise error ≈ 5.0% (via Bonferroni)
FWE (nominal α) ≈ 92.3%
FWE (Bonferroni α_B) ≈ 4.9%
True signals: 3, detected (nominal / Bonf) ≈ 3 / 1
Noise strategies flagged (nominal / Bonf) ≈ 2 / 0
Interpretation

As N grows, a fixed α cuts more and more of the noise strategies below the blue line: the family-wise chance of at least one false discovery climbs towards 100%. Bonferroni shrinks α to keep that error near the 5% target, but some genuine signals (green) now fall above the threshold and are missed.

Robust validation means combining multiplicity control with other defences: out-of-sample tests, hold-out periods, economic priors and independent replication. The goal is not zero false discoveries, but a transparent and controlled trade-off between power and reliability.