Crisis stress · three regimes

How honest is our 90% CI in a crisis?

Three numbers tell you whether a probabilistic forecast is honest: the information coefficient (does score predict return?), the conformal coverage rate (does the 90% CI actually contain reality 90% of the time?), and a permutation-test p-value contextualising the IC against random shuffles. We run all three on the production engine across three crisis windows below — slow rate-hike grind vs. flash crash vs. avalanche financial collapse. Different regimes, different results.

2022-01-04 · as-of

Rate-hike bear market

S&P 500 closed at all-time high 4796.56 the day before. Fed pivot to rate hikes started Jan 4.

~280 trading days, S&P fell 25% to Oct 12, 2022 trough

Tickers tested
26
Composite IC
0.123
30d CI cov.
65.4%
Permutation p
0.481
Bearish hit rate: 6/7 (86%)
Bullish hit rate: 2/5 (40%)

Slow grinding bear. Engine bearish calls were well-calibrated; bullish calls were over-confident — engine fought the regime. Permutation p=0.48: with n=26 the IC is not statistically significant (need 374-ticker universe).

2020-02-19 · as-of

COVID flash crash

S&P 500 closed at all-time high 3386.15. Bear market started the next day (Feb 20). Trough at 2237.40 on Mar 23, 2020.

23 trading days to trough, fastest 30%+ drawdown on record

Tickers tested
22
Composite IC
-0.045
30d CI cov.
13.6%
Permutation p

Engine had near-zero predictive power on this 23-day exogenous shock — IC = -0.045 means scores were essentially uncorrelated with realised forward returns. 90% CI covered only 14% of outcomes. Honest reading: factor models built for normal market dynamics are not designed to predict pandemic-driven flash crashes. This is the failure mode investors should know about — and we publish it rather than hide it.

2008-09-12 · as-of

Lehman / financial crisis

Friday before Lehman Brothers bankruptcy weekend. S&P 1251.70 → 752.44 by Nov 20.

~50 trading days to interim trough; full bear ran to Mar 9, 2009 (-57% peak-to-trough)

Tickers tested
19
Composite IC
0.261
30d CI cov.
15.8%
Permutation p

IC = 0.261 is paper-grade — engine ranked tickers strongly even during the avalanche. BUT 90% CI covered only 16% of outcomes. Honest reading: in a leverage-driven liquidation, our prediction intervals were far too narrow — rank order survived, magnitude estimates did not. Mondrian conformal calibration (post-2026-05-16) widens halfwidths when residuals expand; this exact regime is what that loop will fix. Tickers: 19 of the modern universe with sufficient pre-2008 history (META, TSLA, AVGO, ABBV did not exist).

How to read these numbers

Why three crises matter

A model that does well on one crisis window may have been lucky or overfit to that regime. The three windows here are structurally different: 2022 was a slow grind driven by interest-rate policy; 2020 was a 33-day flash collapse driven by an exogenous health shock; 2008 was a 6-month avalanche driven by leverage unwinding. A model that produces honest coverage across all three has captured something general about equity dynamics. A model that fails one of them has a known weakness — and we publish it.

What is coming next

46 invariants liveFDR observabilityTest suiteMath coherence13-factor methodologyAggregate backtestLive track record
Test sources: src/lib/data/__tests__/crisis-stress-{2022,2020,2008}.test.ts+ permutation-test.test.ts. Reconstructors: src/lib/data/historical-reconstructor.ts.