How we validate alpha

Every strategy that ships at WOLFX clears the same harness. Every rejection is published next to every pass on the scoreboard. This page is the rule book.

Walk-forward 70/15/15

We split the 2016-2026 historical sample into three slices: train (70 %), validation (15 %), and test (15 %). Strategy parameters are fit on train only. Validation tunes hyperparameters. Test is touched once, at the end. Anything that "works" on train but fails on test is rejected.

The point of this discipline is to catch overfitting. A strategy that beats noise on the in-sample period but collapses on data it has never seen is not edge — it's pattern-matching on training set artifacts.

Nine hard gates (v7)

  1. Test slice Sharpe ≥ 0.40 — the headline edge metric
  2. Test slice MaxDD ≤ 15 % — the worst single drawdown can't bury the account
  3. Test slice profit factor ≥ 1.2 — total dollars-up / dollars-down has margin over breakeven
  4. Test slice trade count ≥ 50–100 (varies by strategy class) — sample size large enough that the Sharpe estimate is statistically firm
  5. Full-window Sharpe > 0 — the strategy must work across the entire 10-year span, not just one regime. (This gate added after Round 11 — Treasury Curve Carry passed test slice but lost 18 % over the full window. We don't ship regime-conditional bets.)
  6. Data source named + verified — the API endpoint and units must be confirmed live before the harness runs. (Added after Round 16 — RRP Carry's units bug.)
  7. Backtest data parity with live data — if backtest uses one source and live uses another, the gap is documented and signal quality assessed. (Added after Round 18.)
  8. Train → val → test trajectory hypothesis — the proposer must hypothesize what the trajectory will look like (monotonic improvement / stable / etc.) and the actual run must match. (Added after Round 19's regime-flip + Round 20's monotonic decay.)
  9. Conservative canary sizing alongside spec — every passing strategy ships at sizing tighter than backtest spec until 100+ live trades validate the regime persists.

What we test for besides the gates

We gauntlet our own live strategies too

The harness doesn't only run on candidate new strategies — it runs on the strategies already deployed in production. Sniper mean-reversion (Round 21), news alpha (Round 22), and quantum convergence (Round 23) have all gone through the same walk-forward applied to candidate strategies. Scoreboard rows for each document the result.

This is uncomfortable but disciplined: production results on small sample sizes (20-50 trades) are easily noise. Walk-forward on 1,000+ trades over 10 years is harder to fool. When the two disagree, we trust the harness and tighten the live strategy.

What gets shipped

A strategy passing the gauntlet ships as a shadow scaffold — flag-off, no live capital. The autonomous shadow scheduler (V170/V174/V185) records signal decisions daily for ~30 sessions. Once rolling Sharpe ≥ 0.5, no -3 % 2-week drawdown, and ≥ 20 trades accumulate, the auto-promote cron (V179b) flips the live flag and the order-submission path (V180a/b/c) starts placing real trades.

Promotion is automatic. Demotion is automatic — the V184 circuit breaker auto-disables any live strategy whose 30-day rolling PF drops below 1.0. No human-in-the-loop required.

What's currently in the pipeline

Questions about methodology: hello@wolfx.trade