WOLFX Research · reports/gauntlet-scoreboard.md

The WOLFX Backtest Gauntlet — Live Scoreboard

WOLFX Research · Updated 2026-04-24

Every strategy WOLFX trades has cleared a walk-forward 70/15/15 backtest with hard gates. Every strategy WOLFX considers but rejects is published here — the rejections are how you know the filter is real.

Running score: 3 PASS / 15 rounds (20 %). One documented near-miss (Round 13).

The passes

RoundStrategyTest SharpeShipped
7Cross-Asset Futures Trend (ES/NQ/RTY/YM/6E/6J, 12-1 skip-month)0.895V167 · 2026-04-24 · whitepaper
8VIX Contango Carry (short VXX / long VXZ, regime-gated)1.41V169 · 2026-04-24 · whitepaper
14Overnight Drift Reversal (SPY MOC→MOO, 5d-intraday filter)1.229V174 · 2026-04-25 · whitepaper

(Round 15 intentionally skipped — FOMC Pre-Announcement Drift only fires 8 times/year, can't clear the v5 ≥50-trade gate. Proposed as a sizing multiplier on Round 14, not standalone.)

Both strategies are in 30-day paper-shadow canary via V170 scheduler. Flag flip to live execution happens only after rolling Sharpe ≥ 0.5 with no monthly drawdown > 3 %.

The rejections

RoundStrategyTest SharpeVerdictFinding
1Overnight Gap Continuation-8.11NO-GOSignal evaporated when realistic fill costs applied. Infrastructure gap on premarket data.
2Crypto Funding Arbitrage v13.15 IS / -5.68 OOSNO-GOTextbook overfit. Great in-sample, dead out-of-sample.
3Crypto Funding v2 (long-only, z < -3)7.22 (spurious)NO-GOForensic finding: the claimed 71.4 % WR was implicitly bundled with a "price at 20-day low" filter. The filter, not the funding signal, was doing the work.
4Cointegration Pairs Trading-1.51NO-GOMega-cap dispersion in 2025-2026 broke the cointegration assumptions that made this work in 2015.
5Momentum + VIX Long/Short-1.23NO-GOBenign-VIX regimes produce short-squeeze spikes that shred the short leg. MaxDD 25 %.
6Momentum long-only (decomposed)+0.88NO-GO-drawdownSharpe ok, but MaxDD 12.85 % eats the risk budget.
PEAD — Post-Earnings Drift-2.77HARD NO-GOTHE SIGNAL HAS INVERTED. A positive earnings surprise now predicts -0.61 % forward return. Classical decades-old anomaly is now anti-signal.
9G10 FX Trend+Carry0.164NO-GOStrategy lost money over 9 years. AUD/USD + USD/JPY profitable legs couldn't offset EUR/USD, GBP/USD, CHF, etc.
10Commodity Basis Carry-1.58NO-GOProxy rejected, not the underlying premium. Inverted-momentum-as-basis shorted the 2024-2026 gold/palladium rally. Real term-structure data required.
11Treasury Curve Carry 2s10s (Yahoo futures ratio)0.54 test / -1.08 trainNO-GOTest slice looked fine (3 of 4 gates pass) but strategy lost 18 % over the full 10-year window. Only worked post-QE. A regime bet, not a carry premium.
12Treasury Curve Carry 2s10s (FRED daily yields, IEF/SHY)-3.34NO-GORetested Round 11 with real yield data to falsify "maybe the proxy was the problem." Result: real yields were worse than the proxy. Full-window Sharpe -0.99, final NAV down 36 %. One trade alone (Oct 2023 short SHY at 3.78× weight) lost $44.8K when 2Y yields fell into the rate-cut cycle. The underlying signal — not the proxy — is wrong for the 2022-2026 hiking/cutting regime.
13DXY Regime Switch (long-only, Variant B)1.20 test / 0.61 fullNEAR-MISS / NO-GOSharpe 1.20, PF 2.95, MaxDD -1.47 %, full-window Sharpe positive — every substantive gate clears with margin. Fails only on trade count (2 vs gate 20). The 20-trade gate is miscalibrated for a regime classifier that fires ~3 times per test slice by design. Honest verdict under strict rules: NO-GO. Under the same gate-calibration argument that Round 7 (Trend) accepted, this would be a PASS — that's a calibration decision, not a statistical one. Flagged for re-evaluation if the trade-count gate is recalibrated per signal class.
16RRP-Driven Treasury Carry Reversal (FRED RRPONTSYD → SHY)-0.89 test / 0.24 fullNO-GOTest slice fails 3 of 5 gates (Sharpe -0.89, PF 0.88, only 29 trades). Walk-forward: train -0.03, val +1.81, test -0.89 — textbook in-sample-fit / out-of-sample-collapse. The validation slice caught the late-2023 RRP drain wave during the Fed pivot; the test slice is in a post-RRP-trough regime where the facility sits near zero and meaningful drains stop happening. Two side findings: (1) the v5 proposal had a units bug — RRPONTSYD is in billions, not millions; harness corrected; (2) the premium, if it existed, has likely been arbitraged away in the four years since Copeland-Duffie-Yang published.

Methodology

Meta-insights the swarm has learned

  1. The 2015-2021 equity-factor playbook is upside-down in 2026. Six out of the first eight rejections were classical equity factors. PEAD inverting was the clincher. Mega-cap concentration, passive flows, and retail options gamma have rewired the single-name tape.
  2. Both passes are macro / commodity / vol with multi-decade academic lineage. The alpha research pivot to this territory produced two consecutive passes after six consecutive equity-factor rejections. That is not coincidence — that is structure.
  3. Data proxies die in out-of-sample testing. Round 10 and Round 11 both had to use Yahoo-price proxies because the true signal requires curve / yield data. Both failed. The lesson: before we backtest another carry signal, procure the data.

Round 14 — first v5 round, first PASS under v5 constraints

Variant B (mean-reversion-filtered overnight long): test Sharpe 1.229, MaxDD -8.36 %, PF 1.25, 163 trades, full-window Sharpe 0.447. All five v5 gates cleared. Train/Val/Test Sharpe 0.51 / 1.35 / 1.23 — clean OOS pattern, no overfit signature. Variant A (always-on overnight long, no filter) NO-GO at PF 1.09 — confirms the filter is doing real work.

The strategy fills the high-frequency gap that v4 was structurally unable to test. Diversifies cleanly from Trend (monthly futures momentum) and VIX Carry (monthly vol selling).

What's next

---

WOLFX publishes every signal and every realized fill. Past performance, including walk-forward backtest performance, is not predictive of future results. Every strategy here is informational — nothing is investment advice.

Edge-served from Cloudflare R2.