We unleashed Claude Code and OpenAI Codex to create a machine learning model trained on 25+ years of NCAA tournament history. Here's their picks and a quick tutorial on how Machine Learning works.
This is a research project for educational purposes. You can learn how to do this on your own below.
Claude and ChatGPT, shown as equal model paths — click any team for details
Model vs market — sorted by ML probability
Where our model disagrees with Vegas — sorted by edge
| TEAM | SEED | REGION | MODEL CHAMP% | MARKET% | EDGE (Δ) | WIN% | PPG |
|---|
From raw data to bracket predictions — a walkthrough for a business audience
Every matchup probability is built from three independent signals blended into a single number. Each captures a different dimension of tournament reality — statistical dominance, historical seed behavior, and direct head-to-head history.
For every matchup, the model receives current-season stats for both teams. Rather than feeding raw values directly, it computes the difference between each team's stats — this is called a "delta feature" and forces the model to reason about relative strength, not individual team identity.
| TEAM | SEED | WIN % | PPG | OPP PPG | PT DIFF | SRS | SOS |
|---|---|---|---|---|---|---|---|
| Duke | 1 | .882 | 84.3 | 63.1 | +21.2 | 24.1 | 13.8 |
| TCU | 9 | .647 | 73.8 | 67.4 | +6.4 | 8.9 | 9.4 |
| DELTA (A − B) → MODEL INPUT | −8 | +.235 | +10.5 | −4.3 | +14.8 | +15.2 | +4.4 |
↑ The orange DELTA row is the only row the model sees. Green = advantage for Team A, Red = disadvantage. Note: a lower Opp PPG is an advantage (defense), so the −4.3 actually favors Duke — the model learns this from historical outcomes.
Gradient boosted trees trained on 1,600+ tournament games from 2000–2025. The model learns which statistical gaps matter most in March — seed difference alone explains ~37% of variance, but points-per-game differential and strength of schedule also carry significant weight. Gets 70% of the blend because it's trained directly on tournament outcomes, not just regular-season strength.
← 12-seeds beat 5-seeds 35% of the time historically — a fact the model learns but this signal explicitly reinforces
Historical win rates for each seed matchup since 1985. These rates are remarkably stable across eras. Weighted more heavily in early rounds (R64/R32) where seeding is most predictive — by the Elite Eight, any remaining team has proven themselves regardless of original seed, so the ML model carries more weight in later rounds.
When teams have played each other, their head-to-head record nudges the final probability by up to ±5%. Capped intentionally — college rosters turn over completely every 4 years, and old matchups between different player generations are weak predictors. Acts as a small momentum signal, not a dominant one.
Real example — Duke (1) vs. Siena (16): ML model outputs 88% based on statistical gap. Historic 1-vs-16 seed rate is 99%. No H2H history between these programs. Blended: (0.70 × 0.88) + (0.30 × 0.99) + 0 = 0.616 + 0.297 = 91.3%. The seed history pulls the prediction higher than the ML model alone, anchoring it to 40 years of tournament data. This is by design — the model occasionally underestimates dominant seeds, and the seed history acts as a calibration floor.
This project was built using Claude (Anthropic) as a coding collaborator. Here's an honest breakdown of what was machine-generated versus where human judgment shaped the final product.