Head-to-head statistical profiles, win probability, projected starters, and bullpen context for every D1 game. Click any team to see their full profile.
Betting Intelligence Dashboard →Every matchup card is powered by our Skill Model — a z-score based engine that evaluates teams on predictive skill stats rather than surface outcomes. Here's exactly how it works.
Win Prob is the model's estimated likelihood that each team wins. It is a statistical estimate — not a guarantee.
The model evaluates four weighted components:
| Component | Key Stats Used | Weight |
|---|---|---|
| Team Strength | Run differential per game, RPI value (schedule-adjusted) | 30% |
| Pitching Quality | FIP (fielding-independent pitching), K-BB% (stuff + command), WHIP | 22% |
| Starting Pitcher | Projected starter quality score (ERA, WHIP, K/9, IP/start) | 15% |
| Offense Quality | OPS, OBP, ISO (power), plate discipline (BB%-K%) | 22% |
| H2H Pitching | Starter quality vs opponent offense (confidence-gated) | 5% |
| Context | Home-field advantage (reduced), recent form | 6% |
Instead of raw stat comparisons, each team's stats are converted to z-scores — measuring how many standard deviations they are above or below the D1 average. This normalizes stats on different scales so they can be combined meaningfully.
The model prioritizes skill indicators over outcome stats. FIP strips out fielding luck to isolate pitching skill. ISO isolates power from batting average. K-BB% measures a pitcher's ability to strike out batters while limiting walks.
The final score is converted to a probability using a logistic function, then passed through a calibration layer that corrects for systematic compression toward 50%. A time-aware backtest on late-season games (March 15–26, n=483, when season stats are most converged) was used to select the calibration factor — no end-of-season stats were used to evaluate games before they were played. The raw model underrated strong favorites by 15–22 percentage points and was poorly calibrated in the 10–40% range; the calibration step reduces those gaps. The final probability is clamped between 10% and 88%.
Team A has a better FIP, stronger run differential, and higher OPS. Team B has a slight edge in recent form. The model combines these z-score advantages across all four components, applies calibration, and might output Team A 75% — Team B 25%. Before calibration, this same matchup would have shown 63%–37%.
Model Edge is the difference between the model's win probability and the market's true implied probability — after removing sportsbook vig. It reveals where the model disagrees with a fair-market price.
How it's calculated:
Team A is +130 (away), Team B is −150 (home). Raw implied: 43.5% + 60.0% = 103.5% total. Vig-free: Team A = 43.5/103.5 = 42.0%. If the model gives Team A 50%, Model Edge = 50% − 42% = +8%.
Plus-money teams can still rate well when the matchup is genuinely closer than the odds imply. Large underdogs (+200 or longer) now require stronger evidence before being tagged as value — they must clear a higher edge threshold, and displayed edge is capped to reduce noise from model uncertainty at long prices.
Edges are classified into tiers:
| Edge | Classification | Meaning |
|---|---|---|
| 9%+ | Premium Edge | Rare — strongest model-vs-market disagreement |
| 6–8% | Strong Edge | Meaningful edge worth attention |
| 3–5% | Slight Edge | Some model lean — within wider variance |
| <3% | No Edge | Model and market roughly agree |
Underdog value thresholds are price-aware: short underdogs (+100–149) need 4%+ edge, mid dogs (+150–199) need 6%+, and long dogs (+200+) need 8%+ before being flagged as Underdog Value.
A positive edge does not mean a team will win. It means the model's estimate diverges from the fair-market price. Odds change. No model predicts outcomes perfectly.
Each matchup card shows a Bullpen row with a status label and composite score (0–100) for each team. Higher scores mean a more vulnerable bullpen.
The score combines three factors:
| Factor | What It Measures | Weight |
|---|---|---|
| Workload Stress | Recent bullpen usage — innings, back-to-back appearances, batters faced in the last 1–5 days | 50% |
| Bullpen Quality | Season-long reliever performance — ERA, WHIP, K/BB ratio, strand rate, depth | 30% |
| Starter Support | How deep starters go — avg IP per start, bullpen innings share, reliever depth | 20% |
Status labels are assigned by score range:
| Label | Score | Meaning |
|---|---|---|
| Fresh | 0–25 | Rested, deep, and performing well. Low risk in late innings. |
| Stable | 26–45 | Normal workload, adequate quality. No major concern. |
| At Risk | 46–65 | Elevated stress or below-average quality. Potential late-game vulnerability. |
| Burned | 66–100 | Heavy recent usage combined with poor quality and thin depth. Strong late-game risk. |
When two teams differ by 10+ points, a Bullpen Edge callout names the team with the healthier pen. This is an additional context signal — it does not override the main win-probability model.
Team A shows At Risk 54 and Team B shows Stable 31. The 23-point gap triggers Bullpen Edge: Team B — meaning Team B has a meaningful pitching advantage in the late innings.
Data is sourced from stats.ncaa.org box scores (recent game-by-game reliever usage) combined with season-aggregate pitching statistics. Pitch counts are not available from NCAA box scores, so batters faced is used as the workload proxy.
All probabilities are data-driven estimates based on publicly available team statistics and schedule-adjusted rankings. They update as new data becomes available. Odds are sourced from third-party sportsbooks and may change at any time. This is informational — not betting advice.