An independent, statistically rigorous calibration audit of your NFL prediction model. We don't sell picks. We tell you, with multiple-comparisons discipline, which situations your model is calibrated on, which are drift, and which are noise, so you size into the edges that survive scrutiny and stop bleeding on the ones that don't.
Most models are evaluated on a single aggregate number. That number averages your real edges together with the segments where you're flat or losing. The edge is real, it's just buried.
A 54% lifetime model often hides a 58% core that lives in two or three situations, dragged down by segments where you have no edge at all. Size uniformly across all of it and you dilute your best spots without ever seeing which ones they are.
Open-to-close line movement is the easy number, and it flatters you. It says the line came to your side, not that you got that price at your fill. Measured from the moment you actually bet, the story is usually less comfortable, and most desks never look at it that way.
A cover rate that beats breakeven on paper can still sit inside a confidence interval wide enough to include breakeven. Most operators have never put an error bar on their own number, so they can't tell a real edge from a hot stretch they are about to give back.
Rule changes, scoring shifts, a retrain that quietly broke something: an edge that was real three seasons ago may be gone now. A lifetime cover rate blends the dead years with the live ones, so the decay stays invisible until a full season of results makes it obvious.
A full Calibration Audit runs fifteen distinct analyses on your model's historical record against the AGINT NFL foundation. Below are the ones operators find most revealing. The differentiator across all of them is discipline: every situational finding is corrected for multiple comparisons and labeled by the confidence it actually earns.
Decompose calibration across the full record and at the facet level, by weather, rest, role (home/road favorite/dog), divisional context, and market regime, to locate drift, not just measure it.
Line-movement CLV (open versus close) is the cleanest available proxy for genuine edge, measured overall and within facets. When you supply bet timestamps and prices, the full Audit adds bet-time-versus-close CLV, the measure that captures the value you actually realized, not just how the line moved.
Evaluation is walk-forward, so findings reflect what your model would have known in real time, not a backtest tuned to the answer.
NFL scoring environments shift. The Audit tests your model across structural eras and against model-versus-market divergence buckets, so an edge that has quietly decayed since a rule change or a retrain shows up as decay rather than hiding inside a lifetime average.
A 54% cover rate on 200 picks may not be statistically distinguishable from breakeven. The Audit reports bootstrap confidence intervals (10,000 resamples) on every headline metric, so an apparent edge is never presented as real when the sample cannot support it.
If your model outputs cover probabilities, the Audit decomposes the Brier score into reliability, resolution, and uncertainty, and recommends whether Platt scaling or isotonic regression would tighten your calibration. Pick-only models get a reliability diagram and the constraint stated plainly.
Beyond single facets, the Audit surfaces the specific situational combinations where your model is strongest and weakest, cross-referenced against an independent intelligence layer so a finding is corroborated, not just a slice that happened to look good.
Across eight or more facets the Audit tests thirty-plus cells, and some will look significant by chance. Every facet is tested under multiple-comparisons correction. A finding only graduates to HIGH CONFIDENCE if it survives, the rest are honestly labeled directional or noise.
The Snapshot covers calibration error, line-movement CLV, walk-forward evaluation, and two situational facets as observations to investigate. The items marked Full Audit add the statistical rigor, situational depth, and multiple-comparisons control that turn observations into named, actionable findings.
Survives multiple-comparisons correction. Size into these.
Suggestive but not yet statistically validated. Worth watching, not yet worth sizing.
Looks like signal, behaves like chance. Explicitly flagged so you don't chase it.
Anonymized, illustrative of the format, not a real customer.
The model's lifetime edge is real but concentrated. In wind > 15 mph games where it backs the road side, calibration runs roughly 11 points off baseline. Strip that facet out and the surviving core covers at ~58% against a ~54% blended lifetime number. The leakage isn't a bad model, it's uniform sizing across uneven edges.
Real deliverables name the facets, quantify the drift, and recommend sizing changes, every claim tied to its confidence label.
Start with the Snapshot to see whether there's something worth chasing. Step up to the full Calibration Audit when you need statistically validated, named recommendations you can act on.
Snapshot credits in full against a Calibration Audit within 60 days. Audit terms: 50% on signing, 50% on delivery.
Independent validation of where your pooled-capital edge actually lives, before you scale the bet.
Externally audited blind spots, with an audit trail and data isolation your risk team can defend.
A rigor benchmark against your internal calibration tooling, on a foundation dataset you don't maintain.
Third-party diligence on an operator's reported edge for LP communications and model-risk review.
Know which slices of your model to size into and which to drop, before next season's bankroll plan.
Hand the methodology to your quant. It's built to survive technical scrutiny, not to dodge it.
Every engagement runs in a dedicated, isolated tenant. Your data is never co-mingled with another customer's or with fund operations.
We need your historical predictions and outcomes, not your model code, weights, or feature engineering. The edge stays yours.
Every query and finding is traceable to its inputs, so a risk or compliance reviewer can reconstruct exactly how a conclusion was reached.
No deck, no pitch. A 30-minute conversation about your model, what you'd want to learn, and whether a Snapshot or a full Audit is the right first step.
mj@fanvatic.comAGINT Analyze is an independent, customer-funded diagnostic service that evaluates a customer's own prediction model. It is separate from AGINT Capital fund operations. Analyze does not provide picks, wagering selections, or investment advice, and does not disclose AGINT Capital's models, signals, or performance. Findings describe the customer's historical model behavior and are not a guarantee of future results. Not financial advice.