Game Model Coefficients
In the interest of full disclosure, or for those fellow uber-geeks, here is the actual model I'll be using for estimating outcome probabilities for each NFL game.
It is a logit regression model based on the outcomes of all regular season games in 2002-2006. I looked at each game twice, once from the point of view of the home team and from the point of view of the visiting team. I called each team, Team A and Team B. To identify the home team, I used a dummy variable AHome, which was 1 or 0 depending on whether Team A was home or away. The dependent variable is AWon, which is 1 if Team A won or 0 if Team B won. There were 2560 cases (games) considered.
| VARIABLE | COEFFICIENT | STD ERROR | T-STAT | SLOPE at mean |
| const | -0.26 | 1.36 | -0.19 | |
| AHOME | 0.74 | 0.09 | 8.29 | 0.19 |
| AOPASS | 0.45 | 0.07 | 6.56 | 0.11 |
| AORUN | 0.27 | 0.10 | 2.65 | 0.07 |
| ADPASS | -0.54 | 0.09 | -5.90 | -0.13 |
| ADRUN | -0.21 | 0.11 | -1.87 | -0.05 |
| AOINTRATE | -15.90 | 6.26 | -2.54 | -3.98 |
| ADINTRATE | 17.68 | 5.16 | 3.43 | 4.42 |
| AOFUMRATE | -20.50 | 7.79 | -2.63 | -5.12 |
| APENRATE | -1.49 | 0.72 | -2.07 | -0.37 |
| BOPASS | -0.45 | 0.07 | -6.54 | -0.11 |
| BORUN | -0.27 | 0.10 | -2.64 | -0.07 |
| BDPASS | 0.53 | 0.09 | 5.83 | 0.13 |
| BDRUN | 0.20 | 0.11 | 1.79 | 0.05 |
| BOINTRATE | 15.71 | 6.26 | 2.51 | 3.93 |
| BDINTRATE | -18.95 | 5.16 | -3.67 | -4.74 |
| BOFUMRATE | 21.01 | 7.79 | 2.70 | 5.25 |
| BPENRATE | 1.47 | 0.72 | 2.04 | 0.37 |
Retrodictively, the model predicts 69.5% of the games correctly. But keep in mind there are many evenly matched games and upredictable upsets, so it may be impossible for even the most perfect model to get past 75% or so.
I realize the numbers in the table above are meaningless to most people, but I want to ensure everything I do is out in the open.
Key:
OPASS = (offensive pass yds - sack yds) / pass plays
ORUN = offensive run yds / run plays
DPASS = (defensive pass yds - sack yds) / pass plays
DRUN = defensive run yds / run plays
OINTRATE = offensive interceptions / pass attempts
DINTRATE = defensive interceptions / pass attempts
OFUMRATE = fumbles / offensive plays
PENRATE = team penalty yds / total plays
The t-stat indicates the significance of each variable. For this sample size, a t-stat of approximately 1.8 or greater (or -1.8 or less) indicates a signifcance level of p=0.05 or better.
Below is a graph of the spectrum of game probabilities divided ramdomly into two sets, training cases for the regression, and test validation cases. The graph is of the actual outcome rates vs. the model's predicted probabilities.
9 comments:
By 69.5% accuracy retrospectively, do you mean that you're testing on games that the model was trained on?
By which I mean, did you test the 2002-2006 games on using the logistic regrssion model produced by training on the same set of games?
Derek-Yes.
The word is "retrodictive", not "retrospective," incidentally.
And the more interesting number in that case isn't the prediction accuracy - that's determined by the set of games used for the test - but the expected accuracy versus the observed accuracy (i.e. the error).
The observed accuracy is mostly irrelevant without knowing the expected accuracy: if one model expected 60% accuracy, observed 70% accuracy, and another model expected 65% accuracy and observed 65%, the second is likely a better model, as the first, in all likelihood, just got lucky.
Pat-Thanks.
Retrodictive--I couldn't remember that word.
Help me out. You're saying my observed accuracy is 69.5%. But how does one know what an expected accuracy would be? Last year, this model (or one very close to it) was correct 65% of the time, and was well calibrated, i.e. 80% winners won 80% of the time, etc. But 2006 was a very odd year in which home teams only won 53% of games when they normally win 58%. I would expect it fall somewhere between 65 and 70% correct for future games. Anyway, how would I calculate error?
Patrick-Are you refering to "calibration?" I think we just have some differenct terminology. Here's how last year's calibration numbers looked.
http://www.bbnflstats.com/2007/03/assessing-models-accuracy.html
But how does one know what an expected accuracy would be?
Run through the season. Predict each pair of games. The regression will give you some number that is related, somehow, to the probability that team A will beat team B (and obviously the probability that B will beat team A). You obviously know that conversion from the calibration. Average the larger of the two numbers for all games (the larger represents the winner).
Then, subtract that number from your observed accuracy. That's the error.
Now, interpreting that number is a bit of work: see a post in my Eagles blog here, although I think you have to register, so sorry about that.
But the basic idea is simple: suppose you have 4 games, and you expected to get 70% of them right, and you got 3/4 of them right. The error in that case would be 5% - but the problem is that the uncertainty on that error is huge, due to the low statistics. If the same games had been done 25 times more often, 3/4 is perfectly consistent with 90/100, so in truth, your "error" is really 5+/-40% or so.
There's also one other thing which is important for comparing ranking systems which is often overlooked: the convergence speed. In your case, the "team ranking" is based on real statistics, so the question there is "how fast does the combination of those statistics stabilize"?
That number essentially tells you what the uncertainty in your prediction for each game is - that is, if you say "team A is going to beat team B 70% of the time", how precise is that 70%? You have an estimate for how precise that 70% is from the errors in the regression, but that's just uncertainties in your model - you also have to accept that there are uncertainties in the data, too.
Patrick-Thanks. All great stuff I was not familiar with. I used some different software that can randomly select cases as training cases and validation cases. I posted the results in graph format to the original post.
My interpretation is that you'd want to see two things. One, the test and validation plots are tightly intertwined. And two, they both follow the diagonal path tightly so as not to diverge to far from actual vs. expected.
I don't have an exact error number yet. That will take a bit of work in Excel. But my reaction to the graph is that there is not much divergence between expected and actual.
The thing to then watch for, year to year, are years where the error is significantly larger. You can figure out what the "expected error distribution" is by assuming the error truly is binomial (which is what you're presuming in the regression anyway, since it's a chi-squared fit), and doing a Monte Carlo.
If the error is always within the expected error distribution, then you've got a model which almost perfectly represents the game. It almost certainly won't be, since, well, it's a model, and the game is more complex.
Post a Comment