Tuesday, November 17, 2009

Belichick's Freakonomics Greatness

[Updated to correct mis-spellings of Bill Belichick's name.]

By now everyone knows about Bill Belichick's much criticized decision to go for a first down "on fourth-and-2 from the Pats’ own 28-yard line with a six-point lead and 2:08 remaining in the fourth quarter."

Going against the grain, Steve Levitt argued yesterday that Bill Belichick Is Great for making this decision. Steve bases his argument on a well known paper by Berkeley economist David Romer, titled Do Firms Maximize? Evidence from Professional Football.

I think Levitt is off base. There are two issues.

First, Romer's article (self-consciously) says nothing about late-game situations: it deliberately focuses on first-quarter situations precisely to avoid issues related to the end of games. Thus, even if Levitt is right, Romer's article can't be the reason.

Second, I show below that when one thinks carefully about the actual situation and uses plausible values of the relevant probabilities that NE wins in each situation, the right way to think of this problem is to develop a threshold probability, m*, of making the first down. When the actual probability is above this threshold, NE's win probability is maximized by going for the first down, and otherwise the win probability is maximized by punting. The threshold m* depends only on two conditional probabilities: the probability that NE wins when it punts and the probability that NE wins when it goes for the first down but does not make it. What I regard as plausible bounds on these conditional probabilities yield bounds on the threshold m* of 17 and 78 percent. It seems very likely that the true probability of converting on fourth and 2 is between these bounds. In other words, without knowing the conditional probabilities described above, all we can say is that Levitt might be right, but he might also be wrong.

This issue is one that could be resolved with data -- but not the Romer data that Levitt cites. Here are the details.

I. Romer's Paper

Romer's abstract states:
Play-by-play data and dynamic programming are used to estimate the average payoffs to kicking and trying for a first down under different circumstances. Examination of actual decisions shows systematic, clear-cut, and overwhelmingly statistically significant departures from the decisions that would maximize teams’ chances of winning.
Translated, Romer is saying that teams should go for it on fourth down, much more often than they actually do. Romer's paper is very well done, and it presents compelling evidence. A key fact about Romer's paper is that he accounts not only for the immediate consequences of a team's fourth-down decisions, but also for its subsequent effects:
The choice between kicking and going for it leads to an immediate payoff in terms of points (which may be zero) and to one team having a first down somewhere on the field. That first down leads to additional scoring (which again may be zero) and to another possession and first down. And so on. (Page 342.)
But Romer is the first to note that, by design, his evidence has little to do with situations like the one in the Indy-Pats game the other night. His paper doesn't and, given its design, can't address late-game situations. Romer writes on page 344 (emphases mine):

By describing the values of situations in terms of expected point differences, I am implicitly assuming that a team that wants to maximize its chances of winning should be risk-neutral over points scored. Although this is clearly not a good assumption late in a game, I show in Section IV that it is an excellent approximation for the early part. For that reason, I focus on the first quarter.

In the present context, this is essentially the game's final situation: given that Indy has just one time out remaining, if New England makes the first down, the game is effectively over.* So, it doesn't matter whether New England scores if the Pats make the first down. (*It is possible that NE gets the first down on the play of interest but then turns the ball over to Indy on a subsequent fourth down with a very small number of seconds. But this possibility only reduces the value of going for it now, so I will ignore it.)

Now consider what happens if Indy gets the ball back now, however that happens. Should Indy score, there is no chance Indy would go for 2, so Belichick's relevant lower bound is -7. If we assume that Indy will hit the PAT with probability 1, which is approximately right, then we might as well regard the lower bound as "lose/no-lose".

In sum, Romer's objective function is the wrong one for this situation -- as he obviously understands from the second passage quoted above. So Romer's paper really doesn't provide any empirical support for Levitt's claim that Belichick did the right thing. (This is true even if one thinks that Belichick is risk-neutral over game wins in this situation: what matters is the binary variable representing Indy-gets-a-TD, not expected point difference.)

II. The Right Way to Look at This Situation

Look at it this way.
  • Assume the Pats will win the game for sure if they go for it and make the first down. Let m be the probability that NE makes the first down if the Pats go for it. Also let q be the probability that the Pats win even when they go for it and don't get the first down (so q is the probability that NE either prevents an Indy TD or gives one up and then scores themselves).
  • If New England punts, NE will either win the game (by not letting Indy score a TD or by letting them do so but subsequently scoring themselves), or NE won't. Let p be the probability that NE does win, if New England punts.
Levitt makes a lot of hay about his belief that Belichick's decision maximized win probability (even under Levitt's postulated principal-agent problem). So let's assume that Belichick's goal indeed is to maximize win probability. That means he should go for it in our situation if and only if the probability of winning when he goes for it is greater than the probability of winning when he kicks (I'll ignore the possibility that the win-probabilities are equal). Given our definitions above:

  • If New England goes for it, the probability of winning is m + (1-m)q. That is, the Pats win when they make the first down or, having not made it, win anyway.
  • If New England punts, the probability of winning is p.

So, Belichik should go for it whenever m + (1-m)q > p. If we solve this inequality for m, we see that it amounts to the condition

m > (p-q) / (1 - q) = m*.

The threshold's denominator is the probability that NE loses, given that NE goes for the first down and doesn't make it, giving Indy excellent field position. The numerator is the difference in win probabilities given that NE either kicks (p) or goes for it and doesn't make the first down (q). Since p is less than 1 and q is greater than 0 and less than p, m* is always between 0 and 1: our threshold is a proper probability.

If I had to guess, I'd think the following:
  • NE's win probability when it punts, p, is at least 50 percent here, but not more than 80 percent (yes, Indy's offense is great, but the Pats had intercepted Manning twice, and Indy did have only 28 points after 58 minutes, after all; plus, there's only 2 minutes left, a punt will put Indy somewhere between its own 20 and 30, and Indy needs a TD, not just a field goal).
  • NE's win probability when it goes for it and doesn't get the first down, q, is at least 10 percent and not more than 40 percent. My thinking here is that going 30 yards is a LOT easier than going 70 yards, especially with so little time remaining.

Given these bounds, we can bound the threshold value of m as follows:

  • NE wins 50 percent of the time when it punts and 40 percent of the time when it goes for it and doesn't get the first down: p = 0.5, q = 0.4. This can be shown to be the most friendly-to-Belichick-and-Levitt case my bounds allow. It implies that Belichick should go for it if and only if m is greater than 1/6: NE has to have at least a 17 percent chance to convert on 4th and 2.
  • NE wins 80 percent of the time when it punts and only 10 percent of the time when it goes for it and doesn't get the first down: p = 0.8, q = 0.1. This implies that Belichick should go for it if and only if m is greater than 7/9: NE has to have at least a 78 percent chance to convert on 4th and 2.
Thus my bounds on the threshold m* are 17 percent and 78 percent. Personally I think the truth is probably closer to the high end, so I am skeptical that the decision was a good one. But even if you disagree, it seems very likely that NE's chance of getting the first down lies between 17 and 78 percent. If you accept my bounds on p and q, then you must agree that it is an empirical question whether Belichick made the right call.

I'll be curious to see what values Levitt thinks p and q take (maybe he'll even estimate them!).