SumerSports In-Game Coaching Model: An Introduction

In this article we’re going to talk about quantifying coaching decisions in four different facets of play: fourth-down decisions, two-point conversion decisions, timeouts, and delays of game.
by Eric Eager|February 28, 2023
Exterior image of Lincoln Financial Field during the day. The home of the Philadelphia Eagles.


I know we’ve been around for a little while, and even had a Wall Street Journal article written about us, but this is our first foray into written content after months of building our business-to-business product aimed at helping teams win games through roster optimization.

In this article we’re going to talk about quantifying coaching decisions in four different facets of play: fourth-down decisions, two-point conversion decisions, timeouts, and delays of game. The football analytics community is indebted to Ben Baldwin, Sebastian Carl, the team at nflfastr, and many others for both the data to do this work and some of the work that has already been done to try to understand this problem in the past. Here are a few examples:

As with anything, though, we can lose a great deal of what is really happening if we don’t quantify and codify a well-reasoned way to evaluate the decisions coaches are making into our process. Then, we can use that information to make coaching better or at least exploit the difference between good decisions and not-so-good decisions in the fantasy football space (stay tuned to for more on that in the future).

To that aim, we’ve created four models to determine how well a head coach is doing in these realms relative to their peers. Our first model looks at the discrepancy in win probabilities between going for it and kicking (the max taken over punting and kicking a field goal).  The response variable is a Boolean, which is 1 if the coach makes the correct choice: namely they go for it when the go probability is higher than kick or they kick when the opposite is true.  It is a 0 if they make an incorrect choice. Since the league is generally still biased towards kicking, the output of this model is the probability that the coach makes the correct choice, and their “right decision over expected” is the actual decision they make minus this probability. The modeling framework we chose for this is an xgboost model in R with the following features: season, the pre-game market win probability for the offense, the in-game win probability for the offense, down, distance, and yards to go for a touchdown.

We fit a similar model for the win probability gained/surrendered with the same features. Unlike the model above for a correct or incorrect decision, this quantifies the magnitude of the choice a coach is making when they decide to go for it or kick. The model type (other than being a regression and not a classification) and features are the same. We call this “win probability gain over expected”.

There are 199 coaches that, from 2014 to 2022, had 50 or more fourth-down decisions in consecutive years. For those coaches, right decision over expected is correlated at a rate of 0.247. Total win probability gain over expected is correlated at a rate of 0.261 for those same coaches.

This shows that, even after stripping away a lot of what goes into a decision that coaches don’t have control over, there is signal in the decision-making profile of a head coach. We did the same thing for two-point conversions with the features being season, the pre-game market win probability for the offense, the in-game win probability for the offense, the score differential, and number of seconds remaining in the game. We similarly constructed a model for the win probability gain or loss as a result of these decisions. These are correlated at a rate of 0.133 and 0.194, respectively, for the same 199 coaches as above. This shows that, again, there’s signal in these decisions.

It’s instructive to see that the magnitude in win probability added over a whole season on fourth-down decisions is much higher at the extremes than two-point conversions. This is consistent with Ben Baldwin’s intuition as well:

The next thing to discuss is timeout usage.  Timeouts are a useful tool at the end of halves or games, but often, due to factors within the control of the coach and their staff, are wasted early in both halves.  A significant fraction of NFL games are within one score at the end of the fourth quarter, so spending these timeouts too early is generally a bad thing.

Thus, we looked atthe number of times a head coach used a timeout in the first half (minus the final two minutes) and the second half (minus the final four minutes) in each game.  We used a simple multinomial logistic regression – with season as the only indicator – to set the expectation for the number of timeouts wasted in the first and second half.  This metric is called “timeouts taken above expectation” – and is broken up by first and second half.  The average win probability loss on such a timeout is between one and two percent, depending on the half, and is applied uniformly to create a win probability gained over expected for timeouts as well.  The stability for win probability added during the first half is 0.543 and the second half is 0.316, demonstrating that after modding out by expectation, not taking timeouts is a repeatable skill in both halves.

Finally, we look at the number of delay of game penalties that a coach’s offense surrenders.  While there are some instances where a delay of game is preferred – see Nick Sirianni in the Super Bowl – at best a delay of game penalty is simply the best of a bunch of bad outcomes for a team. Hence, their avoidance is important to great coaching.  Like with timeouts, we apply the same win probability across all delay of game penalties, about 1.5 percent, to create win probability gained over expected by delay of game avoidance, which is correlated year-to-year at a rate of 0.252.

Without further ado, we look at who the best performing coaches were during the 2022 season in terms of making these decisions and the order of magnitude of their decisions through adding up all of the win probabilities gained by fourth-down decisions, two-point conversion decisions, and timeouts taken:

This list does jive with what we believe, especially after watching Sirianni in the playoffs and Super Bowl. Dan Campbell’s ability to press and edge was also on display the entire second half of the 2022 season, as he took a 1-6 Detroit Lions team to 9-8. People might be surprised by Brandon Staley, who was first in this metric in 2021, as he appeared to have backslid in 2022. However, he still is among the league leaders in decision making despite a regression in 2022. John Harbaugh of the Ravens is perennially one of the best coaches in this department.

There are obviously many other things that go into coaching than in-game decisions on fourth downs, two-point conversions, timeouts, and delay of games, but as the win shares shown here imply, they matter at the margins. For more of a discussion on coaching value, check out this clip from the SumerSports Show with Eric Eager and Thomas Dimitroff:

If you liked this article, come back to SumerSports over the next few months as we preview free agency, the draft, and the 2023 season using methods like these.


Related Posts

Justin Jefferson’s Massive Contract: What It Means for the Dallas Cowboys & CeeDee Lamb

June 13, 2024

Cutting Through Noise to Increase CPOE Stability

June 4, 2024

The Class Play – Answering Key Questions from Texas CFB Message Boards

June 1, 2024

College Route Running Versatility and the 2024 NFL Draft Class

May 29, 2024