Beating Pythagoras: A Pirates Tale

Posted on June 11, 2019 by Nate Werner in Baseball // 1 Comment

The Pittsburgh Pirates are a bit of an anomaly this season. Most of the year they've sat around the .500 mark while having an abysmal run differential — a combination of things that tends to spell disaster for the team going forward.

Just how stark is the contrast? Here's the graph of the Pirates Run Differential and Win differential (a.k.a. their games above .500)

In theory, we'd expect that for every 10 runs below 0 in Run Differential the Pirates were to have a corresponding 1 game below .500. More concretely, the Pirates' -78 Run Differential would predict the Pirates to be about 8 games below .500, not just 4. We can see how divergent these lines are and wonder — is this remotely sustainable?

On average, the answer is no. This can and has been demonstrated by simply taking end of season records and run differentials and graphing them out against each other. Below is a graph of end of season Run Differential and Winning percentages, where we can see the closeness in relationship between the two.

In mathematical terms, the r-squared is 86%, or 86% of the variance in Winning Percentage is explained by Run Differential (and vice versa). In visual terms, there is very little 'spread' to the data, and the trend can pretty clearly be spotted, the better your run differential, the better your record tends to be.

The Pirates are contending not only in the most competitive division in baseball, but also against the Pythagorean Expectation itself.

Given the closeness of this relationship, the question then becomes what might cause this relationship to breakdown?

If we look to the original graph of the Pirates run differential over the course of the season, we start to get some idea of how that may happen. On that run differential graph, the line has steep downward movements and shallow upwards movements; in less technical language, the Pirates are getting blown out in losses, while winning close games.

As a brief example, look to the Pirates 7-4 West Coast road trip against the Cardinals, Diamondbacks, and Padres from May 9^th to the 19^{th —} the Pirates net run differential was -11, scoring 49 runs to the opponents 60. To further highlight the point, in this stretch, the Pirates were outscored an average of 7.5 runs per loss, while outscoring opponents by 2.7 runs per win. This hardly seems like a recipe for success

Ultimately, the question becomes 'do blowouts matter There are essentially two schools of thought; on one hand a loss is a loss, no matter how badly the game is lost. On the other, good teams don't get blown out, but rather do the blowing out. Both schools of thought have a point; losing is losing, regardless of margin and good teams should tend to do more blowing out than getting blown out.

Fortunately, this is an empirical question. If we strip out 'blowout runs' from a team's Run Differential, do we get a better or worse prediction of their end of year result, relative to keeping those blowout runs in?

To check this, I pulled game results for the last 5 seasons, calculated game and season run differentials at each game, and compared them to each team's final record to calculate the correlation between the two. Additionally I made an adjustment for blowout games; if a team was blown greater than the 'blowout margin', they stopped being debited runs at the blowout margin, likewise the team doing the blowing out stopped receiving credit at the blowout margin.

This is clearer with an example. Let's say we classify a blowout at 8 runs, then the Pirates 17-4 loss to the Cardinals on May 9^th, would have only cost the Pirates -8 runs in this 'Blowout Adjusted Run Differential' (BARD) rather than the traditional -13 runs. Put another way, the Cardinals would only be credited with scoring 12 runs, since beyond that are just 'blowout runs', while still being responsible for all 4 of the Pirates runs against.

We can take both the RD and the BARD at various points in the season for all 30 teams, and compare find which is more predictive of end of season record. If blowouts contain predictive signal for us, traditional run differential will have better predictive power throughout the course of the season; however, if blowouts just create noise in the data we will see the Blowout Adjusted number perform better as a predictor of the team's end result.

If we graph these correlations out over say, games 20 through 161, we can see the trends in the forecasting ability for both Run Differential and its Blowout Adjusted counterpart.

The way to read this graph is, for instance, at game 80, Run Differential has a correlation to end of season record of just under 0.7, while Blowout Adjusted Run Differential has a correlation of just over 0.7. This is an interesting finding: beyond about the 40^th game of the season, adjusting for blowouts is consistently more predictive than traditional run differential over the past 5 seasons.

This yields another question — what is the best blowout margin to set?

We can play around with the data here determining which is the best margin to use. Since the majority of games won't get to 8 runs, it makes more sense to shrink our margin rather than increase it, we'll do it by half going down to 4 runs. Here's what we get:

Aha! We further boosted the predictive power by excluding blowouts of 5 or more runs. This would imply that blowouts are, not without meaning, but far less impactful on the ultimate results of a season than its impact on morale of fans and players alike.

Blowouts are Pythagoras’ Achilles’ heel. By taking them out, we get more insight into a team’s abilities than keeping them in.

But that's not all. Let's imagine that we shrunk our margin all the way to 1 run, this would be the exact same as just counting wins and losses, you'd get +1 for a win and -1 for a loss, we could instead just use mid-season winning percentage to predict end of season winning percentage. This is also a good relative comparison for Run Differential's predictive power.

We'll add Win% to the graph from above to produce this:

The key take away from this graph is that after about the 45^th game, a team's actual Win% is more predictive than Run Differential/Pythagorean Wins, after the 50^th game, the best predictor of end of season record is a team's mid-season record itself. While obviously we'd expect Win% to have a near perfect correlation towards the end of the season, the fact that it has superior predictive power so early in the season is a rather interesting finding.

What if, instead of having a hard cutoff at 4 runs, we instead smoothed out the runs? In other words, rather than simply saying that runs above 4 have no value, we instead just reduce their importance the higher and higher they go. In economics this idea is stated as decreasing marginal value, the first sandwich you eat when you're hungry is worth a lot more than the second, which is worth more than the third, and so on. In baseball, you can only win the game once by scoring one more run than the opposing team, the fourth, fifth, and sixth runs have a decreasing value to the team. Let's imagine we had a graph like the one below to transform actual game run differentials into 'Smoothed Run Differentials' (SMRD):

If you were to find a game's actual run differential on the X-axis, trace up to the line, the 'Smoothed Run Differential' would simply be the value on the Y-axis. The advantage of this method is that we don't discount the runs beyond the 4^th run, we simply depreciate their value. In this function, the most valuable run is the first one above your opponent, with each subsequent run being slightly less than the previous. Congruently, giving up the first run is the most costly and each additional run beyond that is slightly less costly.

Let's run the same analysis as before, only using the Smoothed method, rather than the Blowout Adjusted.

Smoothing the runs scored provides a much more predictive version of Run Differential than Pythagorean or the 10 run rule. In fact, adjusting for 'Blowout Runs' in this way proves to slightly better or approximately the same as the team's actual record at predicting the team's ultimate result, up until about the 120^th game of the season. This is significantly better than the Pythagorean Expectation framework.

Why does this work? Ultimately, blowouts are mostly noise, with a decreasing amount of signal. We know this since using the smoothing method produces a more insightful metric than simply cutting off at some arbitrary value, or never accounting for blowouts at all. By decreasing the value we place on blowout games, we can more adequately identify the true strength of the team.

Now, what does this mean for the Pirates?

Team	Win%	RD	SMRD	SM Win%
Chicago Cubs	0.571	64	38.4	0.536
Milwaukee Brewers	0.576	12	23.8	0.523
St. Louis Cardinals	0.500	9	2.9	0.503
Cincinnati Reds	0.453	33	-9.2	0.491
Pittsburgh Pirates	0.469	-78	-19.7	0.481

While certainly not a rosy picture for the Pirates, it does provide a much more reasonable take on this team; they're not anywhere near as bad as their Run Differential might imply. In fact they are the most underrated team as a result of blowouts in the MLB; the Reds on the other hand are the most overrated team in the league as a result of blowouts.

It's worth noting that this is just a snapshot view of the standings. Much in the same way Pythagorean Winning Percentages and Run Differentials change from game to game so to have Smoothed Winning Percentages and SMRD. Even in the time it has taken me to do this research and write the article there has been some significant movement within the standings.

Perhaps an apt comparison exists in a league that is just finishing up its post-season this week. The idea of the magnitude of losses mattering is one that has not made its way into the hockey realm; teams routinely pull their goalie in order to give their team a shot at winning a game they're losing. If you score and tie it up, you've got a chance to win, if the other team scores, oh well, you probably weren't going to win anyway.

While not directly comparable to baseball, a major league team might hold back putting their best bullpen arms in an otherwise losing effort to keep the game close. Instead a team might elect to gamble on unproven or otherwise not-as-good hurlers to keep the game close and give the offense a chance. If those pitchers can't keep the game close, oh well, you were probably going to lose anyway, and at least you didn't waste good pitching on a losing effort, pitching that might be needed tomorrow.

Blowouts are certainly a morale killer; no one wants to watch their team lose in a laugher. They're also, however, mostly meaningless in assessing a team's true ability.

There’s seemingly one harrowing path to beating Pythagorean Wins, and the Pirates are taking it.

Nate Werner

Nate Werner is a recent graduate from Penn State, where he obtained a B.S. in Economics and currently does analytics for a financial firm. He is a lifelong Pirates fan that uses the tools of statistical analysis to dive deeper into the numbers of baseball. His goal is to take the style of analysis used in front offices across the Major Leagues and bring it to the computer screens of everyday fans. You can read some of Nate's more general analyses of baseball on http://goldboxstats.wordpress.com and follow him on Twitter @GoldBoxStats.

Run Differential

Beating Pythagoras: A Pirates Tale

Nate Werner

1 Comment on Beating Pythagoras: A Pirates Tale

Leave a comment Cancel reply