Written by Alex Singal and Tracy Williams
If there’s one group of people who are as obsessed as we are with rigorously analyzing a complicated domain and figuring out where to prioritize scarce resources, it’s Major League Baseball front offices. With that in mind, we wanted to write this guide comparing some baseball statistics with the metrics we take into consideration when evaluating programs to save and improve lives.
Batting Average
Batting average is simple to calculate and easy to explain, and it was historically considered one of the most important ways of evaluating how good a player was. It remains one of the primary baseball stats you’ll find in the newspaper.
But as a measure of a player’s value, batting average isn’t actually all that helpful—and at times can be actively misleading. Batting average has two primary shortcomings: it treats singles, doubles, triples, and home runs equally, and it completely ignores walks. But walks are really valuable, and not all hits are the same! This points us toward statistics that are more valuable than batting average. On-base percentage incorporates walks and hit-by-pitches, and is therefore a better measure of a player’s ability to reach base. And slugging percentage is a blunt measurement of how far a player advances around the bases per at-bat, indicating not just the ability to hit safely, but also to hit for power. These two statistics do a good job of accounting for the shortcomings of batting average and paint a more useful picture of a player’s offensive value.
As an example, Juan Pierre and Adam Dunn, who both played in about 2,000 games over their careers, had lifetime batting averages of .295 and .237 respectively. At first glance this might give the impression that Pierre was the more productive hitter. Looking at on-base percentage though, we see that Dunn actually reached base more frequently than Pierre (.364 versus .343). In addition, Dunn hit 462 home runs and had a career slugging percentage of .490 compared to Pierre’s 18 home runs and .361 slugging percentage. Taken together,1And indeed, they can literally be added together to produce the On-Base Plus Slugging statistic. it’s not hard to see that Dunn was the more valuable career hitter. (Dunn’s negative defensive contributions are another story.)
In spite of the evidence that on-base and slugging percentage are the more worthwhile statistics, batting average still looms large in Major League Baseball and in some cases receives far more attention. The race for the highest individual batting average is closely followed by the media, and the player who leads each league at the end of the season even receives a trophy and is referred to as the Batting Champion. Alas, no such trophy exists for the leader in on-base or slugging percentage.
As with batting average, when it comes to finding effective programs to save and improve lives, the most commonly cited metrics aren’t always the most valuable. In the broader world of charity evaluation, the ratio of overhead expenses to program expenses has often been used as a measure of an organization’s effectiveness. We at GiveWell have historically disagreed (to say the least) with the idea that overhead ratio can by itself tell you much of anything about an organization, and have instead sought out more sophisticated metrics to measure performance.
Baseball usage: “Juan Pierre has a high batting average, but he’s a slap hitter who rarely walks, which limits his offensive value.”
GiveWell usage: “This organization has a low overhead ratio, but its programs don’t accomplish much and aren’t cost-effective.”
Weighted Runs Created Plus (wRC+)
When baseball statistics were in their infancy, fans compared different players simply by the number of runs they scored. Eventually, people realized that this wasn’t really fair because a number of factors other than a player’s abilities—the talent of their teammates, the kind of ball being used, the atmospheric environment at the stadium, the distance from home plate to the wall—affect the number of runs a player is likely to score.
Metrics like wRC+ adjust for these conditions to make better comparisons between players. For example, was a given player batting during the “deadball era” of the early 1900s, when they were using worn-down baseballs that may have affected the scoring environment?2Gordon 2018 argues that the “deadball era” had multiple causes, not just worn-down baseballs. Did the game happen in Colorado, where the elevation is higher, the air is thinner, and balls travel further?
It’s impossible to perfectly calculate these kinds of adjustments, and there is inevitably some subjectivity when it comes to choosing which factors to consider. But wRC+ is a great example of how, even when they include subjective elements, numerical analyses can help us make better (if still imperfect) estimates.
Similarly, this kind of thinking is core to GiveWell’s ability to make good recommendations for how our donors should allocate their funds. It would be easy to take a high-quality academic paper that says supplementing vitamin A for children in a particular time and place could save lives for such-and-such dollars each, then simply assume you can extrapolate that to other times and places. But in reality, it’s not that simple, because there are many important contextual factors that determine whether that finding will still apply in a new setting (in other words, the external validity of the study).
For example, children in Malawi tend to have lower rates of vitamin A deficiency than children in otherwise comparable contexts, perhaps due to higher prevalence of whole fish in the Malawian diet. (One person we spoke with even speculated that this was specifically due to Malawian children being fed fish eyeballs, which may be an especially good source of vitamin A.) As a result, we adjust the cost-effectiveness estimate of distributing vitamin A in Malawi relative to the cost-effectiveness in other countries where pre-existing vitamin A deficiency levels are higher.
Baseball usage: “Dante Bichette scored 104 runs for the Colorado Rockies in 1999, but if you look at his wRC+, which incorporates the atmospheric conditions in Denver, he was only a league-average hitter that season.”
GiveWell usage: “Vitamin A supplementation is promising in general, but this region has a higher baseline level of vitamin A consumption, so we’re going to look elsewhere.”
Wins Above Replacement (WAR)
WAR tries to measure a player’s impact not simply by counting the runs they contribute on offense or prevent on defense, but by figuring out how many games they caused their team to win relative to what we’d expect from the realistic alternative if they hadn’t been on the team.
The basic insight is that if Matt Chapman wasn’t playing for the San Francisco Giants this season, the alternative isn’t the Giants playing without a third baseman—it’s having the best available minor-league third baseman in his place instead. So the real impact of having Chapman on your team is not measured by his raw stats but by the difference between his performance and what the next man up from the Sacramento River Cats would have done in his place.
At GiveWell, this concept is deeply core to our work. For example, when funding the distribution of anti-malarial nets in Africa, we don’t simply count the number of lives that nets save and “credit” that to ourselves; we think about whether other funders might have funded the nets in our place, or whether some of the families would be able to purchase nets or use existing but less-effective nets.
To oversimplify things a bit: if we were to fund a program to distribute 500,000 nets but 400,000 nets would have been distributed without our involvement, our measurable impact would be the 100,000 additional nets that get distributed.
Baseball usage: “Gunnar Henderson is worth 6.4 WAR so far this season, and we’re only at the All-Star break. That’s amazing.”
GiveWell usage: “GiveWell’s program led to the distribution of an additional 100,000 nets, which will (in expectation) avert more than 130 deaths. That’s amazing.”
By deemphasizing incomplete statistics like batting average and focusing instead on more sophisticated ones like WAR and wRC+, modern baseball front offices have shifted their focus to metrics that aim to capture what is most important: How much does this player contribute, relative to the actual alternatives, to helping our team win games (and making sure their team loses)?
GiveWell started from a similar insight. At the time—way back in 2007—the analysis of charitable organizations was often quite basic, and fixated on some easily measurable numbers like the aforementioned overhead ratio. GiveWell was founded to improve this situation and give donors data and analysis to make more informed decisions. The primary number we focus on is how much good your money can do through different programs. In other words: How cost-effective are they? It’s the core of everything we do.
Every year, our 37-person research team works to figure out how different interventions compare, including making sure that our Top Charities should still be top (since, like baseball players, individual giving opportunities can get better or worse over time). And of course, we’re always on the lookout for breakout candidates that might become the Top Charities of tomorrow.
If you’d like to learn more about our work, check out how we estimate the cost to save a life and the analyses behind it.
Notes
↑1 | And indeed, they can literally be added together to produce the On-Base Plus Slugging statistic. |
---|---|
↑2 | Gordon 2018 argues that the “deadball era” had multiple causes, not just worn-down baseballs. |
Comments
Thank you for this comparison which I find interesting and full of insights.
I have a question with regards to your statement in the WAR section: “if we were to fund a program to distribute 500,000 nets but 400,000 nets would have been distributed without our involvement, our measurable impact would be the 100,000 additional nets that get distributed”.
Since GiveWell has in the end distributed 500,000 nets, this allows the other funders to use the funds, which they would have used to distribute 400,000 nets, on other projects which will most likely save additional lives. While I assume that it is very difficult to estimate how many more lives are saved (because they are different organisations) shouldn’t GiveWell also take this factor into account?
Thanks!
Hi Marco,
Thanks for your question! Assessing how GiveWell’s funding decisions influence the actions of governments, funders, and other organizations is an important part of figuring out which global health programs are most cost-effective. We include leverage and funging adjustments in our cost-effectiveness models that aim to capture the effect of displacing other funding or encouraging other funders to join us (see an example of the adjustments we make for seasonal malaria chemoprevention here). If you’re curious to learn more about how we approach fungibility, you can read more in this blog post.