# The metrics debate

About a year ago, we participated in a Giving Carnival on the topic of metrics in philanthropy, and laid out the metrics we were planning on using – along with caveats. Seeing a continuing interest in this topic (even including a panel called “Metrics Mania”), I’d like to share a bit of my progressed thinking on the matter – specifically that debating metrics in the abstract seems unlikely to go anywhere useful.

As promised, our research tried to divide charities into causes to make “apples to apples” comparisons; but as many predicted (and as we acknowledged would be true to some extent), even after this narrowing it was impossible to truly put different charities in the same terms. Any two given job training programs serve different groups of people; comparing them directly on “people placed in jobs” is a futile endeavor. We looked at health programs’ abilities to “save lives,” but not all lives are the same, and different health programs have wildly different and difficult-to-quantify impacts on non-life-threatening conditions.

This doesn’t mean that metrics are futile or useless. There’s a big difference between being able to demonstrate emotionally relevant results (such as lives saved) and having no evidence other than an unfamiliar development officer’s report of a gut feeling. And there can be enormous differences in the “cost per person” associated with different approaches, enough – when taken in context and accompanied by intuitive judgment calls – to make a big difference in my view of things. For example, it’s typical for employment assistance programs to cost in the neighborhood of $10,000-$20,000 per person served, while we’re ballparking certain developing-world aid programs as $1,000 per life saved – though the story doesn’t end there, that difference is larger than I would have guessed and larger than I can ignore. Bottom line, both the metrics we used and the ways we used them (particularly the weight of metrics vs. intuition) ended up depending, pretty much entirely, on exactly what decision we were trying to make. We took one approach when all we knew was our general areas of focus; we modified our approach when we had more info about our options; we frankly got nothing out of the completely abstract discussions of whether metrics should be used “in general.” There are as many metrics as there are charities, and there are as many necessary debates about metrics as there are giving decisions. I agree with those who say metrics are important, and those who say we can’t let them dictate everything (and it seems that nearly everyone who weighs in on “the metrics debate” says both of these things). But I don’t feel we can have the really important conversation at that level of abstraction. Instead, the conversation we need to have is on the specifics. In our case, and that of anyone who’s interested in NYC employment assistance or global health interventions, that means asking: did we use appropriate metrics for the charities we’ve evaluated? Are there other approaches that would have been a better use of our time and resources? Did we compare our applicants as well as we could have with the resources we had? This is a conversation with a lot of room for important and fruitful disagreement. If you’re interested in the question of metrics, I ask that you consider engaging in it – or, if you don’t find our causes compelling, conducting and publishing your own evaluations – rather than trying to settle the question for the whole nonprofit sector at once. ### Comments • Michelle on March 27, 2008 at 9:22 am said: It seems to me that people have been saying this to you since the beginning of your venture. When others suggested you listen to and consider the views of the people who have spent their lives in nonprofit work, the outcome you arrive at in your last paragraph is what they were trying to say. You’ve received an education, but an expensive and time-consuming one that may have cost you some allies. • John J. on March 27, 2008 at 2:46 pm said: Exactly, Michelle. And he gets paid$65,000 a year to write in his Givewell diary about common sense conclusions that people in the field knew all along to be the obvious.

By his own accounts, he could have saved 65 lives in the developing world for what he was paid in the last year.

• I said this in response to another post, but you need to learn about DALYs (Disability-Adjusted Life-Years) and the essentially identical metric of QALYs (Quality-Adjusted Life-Years). The are the metrics of choice by used to measure cost-effectiveness in disease treatment and medicine in the United States and by the World Health Organization.

I was disappointed to find somewhere in your blogs a reference to a WHO explanation of DALYs, along with a link, and a comment indicating the formula was too complicated or difficult to understand. In fact it isn’t that hard at all–and if you find it too hard, I’d agree with some of your critics that you shouldn’t be doing this kind of work, or paying yourselves tens of thousands a year for what you are doing.

I recommend the book Cost-effectiveness in health and medicine (pp. 82-123). New York, NY: Oxford University Press. (M. R. Gold, J. E. Siegel, L B. Russell, & M. C. Weinstein (Eds.)) as a good starting point for educating yourselves.

Plenty of CEAs (cost-effectiveness analyses) appear in the medical literature, and overlapping literature on treatment of diseases in developing countries, using the QALY/DALY measure. You can find a lot of them on MedLine.

More generally, you might want to learn about utility measurement in: Baron, J. (2000). Thinking and Deciding. New York, NY: Cambridge University Press.

• Holden on April 1, 2008 at 3:47 pm said:

Michelle: I appreciate the feedback, but this isn’t an accurate description of what’s happened.

• Before starting our project, we talked to as many nonprofit experts as we could, to get feedback on our general plan and model, and ask whether they thought our project was worth pursuing and would add value. The earliest stages of this process are detailed in our original business plan (Appendix G); as the project progressed, we continued to speak with as many people as we could, and at this point have thoroughly discussed the project with relevant people from a broad variety of relevant institutions, including major foundations (including Ford, Hewlett, and Robin Hood), philanthropic advisors (including Geneva Global and Rockefeller Philanthropy Advisors), and other institutions including Bridgespan, the Center for Effective Philanthropy, and New Profit. We have gotten suggestions that we immediately took; suggestions that we recognized the value of with time; and suggestions that we still disagree with; but we haven’t once gotten the feedback, when speaking thoroughly with someone experienced in the sector, that our project is not worth doing (even taking into account our own newness to the sector, which we have constantly been explicit about) or already being done. If you are interested in having a similar conversation with us – perhaps over the phone – I’d be happy to arrange it; we value input from people who take the time to engage thoroughly with us and our project.
• Regarding this particular post: my intent is not to take one side or another, or even a particular middle ground, on the metrics debate. It is to urge our peers to engage in a conversation that is currently not happening in any public forum. There is a great deal of conversation about the value of metrics in general (you can see this in the posts I’ve linked to, many by experienced people in the sector, as well as in the “Metrics Mania” panel I mentioned) but none (that we’ve found) that utilize specific metrics to inform the conversation of how best to help people. We’re urging a debate on the specifics, per the end of this post:

In our case, and that of anyone who’s interested in NYC employment assistance or global health interventions, that means asking: did we use appropriate metrics for the charities we’ve evaluated? Are there other approaches that would have been a better use of our time and resources? Did we compare our applicants as well as we could have with the resources we had?

This is a conversation with a lot of room for important and fruitful disagreement. If you’re interested in the question of metrics, I ask that you consider engaging in it – or, if you don’t find our causes compelling, conducting and publishing your own evaluations …

I’m not aware of anyone else in the sector who is publicly publishing their evaluations, and those participating in “the metrics debate” – even as they acknowledge the importance of specifics – are not participating in specific conversations about where to give. I don’t understand your claim that this post is repeating conventional wisdom.

It is true that we are receiving an expensive education in complex issues. Our mission is to share what we learn with donors who don’t have the time or resources for such an education; we aren’t aware of others who are effectively doing this sort of sharing.

• Holden on April 1, 2008 at 3:52 pm said:

Ron:

We recognize that the DALY metric is widely used and accepted, and agree with you that we should provide it to our donors when possible. We do so currently for PSI (though we are using estimates that we don’t have enough information to verify independently), and we will continue to do so when we can.

However, we generally don’t have enough information to calculate DALYs for our applicants under Cause 1 (saving lives, developing world) with any precision at all, because the information we have doesn’t sufficiently distinguish between slight and severe medical cases. We do in most cases have enough information to make estimates of “lives saved,” which could be converted into DALYs, but we have too little information available about non-fatal cases to make a reasonable attempt at capturing the full DALY impact.

It is also true that we personally find the metric to be unhelpful. The relevant discussion is here; we give many reasons for preferring not to use this metric, but these reasons do not include its being “too complicated or difficult to understand” (and I agree with you that it would not be a valid reason – if there is another place where we do appear to be saying this about DALYs, I’d appreciate your pointing me to it). We do make the point that the units measured by DALYs are not emotionally relevant, which is different from saying they’re too difficult to calculate or that the calculation is too difficult to follow.

We also appreciate the reading suggestions, and will check them out when we return to research in this area.

• Phil Steinmeyer on April 2, 2008 at 11:37 am said:

Holden, a quick, and somewhat minor point.

I notice in your writing that you often refer to your causes by their number (i.e. Cause 1).

This may be a convenient shorthand for you and those very close to your organization, but for the general public, this labeling is hard to parse. I don’t have a desire to hunt around the givewell.net site and parse “Cause 1”. It would be more readable to refer to “Health in Africa” or “Education in New York City” or whatnot.

• Phil Steinmeyer on April 2, 2008 at 11:41 am said:

On another note, how are you doing on creating detailed reports for your causes other than “Save lives in Africa” and “Raise incomes in NYC”?

I’m particularly interested in your report on “Fight global poverty”. Even if your report indicates that you were unable to successfully analyze the problem (i.e. point to charities that could be clearly shown to make a positive impact), I’d be interested in the process. Of course, the best case would be that you WERE able to perform reasonably solid analysis and that you found something that impressed you. But in any case, the process might be about as illuminating as the results.

• Holden on April 3, 2008 at 8:57 am said:

Phil:

We are still a few weeks away from publishing our writeups – the research is done and the grants are given, but there are a few rounds of revision (including review by applicants) we have to go through before we can make the writeups public.

I appreciate & agree with your comment on our “Cause 1” type terminology. I’ve edited the comment above to clarify it in that case.

• Holden,

thanks for the links. It might interest people to know that generally when considering medical interventions in the United States, an intervention is considered cost-effective if it costs in the range of $50,000-$100,000 per DALY/QALY. What PSI does is orders of magnitude more cost-effective than that!

I’ve donated to PSI based on your work. What you are attempting to do is really a great thing. If I can be helpful in some small way I’d feel very good about that.

• Holden on April 4, 2008 at 2:35 pm said:

Ron: very glad to hear that you found our research useful.

I’d be interested in your source for the $50,000-$100,000 figure you cite.

If you’d like to get involved with GiveWell, please use this form to submit your contact info. We anticipate having a need for volunteers within a few months, and if you can help out, we’ll appreciate it greatly.

• rob shavell on April 18, 2008 at 7:25 pm said:

if metrics, particularly published common metrics, were not important, huge global financial marketplaces wouldn’t exist.

while *every* CFO knows their manipulability and the difficulty of truly comparing different businesses, they are extraordinarily useful in governing capital flows.

i personally see ZERO reason why the philanthropic sector shouldn’t go much much further in this arena.

• Holden on April 24, 2008 at 11:53 am said:

Rob – it’s possible that we’re wrong and that universal metrics have promise, but there certainly seems to be a challenge here that doesn’t apply to business, which is that different charities have different end goals, not just different operations. We believe that real, useful progress can be made on specific metrics for specific decisions, which may or may not lead to more generalized metrics; on the other hand, we haven’t seen anyone making headway on meaningful metrics that can apply to all charities. This doesn’t mean it can’t be done, but we think the former is the more productive path for now.

• Holden,

here is a source which I believe contains a reference to \$50,000/QALY being considered a “good buy:”

Baron, J. (1997). Biases in the quantitative measurement of values for public decisions. Psychological Bulletin, 122(1), 72-88.

It isn’t the original reference but probably cites it–and I think if you search on Medline, you’d find more references. Also obviously that would dollars of more than a decade ago, so if you adjust for inflation the figure would be higher.

• Holden on May 7, 2008 at 2:01 pm said:

Thanks.

• FYI, Stanford Social Innovation Review recently sponsored a conference on evaluation, which had quite a well of information: