Metrics: Between losing your humanity and throwing up your hands

This post is more than 18 years old

We’ve written a heck of a lot about what “measurements” should and shouldn’t be applied to charitable work – that’s what we spend a lot of our time thinking about, seeing as how we’re trying to figure out where to donate and all. Here’s a roundup of what we think. It’s a long post, but there’s candy for you: we offer up our actual, defined, concrete metrics for you to look at and think about, rather than sticking to abstract thoughts about whether you can quantify philanthropy (as I predict most others will).

First, though, the abstract stuff. Like many others, we are concerned about the point where measurement tries to do too much. Improving people’s lives is complex and difficult; you can never really be sure of what you’re accomplishing; and there are philosophical decisions to be made as well. An overambitious metric risks a conclusion like “Building a new charter school in New York City has a GORP (Good over RePlacement) of 17.3, whereas distributing medication to those with AIDS in Africa has a GORP of 18.5 – I think it’s pretty obvious where to donate, no?” More on this pitfall (which we’ve seen a lot of) here.

We also hate metrics that do too little. I’m constantly amazed at the way people will accept any ranking that someone cares to throw together, regardless of whether it makes any sense whatsoever (see the U.S. News and World Report rankings of just about anything, as well as this flagrant violation of common sense that didn’t so much as give me an honorable mention). The charity version of this is what we call the Straw Ratio, a seductively easy-to-calculate metric that is roughly as helpful in deciding between the best charities as this link is. I’ve written no less than 10 posts on why this metric, featured by Charity Navigator among others, is the worst thing since, uh, unsliced bread.

But to let the matter end there is unacceptable. There are too many options, and there is too much at stake, to throw up our hands, as I argued here. Metrics can’t do everything, but they have too much to offer for us to abandon them. We need to figure out what our goal is in donating, and measure what we can. So, enough generalizations. If we had all the information, the following would be our ultimate measures of charities within each of our seven areas of focus.

—————

Ideal metricsCauses 1-3: aid the poorest of the poor, focusing on Africa.
Cause 1: Provide for basic human needs including basic health care, food, water, and shelter.
Ideal metric: number of people for whom all such basic needs are met, and wouldn’t have been met without the charity’s activities, per dollar per year.
Cause 2: Fight epidemic curable/treatable diseases, including malaria, diarrhea, tuberculosis, AIDS, measles, and pneumonia.
Ideal metric: number of people who are alive and functioning, but would have been killed or debilitated without the charity’s activities, per dollar per year.
Cause 3: Enable economic opportunity through microcredit, job assistance and training, and education.
Ideal metric: number of people whose jobs produce the income necessary to give them and their families a relatively comfortable lifestyle (including health, nourishment, relatively clean and comfortable shelter, and some leisure time), but would have been unemployed or working completely non-sustaining jobs without the charity’s activities, per dollar per year. (Systematic differences in family size would complicate this.)
Causes 4-7: remove barriers to opportunity in wealthy societies, focusing on New York City.
Cause 4: Provide for basic human needs including basic health care, food, and shelter.
Ideal metric: number of people for whom all such basic needs are met, and wouldn’t have been met without the charity’s activities, per dollar per year. Note that this cause remains philosophically distinct from Cause 1, because living like this in the developed world is a fundamentally different experience – and means different things to different donors – relative to living like this in the developing world.
Cause 5: Aid youth development (pre-high school) through after-school activities, child care programs, etc.
Ideal metric: number of children who enter high school with normal levels of learning abilities and mental health, but wouldn’t have without the charity’s activities, per dollar per year.
Cause 6: Improve educational opportunities at the high school level through charter schools, summer schools, and public school reform.
Ideal metric: number of children who graduate from high school well equipped for college (as demonstrated by later college grades), but wouldn’t have done so without this charity’s activities, per dollar per year.
Cause 7: Enable economic opportunity through microcredit, job assistance and training.
Ideal metric: number of people whose jobs produce the income necessary to give them and their families a relatively comfortable lifestyle (including health, nourishment, relatively clean and comfortable shelter, some leisure time, and some room in the budget for luxuries), but would have been unemployed or working completely non-sustaining jobs without the charity’s activities, per dollar per year. (Systematic differences in family size would complicate this.)

—————

OK. On one hand …

I think these metrics rock. There is a “click” for me when I read them – “Yeah, that’s what I want out of this charity! Yeah, that matches with common sense! That’s right – if Group A and Group B are both doing microfinance, and if it can actually be shown [forget for the moment that it can’t be] that $1000 leads to 3 sustainably employed people through A and only 2 through B, I feel good about donating to A!” This quality in a metric is far from given, and I think the key is making sure that everything is measured in people fully served. We make no attempt to make a conversion factor between someone whose life improves a little and someone whose life improves a lot – that factor would be arbitrary and would lead to numbers that don’t have clear meaning. Instead, when we start having to decide between fundamentally different ways of improving people’s lives, we stop comparing charities. This way, we can measure everything in terms of people, not any abstraction.

And I think it’s cool and useful to have these metrics. They help to focus our thoughts, when we’re trying to evaluate charities, and the fact that they’re well-defined may explain why we tend to have a more detailed idea of what we want than other donors do. We know that when we’re talking to a microfinance organization, it isn’t enough to see $ loaned – we know we need to know how many loans were made, how many were paid back, and (if possible) what ultimately happened to the borrowers; and we know how to weight these things and combine them into a sense of what ultimately got accomplished. So, go us.

On the other hand … we’ll never actually calculate a single one of these things.

Oh, we might estimate them, very, very loosely. We never have yet, because we’ve never even been in the ballpark of enough information and certainty. When we’re looking at public school reform, for example, it’s huge just to see that a program made any impact. Calculating how many lives were changed is a dream.

Once you start actually looking at charities’ specific activities, you find that there are no more generalizations to be made on this topic. We know what we’re aiming for (the metrics above), but what information actually ends up being relevant is completely case-specific. There’s no way to go about it except by combining analysis with judgment, common sense, intuition, and improvisation.

So, we welcome comments on our metrics and we continue to think about them, but we are also wary of hyperfocusing on them. We think that people who hyperfocus on metrics are putting too much faith in numbers and experts. Nobody will ever know for sure how many people they’re helping, and the estimation involves literally hundreds of judgment calls that no degree can qualify anyone to make. That’s why we want to put the focus squarely on blowing up the black box. We seek to be the first people ever – including any advisor or foundation you like – to publicly share everything that goes into our giving decisions. We think that’s the most valuable thing you could ever have in an evaluator.

Comments

Albert Ruesga on April 4, 2007 at 3:57 pm said:

Thanks for the invitation to comment on your metrics.

I’m curious: Did the decision to focus on an entire continent, rather than on the poorest countries in that continent, also flow from some kind of metrics? Africa has some of the poorest countries in the world, but East Timor (with a per capita GDP of less than $500) can give many of them a run for the money, in a manner of speaking. Why not fund in East Timor? Wouldn’t the decision to fund in Africa also need to depend on your ability to identify and work with good organizations on the front lines? to leverage your investments? to be given some assurance that public policies will not undermine the good work you manage to do? etc. etc.

I’ll assume then that your decision to fund in Africa is a given, based on a careful consideration of many factors — as were the decisions to fund causes 4 – 7, say. This is important because there’s simply no deductive argument that takes you from your desire to “remove barriers to opportunity in wealthy societies, focusing on New York City,” to these four causes. Why not put all your money into public policy advocacy, for example? Can any metrics show that these are the kinds of investments that will most effectively remove barriers to opportunity?

My biggest caution about your metrics would be not to have them depend too much on things that would not have happened had the charity not existed. These counterfactuals are very difficult to understand and assess. Why not simply measure what these charities in fact accomplished?

I’m reminded here of the story of Saint Rose who prayed fervently to God that the people of Lima be spared a devastating earthquake. The story goes that the earthquake did not in fact occur, and that Rose started being venerated as a saint shortly after her intercession. Would the absence of an earthquake not have happened had Rose not intervened? These are murky waters indeed.

I offer these thoughts in the spirit of one who, as you know, is fairly agnostic about many of the ways that metrics are used and appealed to. This doesn’t affect in the least my admiration for the work that you and your friends are doing, and the spirit of generosity that animates it. The latter trumps all other considerations, in my view.

Best wishes for your very worthy project.
Holden on April 4, 2007 at 10:19 pm said:

Hi Albert, I appreciate the comments and good wishes.

RE: choice of causes, Africa and NYC, etc.: I didn’t explain that in this post, but I gave some coverage of it earlier . Briefly: we need an achievable goal, so we have narrowed the field in ways that are far from deductive. They are based on our personal values; on the structure of the sector (i.e., most organizations serving NYC focus on NYC, whereas many organizations serving Kenya have broad mandates of serving Africa); on what we think will appeal to a wide enough range of donors to make our site a good (if not comprehensive) resource for its first go-round; and on what we think our approach is best suited to (for various reasons, advocacy is not a very good fit). This doesn’t mean we think these are the only worthwhile causes, but they are where we’re focusing for now.

We have cut some causes since our earlier post. A more complete overview of our scope will be available in our business plan, which will be online soon.

RE counterfactuals: I, along with basically everyone else I’ve talked to, completely agree with you that it’s impossible to really say what would have happened if not for a charity’s activities. But conceptually, this IS what we want to know, and it is useful as a regulative ideal. When I see that a job training program led to X people getting jobs, I want to know if there is any data on a comparable set of people who didn’t take the program – so I can start to get an idea of the program’s actual impact.

As I say in the post, we know that we will never actually calculate any of our metrics. But it is still important to articulate them, because it helps us to figure out what information we’re looking for. When a charity says to us, “We’ve given training to over N people,” we know not to stop there because we know what we would ultimately and ideally like to measure.

I would think that if we agreed on this purpose for the metrics, we would agree that the counterfactuals are appropriate. The question is whether we have the same vision of what the metrics are and aren’t useful/relevant for. I appreciate your thoughts.
Dontvote on April 5, 2007 at 7:52 am said:

What’s the deal with complete solutions? Wouldn’t it be valuable to give to an organization that specializes in learning, defining problems or that can provide partial solutions (what about a rice fedex or vaccine pony express or something). In these cases, would you advocate success by measuring the people saved, or the ability of these organizations to put in infrastructure?

One possibility is that to save lives or educate people, you are not expecting any novel solutions that don’t exist already in the for-profit sector, in which case, there is only funding as a reward for organizational effectiveness. Another is that there are enough charity dollars that.
Holden on April 5, 2007 at 8:49 am said:

“you are not expecting any novel solutions that don’t exist already” is true of us in a sense. We (GiveWell) have a focus on funding proven, effective, scalable methods of helping people. It doesn’t mean we don’t think anyone should fund innovation – but that tends to be what large grantmakers focus on. By contrast, our focus is on serving individual donors, so we look for what scales.

Regarding complete solutions: our metrics refer to outcomes (what ultimately happens due to a charity’s activities), not outputs (what it literally does). A charity that just distributes food will cause a certain number of people to have food, water and shelter who wouldn’t have otherwise (they would have had water and shelter but not food); a charity that just distributes water will also do this; and because having all three of these things is so qualitatively different from having just two, we think it’s appropriate to use this as the yardstick for comparing the food charity vs. the water charity.

I recognize how debatable that is: there is a theoretical imperfection because we aren’t counting more minor benefits at all. That’s because we don’t want to make a conversion factor between helping someone a little and helping them a lot. In deciding what a charity’s “ultimate goal” is, we’ve used philosophy and intuition – for example, deciding that the real reason you’d want to distribute food is ultimately that you’re trying to increase the # people who have ALL their basic needs met. There’s no ironclad way to do this, which is why we throw it open to discussion. If you disagree with any of the particular philosophical/intuitive choices we’ve made, please jump in.
Mark Petersen on April 5, 2007 at 2:52 pm said:

Holden … thanks for this insightful piece. I came to a similar conclusion, and reflected on it over on my blog under “Scorecarding your way to happiness”.

Comments are closed.

Enter search terms here.

Search form

The GiveWell Blog

Comments