The GiveWell Blog

The metrics debate

About a year ago, we participated in a Giving Carnival on the topic of metrics in philanthropy, and laid out the metrics we were planning on using – along with caveats. Seeing a continuing interest in this topic (even including a panel called “Metrics Mania”), I’d like to share a bit of my progressed thinking on the matter – specifically that debating metrics in the abstract seems unlikely to go anywhere useful.

As promised, our research tried to divide charities into causes to make “apples to apples” comparisons; but as many predicted (and as we acknowledged would be true to some extent), even after this narrowing it was impossible to truly put different charities in the same terms. Any two given job training programs serve different groups of people; comparing them directly on “people placed in jobs” is a futile endeavor. We looked at health programs’ abilities to “save lives,” but not all lives are the same, and different health programs have wildly different and difficult-to-quantify impacts on non-life-threatening conditions.

This doesn’t mean that metrics are futile or useless. There’s a big difference between being able to demonstrate emotionally relevant results (such as lives saved) and having no evidence other than an unfamiliar development officer’s report of a gut feeling. And there can be enormous differences in the “cost per person” associated with different approaches, enough – when taken in context and accompanied by intuitive judgment calls – to make a big difference in my view of things. For example, it’s typical for employment assistance programs to cost in the neighborhood of $10,000-$20,000 per person served, while we’re ballparking certain developing-world aid programs as $1,000 per life saved – though the story doesn’t end there, that difference is larger than I would have guessed and larger than I can ignore.

Bottom line, both the metrics we used and the ways we used them (particularly the weight of metrics vs. intuition) ended up depending, pretty much entirely, on exactly what decision we were trying to make. We took one approach when all we knew was our general areas of focus; we modified our approach when we had more info about our options; we frankly got nothing out of the completely abstract discussions of whether metrics should be used “in general.” There are as many metrics as there are charities, and there are as many necessary debates about metrics as there are giving decisions.

I agree with those who say metrics are important, and those who say we can’t let them dictate everything (and it seems that nearly everyone who weighs in on “the metrics debate” says both of these things). But I don’t feel we can have the really important conversation at that level of abstraction. Instead, the conversation we need to have is on the specifics. In our case, and that of anyone who’s interested in NYC employment assistance or global health interventions, that means asking: did we use appropriate metrics for the charities we’ve evaluated? Are there other approaches that would have been a better use of our time and resources? Did we compare our applicants as well as we could have with the resources we had?

This is a conversation with a lot of room for important and fruitful disagreement. If you’re interested in the question of metrics, I ask that you consider engaging in it – or, if you don’t find our causes compelling, conducting and publishing your own evaluations – rather than trying to settle the question for the whole nonprofit sector at once.

Politics vs. philanthropy in education research

There’s a big question in my mind about K-12 education. The question is which of the following hypotheses about helping disadvantaged children is the best bet:

  1. Disadvantaged children are so far behind by age 5 that there’s nothing substantial to be done for them in the K-12 system.
  2. Disadvantaged children are so far behind by age 5 that they need special schools, with a special approach, if they’re to have any hope of catching up.
  3. Disadvantaged children generally attend such poor schools that just getting them into “average” schools (for example, parochial schools without the severe behavior and resource problems of bottom-level public schools) would be a huge help.

My view on KIPP vs. the Children’s Scholarship Fund, for example, hinges mostly on my view of #2 vs. #3. Of course, believing #1 would make me want to avoid this cause entirely in the future. We’ve been examining academic and government literature to get better informed on this question, but we’ve noticed a serious disconnect between what we most often want to know and what researchers most often study.

To answer our question, you’d study how students do when they change schools, focusing on school qualities such as class size, available funding, disciplinary records, academic records, and demographics. However, most academic and government studies of voucher/charter programs focus instead on whether a school is designated as “public,” “private,” or “charter.”

Three prominent examples:

  • The New York City Voucher Experiment intended to examine the impact of increased choice (via vouchers) on student achievement; the papers on it (Kruger and Zhu 2003; Mayer et al. 2002; Peterson and Howell 2003) conduct a heated debate over who benefited, and how much (if at all), from getting their choice of school, but do not examine or discuss any of the ways in which the schools chosen differed from the ones students would have attended otherwise.
  • “Test-Score Effects of School Vouchers in Dayton, Ohio, New York City, and Washington, D. C.: Evidence from Randomized Field Trials” (Howell et al. 2000), a review of several voucher experiments, also discusses the impact of vouchers without reference to school qualities.
  • “Apples to Apples: An Evaluation of Charter Schools Serving General Student Populations” (Greene et al. 2003) performs similar analysis with charter schools, looking broadly at whether charter schools outperform traditional public schools without examining how, aside from structurally, the two differ.

To be sure, there are exceptions, such as recent studies of charter schools in New York including Hoxby and Murarka 2007. But in trying to examine the three hypotheses above, I’ve been struck by how often researchers pass over the question of “good schools vs. bad schools” (i.e., the 3 hypotheses outlined above) in favor of the question of “private vs. public vs. hybrid schools.”

When a debate is focused on government policy, it makes sense for it to focus on political questions, such as whether the “free market” is better than the “government.” But when you take the perspective of a donor rather than a politician, this question suddenly seems irrelevant. Some public schools are better than others; some private schools are better than others; and I, for one, would expect any huge differences to be driven more by the people, practices and resources of a school than by the structure of its funding (i.e., whether donors, taxes, parents or a mix are paying).

That’s why we’d like to see more studies targeted at donors, rather than politicians. But for it to happen, donors have demand it.

Reinventing the wheel

I think the following comment, from Andrea, is broadly representative of a common criticism we receive.

One thing Givewell is missing, and has been criticized for in the past, is that people are already working on these issues, and that one small organization like Givewell can’t solve this “problem” where all others have failed. For example, Holden’s point that he thinks global health is a cause ripe for funding has obviously already been discovered by none other than Bill Gates, employs many staff who research causes in the way Givewell proposes, but along the well-tested model Sean describes. Again, this points to the naive quality of the entire Givewell enterprise.

We believe that GiveWell has something unique to offer – but this something is not our analytical abilities, or our research process, or our “focus on results.” We believe that many grantmakers, including the Gates Foundation, may be conducting more thorough research and analysis than we can; we have believed this since the very beginning of our project; and to my knowledge, we haven’t implied otherwise at any point. (If you believe there are instances where we – GiveWell, not others writing about GiveWell – have implied otherwise, please point me to them.)

We don’t think we’re inventing the wheel; but we’re reinventing it, out of necessity, because no one else will share their blueprint. Back in August of 2006, when we were first putting serious effort into figuring out where to give, we started by calling foundations and asking them to share their research; as detailed in Appendix D of our business plan from March 2007, the consistent answer we got was that specific information on grantee results – beyond the highly general, selected parts that foundations choose to share – is confidential.

We believe that information about how to help people should never be secret. GiveWell’s uniqueness is not in its ability to conduct thorough research, but in its willingness to share it.

Microloans vs. payday loans

Phil Cubeta’s recent post about payday loans got me thinking about our choice to grant a microfinance organization in our Global Poverty cause.

We decided to grant Opportunity International for two reasons:

  • The belief that there is likely a significant shortage of access to credit in the developing world.
  • The very fact that someone repays a loan with interest likely demonstrates that the loan is used for something that is likely life-improving.

But, doesn’t the same analysis apply to payday loans? I’d bet that there’s a similar lack of credit for very small loans for borrowers with questionable credit-worthiness. And the very fact that lenders run this business likely indicates that borrowers are consistently paying back their loans, even at exorbitant interest rates (400-1000% annualized, according to the Center for Public Policy Research). The same logic that says microfinance is helping people would seem to imply that payday loans are as well.

On the other hand, it’s also possible that many borrowers are only able to repay their loans by taking out another loan – that what we’re witnessing is not a group of people getting back on their feet, but a group of people getting caught in a cycle of debt. Note that this could be numerically consistent with very high (~95%) repayment rates, the statistics commonly cited by microfinance organizations to illustrate their effectiveness in helping people – someone who borrows to pay off another loan 19 times, before finally defaulting, has a 95% repayment rate.

We’re left with two plausible yet conflicting hypotheses about the way in which the practice of making small loans at relatively high interest rates affects those in need. In one case, those in need access much needed credit (albeit at high interests rate) which allows them to weather a difficult financial period and potentially pull themselves out of poverty. In the other, those in need borrow and ultimately find themselves in a debt trap, borrowing more to repay previous loans.

We’ve generally been very frustrated with how little information we’ve been able to get on microfinance operations – who is borrowing, what they’re using the loans for, what their standard of living is, and what happens to that standard of living over time. Without this kind of information, we’re still only guessing at whether microfinance organizations and payday loan operations are helping people pull themselves out of poverty, or simply helping them get caught in cycles of debt.

Quick update

I’m going out of town for the next week and will have only sporadic internet access. A few odds and ends:

Flip-flopping

A little over a year ago, we ballparked the cost to save a life from malaria at $200. Now we think it’s closer to $1000. Last August, I wrote that K-12 education is my favorite cause. A week ago, I owned up to moving it to the bottom of my list, and far preferring global health (even if it’s 4x as costly as I originally guessed). And that’s not even mentioning our December Board meeting, where I walked in with three suggestions as to how we should grant and was convinced to change my mind on two of them.

If you like a man who knows where he stands, I’m not your type. I’m a newcomer to all of the many areas we’re studying, and I have a lot to learn. You’re going to see me change my mind again and again, sometimes going in a circle to come back to what seemed all along like common sense (though with updated information and reasoning behind it). Then again, the history of most areas of human knowledge can be described the same way. That’s what you can expect when you (a) have a lot to learn; (b) are willing to learn it.

The reason I’ve been flip-flopping so violently, and will probably continue to do so, is the same reason this project is worthwhile: there is not enough information out there, nor are there the right kind of information aggregators, for a donor to become well-informed in a reasonable period of time. Currently, the only option I know of for an individual donor is to, in effect, take wild guesses about whom to help, how to help them, and which organization to work with in order to do so. Because of this, GiveWell doesn’t need to provide infallible analysis in order to provide a valuable service; instead, we’re working to make our guesses better and better informed. We’re sharing our progress in real-time because we want as many donors as possible to be able to see what we’re finding and improve on their own guesses. We don’t guarantee that our recommendations will stay the same over time; but probabilistically speaking, they will be better and better.

I’m aware that I often sound authoritative and 100% confident, even when I’m not (this is an idiosyncrasy I’ve had all my life, and I am working on it). But if you look past the tone, you’ll see a flip-flopper; and given the high-complexity, low-information work we’re doing, you shouldn’t expect any less.