The GiveWell Blog

Evaluating organizations vs. practices

Sean Stannard-Stockton wants to see more research focused on particular nonprofits, rather than on “techniques” for helping people; his reasoning is that this would be more useful to donors.

I don’t believe it’s possible to evaluate a nonprofit as an organization, completely in isolation from what it does and whether it works. Especially if I’m trying to make a case to individual donors who don’t know me or the people running the nonprofit. (I’ve argued this more fully in the past).

Phil Steinmeyer is more interested in techniques than in nonprofits; his reasoning is that differences in the effectiveness of different techniques are large enough to overwhelm organizational differences. (One example of this that I’d give is the question of fighting diarrhea by building wells/latrines or focusing on promotion of oral rehydration therapy; there is little obvious synergy between the two, and little reason to believe that they’d be similar in terms of effectiveness.)

I believe there is some value in evaluating “techniques” in the abstract, but doing so is not sufficient if you’re trying to figure out where to donate. The devil is in the details: it’s essential to know whether a nonprofit is carrying out a “technique” in a manner and context that match up with the “technique” you’ve read about. I don’t know of any “techniques” that are so simple, and so clearly effective, that I would bet on a charity simply because of formal adherence to such “techniques,” regardless of where, when, and with whom (and how faithfully) it’s adhering to them.

That’s why it’s crucial that we look at specific charities, judging them on what they do and what the evidence is that it works. It’s not the only analysis we do (we also look at independent research), and it has been the most intensive and expensive part of our process, but we see it as necessary for anyone trying to produce truly valuable and actionable information for individual donors.

Delegation

Sean asks (via email):

What’s your view on whether funders should do research on techniques and then fund organizations that use those techniques or do research on organizations and let them decide on techniques? I was intrigued with your education research post, but was wondering if it might make more sense to find smart dynamic nonprofits who will figure out the best techniques to use and change strategy as more information becomes available.

My literal response is that it depends on the funder’s priorities and techniques – I don’t think there is much to be gained by debating the approach “funders” should take in the abstract. But I want to share how we deal with this question, as naive funders (i.e., not experts in the issues) aiming to serve more naive funders (i.e., individual donors), because we do have a specific philosophy on it and we’d appreciate feedback.

My ideal is to fund at the highest level I can have confidence in, i.e., delegate as many decisions as possible to to someone who I feel confident will make those decisions well.

So, my ideal would be to donate not to a charity, but to another funder. If a major foundation, such as the Gates Foundation, could convince me that they consistently make decisions using (a) a strong process, (b) good reasoning, and (c) subjective/philosophical values that are close to mine, I would give to them and let them do the rest (and get rid of our own, now redundant overhead). This was one of the first things we tried when GiveWell was still a part-time volunteer club. What stopped us was that we couldn’t find a single foundation that publicizes substantive information about how it makes its decisions, why it chooses to do A instead of B, and what evidence there is regarding its past and likely future impact. We couldn’t be confident in the institutions without such information; we couldn’t think of a way to get them to share information, since such institutions generally don’t have incentives that we can affect. So we moved on to trying to find great charities.

Again, the goal was ultimately to find a great organization – one that’s better at what it does than we could ever be, and can make its own compelling, evidence-based case for its effectiveness – and give with no strings attached. In some cases, we found exactly this: for example, the Nurse-Family Partnership‘s outcomes evaluation is available via peer-reviewed publications, its basic model is clearly described on its website, and it provided documents to fill in gaps in our understanding. PSI was a similar case: after some independent checks on its estimates, we felt we could trust its process as a whole, even for activities we haven’t researched.

In other causes, the strongest applicants could provide some pieces of the puzzle, but not the full top-down case for why their approach was the best available. That’s where we had to start looking on our own for information about what approaches are likely to work, and pick organizations that fit with what we had found. There’s a spectrum here. KIPP gave us about 60% of what we needed to have confidence in it, and after some independent analysis, we ended up feeling that it was our best bet. By contrast, our Cause 2 (global poverty) applicants gave us so little to go on that we ended up betting on an approach, more than an organization.

Between blind faith and micromanagement is conditional confidence: trusting an organization to make decisions because of an evidence-based case that they can make them well. That’s our ideal; when it isn’t available, some degree of micromanagement (i.e., picking an organization based on its approach) seems preferable to blind faith.

The metrics debate

About a year ago, we participated in a Giving Carnival on the topic of metrics in philanthropy, and laid out the metrics we were planning on using – along with caveats. Seeing a continuing interest in this topic (even including a panel called “Metrics Mania”), I’d like to share a bit of my progressed thinking on the matter – specifically that debating metrics in the abstract seems unlikely to go anywhere useful.

As promised, our research tried to divide charities into causes to make “apples to apples” comparisons; but as many predicted (and as we acknowledged would be true to some extent), even after this narrowing it was impossible to truly put different charities in the same terms. Any two given job training programs serve different groups of people; comparing them directly on “people placed in jobs” is a futile endeavor. We looked at health programs’ abilities to “save lives,” but not all lives are the same, and different health programs have wildly different and difficult-to-quantify impacts on non-life-threatening conditions.

This doesn’t mean that metrics are futile or useless. There’s a big difference between being able to demonstrate emotionally relevant results (such as lives saved) and having no evidence other than an unfamiliar development officer’s report of a gut feeling. And there can be enormous differences in the “cost per person” associated with different approaches, enough – when taken in context and accompanied by intuitive judgment calls – to make a big difference in my view of things. For example, it’s typical for employment assistance programs to cost in the neighborhood of $10,000-$20,000 per person served, while we’re ballparking certain developing-world aid programs as $1,000 per life saved – though the story doesn’t end there, that difference is larger than I would have guessed and larger than I can ignore.

Bottom line, both the metrics we used and the ways we used them (particularly the weight of metrics vs. intuition) ended up depending, pretty much entirely, on exactly what decision we were trying to make. We took one approach when all we knew was our general areas of focus; we modified our approach when we had more info about our options; we frankly got nothing out of the completely abstract discussions of whether metrics should be used “in general.” There are as many metrics as there are charities, and there are as many necessary debates about metrics as there are giving decisions.

I agree with those who say metrics are important, and those who say we can’t let them dictate everything (and it seems that nearly everyone who weighs in on “the metrics debate” says both of these things). But I don’t feel we can have the really important conversation at that level of abstraction. Instead, the conversation we need to have is on the specifics. In our case, and that of anyone who’s interested in NYC employment assistance or global health interventions, that means asking: did we use appropriate metrics for the charities we’ve evaluated? Are there other approaches that would have been a better use of our time and resources? Did we compare our applicants as well as we could have with the resources we had?

This is a conversation with a lot of room for important and fruitful disagreement. If you’re interested in the question of metrics, I ask that you consider engaging in it – or, if you don’t find our causes compelling, conducting and publishing your own evaluations – rather than trying to settle the question for the whole nonprofit sector at once.

Politics vs. philanthropy in education research

There’s a big question in my mind about K-12 education. The question is which of the following hypotheses about helping disadvantaged children is the best bet:

  1. Disadvantaged children are so far behind by age 5 that there’s nothing substantial to be done for them in the K-12 system.
  2. Disadvantaged children are so far behind by age 5 that they need special schools, with a special approach, if they’re to have any hope of catching up.
  3. Disadvantaged children generally attend such poor schools that just getting them into “average” schools (for example, parochial schools without the severe behavior and resource problems of bottom-level public schools) would be a huge help.

My view on KIPP vs. the Children’s Scholarship Fund, for example, hinges mostly on my view of #2 vs. #3. Of course, believing #1 would make me want to avoid this cause entirely in the future. We’ve been examining academic and government literature to get better informed on this question, but we’ve noticed a serious disconnect between what we most often want to know and what researchers most often study.

To answer our question, you’d study how students do when they change schools, focusing on school qualities such as class size, available funding, disciplinary records, academic records, and demographics. However, most academic and government studies of voucher/charter programs focus instead on whether a school is designated as “public,” “private,” or “charter.”

Three prominent examples:

  • The New York City Voucher Experiment intended to examine the impact of increased choice (via vouchers) on student achievement; the papers on it (Kruger and Zhu 2003; Mayer et al. 2002; Peterson and Howell 2003) conduct a heated debate over who benefited, and how much (if at all), from getting their choice of school, but do not examine or discuss any of the ways in which the schools chosen differed from the ones students would have attended otherwise.
  • “Test-Score Effects of School Vouchers in Dayton, Ohio, New York City, and Washington, D. C.: Evidence from Randomized Field Trials” (Howell et al. 2000), a review of several voucher experiments, also discusses the impact of vouchers without reference to school qualities.
  • “Apples to Apples: An Evaluation of Charter Schools Serving General Student Populations” (Greene et al. 2003) performs similar analysis with charter schools, looking broadly at whether charter schools outperform traditional public schools without examining how, aside from structurally, the two differ.

To be sure, there are exceptions, such as recent studies of charter schools in New York including Hoxby and Murarka 2007. But in trying to examine the three hypotheses above, I’ve been struck by how often researchers pass over the question of “good schools vs. bad schools” (i.e., the 3 hypotheses outlined above) in favor of the question of “private vs. public vs. hybrid schools.”

When a debate is focused on government policy, it makes sense for it to focus on political questions, such as whether the “free market” is better than the “government.” But when you take the perspective of a donor rather than a politician, this question suddenly seems irrelevant. Some public schools are better than others; some private schools are better than others; and I, for one, would expect any huge differences to be driven more by the people, practices and resources of a school than by the structure of its funding (i.e., whether donors, taxes, parents or a mix are paying).

That’s why we’d like to see more studies targeted at donors, rather than politicians. But for it to happen, donors have demand it.

Reinventing the wheel

I think the following comment, from Andrea, is broadly representative of a common criticism we receive.

One thing Givewell is missing, and has been criticized for in the past, is that people are already working on these issues, and that one small organization like Givewell can’t solve this “problem” where all others have failed. For example, Holden’s point that he thinks global health is a cause ripe for funding has obviously already been discovered by none other than Bill Gates, employs many staff who research causes in the way Givewell proposes, but along the well-tested model Sean describes. Again, this points to the naive quality of the entire Givewell enterprise.

We believe that GiveWell has something unique to offer – but this something is not our analytical abilities, or our research process, or our “focus on results.” We believe that many grantmakers, including the Gates Foundation, may be conducting more thorough research and analysis than we can; we have believed this since the very beginning of our project; and to my knowledge, we haven’t implied otherwise at any point. (If you believe there are instances where we – GiveWell, not others writing about GiveWell – have implied otherwise, please point me to them.)

We don’t think we’re inventing the wheel; but we’re reinventing it, out of necessity, because no one else will share their blueprint. Back in August of 2006, when we were first putting serious effort into figuring out where to give, we started by calling foundations and asking them to share their research; as detailed in Appendix D of our business plan from March 2007, the consistent answer we got was that specific information on grantee results – beyond the highly general, selected parts that foundations choose to share – is confidential.

We believe that information about how to help people should never be secret. GiveWell’s uniqueness is not in its ability to conduct thorough research, but in its willingness to share it.

Microloans vs. payday loans

Phil Cubeta’s recent post about payday loans got me thinking about our choice to grant a microfinance organization in our Global Poverty cause.

We decided to grant Opportunity International for two reasons:

  • The belief that there is likely a significant shortage of access to credit in the developing world.
  • The very fact that someone repays a loan with interest likely demonstrates that the loan is used for something that is likely life-improving.

But, doesn’t the same analysis apply to payday loans? I’d bet that there’s a similar lack of credit for very small loans for borrowers with questionable credit-worthiness. And the very fact that lenders run this business likely indicates that borrowers are consistently paying back their loans, even at exorbitant interest rates (400-1000% annualized, according to the Center for Public Policy Research). The same logic that says microfinance is helping people would seem to imply that payday loans are as well.

On the other hand, it’s also possible that many borrowers are only able to repay their loans by taking out another loan – that what we’re witnessing is not a group of people getting back on their feet, but a group of people getting caught in a cycle of debt. Note that this could be numerically consistent with very high (~95%) repayment rates, the statistics commonly cited by microfinance organizations to illustrate their effectiveness in helping people – someone who borrows to pay off another loan 19 times, before finally defaulting, has a 95% repayment rate.

We’re left with two plausible yet conflicting hypotheses about the way in which the practice of making small loans at relatively high interest rates affects those in need. In one case, those in need access much needed credit (albeit at high interests rate) which allows them to weather a difficult financial period and potentially pull themselves out of poverty. In the other, those in need borrow and ultimately find themselves in a debt trap, borrowing more to repay previous loans.

We’ve generally been very frustrated with how little information we’ve been able to get on microfinance operations – who is borrowing, what they’re using the loans for, what their standard of living is, and what happens to that standard of living over time. Without this kind of information, we’re still only guessing at whether microfinance organizations and payday loan operations are helping people pull themselves out of poverty, or simply helping them get caught in cycles of debt.