The GiveWell Blog

3 . . . 2 . . . 1 . . .

This Monday, we will release the beta for the product we’ve been promising since day one: a full account of what charities we recommend, and why, in five different humanitarian causes. (Only one cause will be available at first, but the others will come online within 5-10 weeks.)

I blogged yesterday only because I had to write all that stuff to think it through well, as we make our decisions and recommendations. We promise nothing tomorrow. And if I don’t answer my email or shower this week, know that it’s because we’re working around the clock getting the site as ready as we can make it.

Get pumped.

“Dollars per life changed” metric: What is it good for?

Many moons ago, I listed the metrics we planned to use in evaluations. Well, here’s a shocker for you: when looking at actual charities’ activities, the reality is 10,000x as complicated as anything that can fit into these metrics.

We always knew we would run into the following problem: if Charity A saves 100 people from death and 1000 people from mild fever, while Charity B saves 150 people from death and 500 people from mild fever (assume the same cost), which is better? And we planned to tackle it in a slightly unusual way – rather than trying to “adjust” everything to the same terms (disability-adjusted life-years, or dollars of income gained, or some such nonsense), we would simply focus on how many lives were deeply changed. So in the case above, Charity B wins because it saved more people from death, and to heck with the fevers. Not perfect, but no solution to this problem is, and at least this approach is easy to understand and makes it easy for the donor to draw their own conclusions.

I reasoned that cases like this hypothetical should be rare anyway, as long as we’re separating charities into different causes (some aiming to educate children, others aiming to save lives, etc.) and only comparing charities when they share the same fundamental goal. For example, the charity that saves more lives is probably preventing more fevers too; when charities meaningfully diverge, it’s probably a sign that they belong in different causes.

Well, it hasn’t been even as simple as our messy expectation.

We initially wanted a cause to “save lives in Africa,” but we quickly realized that many charities are trying to prevent permanent blindness, debilitating skin disease, or a cleft lip that can lead to permanent malnutrition issues and ostracization. This isn’t exactly the same as saving a life, but doesn’t it seem pretty close? So we changed Cause 1 to “prevent death and extreme debilitation.”

Well, now we are in a pickle. What do you do when Charity A treats 10,000 cases of malaria and another performs 150 corrective surgeries? Now, bear in mind that malaria can be fatal; or it can lead to brain damage; or anemia; or it can just be a fever and stop there. And we don’t know how often it does each of these things (anyone have a source? Seriously, we can’t find it.) Now about those corrective surgeries. Some of them repair clefts (possibly, though not necessarily, life-ruining in the way described above). Others repair hand deformities or eyelid deformities, which we believe are usually cosmetic issues and not as bad as clefts (though how bad is the penalty for looking weird in these societies? We have no idea). And other surgeries are unclassified. Oh, and we don’t know for sure how many fall in each category – we just have to estimate based on past data.

How do you figure THAT one?

Here’s what I think. By default, I think the more comprehensive a charity is, the better. There are lots of things that make a community program – serving every need that everyone has – better than a narrower program (like running around distributing bednets). A more comprehensive program has tighter integration into the community, probably better relations with it, and better ability to observe it and make sure that people’s lives are improving on the whole (not trading some problems for others). On the other hand, there is exactly one thing that is worse about this approach – that you might help fewer people for the same funds. If malaria is cheap to prevent, running around and preventing everyone’s malaria could save many, many more lives than sitting in a village working on everything from AIDS to hangnail.

So, what I say is that the burden of proof is on the less comprehensive program to show that it’s getting much better results (same money) than the more comprehensive one. We’re still estimating our “dollars per life changed” metrics, using largely the same philosophy we started with – just focus on the really huge life changes – because we want to see if one program is obviously more cost-effective than another. If one program saves 10-100x as many lives as another (and I’d include cleft repair in “saving lives” for only this purpose), we’ll take our chances on it. But if it’s close at all, we’ll go with the communal/comprehensive program.

Since so much is unknown, that often makes things tough for distribution-type programs, but that matches with common sense too. If you’re trying to help people in a faraway land, you’d better be really confident to run around the countryside with nets, instead of sitting down in a village where you can see everything that’s going on.

So that’s what I’m thinking. Keep quantifying charity, as messily as we have to, knowing that we’re only going to use the wacky numbers we come up with if they show enormous differences in cost-effectiveness. Otherwise, we’re giving the edge to charities that work with one person/community at a time (instead of one problem at a time). What do you think?

Another frustrating article about fixing education

Eduwonkette (if you’re interested in the cause of education, stop reading so many philanthropy blogs and subscribe to Eduwonkette) links to an exciting-sounding article about a series of innovative charter schools. Imagine how excited I get by a passage like this:

“On graduation rates, on test scores, on teacher pay — on just about anything you associate with school reform — we have kicked the district’s butt. There’s nobody in America who has taken the same kind of kids in the same kinds of areas and the same dollars and narrowed the achievement gap like we have.” …

It’s been six years since the 47-year-old Barr launched his personal variant on the charter-school formula, Green Dot Public Schools, then lured 500 kids (and their supportive parents) away from nearby — and academically disastrous — Lennox High School in Boyle Heights. To the consternation of L.A. Unified officials, Barr created Animo Leadership Charter High School with the aim of showing what he could do with $1,200 less per student than L.A. Unified and most big-city districts in California spend. His goal was to accomplish what California schools have failed to achieve for nearly 30 years: turn functionally illiterate and grossly undereducated urban freshmen into literate, math-competent, college-ready graduates who can compete with the graduates of rich-kid Harvard-Westlake.

Yeehaw! Let’s go! You’re right, I haven’t seen any other school that can convincingly demonstrate it’s pulled off this incredibly difficult task! How did you do it? Is it the hiring practices, the classroom protocols, what’s the secret?

Oh, and by the way, what is the actual evidence that you’ve closed the achievement gap for your students? From the way you talk, it sounds like you’ve just humiliated the public system – so you’ve been following your kids through college and seeing that they turn out just as well as Harvard-Westlake kids, or what?

Here it is:

So far, early returns from his 10 schools show a graduation rate double that of LAUSD’s sad results. While the data is too new to be earth-shatteringly conclusive …

Oh. Hrmm. Early returns … not conclusive … also, only graduation rates are mentioned, and of course graduation rates are one of the easier metrics to play fast and loose with

And – oh dear –

Duffy has claimed that Barr selectively handpicks only four students out of every 10 in the neighborhoods where he opens a new school … ““We know they have the ability to [skim the cream]. We just don’t have any way of verifying what they do.”

So, we have a school that doesn’t admit all comers, we’re not sure exactly how they choose to admit students, they could be creaming and they could not be, and whether or not they are, their graduation rates might represent progress, lowered graduation requirements, or a fluke … and people are celebrating with grants and editorials about the end of the achievement gap.

In other words, I just re-read the ol’ article about NYC charters, pretty much word for word.

My favorite part of this one is this quote: “‘Eli Broad doesn’t write a check if we are marginally better,’ Barr concludes. ‘People don’t write editorials about us because we’re not successful.’” I beg to differ. Apparently, people write checks and editorials because in the absence of any school conducting true randomized trials and thorough followup, a couple preliminary and questionable stats plus some flashy rhetoric is what passes for both news and hope.

That’s where we’re going to be stuck until and unless an education charity not only figures out how to make a difference, but makes a good, methodologically sound demonstration of it. In the meantime, giving to an education charity with poor self-evaluation seems like giving to a cancer research laboratory with no pens, paper or computers.

What I wrote in the Gates Foundation survey

Hey, the Gates Foundation is finally showing some interest in what others think. Here’s what I wrote in response to “Please share any comment or opinions you have about our web site.”

Not enough information. Not even close to enough information. Where are your evaluations & technical reports? Where’s your evidence for whether you’ve saved lives, how many lives you’ve saved, whether you’ve made any traction on education, what you think works in education? All I see is a list of grantees with 2-sentence descriptions (no account of how you choose them over other applicants) and a bunch of generalizing, salesy publications that don’t get specific about what was observed, how it was observed, and what specific practices you advocate. Glad you’re finally inviting feedback, because there are other people in existence trying to figure out where to give. As the leading foundation, you have an opportunity to create dialogue about how to help people as well as possible, and affect others’ giving as well as your own. This website does a great job burying that opportunity. Also, the look&feel is pretty drab.

Eat a peanut, save a child!!

What would you do to make the world a better place?

Would you …

Of course you would. And of course you can’t.

If people buy from (or look at ads for) companies based on their donations to charity, I promise the prices will rise by however much they’re donating. When sponsors give to charity for every home run the White Sox hit, they have a range in mind – and you can bet that if they end up owing way more than expected, somehow this is going to come out in next year’s pledge (even if it’s through their insurance company’s quote, etc., etc. …)

I’ve debated the specifics of various schemes along these lines before. Right now, I want to drop my usual preference for concreteness, step back from all the mechanics, and just be as general as I can be.

If someone tells you you can make a difference without either giving up anything valuable or doing anything useful, they are wrong.

When trying to figure out which schemes work and which don’t, it seems that’s about all you need to know.

As for the notion that these kinds of schemes “raise awareness” … man, I’m sick of hearing about “raising awareness.” If you’re not, just – read this.

I’m surprised by how many otherwise intelligent people get pulled in by promises of saving the world by yawning. Sure, they’re tempting, and it can be very complicated and confusing to figure out precisely why they don’t work … but the fundamental problem couldn’t be more obvious. Please don’t fall for this stuff. That’s all.

What’s so hard about rigorous self-evaluation?

I’m not trying to be a jerk. I honestly want to know.

Most of the self-evaluation we see from charities looks at clients, but doesn’t compare them to (otherwise similar) non-clients. So it’s probably effective at preaching to the choir, but not at winning over skeptics. When we bring up the issues with it, we constantly hear things like “Of course we’d love to do a randomized/longitudinal study, but those are expensive and difficult and the funding isn’t there.”

This is how I imagine an interested charity could evaluate itself rigorously:

  1. Use a lottery to pick clients. Many/most humanitarian charities don’t have enough money to serve everyone in need. (And if they do have enough, there’s an argument that they don’t need more.) Instead of dealing with this issue by looking for the “most motivated” applicants, or using first-come first-serve (as most do), get everyone interested to sign up, fill out whatever information you need (including contact info), etc. – then roll the dice.

    Unethical? I don’t think so. I don’t see how this is any worse than “first-come first-serve.” It could be slightly worse than screening for interest/motivation … but (a) it’s also cheaper; (b) I’m skeptical of how much is to be gained by taking the top 60% of a population vs. randomly selecting from the top 80%; (c) generating knowledge of how to help people has value too.

    Cost: seems marginal, cheaper than using an interest/motivation screen. You do have to enter the names into Excel, but for 1000 people, that’s what, 5 hours of data entry?
    Staff time: seems marginal.

  2. Get contact information from everyone. This is actually part of step 1, as described. These days you can get on the Web in a library, homeless people blog, and I’m guessing that even very low-income people can and do check email. Especially if you give them a reason to (see below).

    Cost: see above
    Staff time: none

  3. Follow up with both clients and randomly selected non-clients, offering some incentive to respond. Incentives can vary (many times, with clients, providing information can be a condition of continuing support services). But if push comes to shove, I don’t think a ton of people would turn down $50.

    Cost: worst case, sample 100 clients and 100 non-clients per class and pay them $50 ea. That’s a decent amt of sample size. Follow up with each cohort at 2, 5, and 10 years. That’s a total of 600 people followed up with = $30k/yr, absolute maximum.
    Staff time: you do have to decide how to follow up, but once you’ve done that, it’s a matter of sending emails.

  4. Check and record the followup responses. If possible, get applicants to provide proof (of employment, of test scores) for their $50. Have them mail it to you, and get temps to audit it and punch it into Excel.

    Cost: assuming each response takes 30min to process and data entry costs $10/hr, that’s $3k/yr.
    Staff time: none.

  5. And remember: Have Fun! Did you think I was going to put something about rigorous statistical analysis here? Forget it. Data can be analyzed by anyone at any time; only you, today, can collect it. When you have the time/funding/inclination, you can produce a report. But in the meantime, just having the data is incredibly valuable. If some wealthy foundation (or a punk like GiveWell) comes asking for results, dump it on them and say “Analyze it yourself.” They’re desperate for things to spend money on; they can handle it.

    (Also, I’m not a statistics expert, but it seems to me that if you have data that’s actually randomized like this, analyzing it is a matter of minutes not hours. Writing up your methodology nicely and footnoting it and getting peer-reviewed and all that is different, but you don’t need to do that if you’re just trying to gauge yourself internally.)

The big key here, to me, is randomization. Trying to make a good study out of a non-randomized sample can get very complicated and problematic indeed. But if you separate people randomly and check up on their school grades or incomes (even if you just use proxies like public assistance), you have a data set that is probably pretty clean and easy to analyze in a meaningful way. And as a charity deciding whom to serve, you’re the only one who can take this step that makes everything else so much easier.

I tried to be generous in estimating costs, and came out to ~$35k/yr, almost all of it in incentives to get people to respond. Nothing to sneeze at, but for a $10m+ charity, this doesn’t seem unworkable. (Maybe that’s what I’m wrong about?) And this isn’t $35k per study – this is $35k/yr to follow every cohort at 2, 5, and 10 years. That dataset wouldn’t be “decent,” it would be drool-worthy.

And the benefits just seem so enormous. First and foremost, for the organization itself – unless its directors are divinely inspired, don’t they want to know if their program is having an impact? Secondly, as a fundraising tool for the growing set of results-oriented foundations. Finally, just for the sake of knowledge and all that it can accomplish. Other charities can learn from you, and help other people better on their own dime.

The government can learn from you – stop worrying about charity replacing government and instead use charity (and its data) as an argument for expanding it. In debates from Head Start to charter schools to welfare, the research consensus is that we need more research – and the charities on the ground are the ones who are positioned to do it, just by adding a few tweaks onto the programs they’re already conducting.

So what’s holding everyone back? I honestly want to know. I haven’t spent much time around disadvantaged people and I’ve never run a large operation, so I could easily be missing something fundamental. I want to know how difficult and expensive good self-evaluation is, and why. Please share.