The GiveWell Blog

Finals

This post is more than 17 years old

Over the last four weeks, Elie and I have read through the ~160 applications we received, and now we’re wrapping up Round One and getting ready to get deeper into the issues and the charities we’ve picked as finalists.

Unfortunately, putting the apps themselves on the web is going to take a while, just because of boring technological reasons (as well as the need to remove all the confidential stuff, which fortunately is mostly just salary info). We’ve been focusing entirely on picking finalists, so that we can get our Round 2 apps to them as quickly as possible. In the meantime, here’s a rundown of what we’ve done & where we stand.

I believe this is the most complete description you will find anywhere (in public) of what criteria a grantmaker used to pick between applicants. If I’m wrong, send me the link.

Basic approach

Round One is mostly about practicality. If we tried to do back-and-forth questioning with – and gain full understanding of – all our applicants, we’d have no grant decisions and nothing to show for years. So the basic question we’ve been asking is less “Which charities have great programs?” and more “Which charities are going to be able to demonstrate their effectiveness, using facts rather than impressions?”

It’s possible that if we spent 10 years working with a charity, we could be convinced of its greatness despite no hard data (although I think this is less possible than most people think: when you’re aiming to make a permanent difference in clients’ lives, you can’t just rely on how they behave for the time you know them). As we seek to serve individual donors, we need to make decisions that not only ring true to us, but that we can justify to people who’ve never met us or our applicants.

We asked each applicant to pick one program to focus on for now – rather than trying to start broad, we are looking for charities that have at least one instance of documenting and evaluating a program rigorously.

Africa (Causes 1 and 2)

Saving lives and improving living standards are so intimately connected to each other – and so many of our finalists do quite a bit of both – that for now we’re treating this applicant pool as one. We had a total of 71 applications in these causes.

The Aga Khan Foundation, Food for the Hungry, Population Services International, Partners in Health, the American Red Cross, Opportunities Industrialization Centers International, and Project HOPE are all mega-charities that do many things in many places, but gave us reason to think we can understand them, by submitting very detailed, concrete accounts of their featured programs. In each case, we could get a feel for the how many people benefited and how they benefited from the program, whether because the charity tracked clients directly (as Partners in Health did) or because it submitted a strong independent research case for similar programs (as the Red Cross did with its bednet distribution program). We sent each of these charities The Matrix, talked to each on the phone, and got mixed reactions: some are sending us what they have, others are sending us their own “dashboards,” and one (Red Cross) has decided that the time cost of Round 2 is too high.

The International Eye Foundation and Helen Keller International are less broad, each focusing on eye care and preventable blindness. Like the megacharities above, each sent us a very strong, clear, quantified account of an individual program. We sent them “mini-Matrix” applications to get a better sense of their eye care activities.

Opportunity International and the Grameen Foundation both focus on microfinance. Opportunity International did the best job of any microfinance organization giving a sense that its activities have led to improved lives (in other words, going beyond “number of loans” and giving livings-standards-related figures); Grameen Foundation’s application led us, through a series of footnotes, to the best overview I’ve seen of the general effects of microfinance (although as I’ll write in the future, it leaves much to be desired). We sent each a motherlode of questions about the regions they work in and the people they lend to, so that we can get a better understanding of how this complex strategy impacts their clients’ lives.

Interplast, KickStart, the HealthStore Foundation, and the Global Network for Neglected Tropical Diseases all have relatively simple, unified, and highly compelling models for helping people. KickStart develops and markets irrigation technology to improve living standards; Interplast treats congenital defects; the HealthStore Foundation franchises locals to sell medicine; and GNNTDC focuses specifically on intervention campaigns for the “diseases of poverty” that don’t get much mainstream attention (onchocerciasis, anyone?) Of these, only KickStart provided the depth of information and measurement that we wanted to see, but we are hopeful that we’ll be able to get there more easily than for more sprawling charities. We sent each of these four charities highly tailored applications, asking very specifically for the information we need to understand their effects.

And those are the 15 finalists for helping people in Africa. No other applicants gave us a concrete sense of how many people they’d helped and how – they gave descriptions of their activities, anecdotes, newspaper articles, survey data (which I’m very skeptical of, as I explained recently), and often very strong evidence of the size of the problems they were attacking (i.e., disease X kills Y people a year). My take is, when you’re helping people thousands of miles away and choosing between hundreds of possible strategies, none of that is enough, because none of that tells you whether (and how much, and how cost-effectively) your program has worked (improved people’s lives). You need a sense for how many lives your activities have changed … at least for one program. Others may disagree, but that’s how we made the cut.

Early childhood care (Cause 3)

We are basically stalled on this one. Our 14 applicants sent piles of data about how they interact with children … but how does this translate to later life outcomes? We asked in our application, but no one answered – we got nothing on test scores, grades, or anything else that happens once the children enter school.

We haven’t yet named finalists. We are going to continue our research on early childhood care, this time looking specifically for proven best practices, then return to the applications to compare these best practices with the practices of our applicants.

K-12 education (Cause 4)

Note: This section was edited on 9/8/2007 to reflect our announcement of finalists.

We received 50 applications for this cause. Because the achievement gap is such a thorny problem, we focused as much as possible on charities with rigorous evidence that they’ve improved academic outcomes for disadvantaged kids. We used the following principles:

  • What counts in this cause is academic performance. Graduation rates, college enrollment rates, attendance, grade promotion, test scores. Raising children’s self-esteem may be valuable, but if that can’t be shown to translate to better performance (specifically or generally), it doesn’t make a strong Cause 4 applicant. And we trust survey data (kids’, parents’ and teachers’ perceptions of their performance) much less than behavior data, which in turn we trust much less than performance data. I believe there are a million psychology studies that will back me up on this mistrust of surveys, but I don’t have time to compile them now – we will do so by the time we go live with our website.
  • Context is key. It isn’t enough to see that test scores improved over some time period; test scores usually improve with age (as you can see when you look at citywide and state-wide data, which the stronger applicants provide). To us, evidence that a program worked means evidence that the participants outperformed some comparable “control group.” Some control groups are better designed than others: showing that your students had a higher graduation rate than the city as a whole is less compelling than showing that they had a higher graduation rate than students from similar areas.
  • Selection bias is dangerous. This famous paper is one example of the danger of selection bias: if you offer a program to help children succeed, the more motivated children (and families) will likely be the ones to take advantage. Child and family motivation is incredibly important in education (we will also be sure to provide the research case for this when the time comes, though it should intuitively click). One applicant referred us to studies showing that participants in chess programs outperform participants in other extracurriculars … common sense tells us that the teaching power of chess likely has less to do with this than the fact that, to put it bluntly, kids who are nerds get better grades. Our strongest applicants showed or referred us to evidence that is at least attempting to get around this tricky problem, either through randomization or through the (less powerful but still something) technique of comparing changes in test scores.
  • We want evidence that a program has worked before. A couple applicants submitted extremely rigorous, methodologically strong studies showing practically no difference between their participants and a control group. I love these applicants, honestly. It’s fantastic to measure yourself, find failure where you didn’t expect it, and openly share it with others. I wish I saw more of it, and in fact I’ve been toying with the idea of offering a “Failure Grant” sometime in the future to organizations that can convincingly show why a program is failing and what they can do about it. But for our pilot year, we are looking for proven, effective, scalable ways of helping people, not just strong research techniques.

Learning through an Expanded Arts Program (LEAP) promotes a specific in-school curriculum, and attached a study showing that children randomly selected to receive this curriculum tested better than those who had been randomly selected not to. Children’s Scholarship Fund provides scholarships for private school to K-8 students, and attached a study on very similar programs that also employed randomization (although it showed better results only for African-American children, something we find fishy – we need to investigate more to see whether this study may have cherry-picked results). Teach for America, which trains and subsidizes recent college graduates to teach in disadvantaged districts, attached a broad study of effects at several of its sites, finding that its teachers’ students perform about the same on reading and better on math. KIPP, Achievement First, and Replications Inc. all are in the business of creating new schools, and all examined the changes in their students’ test scores versus those of students in nearby districts – far from a perfect examination, but more convincing than anything else we’ve seen from similar programs. New Visions for Public Schools, a school support organization that we’ve known about for a long time (it’s even one of our old recommended orgs from the part-time version of our project last year), looked at a series of schools it created by matching its students to students in similar districts with similar academic records, and found consistently improved attendance and grade promotion rates (though test score outcomes were mixed).

Student Sponsor Partners claims to seek out “academically below-average” children and pay their way through private high school – we need more detail on its selection process, but if it is avoiding bias, then its results (substantially higher graduation rates, and enrollment in better colleges, compared to kids from the same middle schools) are impressive. The LEDA Scholars program, an extremely selective program that attempts to raise college expectations for top disadvantaged/minority students, attached a study showing much better results for its scholars than for those who made its final round. (How comparable are these two groups? We need to find out a lot more about this.) Harlem Center for Education and Double Discovery Zone both run the Federal Talent Search program, which, as HCE pointed out, has been shown preliminarily to increase college matriculation rates compared to a somewhat reasonable (though not perfect, as I’ll discuss next week) comparison group. Finally, the St. Aloysius School runs all the way from preschool through 8th grade, with high school support – an intensity of intervention that we haven’t seen in any other program – and claims a 98% graduation rate. We need more info, but we felt we had to check this one out a bit further.

That’s 12 finalists. Some of the studies leave a lot of questions unanswered, some have small sample size, etc., but in the end, we feel that this is a good representative group of different kinds of programs, and that these are the programs that have the best shot at really convincing us that they’ve improved academic outcomes. Other submissions either had no data on academic outcomes, or provided no useful context for this data (i.e., a comparison group that there’s at least some reason to believe is appropriate to compare), or showed little to no apparent impact of their program.

We didn’t take any afterschool/summer programs, because none were able to show improved academic outcomes either for their own program or for a very similar one, and we haven’t encountered any independent evidence that afterschool/summer programs in general (whether academically or recreationally focused) have the kind of impact we’re looking for.

Helping NYC adults become self-supporting (Cause 5)

We’ve already been over our take on this cause: most applicants run similar programs, and we are continuing with the ones that sent us data not just on how many people have been placed in jobs, but how many have remained in those jobs 3-24 months later. The HOPE program, Vocational Foundation, Year Up, Catholic Charities of NY, and Covenant House all fit this bill; Highbridge Community Life Center and St. Nick’s Community Preservation Corp. – did not provide retention data, but gave such a thorough description of the jobs (and job markets) they prepare clients for that we felt we had a strong sense of likely outcomes for their clients.

That’s seven finalists, from a field of 21.

A couple things we didn’t consider

We didn’t give any points for newspaper articles, third-party awards, etc. unless the relevant attachment linked us to some useful data. We believe that donors and the media (and possibly foundations too) evaluate charities on the wrong criteria, which is why we exist in the first place.

We took a mostly “black box” approach, paying much more attention to the evidence that a tactic has worked than to the details of the tactic. The fact is, nearly every charity we looked at is doing something that makes sense and could logically improve people’s lives. Almost none of them are doing something that makes tons more sense than the others. We read through all the descriptions of programs, and we made sure to include at least one organization for any model we found extremely unique and compelling, but it was the evidence for effectiveness that carried most of the weight in determining finalists.

We paid as little attention as we could to how well written the application was, and anything else that would favor a good grantwriter over a bad one. We simply see no reason to think that the organizations with better fundraisers would be the ones with better programs. Of course, a grantwriter who didn’t send any evidence of effectiveness killed their org’s chances, which is something we have to live with. But I think we asked for this evidence pretty clearly in our application.

On fallibility, uncertainty, and the transient nature of truth

I think I said it best in my email to the organizations that didn’t pass Round 1:

Practical considerations have made it necessary to make our decisions based on limited information. We know we haven’t gotten a complete picture of the issues or of any organization. Our decision doesn’t reflect a belief that your organization is ineffective; it merely reflects that the application was not one of the strongest we received.

Might there be applicants who have all the data we want, and just misread the application? Yes. Might there be applicants who do wonderful work and don’t measure it? Absolutely. Might we be making faulty assumptions about the issues and what matters? You betcha.

But our goal isn’t perfection, our goal is to spend money as well as possible and start a dialogue about how we did it. We had to cut a field of over 150 applications down to something that we can work with. What I’ve done above is given you a full account of how we did it, and once we post the applications (and announce it here) you’ll be able to compare our principles with the applications themselves. Is our reasoning clear? Concrete? Reasonable? Insane? Speak up.

Comments

  • Have you figured how much time is expended in the process, by all the charities and by your team, and divided those hours into the total grant money? How time-effective and efficient is the process considered in toto? What is the opporunity cost of the time invested by these many charities? I will be interested as you go along to see if you come up with ways to streamline the processes, or have suggestions on how the field might do that.

  • Thanks for the comment, Phil. We talk and think about the issue of this “overhead” a ton. Elie and I have actually been arguing a little about how to answer you, since we have a lot to say. Here are a few thoughts.

    On the question of how big our “overhead” is: we’ve spoken with every applicant over the phone to be clear that we expect them to make reasonable judgments about what is and isn’t worth their time (and that we won’t be offended if they need to pull out or decline to send certain information, because of the time/money tradeoff). One thing we’ve discovered is that we work almost exclusively with development officers – i.e., people whose job it is to apply for grants and raise money. I think the amount of time we’re “taking away from program” is miniscule.

    The amount of money, on the other hand, is substantial. For our startup year, our operating costs are almost equal to what we’re giving away. (All of our donors are clear on this.) That raises the question: would you rather donate $X with this level of thought behind it, or twice as much the traditional way, i.e., basically random?

    This is impossible to answer quantitatively (at least right now, though we’re thinking of ways to do it later). But to me, the gap is conceptually/intuitively so huge that I don’t have much need to quantify it. This is not true of the gap between, say, afterschool tutoring and charter schools – it isn’t clear to me at all which one would work better, or indeed whether either works at all.

    Which brings me to what’s really the most important point here, going back to the post I made a few days ago: we aren’t, by default, convinced that any of these programs is helping people at all. The set of factors – physical, structural, cultural – holding back a disadvantaged NYC student is so large and complex that the effectiveness of any program is far from a given. Maybe it simply can’t be done, at least not on a small scale. Maybe an intense charter school can compensate for the other factors. Maybe all you need is a 2-hour-a-week program that makes school fun for kids and shifts their priorities. Well? Which is it?

    Don’t you think there’s more to be gained from examining that question, than from saving an extra couple hundred grand so we can put an extra 50 kids through that charter school or afterschool program?

    One more thing: though it seems clear to me that the overhead of evaluation is worth it, I agree with you that we should be thinking about how to make that overhead as low and efficient as possible. The most obvious step that jumps to mind is reducing the amount of time people spend repeating the evaluation. Thousands of foundations all over the country are working as hard and demanding as much info as we are … we’re just the only ones being public about it. For crying out loud, if Robin Hood didn’t keep all its data locked in a vault, this whole project would have cost nothing (no time, no money). Once we publish what we find, I’m hoping that everyone – the donors looking for a place to donate, the grantmakers looking for more detailed info, and the grant writers wondering how decisions are made and how they should be crafting their apps – will save a whole lot of everything (time, money, and especially children).

    What do you think?

Comments are closed.