The GiveWell Blog

Our advice re: donations for Pakistan flood

We’ve been researching the cause of disaster relief, with the goal of doing a better job than we have in the past serving the donors who come to us for help in the wake of a crisis. At this point our research is still in progress, but we can offer some basic advice to donors interested in helping as effectively as possible:

  1. Give money; don’t give anything else. This has been one of the strongest and most agreed-upon recommendations of the “smart giving” community in general, and we join the broad consensus. Money enables organizations to provide what’s most needed. By contrast, “in-kind donations” need to be transported overseas; then agencies need to sort what’s useful from what isn’t; finally, they need to deal with non-useful supplies. This can worsen already-formidable logistical challenges, and in the end the costs of transportation and allocation can be greater than the value of the goods themselves.

    For more, see our argument against in-kind donations from earlier this year (including a citation of USAID’s statement that in-kind donations are “most often inappropriate”), Alanna Shaikh’s discussion of in-kind donations on Aid Watch, and Saundra Schimmelpfenig’s 32 posts on the topic.

  2. Don’t give to an organization you’ve never heard of or an organization that calls you on the phone. This is common sense, a matter of being proactive with your giving (seeking to do as much good as possible) rather than reactive (giving to whoever approaches you and thus making yourself an easy potential victim for scams). We think it is especially risky to give over the phone, or in direct response to a mailing.
  3. Consider the following key issues for an organization you’re donating to: (a) transparency and accountability – giving details on how much they seek, how much they’ve raised, how much they’ve spent, plans for any excess funds, and as much detail as possible on how they’ve spent funds and what they’ve done; (b) response capacity – having significant staff on the ground in relevant areas prior to the disaster striking; (c) quality of response – doing competent work that is well-matched to local needs; (d) quality of everyday activities – since your donation may effectively fund non-disaster-relief efforts, we think it’s important that an organization disclose information about what its other activities are and how they are evaluated.
  4. Consider that disaster relief may not be the best use of your donation. We have argued before that disaster relief may be less cost-effective than everyday international aid, especially when the disaster in question is a heavily publicized one (and thus one that may have money pouring in past the point of diminishing returns). Preliminarily, it appears that the Pakistan effort has been much less well-funded than the Haiti effort, but it’s worth keeping an eye on the numbers, and it’s always worth considering giving to an outstanding organization that is helping people in need on a day-to-day basis, without the media coverage that comes with a disaster.

Our recommended organizations

Our key questions for organizations are listed above. Generally, we’ve found that most large, reputable organizations score fairly well on two of our criteria: they are fairly strong on the transparency/accountability front, and they often have existing field presences in at-risk regions. The level of disclosure about non-disaster-relief activities varies widely but is often weak; we have not yet found a good way of determining the quality of aid. With that in mind, the organizations that have stood out to us so far (very much subject to change) are:

  • Population Services International (PSI). PSI is one of our top charities for its everyday work; its level of transparency about its activities and the evaluation of them is outstanding. (See our review for details.) It has been in Pakistan for over 20 years (source).
  • Medecins Sans Frontieres (MSF). We have been impressed with MSF’s past transparency about its limited need for funds, something we haven’t seen in any other organization. Its activity reports give a fairly clear picture of its activities around the world, and we are impressed with its public site publishing field research, something we’ve seen from few other large/diverse international aid organizations (PSI and CARE are others). We find its field news to be more detailed and specific than the press releases of most other organizations (a notable exception is the Red Cross, discussed immediately below).
  • Red Cross. The International Federation of the Red Cross and Red Crescent Societies seems to freely provide the most specifics on exactly how much money it has sought and has spent and exactly what it has done. See its country list for links to all of its many past reports. Donating to the Red Cross (whether the American Red Cross or the Red Cross in another country) may be an “obvious” choice, but we think it is also a very defensible one; the Red Cross probably receives more scrutiny, and pressure to be clear about what it is doing, than anyone else, and because of its size and name recognition it may also be particularly well-positioned to carry out a lot of relief while staying coordinated with the government.

These are only preliminary impressions – much more is coming on the topic, and we may change our conclusions about which organizations are best to give to – but as there is a disaster unfolding now, we thought we’d share what we’re thinking.

High-quality study of Head Start early childhood care program

Early this year, the U.S. Department of Health and Human Services released by far the most high-quality study to date of the Head Start childhood care program. I’ve had a chance to review this study, and I feel the results are very interesting.

  • The study’s quality is outstanding, in terms of design and analysis (as well as scale). If I were trying to give an example of a good study that can be held up as a model, this would now be one of the first that would come to mind.
  • The impact observed is generally positive but small, and fades heavily over time.

The study’s quality is outstanding.

This study has almost all the qualities I look for in a meaningful study of a program’s impact:

  • Impact-isolating, selection-bias-avoiding design. Many impact studies fall prey to selection bias, and may end up saying less about the program’s effects than about pre-existing differences between participants and non-participants. This study uses randomization (see pages 2-3) to separate a “treatment group” and “control group” that are essentially equivalent in all measured respects to begin with (see page 2-12), and follows both over time to determine the effects of Head Start itself.
  • Large sample size; long-term followup. The study is an ambitious attempt to get truly representative, long-term data on impact. “The nationally representative study sample, spread over 23 different states, consisted of a total of 84 randomly selected grantees/delegate agencies, 383 randomly selected Head Start centers, and a total of 4667 newly entering children: 2559 3-year-olds and 2108 4-year-olds” (xviii). Children were followed from entry into Head Start at ages 3 and 4 through the end of first grade, a total of 3-4 years (xix). Follow-up will continue through the third grade (xxxviii).
  • Meaningful and clearly described measures. Researchers used a variety of different measures to determine the impact of Head Start on children’s cognitive abilities, social/emotional development, health status, and treatment by parents. These measures are clearly described starting on page 2-15. The vast majority were designed around existing tools that seem (to me) to be focused on collecting factual, reliable information. For example, the “Social skills and positive approaches to learning” dimension assessed children by asking parents whether their child “Makes friends easily,” “Comforts or helps others,” “Accepts friends’ ideas in sharing and playing,” “Enjoys learning,” “Likes to try new things,” and “Shows imagination in work and play” (2-32). While subjective, such a tool seems much more reliable (and less loaded) to me than a less specified question like “Have your child’s social skills improved?”
  • Attempts to avoid and address “publication bias.” We have written before about “publication bias,” the concern that bad news is systematically suppressed in favor of good news. This study contains common-sense measures to reduce such a risk:
    • Public disclosure of many study details before impact-related data was collected. We have known this study was ongoing for a long time; baseline data was released in 2005, giving a good idea of the measures and design being used and making it harder for researchers to “fit the data to the hoped-for conclusions” after collection.
    • Explicit analysis of whether results are reliable in aggregate. This study examined a very large number of measures; it was very likely to find “statistically significant” effects on some purely by chance, just because so many were collected. However, unlike in many other studies we’ve seen, the authors address this issue explicitly, and (in the main body of the paper, not the executive summary) clearly mark the difference between effects that may be an artifact of chance (even though “statistically significant,” finding some effects of comparable size was quite likely due to the large number of measures examined) and effects that are much less likely to be an artifact of chance. (See page 2-52)

  • Explicit distinction between “confirmatory” analysis (looking at the whole sample; testing the original hypotheses) and “exploratory” analysis (looking at effects on subgroups; looking to generate new hypotheses). Many studies present the apparent impact of a program on “subgroups” of the population (for example, effects on African-Americans or effects on higher-risk families; without hypotheses laid out in advance, it is often unclear just how the different subgroups are defined and to what extent subgroup analysis reflects publication bias rather than real impacts. This paper is explicit that the only effects that should be taken as a true test of the program are the ones applying to the full population; while subgroup analysis is presented, it is explicitly in the interest of generating new ideas to be tested in the future. (See page xvi)
  • Charts. Showing charts over time often elucidates the shape and nature of effects in a way that raw numbers cannot. See page 4-16 for an example (discussed more below).

The least encouraging aspect of the study’s quality is response rates, which are in the 70%-90% range (2-19).

In my experience, it’s very rare for an evaluation of a social program – coming from academia or the nonprofit sector – to have even a few of the above positive qualities.

Some of these qualities can only be achieved for certain kinds of studies (for example, randomization is not always feasible), and/or can only be achieved with massive funding (a sample this large and diverse is out of reach for most). However, for many of the qualities above (particularly those related to publication bias), it seems to me that they could be present in almost any impact study, yet rarely are.

I find it interesting that this exemplary study comes not from a major foundation or nonprofit, but from the U.S. government. Years ago, I speculated that government work is superior in some respects to private philanthropic work; if true, I believe this is largely an indictment of the state of philanthropy.

The impact observed is positive, but small and fading heavily over time.

First off, the study appears meaningful in terms of assessing the effects of Head Start and quality child care. It largely succeeded in separating initially similar (see page 2-12) children such that the “treatment” group had significantly more participation in Head Start (and out-of-home child care overall) than the “control” group (see chart on page xx). The authors write that the “treatment” group ended up with meaningfully better child care, measured in terms of teacher qualifications, teacher-child ratios, and other measures of the care environment (page xxi). (Note that the program only examined the effects of one year of Head Start: as page xx shows, “treatment” 3-year-olds had much more Head Start participation than “control” 3-year-olds, but the next year the two groups had similar participation.)

The impacts themselves are best summarized by the tables on pages 4-10, 4-21, 5-4, 5-8, 6-3, 6-6. Unlike in the executive summary, these tables make clear which impacts are clearly distinguished from randomness (these are the ones in bold) and those that are technically “statistically significant” but could just be an artifact of the fact that so many different measures were examined (“*” means “statistically significant at p=0.1”; “**” means “statistically significant at p=0.05”; “***” means “statistically significant at p=0.01” and all *** effects also appear to be in bold).

The basic picture that emerges from these tables is that

  • Impact appeared encouraging at the end of the first year, i.e., immediately after participation in Head Start. Both 4-year-olds and 3-year-olds saw “bold” impact on many different measures of cognitive skills, as well as on the likelihood of receiving dental care.
  • That said, even at this point, effects on other measures of child health, social/emotional development, and parent behavior were more iffy. And all effects appear small in the context of later child development – for example, see the charts on page 4-16 (similar charts follow each table of impacts).
  • Impact appeared to fade out sharply after a year, and stay “faded out” through the first grade. Very few statistically significant effects of any kind, and fewer “bold” ones, can be seen at any point after the first year in the program. The charts following each table, tracking overall progress over time, make impact appear essentially invisible in context.
  • I don’t think it would be fair to claim that impact “faded out entirely” or that Head Start had “no effects.” Positive impacts far outnumber negative ones, even if these impacts are small and rarely statistically significant. It should also be kept in mind that this many of the families who had been lotteried out of Head Start itself had found other sources of early child care (xv); because it was comparing Head Start to alternative (though apparently inferior, as noted above) care, rather than to no care at all, effects should not necessarily be expected to be huge.

Takeaways

The impact of Head Start shown here is highly disappointing compared to many of its advocates’ hopes and promises. It is much weaker than the impact of projects like the Perry Preschool program and the Carolina Abecedarian program, which have been used in the past to estimate the social returns to early childhood care. It is much weaker than the impact that has been imputed from past lower-quality studies of Head Start. It provides strong evidence for the importance of high-quality studies and the Stainless Steel Law of Evaluation, as well as for “fading impacts” as a potential problem.

I don’t believe any of this makes it appropriate to call Head Start a “failure,” or even to reduce its government funding. As noted above, the small impacts noted were consistently more positive than negative, even several years after the program; it seems clear that Head Start is resulting in improved early childhood care and is accomplishing something positive for children.

I largely feel that anyone disappointed by this study must have an unrealistic picture of just how much a single year in a federal social program is likely to change a person. The U.S. achievement gap is complex and not well understood. From a government funding perspective, I’m happy to see a program at this level of effectiveness continued. When it comes to my giving, I continue to personally prefer developing-world aid, where a single intervention really can make huge, demonstrable, lasting differences in people’s lives (such as literally saving them) for not much money.

Needed from major funders: More great organizations

In the wake of the recent Giving Pledges, we’ve been discussing what advice we’d give a major philanthropist (aside from our usual plea to conduct evaluations and share them publicly).

For the most part, our recommendations and criteria are aimed at individual donors, not major philanthropists. We stress the value of given to proven, cost-effective, scalable organizations rather than funding experiments, but we don’t feel that this advice applies to major philanthropists – taking risks with small, untested organizations and approaches makes a great deal of sense when you have the time and funds to follow their work closely, hold them accountable, and perform the evaluation that will hopefully show you (and possibly/eventually the world) how things arae going. However, we do have some thoughts on the kind of risk that’s worth taking.

One of our biggest frustrations in trying to help individual donors has been the difficulty of finding organizations, as opposed to programs or projects, we can be confident in. As we have discussed in our series on room for more funding, we feel that donors can’t take “restricted gifts” at face value, and that they must ultimately either find an organization they can be confident in as a whole or one with a clear and publicly disclosed agenda for it would do with more funding. Such organizations have proven very difficult to find.

  • In the area of developing-world aid, we’ve found many organizations with activities so diverse that it’s impossible for us, or for them, to provide any kind of bird’s-eye view of their activities.
  • Meanwhile, we’ve also seen very promising intervention categories that we can’t support simply because we can’t match them to strong, focused organizations. See our past discussion of community-led total sanitation; we have similar issues with salt iodization.
  • In more informal investigations into other causes, we’ve found a multitude of organizations that seem to act as “umbrellas” for a cause, seemingly doing “many things related to the cause” rather than pursuing narrower, targeted agendas. For an example, see our discussion of anti-cancer organizations.
  • For another example, see the organizations listed at Philanthropedia’s report on global warming, which are mostly not focused solely on specific anti-global-warming strategies, but rather extremely broad environmental organizations simultaneously carrying out all manner of global-warming-related activities (forest conservation, political advocacy, research into new energy sources and more), as well as non-global-warming-related activities such as endangered species protection.

Of course, it could make sense for an organization to have varied activities, if there are synergies between them and a clear strategy underlying them. But in all the cases discussed above, that doesn’t appear to be what’s happening. In fact, my impression from the conversations I’ve had with major funders is that most large organizations are essentially loose coalitions of separate offices and projects, some excellent, some poor. Two major funders have stated to me, off the record, that one major international nonprofit does great work in some areas but that they would never endorse a contribution to it. One has stated to me that (paraphrasing) “I don’t think about what organization to fund – it all comes down to which people are good, and people move around a lot.” From scrutinizing nearly any major funder’s list of grants, or from examining the work of the Center for High-Impact Philanthropy at University of Pennsylvania (which aims to advise larger donors), it seems clear that the typical approach of a major funder is to evaluate projects and people, not organizations.

Unfortunately, this attitude is somewhat self-fulfilling. As long as major funders treat organizations as contractors to carry out their projects of choice, organizations will remain loose coalitions; successful projects will be isolated events. We’ll see none of the gains that come with organization-level culture, knowledge and training built around core competencies. And people giving smaller amounts will have no way to know what they’re really giving to.

We’ve argued before that great organizations are born, not made. Rather than trying to wrench existing organizations into their preferred projects, we’d like to see more major funders trying to “birth” great organizations, so that there’s something left over when they move on.

Philanthropy vouchers

We focus on finding charities that are doing demonstrably good work already, rather than on proposals for new sorts of projects. This post is an exception: we’ve been tossing around an idea for “philanthropy vouchers” that we think could be worth trying in a broad variety of contexts, and we’re interested in others’ thoughts.

The idea is a variation of the “development vouchers” idea put forth by William Easterly in The White Man’s Burden (see page 330). Prof. Easterly proposes that official aid agencies co-create an independent “voucher fund,” and issue vouchers to people in developing countries that can be redeemed for money from the fund. The basic appeal of the idea is that, like cash handouts, it may shift the power and choice to the hands of the people we’re trying to help, rather than the hands of well-meaning outsiders at charities; but the two major concerns with cash handouts (fraud/manipulation by less poor locals and poor/irresponsible use of the money) could be mitigated by some basic regulations on what sorts of services the vouchers can be spent on.

While Prof. Easterly proposes a coordinated effort by major aid agencies, our proposal can be carried out at very small scale, unilaterally, by a single funder. The funder would simply issue a set amount in vouchers, set its own rules for how they could be redeemed, and set aside the necessary funds.

Specifically, to carry out a philanthropy vouchers program, a funder would do the following:

  1. Determine how much “money” it wanted to inject into a community in the form of vouchers.
  2. Form a definition of a “philanthropic organization,” i.e., an organization that would be eligible for collecting these vouchers from people and trading them to the funder for cash. This classification could be formed in a variety of ways: the funder might lay out a set of general criteria for “philanthropic” organizations and take applications for formal designation as “philanthropic,” with approved organizations’ getting the right to trade vouchers to the funder for cash; or it might do something as simple as accepting vouchers from any organization classified as a charity in its country of origin.
  3. Print vouchers and distribute them to the people in an area (trying to target those in need, but the targeting wouldn’t be as high-stakes as it is with cash).
  4. From there, any organization classified as “philanthropic” could offer its goods and services, and all such organizations would effectively be competing for the funds embedded in the vouchers.
  5. The funder would still be well advised to do its own monitoring and evaluation of how the program is going – in particular, spot-interviewing participants to ensure that vouchers were obtained through transparent and mutually consensual transactions

For a hypothetical example, consider an “alternative Millennium Villages” powered by philanthropy vouchers.

  • The funder would create a definition of “philanthropic organization” as any US-registered public charity, or local government agency, whose activities in the village consisted of providing or “selling” the following: vitamin and mineral supplements, health services, water, primary education, food meeting basic nutrition standards, or electricity. Organizations would apply to the funder for recognition as such an organization, a process that need not be nearly as involved as applying for direct funding. Organizations with other ideas for helping people, such as cellphones, could apply as well, and their status would be at the funder’s discretion.
  • The funder would print 5,000 vouchers for $50 each, and distribute them throughout a village of 5,000 with a rough goal of allocating one voucher per person (or N vouchers per family of N). (Assuming $50,000 in funder overhead, this would be equivalent in cost to Millennium Villages). Alternatively, the funder might allocate some of the vouchers to a “common fund” allocated through a voting procedure among villagers, in order to encourage the purchase of “public goods” such as well construction (though of course the villagers could also arrange such a “common fund” themselves, or simply choose to “pay” ala carte for water).
  • Nonprofits and government agencies would then hopefully offer services in attempts to win clients’ vouchers. If a nonprofit perceived that others were focusing excessively on farmer training as opposed to water, it could invest in providing water and hope to take in more revenue in vouchers than its costs.
  • With each voucher submitted to the funder, an organization might submit a brief description of what was provided in exchange for the voucher, and to whom; the funder would then perform “spot interviews” to see if these descriptions were confirmed by villagers.

Though the example given is for the developing world, I think the concept could as easily be used in poor communities in the U.S.

There would be many challenges involved in such a program. Tensions could arise between different “competing” organizations, and they may resort to misleading advertising or even coercion in order to win more vouchers. Vouchers wouldn’t be distributed perfectly fairly or evenly among participants. However, these issues could be monitored to some degree using spot interviews, and the concerns would be smaller than with a cash handout program. On the flip side, voucher revenues would provide strong indicators of which services people valued most and how that changed over time, and the actual services provided could adjust in real-time to these indicators. Incentives and possibilities for innovation and adaptation would likely be much greater than for a centrally planned project.

All in all, it seems to us like a project along these lines would be worth trying, hopefully accompanied (as with any pilot) by strong monitoring and evaluation. What do you think?

Invest in Kids

As part of our research into United States causes, we’ve been looking at Invest in Kids, an organization focused on implementing evidence-based programs in Colorado, and we recently had the chance to speak with Lisa Merlino, Invest in Kids’ Executive Director (edited transcript of our conversation (DOC)).

While our research is still in progress, we want to highlight some of things we really like about Invest in Kids:

  • Founding story. Invest in Kids was started in the late-90s by a group of mostly lawyers in Colorado who wanted to start an organization to help children in need. They considered their options and spoke with experts to identify programs with strong track records. Ultimately, they were convinced by the Nurse-Family Partnership’s strong evidence of effectiveness and decided to start an organization focused on implementing the evidence-based program. At that time, David Olds, NFP’s founder, was conducting the 3rd randomized-controlled trial of NFP’s model, and the NFP National Service Office (the NFP charity that GiveWell recommends) did not yet exist.
  • Ongoing program selection. After implementing NFP, Invest in Kids began looking for other evidence-based programs to implement. In 2003, they settled on the Incredible Years, another program that has been subject to rigorous evaluation. More recently, they participated in a clinical trial of the Good Behavior Game. According to Ms. Merlino, “This research trial was completed and although changes in child behavior trended in a positive direction, the preliminary data shows outcomes were not statistically significant for the children who received the intervention. Therefore, Invest in Kids has decided not to replicate the program at this time. However, anecdotally we heard powerful stories of improvement in teachers and children so we remain hopeful about the positive outcomes that may be seen from this intervention. We continue to await additional results from this and other trials around the country.”
  • Monitoring and evaluation. Ms. Merlino told us that they have ongoing monitoring of the programs they implement to assess whether the outcomes their programs achieve are in line with their expectations based on the research. Note that IIK has sent us these reports, but we haven’t yet had a chance to review them.

While our analysis of Invest in Kids is ongoing, we’re excited about them. Their general approach of looking to scale up what works should, in our view, serve as a model for other non-profit organizations. We’re looking forward to learning more about them over the next few months.

The Money for Good study

The Money for Good study’s headline finding is that “few donors do research before they give, and those that do look to the nonprofit itself to provide simple information about efficiency and effectiveness.”

That conclusion syncs up with our own experience talking to donors, but we aren’t discouraged by the results. That’s because where the Money for Good study answered the question “how do most donors behave?” we’re interested in answering a different question: is there a market for giving based on evidence of impact and how big is that market?

Hope Consulting shared their raw survey data with us, and we’ve done a rough estimate of the size of the potential “GiveWell market” by extrapolating the percentages in the survey to the size of the overall giving market. We estimate:

  • $4.1 billion from donors who report having done research to compare and evaluate multiple organizations (as opposed to researching a single organization or researching how much to give).
  • $3.8 billion if we further narrow the above set by looking at what factors are important to them, and eliminate any donors that rank what we consider “factors irrelevant to impact” (e.g., “ability to get involved with the organization” or “public recognition of my donation”) higher than what we consider “factors relevant to impact” (e.g., “organizational effectiveness”)
  • $554 million from donors who both did research to compare organizations (i.e., fit in the first group above) and reported that “amount of good organization is accomplish” was the most important piece of information sought in their research.

We still don’t have a great sense of the potential market size for GiveWell-style research, but it certainly hasn’t been established that the market is small.

Ultimately, we think it’s important to take the study’s conclusions with a grain of salt. If you polled all TV watchers on what they want, you’d conclude that only a very small percentage want something like The Wire, yet that show wasn’t exactly a failure. In fact, for most successful businesses I can think of, it’s still the case that most people aren’t customers of them.

Our goal isn’t to create a product that the majority of people like; it’s to create a product that some minority market loves. From what we’re seeing now, it’s still possible that the minority of donors interested in impact-focused research is quite large.