The GiveWell Blog

Meta-research

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

We previously laid out our working set of focus areas for GiveWell Labs. This post further elaborates on the cause of “meta-research” and explains why meta-research is currently a very high priority for us – it is our #2 highest-priority focus area, after global health and nutrition.

Meta-research refers to improving the incentives in the academic world, to bring them more in line with producing work of maximal benefit to society. Below, we discuss

  • Problems and potential solutions we perceive for (the incentives within) development economics, the area of academia we’re currently most familiar with.
  • Some preliminary thoughts on the potential of meta-research interventions in other fields, particularly medicine.
  • Why we find meta-research so promising and high-priority as a cause.
  • Our plans at the moment for investigating meta-research further.

Meta-research issues for development economics

Through our work in trying to find top charities, we’ve examined a fair amount of the literature on how Western aid might contribute to reducing poverty, which we broadly refer to in this post as “development economics.” In doing so, we’ve noticed – and discussed – multiple ways in which development economics appears to be falling short of its full potential to generate useful knowledge:

Lack of adequate measures against publication bias. We have written extensively about publication bias, which refers broadly to the tendency of studies to be biased toward drawing the “right” conclusions (the conclusions the author would like to believe in, the conclusions the overall peer community would like to believe in, etc.) Publication bias can come both from “data mining” (an author interprets the data in many different ways and publishes/highlights the ways that point to the “right” conclusions) and the “file drawer problem” (studies that do not find the “right” conclusions have more difficulty getting published).

Conceptually, publication bias seems to us like one of the most fundamental threats to academia’s producing useful knowledge – it is a force that pushes research to “find” what is already believed (or what people want to believe), rather than what is true, in a way that is difficult for the users of research to detect. The existing studies on publication bias suggest that it is a major problem. There are potential solutions to publication bias – particularly preregistration – that appear underutilized (we have seen next to no use of preregistration in development economics).

A funder recently forwarded us the following comment on a paper under review from a journal, which illustrates this problem:

Overall, I think the paper addresses very important research questions. The authors did well in trying to address issues of causality. But the lack of results has weakened the scope and the relevance of the paper. Unless the authors considerably generate new and positive results by looking say at more heterogeneous treatment effects, the paper cannot, in my view, be published in an academic journal such as the [journal in question].

Lack of open data and code, by which we mean the fact that academic authors rarely share the full details behind their calculations and claims. David Roodman wrote in 2010:

Not only do authors often keep their data and computer programs secret, but journals, whose job it is to assure quality, let them get away with it. For example, it took two relatively gargantuan efforts—Jonathan Morduch’s in the late 1990s, and mine (joining Jonathan) more recently—just to check the math in the Pitt and Khandker paper claiming that microcredit reduced poverty in Bangladesh. And it’s pretty clear now that the math was wrong.

The case he discusses turned out, in our opinion, to be an excellent illustration of the problems that can arise when authors do not share the full details of their calculations: a study was cited for years as some of the best available evidence regarding the impact of microfinance, but it ultimately turned out to be badly flawed, and later more rigorous studies contradicted its conclusions. (See our 2011 discussion of this case.)

Another example of the importance of open data was our 2011 uncovering of errors in a prominent cost-effectiveness estimate for deworming. This estimate had been public and cited since 2006, and it took us months of back-and-forth to obtain the full details behind it; at that point it turned out to contain multiple basic errors that caused it to be off by a factor of ~100.

The lack of open data is significant for reasons other than the difficulty of understanding and examining prominent findings. It also is significant because open data could be a public good for researchers; one data set could be used by many different researchers to generate multiple valuable findings. Currently, incentives to create such public goods seem weak.

Inadequate critical discussion and examination of prominent research results. The above two examples, in addition to illustrating open-data-related problems, illustrate another issue: it appears that there are few incentives within academia to critically examine and challenge others’ findings. And when critical examinations and challenges do occur, they can be difficult to find. Note that Roodman and Morduch’s critique (from the example above) was rejected by the journal that had published the study they were critiquing (the sole reviewer was the author of the critiqued study); as for the case of the DCP2 estimate, the critique came from GiveWell and has been published only on our blog (five years after the publication of the estimate).

Overall, our impression is that there is little incentive for academics to actively investigate and question each others’ findings, and that doing so is difficult due to the lack of open data (mentioned above).

Lack of replication. In addition to questioning the analysis of prominent studies, it would also be useful to replicate them: to try carrying out similar interventions, in similar contexts, and seeing whether similar results hold.

In the field of medicine, it is common for an intervention to be carried out in many different rigorous studies (for example, the literature on the effects of distributing insecticide-treated nets includes 22 different randomized controlled trials, and the programs executed are broadly similar though there are some differences). But in development economics, this practice is relatively rare.

More at a recent post by Berk Ozler.

General disconnect between “incentive to publish” and “incentive to contribute maximally to the stock of useful knowledge.” This point is vaguer, but we have heard it raised in multiple conversations with academics. In general, it seems that academics are encouraged to do a certain kind of work: work that results in frequent insights that can lead to publications. Other kinds of useful work may be under-rewarded:

  • Creating public goods for other researchers, such as public data sets (as discussed above)
  • Work whose main payoff is far in the future (for example, studies that take 20 years to generate the most important findings)
  • Studies that challenge widely held, fundamental assumptions in the field (and thus may have difficulty being published and cited despite having high value)
  • Studies whose findings are important from a policymaking or funding perspective, but not interesting (and thus difficult to publish) in terms of delivering surprising or generalizable new insights. For example, we have only been able to identify one randomized controlled trial of a program for improving rural point-of-source water quality, despite the popularity and importance of this type of intervention.

Potential interventions to address these issues.

We’ve had several conversations with academics and funders who work on development economics about how the above issues might be addressed. Most are directed at the specific problems we’ve listed above, though some are more generally in the category of “creating public goods for the research community as a whole.” Some of the more interesting ideas we’ve come across:

  • Funding efforts to promote the use of preregistration and data/code sharing, such as advocating that journals require these things of their publications (a journal might require preregistration and data/code sharing as a condition of publication) or that funders require these things of their grantees (a funder might require preregistration and data/code sharing from all funded studies).
  • Creating a “journal of good questions” – a journal that makes publication decisions on the basis of preregistered study plans rather than on the basis of results. The idea is to reward (with publication) good choices of topics and hypotheses and plans for investigating them, regardless of whether the results themselves turn out to be “interesting.” (We have previously discussed this idea.)
  • Funding a journal, or special issue of a journal, devoted to open-access data sets. Each data set would be accompanied by an explanation of its value and published as a “publication,” to be cited by any future publication drawing on that data set. This may improve incentives to create and publish useful open-access data sets, since scholars who did so well could end up publishing the data sets as papers and having them cited.
  • Funding the creation of large-scale, general-purpose open-access data sets. Currently, researchers generally collect data for the purpose of conducting a particular study; an effort that aimed specifically to create a public good might be better suited to maximizing the general usefulness of the collected data, and may be able to do so at greater scale than would be realistic for a data set aiming to answer a particular question. For example, one might fund a long-term effort to track a representative population in a particular developing country, randomly separating the population into a large “control group” and a set of “treatment groups” that could be treated with different interventions of general interest (cash transfers, scholarships, nutrition programs, etc.)
  • Funding a journal, or special issue of a journal, devoted to discussion, critiques, re-analyses, etc. of existing studies, in order to put more emphasis on – and give more reward to – this activity.
  • Funding awards for excellent public data sets and for excellent replicative studies, reanalysis, and other work that causes either confirmation or re-examination of earlier studies’ findings.
  • Creating a group that specializes in high-quality systematic reviews that summarize the evidence on a particular question, giving heavier weight to more credible studies (similar to the work of the Cochrane Collaboration, which we discuss more below). These reviews might make it easier for funders, policymakers, etc. to make sense of research, and would also provide incentives to researchers to conduct their studies in more credible ways (employing preregistration, data/code sharing, etc.)
  • Creating a web application for sharing, discussing, and rating papers (discussed previously).
  • Awards for the most useful and important research from a policymaker’s or funder’s perspective (these could take practices like data sharing and registration into account as inputs into the credibility of the research).
  • Promoting an “alternative/supplemental reputation system” for papers (and potentially academics) directly based on the value of research from a funder’s or policymaker’s perspective, taking practices like data sharing and registration into account as inputs into the credibility of the research.
  • Creating an organization dedicated to taking quick action to take advantage of “shocks” (natural disasters, policy changes, etc.) that may provide opportunities to test hypotheses. When a “shock” occurred, the organization could poll relevant academics on what the important questions are and what data should be collected, record the academics’ predictions, and fund the collection of relevant data.

Meta-research for other fields

We aren’t as familiar with most fields of research as we are with development economics. However, we have some preliminary reason to think that many fields in academia have a similar story to development economics: multiple issues that keep them short of reaching their full potential to generate useful knowledge, and substantial room for interventions that may improve matters.

  • We recently met with representatives of the Cochrane Collaboration, a group that does systematic reviews of medical literature. We have found Cochrane’s work to be valuable and high-quality, and we were surprised to be told that the U.S. Cochrane Center raises very little in the way of unrestricted funding. After talking to more people in the field, we have formed a preliminary impression that there is little funding available for medical initiatives that cut across biological categories, including the sort of work that Cochrane does (which we would characterize as “meta-research” in the sense that it works toward improved incentives and higher value-added for research in general). We will be further investigating the Cochrane Collaboration’s funding situation and writing more about it in the future.
  • Informal conversations have given me the impression that many of the problems described above – particularly lack of adequate measures against publication bias, lack of preregistration, lack of data/code sharing, and general misalignment between what academics have incentives to study and what would be most valuable – apply to many other fields within the natural and social sciences.
  • I’ve also heard of other problems and ideas that are specific to other fields. For example, a friend of mine in the field of computer science stated to me that
    • There are too few literature reviews in the field of computer science, summarizing what is known and what remains to be determined within a particular field. The literature reviews that do exist quickly become out of date. More up-to-date literature reviews would make it easier for people to contribute to fields without having to be at the right school (and thus in the right social network) for these fields.
    • There are some sub-fields in computer science that require testing different algorithms on data sets, such that the number of appropriate available data sets is highly limited. (For example, testing an algorithm for analyzing online social networks against a data set based on an actual online social network.) In practice, academics often design algorithms that are “over-fitted” to the data sets in use, such that their predictive power over new data sets is questionable. He proposed a set of centralized “canonical” data sets, each split into an “exploration” half and a “confirmation” half; while the “exploration” half would be open access, the “confirmation” half would be controlled by a central authority and academics would be able to test their algorithms on it only in a limited, controlled way (for example, perhaps each academic would be given 5 test runs per month). These data sets would constitute a public good making it easier to compare different academics’ algorithms in a meaningful way, both by reducing the risk of over-fitting and by bringing more standardization to the tests.

Overall, the conversations I’ve had about meta-research – even with people who aren’t carefully selected, such as personal friends – have resulted in an unusually high density of strong opinions and novel (to me) ideas for bringing about positive change.

Why we find meta-research promising as a cause

High potential impact. As we wrote previously, it seems to us that many of philanthropy’s most impressive success stories come from funding scientific research, and that meta-research could have a leveraged impact in the world of scientific research.

Seeming neglect by other funders. We see multiple preliminary signs that this area is neglected by other funders:

  • In examining what foundations work on today, we haven’t seen anyone who appears to have a focus on meta-research. We recently attended a funders’ meeting on promoting preregistration and got the same impression from that meeting.
  • As mentioned above, informal conversations seem to lead more quickly to “ideas for projects that could be worked on but aren’t currently being worked on” than conversations in other domains.
  • As mentioned above, we are surprised by the U.S. Cochrane Center’s apparent low level of funding and need for more funds, and feel that this may point to meta-research as a neglected area.

Good learning opportunities. We have identified funding scientific research as an important area for further investigation. We believe it is one of the most promising areas in philanthropy and also one of the areas that we know the least about. We believe that investigating the question, “In what ways does the world of academic research function suboptimally?” will lead naturally to a better understanding of how that world operates and where within it we are most likely to find overlooked giving opportunities.

Our plan for further investigation of meta-research as an issue area

We are pursuing the following paths of further investigation:

  • Further investigation of the Cochrane Collaboration, starting with conversations with potential funders about why it is having trouble attracting funding. We believe that the Cochrane Collaboration may turn out to be an excellent giving opportunity, and if it does, that this will provide further evidence that meta-research is a promising and under-invested-in cause; on the other hand, if we discover reasons to doubt Cochrane’s effectiveness or need for more funds, this will likely be highly educational in thinking about meta-research in general.
  • Conversations with academics about meta-research-related issues. Some of the key questions we have been asking and will continue to ask:
    • Are there any ways in which the academic system is falling short of its full potential to generate useful knowledge? What are they?
    • What could be done about them?
    • Who is working on the solutions to these problems? Who would be the logical people for a funder to work with on them?
    • Is there any research that you wish you could do but can’t get funded to do? Is there any research that you generally feel ought to be taking place and isn’t? If so, why is this happening?
    • Are there areas of research that you think is overdone or overinvested in? Why do you think this is?
    • What do you think of the ideas we’ve accumulated so far? To the extent that you find one or more to be good ideas, whom would you recommend working with to move forward on or further investigate them?
    • Whom else would you recommend speaking with?
  • Trying to get a bird’s-eye view of the world of academic research, i.e., a view of what the various fields are, how large they are (in terms of people and funding), and where the funding for them comes from. We hope that this bird’s-eye view will help us be more strategic about which fields best combine “high potential” with “major room for interventions to improve their value-added,” and thus to pick fields to focus on for meta-research in a more systematic manner than we’ve done so far.

Giving cash versus giving bednets

We recently published a new review of GiveDirectly, a “standout” charity that gives cash directly to poor people in Kenya. As we were going through the process of discussing and vetting the new review, I found myself wondering how I would defend my preference to donate to distribute insecticide-treated bednets (ITNs) against a serious advocate for cash transfers. We’ve written before about the theoretical appeal of giving out cash, and the fact that there is a promising charity doing so renews the question of whether we should.

I continue to worry about the potential “paternalism” of giving bednets rather than cash (i.e., the implication that donors are making decisions on behalf of recipients). I believe that by default, we should assume that recipients are best positioned to make their own decisions. However, I see a few reasons to think bednets can overcome this presumption:

  • The positive externalities of ITNs
  • The fact that bednets protect children rather than adults
  • The fact that ITNs may be unavailable in local markets or that people may reasonably expect to be given them for free.

I address each of these reasons in more depth below. Note, however, that this discussion is meant to be primarily about the theoretical question of giving cash versus giving bednets; a more practical discussion of giving to the Against Malaria Foundation versus giving to GiveDirectly would focus on the specifics of the two organizations.

The positive externalities of ITNs

We discussed the evidence that ITNs have benefits for community members other than those using the ITNs in our review of the evidence for ITNs. After speaking with several malaria scholars and reviewing the literature, we concluded:

  • The evidence for the efficacy of ITNs is based on studies of universal coverage programs, not targeted programs. In particular, all five studies relevant to the impact of ITNs on mortality involved distribution of ITNs to the community at large, not targeted coverage… Thus, there is little basis available for determining how the impact of ITNs divides between individual-level effects (protection of the person sleeping under the net, due to blockage of mosquitoes) and community-level effects (protection of everyone in communities where ITN coverage is high, due to reduction in the number of infected mosquitoes, caused either by mosquitoes’ being killed by insecticide or by mosquitoes’ becoming exhausted when they have trouble finding a host).
  • The people we spoke to all believe that the community-level effect of ITNs is likely to be a significant component of their effect, though none believe that this effect has been conclusively demonstrated or well quantified.
  • There is some empirical evidence suggesting that the community-level impact of ITNs is significant.

In our main model of the cost-effectiveness of distributing ITNs (XLS), we assumed that 50% of the benefits of ITNs come from the total community coverage of ITNs.

To the extent that ITNs have positive externalities, private actors may underinvest in them, meaning that it may be a good idea to distribute them freely even if individuals would choose not to purchase them at the available price. More generally, since we care about helping whole populations and not any particular specific individual, providing “public goods” of this sort amplifies our impact relative to giving the same amount of money to individuals.

Although it is conceptually possible that giving a large number of individuals small cash grants also has positive externalities, e.g. by boosting the local economy, we haven’t seen any evidence of this, and we doubt that the magnitude of the externality would be as large.

Bednets protect children rather than adults

One of the central reasons that I appreciate cash transfers is that they avoid paternalism. But sometimes, especially with regard to children, paternalism seems morally justifiable. I believe this is one of those cases.

Although AMF distributes ITNs universally, not just to children, the main benefits of ITNs—averting mortality—accrue to children under the age of 5. Children under the age of 5 lack bargaining power, income, and access to credit, not to mention the cognitive faculties to make decisions about their own long-term welfare. Accordingly, purchasing something that is reasonably likely to keep young children alive, even if they don’t or can’t decide to purchase it for themselves, seems to be a justifiable form of paternalism. In general, paternalism towards such young children is unobjectionable.

By distributing bednets, we might be spending money to benefit kids in a way that their parents wouldn’t spend it if we gave it to them instead. Given the magnitude of the benefits to the children, this seems to be justified.

People may not purchase ITNs because they are unavailable in local markets or because they expect to be given them for free

This point is more anecdotal, but Natalie, Holden and I remember being told while we were in Malawi that long-lasting insecticide-treated bednets, of the sort that AMF distributes, are essentially not available for purchase in local markets. Unfortunately this is not in our published notes (DOC) from the conversation where we recollect it occurring.

In another case, an RCT in Kenya in which researchers experimentally subsidized the cost of bednets (PDF), even very small increases in prices led to substantial reductions in bednet purchases by mothers (e.g. charging $0.60 led to a 60% reduction in take-up). Two different people told us in off-the-record conversations that they thought that this occurred because the mothers offered subsidized bednets believed that they would be able to acquire free nets at some other point. There have been periodic free ITN distributions in many sub-Saharan African countries over the last decade, and the international consensus seems to be that governments should distribute ITNs free of charge in malaria-endemic areas. Accordingly, it should not be especially surprising that citizens may expect bednets to be provided free of charge, and may not move to purchase them even if they are available at subsidized prices in the marketplace. If we reasonably expect to be given something for free in a relatively short time window, why buy it now?

This wouldn’t necessarily have been the case if philanthropy had never funded bednets, but having started down this path, I think it provides another consideration in favor of continuing. If we could credibly and cheaply communicate that no more bednets would be forthcoming, this consideration wouldn’t matter, but there is no obvious way to do so.

This is something to keep in mind in the future: philanthropic funding decisions may create an unanticipated a form of “lock-in,” in which future philanthropists become effectively committed to continued funding, even if it would not have been necessary in a counterfactual world of no philanthropic support. Although unlikely to be crucial, this consideration may counsel against certain undertaking some marginal philanthropic activities.

Conclusion

I think that in order to avoid paternalism, philanthropists working to improve the lives of the global poor should have fairly strong presumption in favor of cash transfers, and that those who advocate other strategies should have a convincing story to tell about why they beat cash. Above, I’ve tried to justify my view that bednet distributions are one of those philanthropic strategies that may beat cash. In searching for future top charities, I’d like to see a similarly strong case.

Update on Against Malaria Foundation’s costs

New cost estimates for AMF’s 2012 distributions

In a blog post in February, we noted that we had missed some costs in our estimates that were incurred by AMF’s distribution partner, Concern Universal. We undertook an assessment of these costs through discussion with Concern Universal.

In the course of our assessment, we re-visited our estimate of all other distribution costs as well, and decided that the most informative cost estimates for donors are 2012 projected distribution costs. The reason for this is that as of November 2011, AMF has shifted to larger-scale distributions which it will continue in 2012; these distributions are more cost-effective than previous distributions.

We have now calculated the 2012 projected costs. The total cost per net is lower than our previous estimate, even including the extra distribution partner costs mentioned above. We estimate a total cost of $5.54 per net for 2012 distributions, compared to a previous estimated total cost of $6.31 per net. This figure includes estimates of all costs incurred by all organizations participating in the distribution, including AMF, AMF’s distribution partners and local actors that work with AMF’s distribution partners.

The bulk of the change is due to the fact that AMF expects to distribute a million nets in 2012 – over twice the number it distributed in any previous year – while its organizational costs are likely to remain stable. Another contributor to the lower cost is that the equivalent cost of the donated services that AMF receives has decreased (both in the past year and projected for 2012). See our updated AMF review for full details.

We also calculated the marginal cost per net, which is projected to be $5.15 per net for 2012. The marginal net cost excludes AMF organizational costs, because we believe that these are unlikely to rise as additional nets are distributed (details in our updated AMF review.) The marginal cost per net is slightly higher than our previous estimate (which was about $5 per net), since it includes an extra $0.15 costs incurred by the distribution partners (for details on these costs, see below.)

Updated cost per life saved

Using the 2012 projected costs per LLIN, we estimate the cost per child life saved through an AMF LLIN distribution at about $1,600 using the marginal cost ($5.15 per LLIN) and about $1,700 using the total cost ($5.54 per LLIN).

See our spreadsheet analysis for details of our cost per life saved estimate.

Missing distribution partner costs

We have now gathered information on the missing costs from AMF’s distribution partner, Concern Universal. These missing costs have added an additional $0.15 per net. They consist of costs for salaries and office overhead that were incurred by both Concern Universal and by the Malawi government (which pays the salaries of health workers who assisted in the net distribution). Concern Universal did not initially tell us about these costs because they were costs that it incurred regardless of whether the distribution took place. However, we prefer to include all costs incurred to carry out a project, because we believe that this gives the best view of what it costs to achieve a particular impact (such as saving a life), and also avoids the lack of clarity and complications of leverage in charity.

Full details on these costs are available in our costs spreadsheet and our updated AMF review.

Millennium Villages Project

Several people have emailed us in the past few days asking about the new evaluation of the Millennium Villages Project (MVP), published in The Lancet last week. It has received significant attention in the development blogosphere (see, e.g., here, here, here, and here).

The evaluation argues that the MVP was responsible for a substantial drop in child mortality. However, we see a number of problems.

Summary

  • Even if the evaluation’s conclusions are taken at face value, insecticide-treated net distribution alone appears to account for 42% of the total effect on child mortality (though there is high uncertainty).
  • The MVP is much more expensive than insecticide-treated net distribution – around 45x on a per-person basis. Therefore, we believe that in order to make an argument that the MVP is the best available use of dollars, one must demonstrate effects far greater than those attained through distributing bednets. We believe the evaluation falls short on this front, and that the mortality averted by the MVP could have been averted at about 1/35th of the cost by simply distributing bednets. Note that the evaluation does not claim statistically significant impacts beyond health; all five of the reported statistically significant impacts are fairly closely connected to childhood mortality reduction.
  • There are a number of other issues with the evaluation, such that we believe the child mortality effect should not be taken at face value. We have substantial concerns about both selection bias and publication bias. In addition, a mathematical error, discovered by the World Bank’s Gabriel Demombynes and Espen Beer Prydz, overstates the reduction in child mortality, and the corrected effect appears similar to the reduction in child mortality for the countries as a whole that the MVP works in (though still greater than the reduction in mortality for the villages the MVP chose as comparisons for the evaluation). The MVP published a partial retraction with respect to this error (PDF) today.

We would guess that the MVP has some positive effects in the villages it works in – but for a project that costs as much per person as the MVP, that isn’t enough. We don’t believe the MVP has demonstrated cost-effective or sustainable benefits. We also don’t believe it has lived up (so far) to its hopes of being a “proof of concept” that can shed new light on debates over poverty.

Also see coverage of the Millennium Villages Project by David Barry, Michael Clemens, Lee Crawfurd, and Gabriel Demombynes and Espen Beer Prydz, much of which we’ve found helpful in thinking about the MVP and some of which we cite in this post.

Background

The Millennium Villages Project attempts to make significant progress towards achieving the Millennium Development Goals through a package of intensive interventions in 13 clusters of villages in rural Africa. It further aims to serve as a demonstration of the potential of integrated development efforts to cost-effectively improve lives in rural Africa. In its own words, the MVP states, “Millennium Villages are designed to demonstrate how the Millennium Development Goals can be met in rural Africa over 10 years through integrated, community-led development at very low cost.”

The drop in child mortality, and the comparison to insecticide-treated nets

The new evaluation concludes:

“Baseline levels of MDG-related spending averaged $27 per head, increasing to $116 by year 3 of which $25 was spent on health. After 3 years, reductions in poverty, food insecurity, stunting, and malaria parasitaemia were reported across nine Millennium Village sites. Access to improved water and sanitation increased, along with coverage for many maternal-child health interventions. Mortality rates in children younger than 5 years of age decreased by 22% in Millennium Village sites relative to baseline (absolute decrease 25 deaths per 1000 livebirths, p=0.015) and 32% relative to matched comparison sites (30 deaths per 1000 livebirths, p=0.033). The average annual rate of reduction of mortality in children younger than 5 years of age was three-times faster in Millennium Village sites than in the most recent 10-year national rural trends (7.8% vs 2.6%).”

In a later section, we question the size and robustness of this conclusion; here we argue that even taken at face value, it does not imply good cost-effectiveness for the MVP compared to insecticide-treated net distribution alone.

The MVP’s own accounting puts the cost per person served in the third year of treatment, including only field costs, at $116 (see the quote, above). Assuming linear ramp-up of the program, we take the average of baseline ($27/person) and third year ($116/person) spending and estimate that MVP spent roughly $72/person during the first three years of the project. Michael Clemens, has argued that their spending amounts to “roughly 100% of local income per capita.”

We should expect that amount of spending to make a difference in the short term, especially since some of it is going to cheap, proven interventions, like distributing bednets. In fact, it appears that the biggest and most robust impact of the 18 reported was increasing the usage of bednets.

The proportion of under-5 children sleeping under bednets in the MVP villages in year 3 was 36.7 percentage points higher than the proportion in the comparison villages. The Cochrane Review on bednet distribution estimates that “5.53 deaths [are] averted per 1000 children protected per year.” (See note.) If we assume that 80% of bednets distributed are used, the additional bednet usage rate (36.7 percentage points) found in MVP’s survey indicates that MVP’s program lead to 46 percentage points (36.7 / 80%) more villagers receiving bednets than did in the control villages. (Note that using a figure lower than 80% for usage would imply a higher impact of bednets because of the way the estimate works.) Therefore, we’d estimate that for every 1000 children living in an MVP village, the bednet portion of MVP’s program alone would be expected to save 2.54 lives per year ((5.53 lives saved per year / 1000 children who receive a bednet) * 0.46 additional children receiving a bednet per child in a MVP village). Said another way, the bednet effect of the MVP program would be expected to reduce a child’s chances of dying by his or her fifth birthday by roughly 1.27 percentage points (0.254% reduction in mortality per year over 5 years). The total reduction in under-five mortality observed in the evaluation was 3.05 percentage points (30.5 per 1000 live births). Thus the expected effect of increasing bednet usage in the villages accounts for 42% of the observed decrease in under-5 mortality, and is within the 95% confidence interval for the total under-5 mortality reduction. (We can’t say with 95% confidence that the true total effect of the MVP on child mortality is larger than just its effect due to increased bednet distribution.)

Insecticide-treated nets cost roughly $6.31 (including all costs) to distribute and cover an average of 1.8 people and last 2.22 years (according to our best estimates). That works out to about $1.58 per person per year. At $72 per person per year, the MVP costs about 45 times as much (on a per-person-per-year basis) as net distribution. Although we would expect bednets to achieve a smaller effect on mortality than MVP on a per-person-per-year basis, we estimate that the MVP could have attained the same mortality reduction at ~1/35 of the cost by simply distributing bednets (see our spreadsheet for details of the calculation).

If the MVP evaluation had shown other impressive impacts, then perhaps the higher costs would be well justified, but 3 of the 5 statistically significant results from the study are on bednet usage, malaria prevalance, and child mortality. (The other two are access to improved sanitation and skilled birth attendance, both of which would also be expected to manifest benefits in terms of reductions in under-5 mortality.) There were no statistically significant benefits in terms of poverty or education.

Other issues with the MVP’s evaluation

Lack of randomization in selecting treatment vs. comparison villages. The evaluation uses a comparison group of villages that were selected non-randomly at the time of follow-up, so many of the main conclusions of the evaluation are drawn based simply on comparing the status of the treated and non-treated villages in year 3 of the intervention, without controlling for potential initial differences between the two groups. If the control villages started at a lower baseline level and improved over time at exactly the same rate as the treatment villages, then the treatment would appear to have an impact equal to the initial difference, before the intervention began, between the the treatment and control groups, even though it actually had none. Even in cases in which baseline data is available from the control groups, it is possible that the group of villages selected as controls could improve more slowly than the treatment group for reasons having nothing to do with the treatment. Accordingly, there are strong structural reasons to regard the evaluation’s claims with skepticism.

Michael Clemens has written more about this issue here and here. We agree with his argument that the MVP could and seemingly should have randomized its selection of treatment vs. control villages instead, especially given its goal of serving as a proof of concept.

Publication bias concerns. The authors report 18 outcomes from the evaluation; results on 13 of them are statistically insignificant at the standard 95% confidence level (including all of the measures of poverty and education). Even if results were entirely random, we’d expect roughly one statistically significant result out of 18 comparisons. The authors find five statistically significant results, which implies that the results are unlikely to be just due to chance, but they could have explicitly addressed the fact that they checked a number of hypotheses and performed statistical adjustments for this fact, which would have increased our confidence in their results. The authors did register the study with ClinicalTrials.gov, but the protocol was first submitted in May 2010, long after the data had been collected for this study.

We also note that the registration lists 22 outcomes, but the authors only report results for 18 in the paper. They explain the discrepancy as follows: “The outcome of antimalarial treatment for children younger than 5 years of age was excluded because new WHO guidelines for rapid testing and treatment at the household level invalidate questions used to construct this indicator. Questions on exclusive breast-feeding, the introduction of complementary feeding, and appropriate pneumonia treatment were not captured in our year 3 assessments.” But this only accounts for three of the four missing outcomes. This does not explain why the authors do not report results for mid-upper arm circumference (a measure of malnutrition), which the ClinicalTrials.gov protocol said they would collect.

Mathematical error in estimating the magnitude of the child-mortality drop.

Note: the MVP published a partial retraction with respect to this error (PDF) today.

At the World Bank’s Development Impact Blog, Gabriel Demombynes and Espen Beer Prydz point out a mathematical error in the evaluation’s claim that “The average annual rate of reduction of mortality in children younger than 5 years of age was three-times faster in Millennium Village sites than in the most recent 10-year national rural trends (7.8% vs 2.6%).”

Essentially, they used the wrong time frame in calculating the decline in Millennium Villages: to estimate the per-year decline in childhood mortality, they divided the difference in the average childhood mortality during the treatment period (3 years long) and the previous 5 year baseline period by three, to try to get the annual decline. As Demombynes and Prydz point out, however, this mistakenly assumes that the time difference between the 3 year average and the 5 year average is 3 years, when it is in fact 4 years:

[When we originally published this post in 2012, we included a link here to an image stored on a World Bank web server. In 2020, we learned that this image link was broken and were unable to successfully replace it. We apologize for the omission of this image.]

This shifts the annual decline in child mortality from 7.8% to 5.9% (though see Dave Barry and Michael Clemens’ comments here for more discussion of the assumptions behind these calculations).

The adjusted figure for child mortality improvement is no better for the MVP villages than for national trends. Demombynes and Prydz go on to argue that using a more appropriate and up-to-date data set for the national trends in childhood mortality get an average trend of -6.4% a year, better than in the Millennium Villages, and that the average reductions in rural areas are even higher.

Note, however, that this argument is saying that the comparison group in the study is not representative of the broader trend, not that the Millennium Villages did not improve relative to the comparison group.

Conclusion

The Millennium Villages Project is a large, multi-sectoral, long-term set of interventions. The new evaluation suggests, though it does not prove, that the MVP is making progress in reducing childhood mortality, but at great cost. It does not provide any evidence that the MVP is reducing poverty or improving education, its other main goals. These results from the first three years of implementation, if taken seriously, are discouraging. The primary benefits of the intervention so far–reductions in childhood mortality–could have been achieved at much lower costs by simply distributing bednets.

Note: the Cochrane estimate of 5.53 deaths averted per 1,000 children protected per year does not assume perfect usage. Our examination of the studies that went in to the Cochrane estimate found that most studies report usage rates in the range of 60-80%, though some report 90%+ usage.

GiveWell Labs update and priority causes

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

Over the past few months, the main focus of GiveWell Labs has been strategic cause selection. Before diving into a particular cause, we want to make sure we’ve done a reasonable amount of work looking at all our options and picking our causes strategically.

We’ve published our take on what information we can find on philanthropy’s past successes and our observations on what foundations work on today (both with spreadsheets so others can examine our data), and we’ve published our framework for identifying a good cause. With these in mind, this post lists causes we’re planning to focus on over the short term.

We are not at all confident that these causes represent the most promising ones; we see our list of priority causes as a starting point for learning. By publishing our reasoning, along with all data we’ve used, we hope to elicit feedback at this early stage; in the course of investigating our priority causes, we expect to learn more about these causes and about the best way to choose causes in general. And we have prioritized our causes partly based on the potential for learning, not just based on how promising we would guess that they are. Also note that these causes do not represent restrictions – we will consider outstanding giving opportunities in any category – but rather areas of focus for investigation.

We currently believe that no established philanthropist engages in strategic cause selection – the practice of listing all the causes one might work on, and choosing them based on a combination of “potential impact” and “underinvestment by other philanthropists.” (This is not to say that no established philanthropist picks good causes – we believe many have picked excellent causes, perhaps through more implicit “strategy” – it is just to say that we know of no established philanthropist applying the sort of explicit strategic selection we envision.) So we believe we are in uncharted territory; thus, we expect to hit a fair amount of dead ends and to do a lot of revision and learning, but we also hope that strategic cause selection will eventually become a valuable tool for having maximal impact with one’s giving.

Summary of our priority causes (details follow):

  • Global health and nutrition is an area we know well and believe has many good giving opportunities. It is our current top priority. We seek to find more opportunities for donors along the lines of our top charities; we also seek to learn from existing foundations about the best higher-risk projects they are unable to fund.
  • Funding scientific research is a good conceptual fit for philanthropy, accounts for many of philanthropy’s most impressive success stories, and may provide bang-for-the-buck as good as or better than global health and nutrition.
  • Meta-research is our term for trying to improve the systematic incentives that academic researchers face, to bring them more in line with producing maximally useful work. We believe there is substantial room for improvement in this alignment, and that this cause is therefore promising as a high-leverage way to get the benefits of funding research; current philanthropic attention to this cause appears very low.
  • Averting and preparing for global catastrophic risks (GCRs) including climate change is a good conceptual fit for philanthropy and may provide bang-for-the-buck as good as or better than global health and nutrition. Today’s philanthropy appears to invest moderately in climate change, but very little in other GCRs.

We also briefly discuss popular causes that we aren’t currently prioritizing.

Top-priority causes

Global health and nutrition
Based on our past work seeking outstanding charities, we feel that global health and nutrition is the strongest area within the category of “directly helping the disadvantaged.” It’s also an area that we know fairly well (again, because of our past work), so we expect to be able to find strong giving opportunities more quickly here than in areas we’re less familiar with. Because of this, global health and nutrition is our top priority for GiveWell Labs.

Our plans:

  • As discussed at our 2011 research outline, we are investigating the idea of restricted funding to large organizations in order to fund proven, cost-effective interventions that we can’t fund otherwise. Our goal here would be to, in a sense, “create new top charities” – create funding vehicles that allow individual donors to deliver proven, cost-effective health and nutrition interventions. (One could think of this project as trying to create an “AMF for vaccines, nutrition, or other promising intervention.”)
  • We are also interested in higher-risk, higher-upside projects within this area. We are aware of some major foundations that pursue these sorts of opportunities and have more investigative capacity and relevant background than we do. So our ideal would be to leverage these foundations’ investigative work, by working with them to identify the best giving opportunities that they have sourced but cannot fully fund. We are currently looking into the possibility of doing this. If it proves unworkable, we may seek other ways to investigate high-risk, high-upside opportunities in this area.

Funding scientific research
As discussed previously, we believe many of the most impressive “success stories” in the history of philanthropy are in the category of funding research, particularly biomedical research. We also find research funding to be a good conceptual fit for philanthropy, as well as something that could plausibly get better “bang for the buck” than global health and nutrition interventions (since it involves creating global public goods – once developed, a new insight can be applied on a global scale and potentially for a long time).

In philanthropy currently, it appears that biomedical research is a moderately popular area, while natural sciences are less popular but still have some philanthropic presence. Of course, much of the funding for (early-stage) research comes via government and/or university money, but we hypothesize that philanthropy may be able to play a special role in supplementing these systems, by specifically aiming to support the kind of work that the traditional academic system and government funders cannot or will not. (We believe that there may be ways in which the traditional system falls short of maximum value-added, as discussed in the next section.) When we look at the activities of current philanthropic players (see our notes on the biomedical research activities of the top 100 foundations), it seems possible to us that relatively few of these players are specifically looking to supplement or improve on the government and university systems (by contrast, we believe that many efforts within U.S. education and global health seek to improve on and contrast with government programs in these areas).

So we see funding research as a potentially high-impact area, and we’re especially interested in the possibility of opportunities that the government/university systems systematically underfund. In addition, funding research is fundamentally different from the sort of direct-aid-oriented work we’ve focused on in the past, and we feel that investigating it will be an important learning experience.

Our next steps will be to

  • Seek out conversations with the major foundations that fund scientific research
  • Ask researchers about under-invested-in opportunities, while conducting “meta-research” conversations (see next section)

Meta-research
In the course of our research on outstanding charities, we’ve come to the working conclusion that academic research – at least on topics relevant to us – is falling far short of its maximum value-added to society, largely due to problematic incentives. We laid out some of our views last year in Suggestions for the Social Sciences; we also think that GiveWell Board member Tim Ogden’s recent SSIR piece is worth reading on this topic.

In brief, we believe that (a) academic incentives do not appear fully aligned with what would be most useful (for example, replicating studies is highly useful but does not appear to be popular in academia); (b) academics rarely engage in practices – such as preregistration, and sharing of data and code – that could make their research easier for outsiders to evaluate and use in decisionmaking; (c) too much academic research is restricted to pay-access to journals, rather than being in a format and place that would allow maximum accessibility. Based on informal conversations, we believe these issues are present across academia generally, not just in the areas we’ve examined, though we intend to investigate more.

We have seen some philanthropy focused on (c). Two of the 82 foundations we’ve examined have program areas that we’ve categorized as “scholarship and open access”; the Wellcome Trust in the UK is also pushing for open access. However, we’re not aware of any foundation making a concerted push to improve (a) and (b), aligning academic incentives with what would be most useful to society.

As discussed in the previous section, we think of research as a highly promising and important area for philanthropy, based both on history and on the conceptual possibility of impact-per-dollar-spent. If problematic incentives are causing academic research to systematically fall short of its maximum potential value-added to society, investments in meta-research could have highly leveraged impact. That’s sufficient to think that this cause has some potential; the fact that it appears to be largely absent from today’s large-scale philanthropy increases its appeal.

We will write more in the future about our plans for investigating meta-research, which overlap strongly with our plans for investigating direct funding of research (the previous section). We are aiming to speak to a broad range of academics about whether, and how, the work being done in their fields – and the general practices of their field – diverge from what would add maximum value to society.

Global catastrophic risks (GCRs), including climate change
Foundations work to address a variety of threats – such as climate change, nuclear weapons proliferation, and bioterrorism – that could conceivably lead to major global catastrophes.

We see this work as an excellent conceptual fit for philanthropy, because the potential catastrophes are so far-reaching that it is hard to articulate any other actor that has good incentives to invest sufficiently in preparing for and averting them. (Governments do have some incentives to avert catastrophic risks, but catastrophic risk preparation has no natural “interest groups” to lobby for it, and it is easy to imagine that governments may not invest sufficiently or efficiently.) As with research, we find it plausible that opportunities in this area could have good “bang for the buck” relative to international aid, simply because they seek to avert such large catastrophes.

In philanthropy currently, working on climate change is moderately popular, but work on other risks is extremely rare. Out of 82 foundations we examined, two work on nuclear non-proliferation and one works on biological threats; none work on other potential threats.

One concern about this area is that gauging the success or failure of projects seems extremely difficult to do, even in a proximate way, because projects are so focused on low-probability events.

We are currently reviewing the literature on climate change and will be posting more in the future. We are also advising Nick Beckstead and a few volunteers from Giving What We Can as they collect information on the organizations working on GCRs other than climate change.

A note on policy advocacy
A long-term goal of ours is to learn more about policy advocacy, which is a general philanthropic tactic (an option for funding in almost any cause) that we know very little about. For the near future, we do not plan on recommending any policy advocacy funding; we plan on allocating small amounts of time to conversations with people in the space to learn more about how it works in general.

Popular causes we don’t plan to prioritize
Our survey of the current state of philanthropy highlighted the following as particularly popular causes that aren’t listed above. We will be writing more about them; for now, we provide very brief thoughts and relevant links to some work we’ve done in the past.

  • U.S./developed-world education: we perceive this as perhaps the most popular cause in philanthropy today. Many major foundations and philanthropists are working on it, and have worked on it in the past, yet progress seems slow on achieving – and rolling out – evidence-backed ways to improve educational outcomes. For more, see our report on U.S. charities.
  • U.S. poverty alleviation (including health care): we see a lot of philanthropy focused in these areas today, yet we believe the bang-for-the-buck is poor relative to international aid. For more, see our report on U.S. charities, Your Dollar Goes Further Overseas, Poor in the U.S. = rich, and Hunger Here vs. Hunger There.
  • Arts and culture. We don’t see GiveWell as having much potential value-added in this area. (We’ll be elaborating in a future post.)
  • Animal welfare; environmental conservation (not including climate change-related work). Current GiveWell staff are primarily interested in humanitarian giving, and we don’t see these areas as being directly enough connected to humanitarian values to merit a high priority. At one point we advised a volunteer who did some work investigating animal welfare charities, and we may later discuss this work.
  • Funding social entrepreneurs and social enterprise. We do not find this area promising; we will be writing about it more in the future. Also see Acumen Fund and Social Enterprise Investment and When Donations and Profits Meet, Beware.
  • Developing-world aid outside of health and nutrition. From what we’ve seen so far, health and nutrition are the most promising areas within developing-world aid. However, we remain open on this point, and are certainly more interested in this area than in the other areas listed in this section. We’re particularly interested in learning more about the “transparency/accountability/democracy” sector, which is moderately popular among today’s foundations and which we currently know very little about. Also see our writeups on microfinance, developing-world economic empowerment, disaster relief, agriculture, and education (as well as our summary of why we prefer global health and nutrition).

What large-scale philanthropy focuses on today

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

We think there are two key questions for someone trying to do strategic cause selection: (1) What is the history of philanthropy – what’s worked and what hasn’t? (2) What is the current state of philanthropy – what are philanthropists focused on and what might they be overlooking?

We started to answer (1) in our discussion of foundation “success stories.” This post addresses (2). We first discuss the data sets we have used, which we are making publicly available and linking from this post. We then make some observations from these data sets.

The data sets we’ve used

  • Dollar allocation data. The Foundation Center maintains a database of grant amounts, dates, descriptions and more for over 100,000 foundations (over 2.4 million grants). It also tags these grants by category in ways that we’ve found helpful. The Foundation Center provided us with a breakdown by category of 2009-2010 grants that it had selected as an efficient representative sample, totaling about $20 billion, which would be equivalent to about half of 2010 foundation giving according to the Foundation Center). We went through the 923 categories provided by the Foundation Center and applied our own tags to these categories, resulting in a breakdown of spending by 33 “GiveWell categories” (106 total subcategories). When we were unclear on the nature of a Foundation Center category (or simply found one interesting), we pulled the top 100 grants for that category using our paid subscription to Foundation Directory Online.”GiveWell categories” simply refers to a set of tags we created, because we found it to be helpful in thinking about the breakdown of giving from our perspective. When we discuss dollar allocations to different categories in this post, we are referring to “GiveWell categories” and not to the categories maintained by the Foundation Center. There may be some cases in which GiveWell defines a term differently from the Foundation Center, meaning that our figure for that term will be different from what the Foundation Center publishes (for example, we break out “museums” as a separate category from “arts and culture,” so the figure we would give for foundation spending on “arts and culture” is different from the figure the Foundation Center would give). This does not mean that there is actually a contradiction between our data and Foundation Center’s; we are using Foundation Center’s data and consider their reported funding allocations to be correct according to their term definitions.

    We provide a spreadsheet that includes both the data provided directly to us by Foundation Center (“FDO categories”) and the breakdown according to our own category definitions (“GiveWell categories”). It also makes it possible to see exactly how we defined “GiveWell categories” and thus how these might be different from “FDO categories.”

    Dollar allocation data (XLS)
  • Data from the top 100 foundations’ websites, compiled by Victoria Dimond (GiveWell volunteer) and Good Ventures, which has been working closely with GiveWell on GiveWell Labs. Victoria and Good Ventures visited the websites of the top 100 independent foundations in the U.S. (we generated this list using Foundation Center data; we found sufficiently informative websites for 82 of the 100) and created a spreadsheet with the names and descriptions of their program areas and sub-program areas. We then created summary sheets that rank program area types based on how many foundations work on them, and rank foundations by their “unusualness” (the extent to which they work on program areas that few other foundations work on).
    Program Areas for Top 100 U.S. Foundations (XLS)

In categorizing giving for both of these, we deliberately used categories tailored to our own interests (rather than trying to come up with a universally useful taxonomy). For example, since we have pretty well-defined views on the best ways to help the disadvantaged, we tended to lump many different things together under headings such as “Helping the disadvantaged” or “U.S. poverty” (this includes human services, youth development services, and more). By contrast, we tended to separate out any kind of work we found particularly interesting. So if you are seeking a picture of how foundations give for your own purposes, you may consider going back to the raw data (which we provide in the files linked above) and creating your own categories.

Our observations
Popular areas (according to GiveWell’s taxonomy)

Highly popular areas include:

  • U.S. education (K12/preschool) – 46 of 82 foundations in the “top 100 foundations” set list this as a program area; it accounts for over 7% of giving (in dollar terms) according to dollar allocation data.
  • U.S. higher education (scholarships, increasing access to higher education, or general/capital support) – 25 of 82 foundations, around 8% of giving according to dollar allocation data (the latter is harder to interpret on this point since it may include other activities within higher education).
  • U.S. poverty alleviation – 42 of 82 foundations, ~ 5% of giving according to dollar allocation data (this figure was obtained by adding human services and youth development, both of which appear primarily focused on the U.S.; other areas should also be partially counted, but they are a mix of international and U.S. giving) according to dollar allocation data.
  • Arts & culture: 30 of 82 foundations; ~5% of total giving according to dollar allocation data.
  • Environment (conservation): 25 of 82 foundations, ~4% of total giving according to dollar allocation data.
  • Health care and biomedical research funding (including support of hospitals): 17 of 82 foundations work on health care delivery and 14 of 82 work on biomedical research. This category (in which research and delivery can be difficult to separate) accounts for ~20% of total giving according to dollar allocation data.
  • Climate change and/or energy: 14 of 82 foundations work in these areas, though they account for only ~1% of total giving according to dollar allocation data.

This set of areas accounts for about half of all of the giving in the dollar allocation data (and much of what remains is difficult to categorize). It includes every area that is listed by 9 or more of the 82 foundations we examined.

International causes

Causes focused on helping other countries – or international relations – appear less common than the above causes, but are still fairly common. Each of the following are included in the work of 8-9 of the 82 foundations we examined:

  • Developing-world poverty
  • Developing-world health
  • Developing-world transparency/accountability/democracy
  • Foreign policy analysis

Total “international affairs” tagged giving is around 3% of all giving (in dollar terms) according to dollar allocation data, though this includes many international-aid grants that may be tagged as university support (for relevant research), health, agriculture, etc.

While we’ve done substantial investigation into the first two causes listed above, the second two have largely not been on our radar. Some of the largest foundations emphasize their work in these areas.

Less popular causes (according to GiveWell’s taxonomy)

Among the causes that are less popular, we find the following particularly interesting (not necessarily promising, but worth noting for later discussion). Here we focus on the “top 100 foundations” set since less-popular causes like this are difficult to isolate in dollar allocation data.

  • Natural sciences and mathematics, excluding biomedical sciences – 7 of 82 foundations list program areas in this category.
  • Immigration (advocacy and integration) – 4 foundations.
  • Promoting specific topics in higher education – 4 foundations. (We note that many of philanthropy’s putative success stories are in this category.)
  • Developing-world education – 3 foundations.
  • Reproductive health/rights – 3 foundations.
  • Social entrepreneurship – 3 foundations.
  • Mitigation/prevention of global catastrophic risks other than climate change. 2 foundations focus on nuclear nonproliferation, while one focuses on biological threats; the total giving for this category according to dollar allocation data is 0.1% of all giving dollars.
  • Scholarship and open access – 2 foundations.
  • Education and technology – 2 foundations.
  • Information access (cellphones, Internet) – 2 foundations.
  • Social sciences – 2 foundations.
  • Disease surveillance – 1 foundation.