The GiveWell Blog

New Cochrane review of the effectiveness of deworming

Update 07/20/12: Miguel and Kremer (and others) have responded to the characterization of their 2004 study by the updated Cochrane review here. We find many of their responses to the Cochrane authors’ objections (which are distinct from our reservations) persuasive, especially regarding attrition and sample selection in the haemoglobin data and baseline school attendance data. As we wrote last week, the Baird et al. 2011 follow-up to Miguel and Kremer 2004 remains especially important to our view on deworming; neither the updated Cochrane review nor the author’s response has changed that.

On Wednesday, the Cochrane Collaboration published a new systematic review of the effectiveness of deworming drugs in improving nutritional status, school performance, and cognitive test scores.

The new Cochrane review of deworming to kill soil-transmitted intestinal worms (STHs) finds almost no evidence of benefits on nutrition, cognitive development, or school performance in mass deworming studies, and small benefits on nutrition in small, screened studies; this is largely the same conclusion as the older Cochrane review, though the new one is updated with more studies and a persuasive response to criticisms. It excludes studies that treat both STHs and schistosomiasis, which is what the Schistosomiasis Control Initiative does, so it does not directly affect our assessment of them. However, the new review reinforces our skepticism about the quality of much of the evidence supporting deworming, and strengthens our view that the evidence in favor of distributing bednets is stronger. Accordingly, SCI continues to hold our #2 rating. We plan to continue to investigate the papers that are most crucial to our assessment of the benefits of deworming.

In a nutshell, the new Cochrane review does not directly challenge the case for SCI as our #2 charity, though we have somewhat less confidence than we did.

In the remainder of this post, we:

 

The new Cochrane review on STH deworming

In the new Cochrane review on STH deworming, Taylor-Robinson et al. examine randomized controlled trials (RCTs) of deworming to address soil-transmitted intestinal worms (STHs), looking at impacts on nutrition, cognitive skills, and educational outcomes. Excluding studies that treated both STHs and schistosomiasis, they find surprisingly limited evidence of nutritional benefits, and very little support for cognitive or educational benefits.

In particular, they find that:

  • in mass deworming programs that treated everyone without testing them first, there is no consistent evidence for any effect on nutrition, cognitive performance, or school performance (more);
  • in small pilot programs that screened for the presence of worms prior to treatment, treatment was associated with increased weight and haemoglobin, which implies a reduction in anemia (more).

The previous Cochrane review of STH deworming, also by Taylor-Robinson et al., reached many similar conclusions, but we believe the new one to be more robust (more). The older review did not separate out studies that screened for worm infection and look at their effects separately, as the new review does. Doing so sharpens our take on the evidence, without fundamentally changing the picture.

We write more below about how this affects our take on SCI, but it is worth noting that the new systematic review might affect our likelihood of recommending Deworm the World, another deworming charity that we have been investigating. Unlike SCI, which conducts combination deworming, we believe that Deworm the World does some STH-only deworming.

Changes since the last Cochrane review and response to critics

The new review differs from the previous Cochrane review of STH deworming in several ways. Most importantly, from our perspective:

  • it incorporates many additional studies, including more studies focused on haemoglobin/ anemia and Miguel and Kremer 2004, which was previously excluded;
  • it stratifies mass deworming studies by the prevalence of infections, so it can determine whether effects are consistently larger in higher-prevalence studies; and
  • it distinguishes between mass and screened deworming programs.

The new review also differs from several systematic reviews–Hall 2008, Albonico 2008, and Gulani 2007–that have been published since the last major update to the Cochrane review, all of which found statistically significant benefits to deworming.

Some of the changes since the last review were undertaken in order to respond to criticisms from deworming scholars. Taylor-Robinson et al. write:

Critics of a previous version of this review (Dickson 2000a) stated that the impact must be considered stratified by the intensity of the infection (Cooper 2000; Savioli 2000). We have done this comprehensively in this edition and no clear pattern of effect has emerged….

Other advocates of deworming, such as Bundy 2009, have argued that many of the underlying trials of deworming suffer from three critical methodological problems: treatment externalities in dynamic infection systems, inadequate measurement of cognitive outcomes and school attendance, and sample attrition. We agree with these points. However, externalities will be detected by large cluster-RCTs with a year or more follow up, and there are now five trials such as this included in this review.

We find these responses from Taylor-Robinson et al. compelling and we believe the new review to be a significant improvement over the older Cochrane review of deworming.

The new review’s take on mass-deworming programs

Unlike screened programs, mass deworming programs treat everyone with deworming drugs without testing whether they have a worm infection first (because doing so is costly relative to the price of the deworming drugs). The new Cochrane review finds that there is little evidence from studies of mass deworming programs to show that they improve nutrition, cognitive performance, or school outcomes.

Two studies in one location in Kenya with extremely high worm prevalence found that a single deworming treatment caused weight gain, but seven more studies in different areas found no effect, and larger studies with multiple doses were even more inconclusive: two found large and significant results, while ten others found small statistically insignificant results (pgs 19-21). There is essentially no evidence from studies of mass STH deworming to show that it improves haemoglobin status, height, cognitive test scores, or school performance; the evidence for an improvement in school attendance comes solely from the Miguel and Kremer 2004 study, with the other unscreened RCT finding no improvement in attendance. (See our update about this study above).

The older Cochrane review on STH deworming, which we wrote about in our intervention report on deworming, did not distinguish as sharply between mass and screened programs. Though a sensitivity analysis in the old review that focused on mass studies found no significant effect on weight, the main analysis found a small statistically significant benefit by combining screened and mass studies. The new review continues to find that mass deworming has no statistically significant benefit on weight, but it differs from the older review in that it foregrounds this result.

The new Cochrane review also includes haemoglobin status as a main outcome for the first time. It is the first systematic review we’ve seen that distinguishes between the haemoglobin outcomes of mass and screened deworming, finding no statistically significant effect of mass STH deworming.

The new review’s take on smaller programs that screened for worm infection

Despite finding little evidence from mass deworming studies to support deworming, the new Cochrane review does find some evidence from randomized controlled trials to indicate that STH deworming improves nutrition in programs that screen for worm infections (i.e. only give deworming drugs to infected people).

In three small RCTs with a total of 149 participants who were screened for STH infections prior to participation, deworming pills caused a statistically significant increase in weight of about .6 kilograms. In a few other small screened RCTs, deworming statistically significantly improved mid-upper arm circumference and skin fold thickness; similar studies found no effect on height, body mass index, or school attendance. Two screened RCTs with a total of 108 participants found that treating STH infections causes a statistically significant increase in haemoglobin of 3.7g/L (which implies a reduction in anemia).

What does it mean if smaller programs with screened participants show effects, while larger programs of mass deworming do not? One possibility is that STH deworming does have some impact on nutrition in infected individuals, but that the effect is too small to pick up in unscreened population studies. Another possibility is that the effects seen in smaller programs are spurious. The Cochrane review highlights the latter possibility, stating that “the data on targeted deworming is limited (three small trials, n = 149); the quality of the evidence is ’moderate’ for weight and ’low’ for haemoglobin.” (The Cochrane review also points to a third possibility: “the intervention itself is different … having been screened, and then told they have worms, children are more likely to comply with treatment, and alter their behaviour.” We find this possibility least likely.)

The overall quality of deworming research: publication bias, data-mining, and representativeness
One of our big take-aways from the Taylor-Robinson et al. review is that we should be really worried about publication bias, data-mining, and the representativeness of the research we rely on.

Publication bias

The best example of publication bias comes from the DEVTA study of deworming and Vitamin A supplementation, conducted on a population of more than a million children in Lucknow, India from 1999 to 2004, which remains unpublished to this day. We had already been aware of DEVTA from our research on Vitamin A supplementation, but the particulars of Taylor-Robinson et al.’s correspondence with the authors are new to us:

DEVTA: the world’s largest ever RCT, which includes over a million children randomized in a cluster design with mortality as the primary outcome, remains unpublished six years after completion. We have corresponded with the senior author on several occasions. We also wrote a letter to the Lancet in June 2011, asking for publication of this important study. When this letter was accepted, the authors submitted the manuscript to the Lancet within a week, and we withdrew our letter. However, at the time of writing (June 2012) the paper remains unpublished.

Results presented at a conference in 2007 (PPT) indicate that compliance was high but that the treatment did not cause a statistically significant reduction in mortality. Combining this results with other studies of Vitamin A, there still appears to be an effect on mortality, but the lack of formal publication means that the international consensus continues to overestimate the impact of Vitamin A on mortality.

We don’t think that STH deworming prevents a significant number of deaths, so whatever the impact of the deworming branch of the treatment in DEVTA on child mortality turns out to be is unlikely to affect our assessment of deworming. However, the fact that such a large and important study remains unpublished eight years after the trial was completed and five years after a conference presentation conveying the key results speaks to the power of publication bias.

Data mining

More generally, Taylor-Robinson et al. make it clear that studies have looked for potential impacts of deworming on a large number of different outcomes. (I count more than ten—weight, height, mid-upper arm circumference, skin-fold thickness, body mass index, measures of physical exertion like the Harvard Step Test, hemoglobin status, school attendance, school persistence, school exam performance, and cognitive test scores—with many potential sub-categories and measures each.) With so many different outcomes measured and little theoretical basis for determining which results are genuine, the potential for spurious results seems large, especially for outcomes which have been measured in only a few studies. (This would be a form of data-mining, and seems to have played a role in the previous systematic reviews that did find significant results.)

Representativeness

Taylor-Robinson et al. point to an additional concern about representativeness, which, while not really fitting the rubric of data-mining and publication bias, raises the specter of a set of rigorous research results that nonetheless don’t translate into practice. They write:

Evidence of benefit of deworming on nutrition appears to depend on three studies, all conducted more than 15 years ago, with two from the same area of Kenya where nearly all children were infected with worms and worm burdens were high. Later and much larger studies have failed to demonstrate the same effects. It may be that over time the intensity of infection has declined, and that the results from these few trials are simply not applicable to contemporary populations with lighter worm burdens.

This worry comports with our own reservations about the evidence from the Miguel and Kremer 2004 experiment, which was conducted during a period of abnormally elevated worm prevalences due to flooding caused by El Nino.

Together, these examples heighten our concern about the potential for bias and unrepresentativeness in the key studies we rely on in our assessment of the evidence for deworming.

 

The evidence in favor of the Schistosomiasis Control Initiative

Our intervention report on combination deworming, of the kind conducted by the Schistosomiasis Control Initiative, focuses on three kinds of benefits:

  • Subtle general health impacts, especially on haemoglobin. We drew our conclusions on haemoglobin effects from Smith and Brooker 2010‘s analysis of studies on combination deworming; since the new review examines STH-only deworming and not combination deworming, it does not address these studies.
  • Prevention of potentially severe effects, such as intestinal obstruction. These effects are rare and play a relatively small role in our position on deworming. The Cochrane review does not address these effects for the most part. (As stated above, it does discuss one study, with unavailable results, that examined mortality, but we believe mortality from STHs is rare enough that we wouldn’t expect it to show up in such a study.)
  • Developmental impacts, particularly on income later in life. The new review does not directly address the studies we used here. Bleakley 2004 is outside of the scope of the Cochrane review because it is not an experimental analysis, and Baird et al. 2011 is not mentioned, presumably because it has not yet been published. However, Taylor-Robinson do discuss Miguel and Kremer 2004, which underlies the Baird et al. 2011 follow-up; in their assessment of the risk of bias in included studies, Miguel and Kremer 2004 does poorly (it appears to be the worst-graded of the 42 included trials; Figure 3). (See our update about this study above.) Presumably, the follow-up is subject to most, if not all, of the same worries that characterize the initial study since it relies on the same underlying experiment. We have written before about our reservations about these studies, and the new Taylor-Robinson et al. review reinforces those reservations without adding substantial new information. We plan to continue to research the details of these papers, which are crucial to our assessment of deworming.

Conclusion
The new Cochrane review does not directly challenge the findings that are core to our view on combination deworming. That said, it does highlight general issues with research on deworming (e.g., potential publication bias and a case for benefit that is generally weaker than what many relevant academics and advocates seem to have believed). We therefore continue to recommend the Schistosomiasis Control Initiative as our #2 charity, though we have somewhat less confidence than we previously did.

Update on GiveWell’s web traffic / money moved: Q2 2012

In addition to evaluations of other charities, GiveWell publishes substantial evaluation on itself, from the quality of its research to its impact on donations. We publish quarterly updates regarding two key metrics: (a) donations to top charities and (b) web traffic.

The charts below present basic information about our growth in money moved and web traffic thus far in 2012.

Website traffic tends to peak in December of each year (circled in the chart below). Growth in web traffic has generally remained strong in 2012, though it has slowed somewhat in May and June.

Growth in money moved has remained strong as well. The majority of the funds GiveWell moves comes from a relatively small number donors giving larger gifts. These larger donors tend to give in December, and we have found that growth in donations from smaller donors throughout the year tends to provide a reasonable estimate of the growth from the larger donors by the end of the year.

Below, we show two charts illustrating growth among smaller donors.

Thus far in 2012, GiveWell has directed $404,775 to our top charities from donors giving less than $10,000. This is approximately 2.5x the amount we had directed at this point last year.

Most donors give less than $1,000; the chart below shows the growth in the number of smaller donors giving to our top charities.

Overall, 1247 donors have given to GiveWell’s top charities this year (compared to 479 donors at this point last year).

In total, GiveWell donors have directed $964,250 to our top charities this year, compared with $568,250 at this point in 2011. For the reason described above, we don’t find this number to be particularly meaningful at this time of year. One major difference between 2011 and 2012 is that in 2011, Ken Jennings allocated the $150,000 he won participating in a Jeopardy! contest against IBM’s Watson to VillageReach.

GiveWell and Good Ventures

Last year, we met Cari Tuna and Dustin Moskovitz of Good Ventures, a new foundation that is planning eventually on giving substantial amounts (Dustin and Cari aim to give the majority of their net worth within their lifetimes; Dustin is the co-founder of Facebook and, more recently, Asana). We immediately established that Good Ventures and GiveWell share some core values that relatively few others seem to share:

  • Both Good Ventures and GiveWell are aiming to do as much good as possible, from a global-humanitarian perspective.
  • Both are willing to consider any group and any cause in order to accomplish this goal.
  • Both are highly interested in increasing the level of transparency, accountability, and critical discussion and reflection within the world of giving.

Over time, GiveWell and Good Ventures have worked increasingly closely together. In April of last year, Cari joined our Board of Directors; in December of last year, Cari announced substantial grants to our top-rated charities from Good Ventures. In the meantime, Cari was exploring the rest of the world of philanthropy, speaking with a large number of major philanthropists, nonprofit representatives, philanthropic advisors, etc. After a year of exploration, Cari stated to us that while many of the people she had spoken to had been helpful, GiveWell seemed to be most in alignment with the values of Good Ventures and had given the most helpful support in pursuing these values, and that GiveWell’s research appears to her to be at least as high-quality as any foundation research she’s seen. Now, GiveWell and Good Ventures plan to “act as a single team” as we source and vet funding opportunities in areas in which our interests overlap.

This is a partnership, not a merger; we remain separate legal entities. Cari is President of Good Ventures, while Elie and I are Co-Executive Directors of GiveWell; our authorities differ accordingly. If Good Ventures is interested in an area or activity that we aren’t interested in, it will use its resources to pursue this area or activity; likewise, if we are interested in an area or activity that Good Ventures isn’t interested in, we will use GiveWell’s resources to pursue this area or activity.

However, “acting as a single team” does mean that

  • There are substantial areas of overlap between our interests – investigations and activities that rank high on both of our priority lists. The agenda we laid out recently is a close match to current points of intersection.
  • Within these areas, we maintain a common priority list and divide up labor so that we don’t double-do any work. Division of labor is done by consensus, and if there are unresolvable disagreements each organization makes its own choices about its own resources (this has not happened so far).
  • Within these areas, funding requests and ideas will go through a common process. I.e., if someone brings an idea or request to Cari and we have agreed that it fits within an area that is being primarily managed by GiveWell, she will refer the request or idea to GiveWell rather than evaluating it herself.
  • When given confidential materials that are “for our eyes only,” we will attempt to share these with each other (though of course this will require permission from those providing the materials).
  • We are currently experimenting with close coordination on screening and training new hires. We look for similar qualities in new hires, so people who are interested in a job with one organization or the other may be interviewed by both simultaneously.
  • Overall, the above items require close coordination. For this and other reasons, the GiveWell team is currently planning to move to the Bay Area (more on this in a future post).

It seems to me that this is a relatively unusual arrangement. Formally, each organization has full authority over its own resources and none over the other’s, and this fact underlies all procedures for resolving disagreements if and when we cannot reach consensus. However, in practice recently, these cases have been rare and it has often felt as though we’re a single team with a single agenda.

Why does this situation seem unusual? One possibility is that it isn’t a good idea and that the problems with it will become apparent in the future; this possibility is why we have been clear about procedures for resolving disagreements. But there is another possible explanation. In my view, nonprofit work is naturally suited to this sort of “teamwork without a single authority” arrangement, in a way that for-profit work is not. Both GiveWell and Good Ventures are mission-driven: there are no financial returns to divide up, just a vision for the world on which we are closely aligned.

I believe that nonprofits sometimes mimic for-profits in ways that don’t make sense given their missions. They raise money beyond what they need for their core work. They keep information confidential rather than publishing it as a public good. And they exaggerate successes and downplay shortcomings, while being more honest would help the rest of the world learn and thus ultimately promote their mission (if not their organization). If I’m right, the relative unusualness of “teamwork without mergers” could be another way in which nonprofits are missing opportunities to be effective that aren’t available for for-profits. I think it’s possible that the sort of collaboration GiveWell and Good Ventures have today will be far more common in the future.

Objections and concerns about our new direction

GiveWell has recently been taking on activities that may seem to represent a pretty substantial change of direction, especially for those who think of us as a “charity evaluator focused on saving the most lives per dollar spent.”

  • Within global health and nutrition, we’re considering restricted funding for specific projects, not just recommendations of particular charities.
  • We’re also exploring other causes that are extremely different from global health and may be far less amenable to measurement and “cost per life saved” type calculations, such as meta-research.

When discussing these activities, we’ve lately been encountering a couple of different objections and concerns; this post discusses the objections and our responses. In a nutshell:

  • Some are concerned that we’ll lose our objectivity if we get involved in providing restricted funding: we’ll be tempted to rank the groups following our plans ahead of the groups following their own plans, and we’ll thus lose the quality of being a disinterested third-party evaluator. We believe we can draw a meaningful line between “charities we recommend for unrestricted funding” and “plans we have designed,” leaving individual donors to decide whether they’d rather take our recommendation unconditionally or only follow our advice in the areas where we’re disinterested; we also believe that being open to providing restricted funding is necessary and important, and justifies the resources we’ll be investing. More
  • Some are concerned that by going into new causes, we’ll be spreading ourselves too thin. Understanding global health is already an ambitious and difficult goal; it’s been suggested that we should “stick to our knitting.” We feel that sticking to global health, when we see other causes as potentially more promising, would be out of line with our fundamental mission and value-added as an organization that seeks to help people do as much good as possible. More
  • Some are concerned specifically about new causes that don’t lend themselves to measurement and cost-effectiveness calculations (such as meta-research). It may be difficult to remain systematic and transparent about how we make decisions in these more speculative areas. We recognize this concern, but feel that we can remain systematic and transparent even where measurement is difficult or impossible; furthermore, we feel that we must find a way to do this if we are to have a strong case that philanthropy as a whole (not just sub-sectors of it) should be more systematic and transparent. More

Despite the concerns and risks above, we feel that the benefits of our new direction outweigh them. A major input into this view is the feeling that sticking to our old process would be extremely unlikely to result in finding more outstanding giving opportunities within a reasonable period of time; this is something we will be writing more about.

That said, we do recognize the concerns and risks, and we are interested in others’ thoughts on them.

The risk of losing our objectivity

To date, all of GiveWell’s recommendations have involved unrestricted support to existing organizations. Because of this, we can be pointed to as a “neutral third party” that recommends organizations based exclusively on impact-related criteria. But we’re now contemplating doing what a lot of major funders do and helping to set the agenda for a funded organization, through the mechanism of restricted funding. If we did this, we might have difficulty being neutral between (a) projects that we help design and (b) charities that are simply asking for unrestricted funds, not contracting with us. In fact, we might be tempted to eschew (b) entirely and focus exclusively on designing – rather than finding – giving opportunities.

One important principle here is that we will draw a clear line between organizations we recommend for unrestricted funding and projects designed by GiveWell. We don’t know exactly how the visual presentation will work yet, but we have agreed on the principle that there will be a clear distinction – including on our higher-level and frequently-accessed pages – between GiveWell-designed projects and recommended charities.

Of course, there is still a risk that recommendations for unrestricted funding will have “soft conditions” (i.e., that it will be clear to charities what activities they have to carry out in order to earn or maintain recommendations); this is something that has always been true, though I think the situation is somewhat mitigated by the nature of the room for more funding analysis we perform. (Our analysis asks for predicted charity activities based on total unrestricted funding, not based on GiveWell-specific funding. The expectation is that if GiveWell-directed funding falls short of expectations and the gap is made up by other funding, the activities will still be as outlined; this hopefully provides charities an incentive to project the activities they would most like to carry out, rather than projecting the activities they hope will most appeal to GiveWell specifically.)

Even with a clear distinction, there could still be a reasonable concern that GiveWell will over-allocate resources (in terms of investigative capacity) to designing its own projects, as opposed to finding great organizations. We recognize this concern, but wish to note that – philosophically – we greatly prefer unrestricted to restricted funding, and greatly prefer a “hands-off” to a “hands-on” approach. We don’t have the capacity to actively manage projects ourselves, and we believe projects are likely to work out better when they are run by people who fully buy into them (as opposed to people who are fulfilling the requirements of restricted funding).

It’s partly because of this philosophy that we’ve stayed away from restricted funding to date, and we remain highly cautious about it. We would prefer to stick to unrestricted funding and may never in fact deal in restricted funding.

Yet it is worth noting why we are considering restricted funding now in a way that we haven’t before. Our impression is that major funders frequently make extensive use of restricted funding; as a result, the existing landscape consists of many charities whose agendas are set partly or fully by external funders.

  • We’ve been surprised by the disconnect we’ve observed in which there is a large number of promising interventions but few charities that focus on these interventions (in a way such that additional dollars will mean additional execution).
  • More generally, we’ve been surprised that in the majority of conversations in which we ask an organization what it would do with more unrestricted funding, it has no clear answer, and prefers instead to tailor its answer to our priorities.

Practically speaking, charities have to focus on what they can fund; and in today’s world, it seems possible that agendas are largely set by funders. Our ideal role would be to “free” great organizations from restricted funding, allowing them to carry out promising projects that they can’t fund otherwise. However, it seems possible that there are too few charities for whom funding would make this sort of a difference, and there is thus some argument for our taking the sort of active role that other funders do.

Finally, by being open to restricted funding, we’ve come across some opportunities that are similar to “unrestricted funding” in most relevant ways, but that structurally involve restrictions and that we couldn’t have come across using our former approach. For example, we’re currently considering the idea of funding particular parts of UNICEF that work on particular interventions that we’re interested in. This wouldn’t involve laying out our own plan, and it would involve getting money to a specific team and leaving the use of the funds at their discretion; however, we could not find this sort of giving opportunity by talking to general UNICEF representatives and asking what they would do with more unrestricted funding. In some sense it may be appropriate to think of UNICEF (and other organizations like it) as a coalition of teams with their own priorities rather than as a single team with a single set of priorities; so in this case a gift that is formally restricted may have many of the desirable qualities of an unrestricted gift. To avoid confusion, we will still distinguish any recommendations along these lines from purely unrestricted gifts, as laid out above.

The risk of spreading ourselves too thin

We still have a lot to learn about global health and nutrition (as indicated by, among other things, our continued learning from VillageReach’s progress). It has been suggested that we should “stick to our knitting,” focusing on the areas of giving in which (a) we’ve built up the brand we have; (b) data and feedback loops tend to be unusually good for the nonprofit world, facilitating learning.

In response, I’d observe:

  • GiveWell is still a young organization. I believe we have attracted attention more for “bringing a different perspective and approach to giving” than for “being experts in global health” (the latter certainly does not describe us). We recognize that we’re taking some level of risk in moving into new areas, but we also believe that taking risks and staying open to new approaches is a major part of what makes GiveWell what it is and that part of “sticking to our knitting” is retaining this quality. We believe that GiveWell and the donors who use our research will be best served by our continuing to do whatever we believe will lead to the best giving opportunities, continuing to change course as much as necessary to facilitate this, and continuing to bring a different perspective and approach to giving – not continuing to focus on global health.
  • While we currently believe that global health is the most promising cause given the information available, we are not confident in this conclusion. We believe that other causes are potentially promising as well, and if we never investigate them, we will be failing in our mission of finding the best giving opportunities possible.
  • We are currently expanding our staff; we expect that we will invest at least as much time in global health over the next few years as over the last few (while also investing time in other causes).

The risk of losing transparency and systematicity as we move away from highly measurable interventions

We have written before that the cause of “global health and nutrition” seems unusually well-suited to meaningful measurement and metrics (by the standards of the nonprofit sector). When working within this cause, we have been able to be relatively clear about our process and about what distinguishes a recommended from a non-recommended charity. There is some risk that as we tackle other causes, such as meta-research, we will have less of an evidence base to go off of; our goals will be further out; we will have to use more intuition and may therefore become less systematic and transparent.

We believe this is a real risk. However, we also believe that (a) the best opportunities for good giving don’t necessarily lie in the domains with the highest measurability (though there is something to be said for measurability, all else equal); (b) we have reached the point where we feel we can explore causes such as meta-research in a way that – while not as systematic as our work on global health – will still include a great deal of public discussion of how we’re thinking, why we recommend what we do, what the key assumptions are in our thinking and recommendations, and how our projects progress over time.

We have long advocated that philanthropists should be more systematic and transparent in their work. If our own systematicity and transparency applies only to the cause where measurement is easiest, we won’t have a very strong case; if, however, we can consistently bring an unusually level of systematicity and transparency to every cause we examine (even those that are less prone to measurement), we will have much more potential to change philanthropy broadly rather than just a single sector of it.

The benefits of our new direction

The above discussion addresses potential concerns over our new direction. We have previously discussed the substantial benefits: finding the best giving opportunities possible and reaching the largest donors possible, both of which are core to our mission. Dealing with the above issues – keeping a focus on recommending unrestricted funding when possible, covering new causes without overly detracting from continued progress on the causes we know well, and remaining systematic and transparent – will be a challenge, but we feel that it is well worth it, especially because we feel we are reaching the limits (for the moment) of our old approach. (We went through a large number of charities in 2011 and are skeptical that we will find new contenders for our top charities, using that basic methodology, anytime in the near future.)

We welcome further comments and criticisms regarding our new approach.

Meta-research

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

We previously laid out our working set of focus areas for GiveWell Labs. This post further elaborates on the cause of “meta-research” and explains why meta-research is currently a very high priority for us – it is our #2 highest-priority focus area, after global health and nutrition.

Meta-research refers to improving the incentives in the academic world, to bring them more in line with producing work of maximal benefit to society. Below, we discuss

  • Problems and potential solutions we perceive for (the incentives within) development economics, the area of academia we’re currently most familiar with.
  • Some preliminary thoughts on the potential of meta-research interventions in other fields, particularly medicine.
  • Why we find meta-research so promising and high-priority as a cause.
  • Our plans at the moment for investigating meta-research further.

Meta-research issues for development economics

Through our work in trying to find top charities, we’ve examined a fair amount of the literature on how Western aid might contribute to reducing poverty, which we broadly refer to in this post as “development economics.” In doing so, we’ve noticed – and discussed – multiple ways in which development economics appears to be falling short of its full potential to generate useful knowledge:

Lack of adequate measures against publication bias. We have written extensively about publication bias, which refers broadly to the tendency of studies to be biased toward drawing the “right” conclusions (the conclusions the author would like to believe in, the conclusions the overall peer community would like to believe in, etc.) Publication bias can come both from “data mining” (an author interprets the data in many different ways and publishes/highlights the ways that point to the “right” conclusions) and the “file drawer problem” (studies that do not find the “right” conclusions have more difficulty getting published).

Conceptually, publication bias seems to us like one of the most fundamental threats to academia’s producing useful knowledge – it is a force that pushes research to “find” what is already believed (or what people want to believe), rather than what is true, in a way that is difficult for the users of research to detect. The existing studies on publication bias suggest that it is a major problem. There are potential solutions to publication bias – particularly preregistration – that appear underutilized (we have seen next to no use of preregistration in development economics).

A funder recently forwarded us the following comment on a paper under review from a journal, which illustrates this problem:

Overall, I think the paper addresses very important research questions. The authors did well in trying to address issues of causality. But the lack of results has weakened the scope and the relevance of the paper. Unless the authors considerably generate new and positive results by looking say at more heterogeneous treatment effects, the paper cannot, in my view, be published in an academic journal such as the [journal in question].

Lack of open data and code, by which we mean the fact that academic authors rarely share the full details behind their calculations and claims. David Roodman wrote in 2010:

Not only do authors often keep their data and computer programs secret, but journals, whose job it is to assure quality, let them get away with it. For example, it took two relatively gargantuan efforts—Jonathan Morduch’s in the late 1990s, and mine (joining Jonathan) more recently—just to check the math in the Pitt and Khandker paper claiming that microcredit reduced poverty in Bangladesh. And it’s pretty clear now that the math was wrong.

The case he discusses turned out, in our opinion, to be an excellent illustration of the problems that can arise when authors do not share the full details of their calculations: a study was cited for years as some of the best available evidence regarding the impact of microfinance, but it ultimately turned out to be badly flawed, and later more rigorous studies contradicted its conclusions. (See our 2011 discussion of this case.)

Another example of the importance of open data was our 2011 uncovering of errors in a prominent cost-effectiveness estimate for deworming. This estimate had been public and cited since 2006, and it took us months of back-and-forth to obtain the full details behind it; at that point it turned out to contain multiple basic errors that caused it to be off by a factor of ~100.

The lack of open data is significant for reasons other than the difficulty of understanding and examining prominent findings. It also is significant because open data could be a public good for researchers; one data set could be used by many different researchers to generate multiple valuable findings. Currently, incentives to create such public goods seem weak.

Inadequate critical discussion and examination of prominent research results. The above two examples, in addition to illustrating open-data-related problems, illustrate another issue: it appears that there are few incentives within academia to critically examine and challenge others’ findings. And when critical examinations and challenges do occur, they can be difficult to find. Note that Roodman and Morduch’s critique (from the example above) was rejected by the journal that had published the study they were critiquing (the sole reviewer was the author of the critiqued study); as for the case of the DCP2 estimate, the critique came from GiveWell and has been published only on our blog (five years after the publication of the estimate).

Overall, our impression is that there is little incentive for academics to actively investigate and question each others’ findings, and that doing so is difficult due to the lack of open data (mentioned above).

Lack of replication. In addition to questioning the analysis of prominent studies, it would also be useful to replicate them: to try carrying out similar interventions, in similar contexts, and seeing whether similar results hold.

In the field of medicine, it is common for an intervention to be carried out in many different rigorous studies (for example, the literature on the effects of distributing insecticide-treated nets includes 22 different randomized controlled trials, and the programs executed are broadly similar though there are some differences). But in development economics, this practice is relatively rare.

More at a recent post by Berk Ozler.

General disconnect between “incentive to publish” and “incentive to contribute maximally to the stock of useful knowledge.” This point is vaguer, but we have heard it raised in multiple conversations with academics. In general, it seems that academics are encouraged to do a certain kind of work: work that results in frequent insights that can lead to publications. Other kinds of useful work may be under-rewarded:

  • Creating public goods for other researchers, such as public data sets (as discussed above)
  • Work whose main payoff is far in the future (for example, studies that take 20 years to generate the most important findings)
  • Studies that challenge widely held, fundamental assumptions in the field (and thus may have difficulty being published and cited despite having high value)
  • Studies whose findings are important from a policymaking or funding perspective, but not interesting (and thus difficult to publish) in terms of delivering surprising or generalizable new insights. For example, we have only been able to identify one randomized controlled trial of a program for improving rural point-of-source water quality, despite the popularity and importance of this type of intervention.

Potential interventions to address these issues.

We’ve had several conversations with academics and funders who work on development economics about how the above issues might be addressed. Most are directed at the specific problems we’ve listed above, though some are more generally in the category of “creating public goods for the research community as a whole.” Some of the more interesting ideas we’ve come across:

  • Funding efforts to promote the use of preregistration and data/code sharing, such as advocating that journals require these things of their publications (a journal might require preregistration and data/code sharing as a condition of publication) or that funders require these things of their grantees (a funder might require preregistration and data/code sharing from all funded studies).
  • Creating a “journal of good questions” – a journal that makes publication decisions on the basis of preregistered study plans rather than on the basis of results. The idea is to reward (with publication) good choices of topics and hypotheses and plans for investigating them, regardless of whether the results themselves turn out to be “interesting.” (We have previously discussed this idea.)
  • Funding a journal, or special issue of a journal, devoted to open-access data sets. Each data set would be accompanied by an explanation of its value and published as a “publication,” to be cited by any future publication drawing on that data set. This may improve incentives to create and publish useful open-access data sets, since scholars who did so well could end up publishing the data sets as papers and having them cited.
  • Funding the creation of large-scale, general-purpose open-access data sets. Currently, researchers generally collect data for the purpose of conducting a particular study; an effort that aimed specifically to create a public good might be better suited to maximizing the general usefulness of the collected data, and may be able to do so at greater scale than would be realistic for a data set aiming to answer a particular question. For example, one might fund a long-term effort to track a representative population in a particular developing country, randomly separating the population into a large “control group” and a set of “treatment groups” that could be treated with different interventions of general interest (cash transfers, scholarships, nutrition programs, etc.)
  • Funding a journal, or special issue of a journal, devoted to discussion, critiques, re-analyses, etc. of existing studies, in order to put more emphasis on – and give more reward to – this activity.
  • Funding awards for excellent public data sets and for excellent replicative studies, reanalysis, and other work that causes either confirmation or re-examination of earlier studies’ findings.
  • Creating a group that specializes in high-quality systematic reviews that summarize the evidence on a particular question, giving heavier weight to more credible studies (similar to the work of the Cochrane Collaboration, which we discuss more below). These reviews might make it easier for funders, policymakers, etc. to make sense of research, and would also provide incentives to researchers to conduct their studies in more credible ways (employing preregistration, data/code sharing, etc.)
  • Creating a web application for sharing, discussing, and rating papers (discussed previously).
  • Awards for the most useful and important research from a policymaker’s or funder’s perspective (these could take practices like data sharing and registration into account as inputs into the credibility of the research).
  • Promoting an “alternative/supplemental reputation system” for papers (and potentially academics) directly based on the value of research from a funder’s or policymaker’s perspective, taking practices like data sharing and registration into account as inputs into the credibility of the research.
  • Creating an organization dedicated to taking quick action to take advantage of “shocks” (natural disasters, policy changes, etc.) that may provide opportunities to test hypotheses. When a “shock” occurred, the organization could poll relevant academics on what the important questions are and what data should be collected, record the academics’ predictions, and fund the collection of relevant data.

Meta-research for other fields

We aren’t as familiar with most fields of research as we are with development economics. However, we have some preliminary reason to think that many fields in academia have a similar story to development economics: multiple issues that keep them short of reaching their full potential to generate useful knowledge, and substantial room for interventions that may improve matters.

  • We recently met with representatives of the Cochrane Collaboration, a group that does systematic reviews of medical literature. We have found Cochrane’s work to be valuable and high-quality, and we were surprised to be told that the U.S. Cochrane Center raises very little in the way of unrestricted funding. After talking to more people in the field, we have formed a preliminary impression that there is little funding available for medical initiatives that cut across biological categories, including the sort of work that Cochrane does (which we would characterize as “meta-research” in the sense that it works toward improved incentives and higher value-added for research in general). We will be further investigating the Cochrane Collaboration’s funding situation and writing more about it in the future.
  • Informal conversations have given me the impression that many of the problems described above – particularly lack of adequate measures against publication bias, lack of preregistration, lack of data/code sharing, and general misalignment between what academics have incentives to study and what would be most valuable – apply to many other fields within the natural and social sciences.
  • I’ve also heard of other problems and ideas that are specific to other fields. For example, a friend of mine in the field of computer science stated to me that
    • There are too few literature reviews in the field of computer science, summarizing what is known and what remains to be determined within a particular field. The literature reviews that do exist quickly become out of date. More up-to-date literature reviews would make it easier for people to contribute to fields without having to be at the right school (and thus in the right social network) for these fields.
    • There are some sub-fields in computer science that require testing different algorithms on data sets, such that the number of appropriate available data sets is highly limited. (For example, testing an algorithm for analyzing online social networks against a data set based on an actual online social network.) In practice, academics often design algorithms that are “over-fitted” to the data sets in use, such that their predictive power over new data sets is questionable. He proposed a set of centralized “canonical” data sets, each split into an “exploration” half and a “confirmation” half; while the “exploration” half would be open access, the “confirmation” half would be controlled by a central authority and academics would be able to test their algorithms on it only in a limited, controlled way (for example, perhaps each academic would be given 5 test runs per month). These data sets would constitute a public good making it easier to compare different academics’ algorithms in a meaningful way, both by reducing the risk of over-fitting and by bringing more standardization to the tests.

Overall, the conversations I’ve had about meta-research – even with people who aren’t carefully selected, such as personal friends – have resulted in an unusually high density of strong opinions and novel (to me) ideas for bringing about positive change.

Why we find meta-research promising as a cause

High potential impact. As we wrote previously, it seems to us that many of philanthropy’s most impressive success stories come from funding scientific research, and that meta-research could have a leveraged impact in the world of scientific research.

Seeming neglect by other funders. We see multiple preliminary signs that this area is neglected by other funders:

  • In examining what foundations work on today, we haven’t seen anyone who appears to have a focus on meta-research. We recently attended a funders’ meeting on promoting preregistration and got the same impression from that meeting.
  • As mentioned above, informal conversations seem to lead more quickly to “ideas for projects that could be worked on but aren’t currently being worked on” than conversations in other domains.
  • As mentioned above, we are surprised by the U.S. Cochrane Center’s apparent low level of funding and need for more funds, and feel that this may point to meta-research as a neglected area.

Good learning opportunities. We have identified funding scientific research as an important area for further investigation. We believe it is one of the most promising areas in philanthropy and also one of the areas that we know the least about. We believe that investigating the question, “In what ways does the world of academic research function suboptimally?” will lead naturally to a better understanding of how that world operates and where within it we are most likely to find overlooked giving opportunities.

Our plan for further investigation of meta-research as an issue area

We are pursuing the following paths of further investigation:

  • Further investigation of the Cochrane Collaboration, starting with conversations with potential funders about why it is having trouble attracting funding. We believe that the Cochrane Collaboration may turn out to be an excellent giving opportunity, and if it does, that this will provide further evidence that meta-research is a promising and under-invested-in cause; on the other hand, if we discover reasons to doubt Cochrane’s effectiveness or need for more funds, this will likely be highly educational in thinking about meta-research in general.
  • Conversations with academics about meta-research-related issues. Some of the key questions we have been asking and will continue to ask:
    • Are there any ways in which the academic system is falling short of its full potential to generate useful knowledge? What are they?
    • What could be done about them?
    • Who is working on the solutions to these problems? Who would be the logical people for a funder to work with on them?
    • Is there any research that you wish you could do but can’t get funded to do? Is there any research that you generally feel ought to be taking place and isn’t? If so, why is this happening?
    • Are there areas of research that you think is overdone or overinvested in? Why do you think this is?
    • What do you think of the ideas we’ve accumulated so far? To the extent that you find one or more to be good ideas, whom would you recommend working with to move forward on or further investigate them?
    • Whom else would you recommend speaking with?
  • Trying to get a bird’s-eye view of the world of academic research, i.e., a view of what the various fields are, how large they are (in terms of people and funding), and where the funding for them comes from. We hope that this bird’s-eye view will help us be more strategic about which fields best combine “high potential” with “major room for interventions to improve their value-added,” and thus to pick fields to focus on for meta-research in a more systematic manner than we’ve done so far.

Giving cash versus giving bednets

We recently published a new review of GiveDirectly, a “standout” charity that gives cash directly to poor people in Kenya. As we were going through the process of discussing and vetting the new review, I found myself wondering how I would defend my preference to donate to distribute insecticide-treated bednets (ITNs) against a serious advocate for cash transfers. We’ve written before about the theoretical appeal of giving out cash, and the fact that there is a promising charity doing so renews the question of whether we should.

I continue to worry about the potential “paternalism” of giving bednets rather than cash (i.e., the implication that donors are making decisions on behalf of recipients). I believe that by default, we should assume that recipients are best positioned to make their own decisions. However, I see a few reasons to think bednets can overcome this presumption:

  • The positive externalities of ITNs
  • The fact that bednets protect children rather than adults
  • The fact that ITNs may be unavailable in local markets or that people may reasonably expect to be given them for free.

I address each of these reasons in more depth below. Note, however, that this discussion is meant to be primarily about the theoretical question of giving cash versus giving bednets; a more practical discussion of giving to the Against Malaria Foundation versus giving to GiveDirectly would focus on the specifics of the two organizations.

The positive externalities of ITNs

We discussed the evidence that ITNs have benefits for community members other than those using the ITNs in our review of the evidence for ITNs. After speaking with several malaria scholars and reviewing the literature, we concluded:

  • The evidence for the efficacy of ITNs is based on studies of universal coverage programs, not targeted programs. In particular, all five studies relevant to the impact of ITNs on mortality involved distribution of ITNs to the community at large, not targeted coverage… Thus, there is little basis available for determining how the impact of ITNs divides between individual-level effects (protection of the person sleeping under the net, due to blockage of mosquitoes) and community-level effects (protection of everyone in communities where ITN coverage is high, due to reduction in the number of infected mosquitoes, caused either by mosquitoes’ being killed by insecticide or by mosquitoes’ becoming exhausted when they have trouble finding a host).
  • The people we spoke to all believe that the community-level effect of ITNs is likely to be a significant component of their effect, though none believe that this effect has been conclusively demonstrated or well quantified.
  • There is some empirical evidence suggesting that the community-level impact of ITNs is significant.

In our main model of the cost-effectiveness of distributing ITNs (XLS), we assumed that 50% of the benefits of ITNs come from the total community coverage of ITNs.

To the extent that ITNs have positive externalities, private actors may underinvest in them, meaning that it may be a good idea to distribute them freely even if individuals would choose not to purchase them at the available price. More generally, since we care about helping whole populations and not any particular specific individual, providing “public goods” of this sort amplifies our impact relative to giving the same amount of money to individuals.

Although it is conceptually possible that giving a large number of individuals small cash grants also has positive externalities, e.g. by boosting the local economy, we haven’t seen any evidence of this, and we doubt that the magnitude of the externality would be as large.

Bednets protect children rather than adults

One of the central reasons that I appreciate cash transfers is that they avoid paternalism. But sometimes, especially with regard to children, paternalism seems morally justifiable. I believe this is one of those cases.

Although AMF distributes ITNs universally, not just to children, the main benefits of ITNs—averting mortality—accrue to children under the age of 5. Children under the age of 5 lack bargaining power, income, and access to credit, not to mention the cognitive faculties to make decisions about their own long-term welfare. Accordingly, purchasing something that is reasonably likely to keep young children alive, even if they don’t or can’t decide to purchase it for themselves, seems to be a justifiable form of paternalism. In general, paternalism towards such young children is unobjectionable.

By distributing bednets, we might be spending money to benefit kids in a way that their parents wouldn’t spend it if we gave it to them instead. Given the magnitude of the benefits to the children, this seems to be justified.

People may not purchase ITNs because they are unavailable in local markets or because they expect to be given them for free

This point is more anecdotal, but Natalie, Holden and I remember being told while we were in Malawi that long-lasting insecticide-treated bednets, of the sort that AMF distributes, are essentially not available for purchase in local markets. Unfortunately this is not in our published notes (DOC) from the conversation where we recollect it occurring.

In another case, an RCT in Kenya in which researchers experimentally subsidized the cost of bednets (PDF), even very small increases in prices led to substantial reductions in bednet purchases by mothers (e.g. charging $0.60 led to a 60% reduction in take-up). Two different people told us in off-the-record conversations that they thought that this occurred because the mothers offered subsidized bednets believed that they would be able to acquire free nets at some other point. There have been periodic free ITN distributions in many sub-Saharan African countries over the last decade, and the international consensus seems to be that governments should distribute ITNs free of charge in malaria-endemic areas. Accordingly, it should not be especially surprising that citizens may expect bednets to be provided free of charge, and may not move to purchase them even if they are available at subsidized prices in the marketplace. If we reasonably expect to be given something for free in a relatively short time window, why buy it now?

This wouldn’t necessarily have been the case if philanthropy had never funded bednets, but having started down this path, I think it provides another consideration in favor of continuing. If we could credibly and cheaply communicate that no more bednets would be forthcoming, this consideration wouldn’t matter, but there is no obvious way to do so.

This is something to keep in mind in the future: philanthropic funding decisions may create an unanticipated a form of “lock-in,” in which future philanthropists become effectively committed to continued funding, even if it would not have been necessary in a counterfactual world of no philanthropic support. Although unlikely to be crucial, this consideration may counsel against certain undertaking some marginal philanthropic activities.

Conclusion

I think that in order to avoid paternalism, philanthropists working to improve the lives of the global poor should have fairly strong presumption in favor of cash transfers, and that those who advocate other strategies should have a convincing story to tell about why they beat cash. Above, I’ve tried to justify my view that bednet distributions are one of those philanthropic strategies that may beat cash. In searching for future top charities, I’d like to see a similarly strong case.