The GiveWell Blog

The case for cash

Our choice to name GiveDirectly as our #2 charity has drawn some surprise and criticism. GiveDirectly seeks to deliver 90c directly into the hands of the very poor (no strings attached) for every $1 of total organizational expenses. There are many people who consider this intervention “unproven” (since there is not research linking cash transfers directly to the sort of health impacts associated with other top charities’ programs) and even dangerous (with the idea being that the people receiving the transfers are unlikely to spend them well).

We believe that cash transfers face a lower burden of proof than other charitable interventions, yet have been studied more than any other non-health intervention we’re aware of, with results supporting the idea that they have net positive impact; that GiveDirectly fits the description of a “charity with documented positive impact”; that the magnitude of this impact is certainly open to debate, but appears reasonably high and could be competitive with the most cost-effective interventions; and that much of the intuitive resistance we see to the idea of unconditional cash transfers may be driven by misleading analogies between the developing world and the developed world.

This post will

  • Lay out the basic case for the appeal of cash transfers.
  • Address what we see as misleading analogies between the developing world and the developed world, analogies that lead to what we see as excessive pessimism about the impact of cash transfers.

The basic case for cash transfers
Improving consumption, in the short and long run

For several years, our definition of “evidence of impact” has included evidence that “wealth is being transferred to low-income people.” Documentation of positive impact is usually (necessarily) documentation of proxies for improved quality of life, and of such proxies, we have long found “increased wealth/income/consumption” to be one of the stronger ones. (Other strong proxies include improved nutritional status as indicated by height/weight-related measures, and reduced incidence/prevalence of symptomatic diseases such as malaria and diarrhea. A more direct measure of long-term impact is reduced mortality, but we do not wish to consider interventions based solely on their impacts on mortality, as different people have very different intuitions about how this impact should be valued relative to life improvement.) Improvements in wealth/income/consumption are a common goal of health programs, job training programs, and other programs; we would consider such improvements to be evidence of positive impact in evaluations of such programs, and it seems consistent and appropriate to consider them evidence of positive impact in evaluations of cash transfer programs as well.

It may seem near-tautological to say that cash transfers improve recipients’ wealth/income/consumption. We recognize this, and this is why we sometimes describe cash transfers as carrying a “lower burden of proof” than other interventions. If a charity can establish that it is placing significant wealth in the hands of low-income people (not necessarily an easy thing to establish), we believe this is essentially tantamount to evidence of positive impact. (That isn’t to say that it proves the charity’s impact is highly positive, exclusively positive, or even net-positive; but we believe that it establishes a case for such impact comparable to the best cases we’ve seen for other interventions.)

However, it isn’t the case that this is the only argument for cash transfers. Cash transfers also happen to be the most extensively studied non-health intervention we know of. In a large number of high-quality studies, researchers have looked to see whether cash transfers have indeed increased consumption, what sorts of consumption they’ve increased, and whether common concerns about them are supported by evidence. The consistent picture that emerges from these studies is that cash transfers generally do increase consumption, particularly on food, and that evidence to support common concerns has not emerged despite being looked for. (More at our writeup on cash transfers.)

As discussed previously, there is a smaller set of studies implying that people get significant return on investment from cash transfers, even over the long run; the case for longer-term impacts of cash transfers is broadly comparable to the case for longer-term life improvement impacts of our other top charities’ health interventions, and the cost-effectiveness according to our best estimates is in the same ballpark as well.

Leveraged impacts

One common objection to cash transfers is along the lines of “They probably do help some, but can’t we do better?” We think the answer is that we likely can do better, but it isn’t obvious. For one thing, it’s important to keep in mind the limitations of giving as a casual donor. While a major philanthropist may be able to take big risks with big upside, a donor without special expertise – looking to translate dollars directly into improved lives – has limited options. (For what it’s worth, we feel similarly about LLIN distribution and deworming: we think it’s likely that a major philanthropist could do better than paying directly for delivery of these interventions.)

More importantly, cash transfers can have leveraged and transformative impacts. The most direct evidence of this is pair of long-term studies finding annual rates of return on invested funds in the range of 35-75%, over ~5-year time frames, which we have stated constitutes evidence that is roughly as good as the evidence for long-term impact for deworming.

Many people see claimed returns in this range and find them implausible, as though earning such returns would require great ingenuity and/or risk-taking. For relatively wealthy people, this is an accurate perception; but for people with extraordinarily low levels of wealth, the returns to a little bit more – especially delivered as the sort of lump-sum payment that helps them get around the challenges of volatile and uncertain incomes described in Portfolios of the Poor – could be significant. The right analogy is not to an investment vehicle that can provide returns on arbitrary amounts of savings; rather, it’s to the kinds of returns we can realize every day by being able to spend money up front rather than piecemeal. (For example, if I buy a $5 water bottle that allows me to save $1 per week on bottled water, that’s an annual return of over 1000%. I have enough flexibility in my finances that I would never think twice about an expenditure like this, but for someone on an extremely low income, the situation is different.)

More specifically,

  • One common reported use of GiveDirectly’s transfers is purchasing livestock – something that is often portrayed as having lasting, leveraged, transformative impacts. (Though the purchases of livestock by cash transfer recipients have important differences with the livestock distributed by livestock-specific programs – namely that recipients choose which livestock to purchase, and can purchase something else instead if they’d prefer.)
  • Another common reported use of GiveDirectly’s transfers is purchasing a metal roof, to replace a roof made of mud and thatch. We asked about this sort of purchase on our site visit; we were told that mud/thatch roofs take repeated applications of money and time to repair, and can leak to the point of compelling people to move their families and belongings into others’ homes when it rains heavily (or when their roofs are not in good shape). It’s not hard to imagine that a metal roof could make a major and lasting difference to a family (and GiveDirectly has stated to us – though we have not yet examined the details – that the effective “rate of return” on a metal roof is about 17% annually, which would be very high). Unlike with livestock, I’ve never heard of a charity giving out metal roofs; this is arguably an illustration of the idea that recipients have ideas for improving their own lives that haven’t occurred (or can’t be sold) to donors.
  • There are many more possibilities for how recipients could leverage cash transfers into lasting, high-return impacts. One person on our site visit stated that he had used the money to purchase a motorcycle instead of having to rent it by the day for his job; this sort of story is common among microfinance recipients and in Portfolios of the Poor (the most credible attempt we’ve seen to understand how very low-income people in the developing world manage their finances). Indeed, the high repayment rates and high interest rates associated with microfinance give some indication that it is common (though not universal) for low-income people in the developing world to have fairly simple opportunities to earn high returns on cash, and the evidence we have seen on long-term returns from cash provides further evidence.

Two more perspectives from which cash transfers look attractive:

  • One way to think of cash transfers is as “giving low-income people their choice of intervention [via purchase] to improve their own lives.” There are many plausible reasons to think that their choices may be inferior to the choices of more educated aid professionals, and not all such reasons rely on the idea that aid professionals are better informed. (Alexander covered several of these in a post earlier this year, defending his preference for bednet distribution over cash transfers.) But there are also plausible reasons to think that recipients’ choices may be superior to those of aid professionals. For our part, we’d guess that cash transfers are more beneficial than many, but not all, charitable interventions, and our charity rankings and recommended allocations are consistent with this position.
  • Another argument for cash transfers is that they are the best intervention to support given “maximum skepticism.” If I put no credence whatsoever in expert analysis and academic studies, unconditional cash transfers would be my intervention of choice. Since I do put some credence in such analysis and studies (particularly the strongest ones), I don’t go all the way to this position; however, I think that such analysis and studies have more weaknesses than most people recognize, and I don’t find the position of extreme skepticism absurd or indefensible.

Important differences between the developed- and developing-world poor
Some people object to GiveDirectly based on the idea that “cash transfers have been tried and haven’t worked” in the U.S. They cite examples such as government welfare programs and studies of lottery winners.

We believe that it is very misleading to analogize the poor in the developing world to the poor in the U.S. This is primarily because

 

  • Because the U.S. poor generally have access to basic necessities, there is generally little low-hanging fruit in terms of easily purchased items that can materially improve one’s standing. This is (we believe) precisely why there are such better giving opportunities in the developing world.
  • When we think of the U.S. poor, we generally think of people who have access to basic necessities but may face more daunting challenges, such as living in high-crime neighborhoods, being unable to hold what we consider desirable jobs, having substance abuse or other mental health issues, etc. Overcoming these obstacles and becoming “middle-class” would take either a great deal of money (reasonably well spent) or fundamental changes in educational status, behavior, environment, etc. By contrast, the developing-world poor generally lack the ability to afford very basic things; purchasing these things could make them much better off, while still at a lower standard of living than the U.S. poor.
  • We also think the nature of “temptation to spend money unproductively” is very different in the two settings. While the developing-world poor certainly have opportunities to spend money on gambling, alcohol, cigarettes, etc., we haven’t (on our site visits) observed the same level of opportunities to waste large amounts at once that we see here. Furthermore, it’s quite plausible to us that the greatest temptation for a clinically malnourished individual would be to legitimately improve his or her diet (e.g., by eating more protein) or address some other pressing basic need (such as a leaky roof).The last few of the above points are speculative, and shouldn’t be taken as an attempt on our part to demonstrate that developing-world cash transfers are well spent. Rather, they should be taken as illustrations of why it isn’t safe to extrapolate from the U.S. to the developing world. (Instead, one should focus on the studies that have been done of cash transfer programs.)

    Another point worth noting is that even if cash transfers have had disappointing results in the U.S., it isn’t clear that any other anti-poverty intervention has had better results. (More at our discussion of U.S. equality of opportunity.) One way of putting our view is that (a) in the U.S., poverty is often too complex to be solved by money, and therefore donors have trouble helping significantly whether they are making cash transfers or funding other interventions; (b) in the developing world, poverty often includes a lack of very basic, very helpful necessities that can be easily purchased, and therefore donations can do a great deal of good in the form of cash transfers or other interventions.

    Bottom line
    We don’t take the position that cash transfers are the best intervention out there. But we think they are a highly promising intervention, and that many of the concerns we see raised about them are unwarranted and/or exaggerated. In terms of both the intuitive case and the evidential case, we think cash transfers are on the very short list of the most promising interventions we’ve seen for individual donors.

 

More on the ranking of our top charities

We previously wrote that we think cash transfers are likely to be significantly less cost-effective (in terms of “good accomplished per dollar donated”) than deworming; yet we rank GiveDirectly higher than Schistosomiasis Control Initiative. We gave some basic indications of our reasoning in the strengths/weaknesses table of our announcement post. Since then, we’ve had further conversations and sought to better express and formalize our thinking, and we’ve realized that there is a potential source of major confusion here:

  • This year, we selected our top charities based on the criteria we’ve used for years, but we ranked them based on where we would personally give in order to maximize our impact.
  • In past years, the questions of “Where would we give in order to maximize our impact?” and “Which charities meet our criteria?” have been essentially identical, but this year, they have started to diverge. Among other things, the goal of giving to learn has come to carry more and more weight for us.
  • Staff are divided on whether SCI or GiveDirectly better exemplifies our formal “proven, cost-effective, scalable” criteria, but we are in agreement that GiveDirectly is stronger on other dimensions including the opportunity to “give to learn,” and as such, we are unanimous in preferring to support GiveDirectly.
  • We also feel that “giving to learn” is the primary impact-based justification we’re aware of for supporting more than one of our top charities. For donors who don’t put weight on this benefit, we feel the way to maximize impact is to support AMF exclusively.

Below, we elaborate on these points. The upshot of them is that

  • Donors who are place high value on helping GiveWell by giving should either give to all three charities (with our recommended allocation as a reference point) or should make a gift to GiveWell for regranting at our discretion.
  • Donors who are seeking simply to do the most direct good – excluding benefits to GiveWell – should exclusively support Against Malaria Foundation, the charity that performs best by our criteria and the charity that we would support if we could only support one. Of course, donors who disagree with our recommendation of AMF may wish to consider both GiveDirectly and SCI as alternatives.
  • While we don’t feel our communications have been ideal around these issues, we plan to keep our top charities page as is. For donors seeking to support one charity, our recommendation is to support AMF; for donors looking to support multiple charities, but looking to do so based on our recommendation rather than their own review of our work, our recommendation is to use our target allocation; for donors looking to use more of their personal judgment, we make information available for doing so. We are interested in others’ perspectives on whether our top-level communications are appropriate.

The easiest way to address the difference (and where it came from) between our current criteria for identifying top charities and how we answer the question of “Where do I give?” is to briefly recount the evolution of GiveWell’s approach over time. We do so, then outline how we selected and ranked our top charities (and how the two differ), then discuss the implications for donors.

The history of our criteria

GiveWell was founded in order to answer the question, “Where should I give?” (See our story.) To the extent that we’ve formalized and publicized criteria for charities, these have come from formalizing what we were looking for in a giving opportunity. The fact that we start with “Where should I give?” and derive and adapt criteria from there – rather than starting with a set of criteria and applying them formulaically – has long been a distinguishing feature that has led GiveWell to investigate different questions (and generally investigate charities in more depth) compared to other charity evaluators.

When we first started, we recognized that we were extremely new to the world of giving. Accordingly, we wanted to start with the “easiest” giving opportunities for non-experts to assess: opportunities that involved directly paying for delivery of an already-proven intervention with at least somewhat quantifiable positive impacts. We’ve long recognized that there may be better giving opportunities that take a different form (higher risk, higher potential reward), but we haven’t felt that we have the expertise and context needed to assess these opportunities. So we’ve sought the easiest charities to be confident in.

Over time, we’ve gained experience and seen interest from larger donors, leading us to want to broaden our criteria generally. At the same time, we’ve become very interested in the idea of “giving to learn” – gaining information from an organization that is much easier to obtain as a “supporter” (someone who has helped get funding to an organization in the past) than simply as an evaluator (someone who might help get funding to an organization in the future).

  • This idea first started to appeal to us in 2010, when we felt that the significant funds we had directed to VillageReach improved our access (both in terms of VillageReach’s interest in investing time in GiveWell and in terms of GiveWell’s comfort with asking for such investment). We conducted a multiple day field visit and published intensive updates on VillageReach’s work, leading to substantial revisions in our views.
  • Many of the major funders we’ve interacted with have stressed the value of “giving to learn,” outlining a similar dynamic to the one described above: by supporting an organization, one gains the ability to investigate it more deeply.

At this point, it seems to us that one of the best uses of funds might be to support organizations from which – for whatever reason – we believe we can learn a lot, even if we’re not highly confident in such organizations and don’t see the dollars given to them as directly accomplishing much good. We still see enormous room for improvement in our knowledge base, and we anticipate substantial future growth in money moved; thus, we believe that most of the direct impact of the gifts we recommend over GiveWell’s lifetime is likely to be concentrated in the future, and knowledge gained now could have big returns if it improves our future recommendations.

However, as we’ve laid out some of the shifts we’ve been going through this year, many in our audience have said that they want us to continue to focus on charities that meet our traditional criteria – the criteria that have become fairly strongly identified with us by this point.

Therefore, for giving season 2012, we sought to ensure that we were doing our best to highlight all the charities we could find that meet the criteria many have come to associate with us. Our top charities are the current set of charities for which (a) we have conducted extensive due diligence, thoroughly pursued a large number of critical questions for, and remain highly confident in; (b) we believe that more dollars will likely lead to more delivery of programs that are highly cost-effective and backed by strong evidence. (Note that, as has been true every year, there are charities we consider promising and worth investigating, that we simply have not yet done enough due diligence on to place in this category.)

However, when it came time to rank these charities, we reverted to the question of “Where would we like to see funds go in order to accomplish as much good – all things considered – as possible?”

GiveDirectly vs. SCI

Against Malaria Foundation performs very strongly on all of our criteria – more strongly than any other charity we’ve found – and it is the single charity we would most like to see funds go to in order to accomplish good more broadly (including both direct impacts and learning opportunities). This is important because, as we wrote a few weeks ago, we see a fairly strong case for giving exclusively to one organization in order to maximize impact; to the extent that one gives to multiple organizations, we feel that this should be justified by room for more funding considerations or by the goal of “giving to learn.”

Nonetheless, many donors have asked us about the comparison between GiveDirectly and SCI, which is less straightforward because they have quite different strengths and weaknesses.

SCI is, we believe, working on an intervention with greater direct cost-effectiveness and roughly comparable evidence of effectiveness to GiveDirectly. However, GiveDirectly presents a much clearer picture when it comes to room for more funding. Our understanding of SCI is that marginal dollars are used to (a) fill gaps in programs funded by larger donors (such as the UK government’s Department for International Development) and (b) attempt to catalyze the creation of new programs; while SCI has a track record of implementing large programs fully funded by major donors like the US government or the Gates Foundation, we see little to go on in assessing its ability to carry out these activities effectively.

Looking more holistically at the question of where we’d give, we see three more advantages to GiveDirectly. One is our higher subjective confidence in it as an organization, which has implications for how much good we expect it to accomplish if unexpected situations arise, if there is something fundamentally off about our understanding of its activities, etc. The second issue is learning: we believe that with GiveDirectly, we have a clear sense of what we expect additional dollars to lead to and a strong expectation that we’ll be able to meaningfully compare our expectations with what actually happens in the future – that we’ll be able to assess the outcome of our recommendation via charity updates. We don’t believe the same to be true of SCI, based on the updates we’ve done over the past year (which have included some difficulties in communicating). Finally, we see more “upside” for GiveDirectly because we see it as experimenting with an intervention that is (wrongly, in our view) unusual in the aid world.

Staff members do not all agree on how important each of these individual factors are, but when considering all of them together, we are broadly in agreement that dollars given to GiveDirectly will accomplish more good.

Implications

Donors who are place high value on helping GiveWell by giving should either give to all three charities (with our recommended allocation as a reference point) or should make a gift to GiveWell for regranting at our discretion. Our recommended allocation is provided so that donors looking to support multiple charities, and looking to us for guidance on amounts, will give in the proportions that we feel will accomplish the most total good, including contributing to GiveWell’s ability to learn. (We’re planning to write more in the future about how we might better convey our recommended allocations. We suspect that, especially in the case of the #2 and #3 charities, our recommendations might be better expressed in terms of absolute dollars than in proportions of total money moved.)

Donors who are seeking simply to do the most direct good – excluding benefits to GiveWell – should focus their support on the Against Malaria Foundation, the charity that performs best by our criteria and the charity that we would support if we could only support one. We see no benefit to spreading donations out over multiple organizations, except for the potential benefits of “giving to learn” or in response to room for more funding issues (which we do not foresee for our recommended charities at expected funding levels). Donors who disagree with our recommendation of AMF may wish to consider both GiveDirectly and SCI as alternatives.

Cost-effectiveness of nets vs. deworming vs. cash transfers

Update 12/5/2014: we update our cost-effectiveness models annually. The most up-to-date versions can be found here

This post discusses how we see the relative “bang-for-the-buck” – good accomplished per dollar spent – of three interventions:

We discuss:

  • Our general philosophy of cost-effectiveness. We find the exercise of cost-effectiveness useful, and we care greatly about large robust differences, but we also believe that cost-effectiveness estimates involve substantial judgment calls and shouldn’t be taken literally.
  • Simple “cost per person per year” measures. Deworming is the cheapest of the three interventions in terms of “total costs to treat one person for one year,” so other interventions must be more beneficial (on a per-person-per-year basis) in order to be an equally good buy from a donor’s perspective. Deworming and nets are in the same ballpark on this basis, and cash transfers that are spent on metal roofs could be seen as being in the same ballpark as well.
  • Cost per “equivalent life saved.” We can try to convert all the benefits of the different interventions into “equivalent lives saved” (or, alternatively but similarly, into DALYs). We have not attempted this for cash transfers; when doing so for deworming, we run into many judgment calls and an extremely wide range of values, from $31 to $25,000 per “equivalent life saved.” Estimating “lives saved” for bednets is more straightforward, though it involves judgment calls as well. Different people have significantly different intuitions about the right inputs into these figures, even within GiveWell staff. We provide estimates based on the guesses of the three staff members who have worked most intensively on cost-effectiveness, and may be providing other staff members’ estimates in the future. These estimates diverge widely from each other, though in all cases, that nets and deworming are estimated to be in the same ballpark, with a slight edge for nets.
  • Financial returns. The primary benefit we’ve seen evidence for, for deworming, is improved earnings for people dewormed in childhood. We can estimate the net present value of these benefits to arrive at a figure along the lines of “For each $1 spent on deworming, recipients receive total benefits equivalent to $X.” Following this approach allows a comparison to cash transfers. Again, the outcome of the comparison depends on many significant judgment calls, and different staff members produce very different figures. The general picture here is that deworming is between 2 and 5 times as cost-effective as cash-transfers.

We encourage readers who find formal cost-effectiveness analysis important to examine the details of our calculations and assumptions, and to try putting in their own. To the extent that we have intuitive preferences and biases, these could easily be creeping into the assumption- and judgment-call-laden work we’ve done in generating our cost-effectiveness figures, and we’re not entirely confident that the figures themselves are adding substantial information beyond the intuitions we have from examining the details of them.

We have put a great deal of time into our formal cost-effectiveness analysis, and we think our results simultaneously illustrate (a) the value of cost-effectiveness analysis, in terms of highlighting and causing us to examine key assumptions and judgment calls (as well as potentially clarifying what our intuitions imply about which interventions do the most good); (b) the limitations of cost-effectiveness analysis, in that it cannot practically be made robust for many comparisons (including the comparisons in this post), and is only as good as the intuitions and judgment calls that go into it.

Our general philosophy of cost-effectiveness

  • The driving aim of our charity recommendations is to help donors accomplish as much good as possible, on a per-dollar basis.
  • Defining “good” necessarily involves judgment calls, and measuring/estimating it involves more judgment calls. We generally look at cost-effectiveness from multiple angles and publish multiple versions of our estimates, and we encourage donors to do their own critical thinking.
  • We think an appropriate role for our analysis is to help clarify the decisions donors are making, without necessarily quantifying every aspect of the decision. An analogy we sometimes use is that of deciding whether to buy an apple or an orange: ultimately the decision is subjective, but having certain facts – namely, the price of each – can clarify the decision and make it more informed. In this case providing a “cost per apple” and/or “cost per orange” would be helpful, while providing a “cost per unit of food-enjoyment” (with the calculation of “food-enjoyment per fruit” based on guesswork) would likely be less helpful.
  • The less robust a cost-effectiveness estimate, the less weight we should place on the estimate in giving decisions. The conceptual goal of accomplishing the most good per dollar spent does not necessarily entail giving to the charity with the highest explicitly estimated “good accomplished per dollar spent”; there may be factors that are best dealt with outside of the “explicit expected value” framework. (More at our post on the subject and the comments that followed, particularly this comment).
  • We see two potential sources of value in doing explicit cost-effectiveness analysis: (a) finding relatively large and robust differences between different charitable interventions; (b) using cost-effectiveness analysis as a way to ensure that we have carefully thought through the relevant issues.

    Simple cost-effectiveness measures
    One way to start thinking about the relative cost-effectiveness of the three interventions is to think about the “cost per person-year of coverage,” i.e., the cost of serving one person with the intervention for one year.

    • For LLINs, we estimate a cost of $1.36 per “person protected by an LLIN per year.” (This is based on $5.15 per LLIN distributed, with each LLIN covering an average of 1.8 people for an average of 2.22 years; details at the spreadsheet linked from our cost-effectiveness analysis of LLINs, column H).
    • For deworming, we estimate a cost of $0.51 per person treated (details). This $0.51 figure incorporates costs that SCI, our top-rated deworming charity, reported to us. We have not fully investigated all costs associated with deworming programs (as we have for bednets and cash transfers) and believe the cost per person treated could be ~10-25% higher, leading to a cost per person treated of $.51-.64.In high-prevalence areas, deworming is annual, and all of the information we have about the impacts of deworming is based on high-prevalence areas. There is an additional question of what percentage of people treated are children, since the main case for deworming (in our view) is its developmental impacts when applied to children; we have little ability to estimate this figure and generally assume that 50% of those treated are children, leading to a figure of $1.02-$1.28 per child treated.
    • Cash transfers are much more difficult to model on a “per person per year” basis, since they are structured as one-time “wealth transfers.” One way of coming up with a partially informative “per person per year” figure is to look at the cost of a metal roof, which is a commonly reported use of cash transfers. GiveDirectly estimates that such a roof costs about $250 for a household and lasts about 20 years. During our visit to GiveDirectly’s operations in Kenya, recipients reported spending closer to $500 for a roof. At an average household size of 4.7 (from the “Household size analysis” document in our GiveDirectly review) and 20 years per roof, that implies about $2.66-$5.32 in recipient expenditures per “person-year of roof coverage”; at a rate of 90c transferred for each $1 in expenses, that implies about $3-$6 in donor expenses per “person-year of roof coverage.”

    The obvious problem with these figures is that we don’t know how beneficial a year of LLIN coverage is relative to a year of deworming or a year of coverage by a roof. However, estimating such benefits involves a lot more judgment calls than estimating the costs. We think the figures listed above – while inconclusive and unsatisfying – are much less likely to be wildly inaccurate on their own terms than the figures that follow, so we think it’s appropriate to keep them in mind.

    Humanitarian outcomes
    A much more satisfying comparison, but one that is much more difficult to do with precision, is to estimate the “humanitarian value per dollar” created by each intervention. In order to do this, we need a unit of “humanitarian value” that we can standardize on. One such unit is the disability-adjusted life-year (DALY). A similar unit, which we find easier to think about but which is fundamentally similar to the DALY (and relatively easily convertible to it), is the “life saved equivalent.” For some interventions (those with direct impacts on mortality) we can estimate a “cost per life saved”; we can then try to estimate non-mortality-related benefits of interventions, decide how much we value these benefits relative to “lives saved,” and then convert them into “lives saved equivalent.”

    The cost per life saved for LLIN distribution

    Our current best estimate of the “cost per life saved” for LLIN distribution is about $2300 (details at the spreadsheet linked from our cost-effectiveness analysis of LLINs, column H). Some notes on this figure:

    • This figure is only including direct lives saved for children under five.
      • It is also possible that LLINs save adult lives, but figures on adult malaria deaths are disputed and we have not seen studies addressing whether LLINs save adult lives.
      • As discussed previously, we believe the case that LLINs have non-mortality-related “developmental benefits” is somewhat comparable to the case that deworming has such benefits.
      • Finally, LLIN distribution may reduce the burden of malaria, on LLIN users and on the health system, in ways not captured in the above considerations.
    • Our spreadsheet also estimates the cost per life saved under different assumptions about questions like “How long do LLINs last in the field?” and its overall range comes out at ~$1700-$5500.
    • There are several potential major sources of uncertainty in our estimate that are not captured in our spreadsheet, including the possibility of insecticide resistance, the possibility that today’s conditions differ in other ways from the conditions under which the original studies of mortality effects were done, and more. These are listed in our discussion of LLIN cost-effectiveness.

    The cost per “equivalent life saved” for deworming

    We see a lot of uncertainty around the benefits of mass deworming. We have not seen evidence that it directly saves lives, and any such lives saved are likely to be quite rare (not competitive with how often LLINs save lives). However, there is some evidence suggesting that deworming has developmental impacts: that deworming someone in childhood can cause them to earn more money later in life (among other benefits).

    When attempting to estimate a “cost per life saved equivalent” for deworming, one must take a view on multiple difficult-to-quantify factors, all of which are listed in the “assumptions” sheet of our spreadsheet on “cost per life saved equivalent” comparisons:

    • The relative value of saving a life vs. realizing the developmental benefits associated with deworming. The value assigned by the corrected Disease Control Priorities Report to averting “cognitive impairment” is 2.4% of the value it would assign to averting death, but many might consider improving a life to be more than 2.4% as valuable as saving a life. (Assumption 9 on “Assumptions” sheet)
    • The proportion of people dewormed by SCI who are children. (Assumption 2)
    • Whether the benefits of deworming are likely to scale linearly with repeated treatments. The Kenya study on developmental benefitslooks at the benefits of receiving ~2.5 years of additional treatment; it’s possible that people treated by SCI receive fewer or more years of treatment. (SCI generally aims for repeated treatment throughout childhood.) (See Assumption 3; we formalized this issue via a multiplier that captures “how helpful years of treatment in SCI’s program are, relative to the years of treatment in the key study.”)
    • Whether to make a “replicability adjustment” due to the fact that developmental benefits for deworming are not as robustly established as mortality effects for LLINs. (We have discussed this issue previously.) We have sought a reference point for how likely it is that a given study result will hold up under replication, and have seen some potentially relevant figures in John Ioannidis’s analysis of biomedical literature (here and here), though the analogy between biomedical studies and the studies in question has substantial limitations. We have formalized this adjustment as Assumption 4 in the sheet.
    • Whether to make an “external validity” adjustment, to account for the fact that infection prevalence rates were unusually high due to El Nino in the Kenya study on developmental benefits (the study that provides the most helpful information for quantifying the benefits of deworming). We have formalized this adjustment as Assumption 1 in the sheet.
    • How to account for the shorter-term health benefits of deworming, including both rare severe health effects (such as intestinal obstruction due to ascariasis) and subtle general health effects. Given recent developments, we think it is reasonable to consider the subtle general health benefits negligible, but we also provide the option to use our previous estimate of such benefits, which is based on the Disease Control Priorities Report.

    Our spreadsheet on “cost per life saved equivalent” comparisons shows the “cost per life saved equivalent” under different assumptions. We have provided many possible assumptions in the “assumptions” sheet and calculated the “cost per life saved equivalent” for deworming under each possible combination of inputs; it ranges from $31 in the most optimistic scenario to over $2 million in the most pessimistic. It also includes scenarios that represent different GiveWell staff members’ best guesses, discussed further below.

    Comparing LLIN distribution and deworming

    Recall that LLIN distribution may also have developmental benefits. Generally, the more optimistic one is about the case that health in childhood affects quality of life in adulthood, the more optimistic one ought to be about both LLIN distribution and deworming. Personally, I would guess that LLIN distribution has stronger developmental benefits than deworming on a “per-person-per-year” basis (for reasons outlined previously) though one could easily argue either side of this question.

    If one assumes that LLIN distribution and deworming have equal benefits on a “per-person-per-year” basis, and that the proportion of people treated who are children is similar for the two, then this implies that when considering developmental benefits alone, deworming accomplishes about 2.5x as much good per dollar as LLIN distribution (based simply on the “cost per person treated per year” from the previous section).

    One may also try to combine the mortality benefits and the developmental benefits of LLIN distribution under these assumptions, as we do in our spreadsheet on “cost per life saved equivalent” comparisons. Below, three of us – myself, Elie, and Alexander – explain our assumptions and give the resulting cost-effectiveness comparison. We also encourage readers to use the spreadsheet to enter their own assumptions. You can do so by going to the “Master” sheet in our spreadsheet on “cost per life saved equivalent” comparisons, manually editing columns M through V for any row, and watching the output in columns X and Y.

    Elie’s assumptions: $0.56 per person dewormed (assumes that our $0.51 estimate is too low by 10%, based on an underestimate we previously made of AMF’s costs before analyzing more deeply); 50% of deworming goes to children; child-years of deworming by SCI are on average 1/3 as effective as child-years in the key study; developmental benefits should be valued 20% as much as lives saved; the external validity of the key deworming study is 30.25% (based on the change in moderate-heavy infections experienced during the course of the study); 40% chance that the study would hold up under replication; standard (Disease Control Priorities Report-based) minor health benefits of deworming. $1,668 per life saved equivalent for deworming (83% from developmental benefits); $1,564 for nets.

    Holden’s assumptions: $0.51 per person dewormed (uses our current estimate); 50% of deworming goes to children; child-years of deworming by SCI are on average (2.41/10) as effective as child-years in the key study (this is equivalent to assuming that SCI is treating all children throughout childhood, and that benefits do not increase beyond the 2.41 years point); the external validity of the key deworming study is 30.25%; 10 people with developmental benefits are morally equivalent to 1 life saved (this figure comes from regressing my own intuitions about what’s valuable – which put very low value on saving lives relative to improving them – somewhat toward “normality,” since I’m not confident that my intuitions are appropriate on this point); 30% chance that the study would hold up under replication; standard (Disease Control Priorities Report) based minor health benefits of deworming. $3,813 per life saved equivalent for deworming (56% from developmental benefits); $2,004 per life saved equivalent for nets.

    Alexander’s assumptions: $0.51 per person dewormed (uses our current estimate); 50% of deworming goes to children; child-years of deworming by SCI are on average exactly as effective as child-years in the key study (this is based on ignorance about where in the distribution of child-years of deworming additional funds might be spent, and takes into account the upside potential of later years conceivably helping to eliminate schistosomiasis from an area); the external validity of the key deworming study is 30.25%; a 2.4% quasi-disability weight for developmental effects (following the Disease Control Priorities Report, not due to confidence that it is the correct estimate but because it roughly maps to the intuition that a permanent 25% increase in income [derived from Baird et al 2012] for ~40 people starting in adulthood would be roughly as valuable as saving a life of a young person); 50% chance that study would hold up under replication; standard (Disease Control Priorities Report 3% discount rate DALY-based) minor health benefits of deworming (with no minor health benefits for bednets). $2,981 per life saved equivalent for deworming (73% from developmental benefits); $2,027 per life saved equivalent for nets. (Alexander comes up with broadly similar figures because he’s more confident in the representativeness and quality of the deworming studies than Elie or I, but places much less weight on developmental effects than either of us.)

    All of these figures can be converted straightforwardly to DALYs if one prefers these units. The DALYs per “life saved equivalent,” for three different versions of DALYs, are given in column AC of our spreadsheet on “cost per life saved equivalent” comparisons.

    Cost per “equivalent life saved” for cash?

    We have not attempted to use the “equivalent life saved” framework to quantify the benefits of cash transfers, since we know so little about how they are spent and what the likely results are. Instead, we compare cash transfers to the other interventions using a different framework, discussed immediately below.

    Financial returns
    The primary benefit we’ve seen evidence for, regarding deworming, is improved earnings for people dewormed in childhood. We can estimate the net present value of these benefits to arrive at a figure along the lines of “For each $1 spent on deworming, recipients receive total benefits equivalent to $X.” We can also do a modified version of this: estimating $X as a percentage of recipients’ annual income (in the case of deworming, we would guess that the percentage increase in income is a more robust figure across different settings than the dollar increase).

    We can estimate a similar figure for cash transfers, using what we know about the longer-term returns to cash transfers, and compare the two. Finally, we can do a similar calculation for LLIN distribution, assuming that LLINs have similar developmental benefits to deworming (on a per-person-per-year basis), though this calculation excludes other benefits of LLINs such as direct mortality benefits.

    Note: we have updated this section (and the linked spreadsheets) since initially posting to incorporate the short-term health benefits of deworming; across all scenarios we assume that each dewormed person experiences current health benefits of value equivalent to a 0.51% increase in income. This value was chosen because it results in 71% of the total benefits of deworming in our preferred scenarios (and 60% in the full set of considered scenarios) arising from developmental effects, the same as in the “lives saved equivalent” framework discussed above (assuming the short term health effects of deworming from the DCP2).

    Our spreadsheet on financial returns performs this comparison. As in the previous section, many assumptions and judgment calls must be made. Regarding deworming, we have the same questions as in the previous section regarding the reliability and external validity of the studies on developmental benefits. Regarding cash transfers, there are additional questions: how much of the transfers are likely to be invested, and what rate of return are they likely to earn? What discount rate should be used to capture the fact that (a) recipients prefer present consumption to future consumption; (b) donors can invest their money and give later rather than giving today and letting recipients invest it?

    Our spreadsheet on financial returns (XLS) addresses these questions.

    Again, because this calculation is extremely sensitive to small changes in inputs, we provide the inputs for myself, Elie and Alexander. We also encourage readers to use the spreadsheet to enter their own assumptions. You can do so by going to the “Overview” sheet in our spreadsheet on financial returns, manually editing columns J through Q for any row, and watching the output in columns S through Z.

    Elie’s assumptions: 50% of deworming goes to children; child-years of deworming by SCI are on average 1/3 as effective as child-years in the key study; 30.25% adjustment for external validity concerns; cash transfers are 75% invested, returning 10% a year with a 5% discount rate; 40% replicability adjustment for deworming and a 95% replicability adjustment for cash. (Elie assigns a much lower likely rate of return than seen in the studies on cash transfers, and as such does not do another large “replicability adjustment” for this.) This results in deworming being 2.3x as cost-effective as cash, with each dollar spent leading to total benefits equal to 1.25% of annual income for deworming, vs. 0.55% for cash. (In a family with the sort of income reported for GiveDirectly’s clients, this would translate to the equivalent of $2.98 in benefits for every dollar donated for deworming; $1.30 for cash.)

    Holden’s assumptions: I did two different estimates, one representing low skepticism about results found in studies and another representing high skepticism. In both, I assume 50% of deworming goes to children; that child-years of deworming by SCI are on average 50% as effective as child-years in the key study; a 30.25% adjustment for external validity concerns (based on before-and-after rates of heavy-to-moderate worm infections in the key study); and a 75% investment rate (i.e., cash transfers are 75% invested vs. 25% consumed) with a 5% discount rate. However, in one case I give both the cash and deworming studies a 50% chance of holding up under replication, and use a rate of return (25%) that is triangulated from studies; in the other case I use a 5% rate of return on investment (which effectively assumes simply keeping up with the discount rate, i.e., not earning any substantive return on investment) while applying a 30% “replicability adjustment” to the deworming study. These assumptions result in deworming being ~2.9-4.2x as cost-effective as cash. In the less skeptical version, each dollar spent on deworming leads to total benefits equal to 1.93% of annual income, as opposed to 0.66% for cash; in the more skeptical version, each dollar spent on deworming leads to total benefits equal to 1.36% of annual income, as opposed to 0.32% for cash. (In a family with the sort of income reported for GiveDirectly’s clients, this would translate to the equivalent in $4.58 (optimistic)/$3.22 (skeptical) in benefits for every dollar donated for deworming; $1.57 (optimistic)/$0.76 (skeptical) for cash.)

    Alexander’s assumptions are similar to those in my “optimistic” scenario except that he (as described above) assumes that 100% of the child-years of deworming are as effective as the child-years from the key study and takes the average of the reported ranges for cash transfer ROI across the two longer-term studies (54%). These go in opposite directions, leading to the conclusion that each dollar spent on deworming translates to total benefits worth 3.36% of annual income ($7.97 in a family with the sort of income reported for GiveDirectly’s clients) while each dollar spent on cash transfers translates to total benefits worth 1.21% of annual income ($2.87), which implies that deworming is ~2.8x as cost-effective as cash.

    Relative to the ranges of assumptions we consider, our individual estimates come out more positive for cash transfers. The mean across the 6,075 scenarios representing each combination of our assumption set is that each dollar spent on deworming increases annual consumption by 3.37% and each dollar spent on cash transfers increases annual consumption by 0.65%, implying that deworming is ~5x as cost-effective as cash transfers.

    Conclusion
    Our best guess is that bednet distribution is somewhere between one and two times as cost-effective as deworming, while deworming is between 2 and 5 times as cost-effective as cash-transfers. However, we believe that there is ample room for disagreement around these figures, and we plan to write more about the way these figures should be used to guide giving decisions.

Evidence of impact for long-term benefits

We’ve recently published our updated review on the evidence on cash transfers. It elaborates on a claim we’ve made previously – that there is evidence for long-term benefits from cash transfers at high average rates of return.

Some people have expressed skepticism of this evidence, pointing to several limitations: there are not many studies, some of the key data comes from people’s reports of their own spending, and programs studied may not be representative of GiveDirectly‘s program.

We think that some of these limitations are less concerning than they may appear at first glance. More importantly, though, the same limitations broadly apply to the evidence of long-term impact (aside from bednets’ impact on mortality) for our recommended health interventions. The situations are not exactly analogous, but the question of which interventions have stronger vs. weaker evidence for long-term impact (aside from bednets’ impact on mortality) does not have an obvious answer.

We speculate that individual donors instinctively imagine that the evidence around many programs is more robust than it is. (We know that we did so when we first started GiveWell.) If this is the case, we’re glad that our recommendation of a cash transfer program – which many people find intuitively unappealing – has prompted some of our followers to take a closer, more skeptical look at this evidence.

Determining that a given intervention – whether health, cash transfers, or anything else – has long-lasting impacts on quality of life is extremely challenging for a multitude of reasons. It requires researchers to track the same people for a long period of time, to collect accurate and relevant data from these people that can shed light on their quality of life. It requires both funders and researchers to have substantial patience and foresight, making plans 5-10 years in advance (and a lot can change in 5-10 years). As a result, informative data about long-term impacts can be hard to come by, and interpreting such data requires substantial judgment calls.

(An easier form of long-term impact to assess is the impact on mortality. Since most people who make it past their fifth birthday live past age 60, we believe it is relatively safe to equate averting a child’s death to long-term impact. However, different people have different intuitions on how to value averting the death of a child under five vs. improving someone’s long-term quality of life. In addition, while there is strong evidence that bednets avert mortality, the case for deworming rests on life improvement. In this post, we focus on the evidence for long-term life improvement rather than on the evidence for mortality reduction, which is quite robust for bednets.)

Learning about the limitations described in this post has made us (a) more confident that using rigid criteria and definitions of “evidence-based” is the wrong path; (b) more favorably inclined toward interventions that seem to require unusually low burdens of proof (a description that we believe all of our top charities currently fit).

The rest of this post goes into more detail on the limitations to the evidence of long-term impact for cash transfers, and how these compare to the limitations to the evidence for bednet distribution and deworming.

We will discuss the relative cost-effectiveness of cash transfers next week.

Limitation 1: not many studies
The case for the long-term benefits of cash transfers rests largely on one high-quality (randomized) long-term study of conditional cash transfers as well as one high-quality long-term study of unconditional grants to microenterprises. Neither of the programs studied is exactly like GiveDirectly’s program, and both could be have taken place in a substantially different context; this issue is discussed in the following section. This section addresses the simple fact that there are not many studies on the topic.

We have long argued that no one study should be considered a “final word” on the effectiveness of a program, even if the program studied was exactly the same as the program of interest; there are many reasons that a single study might be unreliable, and that its results might fail to hold up upon replication. (For more on this topic, see our discussion of meta-research as well as John Ioannidis’s work on replicability in biomedical research). Thus, having a small number of relevant studies is a significant concern.

However, similar concerns apply in other cases.

The case for the long-term benefits of deworming rests on two studies. One is a high-quality randomized study. The other is a retrospective (non-randomized) examination of a hookworm eradication program in the American South in the early 20th century.

The case for the long-term developmental impacts of insecticide-treated nets includes very little in the way of direct studies: just one retrospective (non-randomized) analysis similar to the second study of deworming mentioned above. However, there are other reasons to be optimistic about the long-term impacts of bednets. One is that bednets have been shown to avert deaths, an impact that can be measured over the short run but has clear long-term significance (though how one ought to value this impact, relative to something like “improved income in adulthood,” is an open question). Another reason is that multiple studies have found substantial short-term health impacts for children under five, and there are studies in a variety of other areas making a case for the connection between under-five health and later-in-life developmental benefits. (We have not written extensively about the latter, though we will do so eventually.)

Even putting aside lives saved, if I had to bet on one intervention to have long-term impacts, I’d bet on nets – though if one rigidly requires top-quality randomized studies of the exact intervention itself, the case for nets is weakest. Regardless, the case for all three interventions is quite limited.

Limitation 2: limited representativeness
The case for the long-term benefits of cash transfers rests on one study of conditional cash transfers (which were made with certain requirements, and which were structured as small recurring transfers while GiveDirectly’s grants are structured as larger one-time transfers) and on one study of grants to microenterprises (which, while made with no strings attached and were made one time only, were targeted specifically at people running microenterprises and were also smaller than GiveDirectly’s transfers).

We don’t believe conditionality is a major issue. In examining the impacts of cash transfers, we have focused on impacts that we feel aren’t plausibly related to the conditions. Data about how people spend their money, and what returns they earn on it, seem unlikely to be driven by the sorts of conditions imposed in these programs, which generally pertain to sending children to school, bringing them in for health checkups, etc. (In fact, we would guess that following conditions would be likely to reduce rather than increase consumption and investment returns for adults, by reducing child labor and/or reallocating time and other resources toward children.) The size and structure of cash transfers may be a major issue, though we would guess (as reasoned in our writeup) that GiveDirectly’s version would be more conducive to higher rates of investment and thus greater long-term returns.

Again, there are similar issues with the evidence for nets and deworming.

  • The key study of deworming was of an annual deworming program in an area with extremely high rates of infection (particularly of schistosomiasis – see our recent post on the matter). Because of the proximity to Lake Victoria and the role El Nino played in the study, we’d guess that most of SCI’s work takes place in areas of much lower infections.Much of SCI’s work involves deworming people less frequently than they were dewormed in the studies (details)
  • The high-quality studies of nets involved unusually intensive programs, with constant replacement and checking up on nets. In some cases, programs were structured quite differently – involving treatment of existing nets (rather than distribution of long-lasting insecticide-treated nets), social marketing of nets (rather than free distribution), etc. As far as we can tell, the conditions of AMF’s distributions are likely to resemble those in the studies in relevant ways (particularly usage of nets), but this isn’t something we have definitively established.

In addition to these differences, there is a broader difference that we feel is quite important: studies took place at very different places and times, and with different populations, which could be particularly significant for economic impacts – the main things we are taking as evidence of long-term impact. This applies to all the studies in question.

Limitation 3: reliance on self-reported data
The studies of cash transfers rely on recipients’ reports of their earnings and/or consumption.

Of the limitations discussed in this post, this is the one we’re least concerned about. We do believe that self-reported data is likely to be highly misleading in certain contexts, such as when it is clear to the person being surveyed what sort of answer the surveyor is hoping for (or when some answers are more socially acceptable than others). We feel this is a valid reason to be skeptical of e.g. GiveDirectly’s data on how people spent their transfers (and GiveDirectly concedes as much). However, the studies of cash transfers take a quite different approach: they randomly assign people to treatment and control groups and perform highly extensive surveys of people in both groups, attempting to quantify consumption and other factors. (To get a sense for how extensive such surveys can be, see GiveDirectly’s survey instrument for its ongoing study; the survey instrument we examined for our reanalysis of the evidence for deworming was similarly extensive.) It generally seems to us to be a consensus among scholars that more complex surveys of this type are more reliable, because it becomes easier to answer straightforwardly than to intuit what sorts of answers are being sought (for example, see our notes on speaking with Richard Cibulskis of the World Malaria Report (DOC)).

The long-term followup on deworming we recently discussed relies on similar survey data (i.e., participants’ self-reports of their earnings). And we also rely on such survey data in estimating the rate at which insecticide-treated nets are used.

Two more considerations regarding the relative strength of the evidence for cash transfers vs. deworming

  • We have recently completed a thorough reanalysis of the main study on deworming, and have not done similar analysis of the studies on returns to cash transfers.

 

 

Revisiting the case for developmental effects of deworming

This post discusses our detailed examination (including, with help from the authors, reanalyzing raw data) of the Miguel and Kremer 2004 study on deworming (treating people for parasite infections) as a way to raise school attendance, and a followup study (Baird et al. 2012) on the later-in-life impacts.

Our current #3 charity, SCI, focuses on deworming. Deworming is quite cheap (we estimate that SCI spends ~$0.50 per person treated, including all costs) but the benefits are not as obvious and tangible as those of many other health programs because the parasites treated cause few deaths and their effects may be subtle. (So much so that a recent Cochrane review fails to find statistically significant impacts of population deworming on the outcomes that have been most studied). In our view, the case that deworming is a “good buy” depends heavily on the idea of developmental effects: the possibility that deworming children has a subtle, lasting impact on their development, and thus on their ability to be productive and successful throughout life.

The main evidence for this idea comes from (a) Bleakley 2004, a study of the Rockefeller Sanitary Commission’s campaign to eradicate hookworm in the American South in the early 20th century; (b) a series of studies in Kenya, in which school deworming was rolled out on a purposefully arbitrary (randomization-like) basis, and children who received more years of deworming were compared to children who had received fewer. This post focuses on the latter.

We have long had questions about these studies, relating both to the possibility of publication bias and to possible ways in which the setting of the studies was unrepresentative. This year, we decided to ask the study authors for the data and code behind their studies, so that we could run additional analyses to gain more information about the seriousness of our concerns. The authors graciously shared their data and code and helped us to interpret it.

The full details of our reanalysis are available here. A big-picture summary of our concerns and findings follows.

Is data-mining a concern?
The concern

Data-mining is a form of publication bias, in which researchers look at many possible analyses of the data they collect, and present only the analyses most favorable to the conclusions they’re hoping to find. We were concerned about this issue for the series of deworming studies because

  • The first study (Miguel and Kremer 2004) drew a great deal of attention for its positive result (regarding the positive impact of deworming on school attendance), raising the possibility that researchers had the incentive to declare positive results from subsequent studies as well.
  • Different studies used different definitions of “treatment group,” and emphasized different outcomes (for example, the initial study emphasized school attendance; the second emphasized height; the third emphasized earnings).

What we found

After performing our own analysis, we are less concerned about this issue than we were previously. Changing our definition of the “treatment group” didn’t have much impact on the findings, and while the authors did share some outcomes with us that had not been reported in the paper, there were not a lot of these and they didn’t significantly change the picture.

That said, there do remain some reasons to be concerned about this issue. The authors of the most recent follow-up study (the one emphasizing earnings) shared not only data but also their funding proposal and survey questionnaire with us, and we note that

  • The survey was extensive, and included a lot of information that wasn’t included in the data that researchers analyzed. We weren’t surprised to see that this was the case (the follow-up study aimed to collect a rich data set, not just data customized for the questions it was asking), but it still raises the possibility that bias could have crept into the process of transforming raw survey answers into analyzable data.
  • The funding proposal expressed an interest in a wide variety of outcomes, including educational attainment, labor market outcomes (measurable in multiple ways, though the authors stated to us that one of the statistically significant positive effects described in the paper, labor market earnings, is the “canonical” measure for the field of labor economics), cognitive performance, happiness, health measures, and more; it did not clearly declare that any particular one was the primary outcome of interest.We attempted some rudimentary analysis of whether this fact could have facilitated spurious findings, and didn’t find strong reason to think that the findings were spurious. However, we struggled with the question of how a study’s findings should be adjusted for the fact that it had a larger universe of multiple outcomes that it could have chosen to emphasize; normal statistical adjustments for multiple comparisons do not perform well in cases with large numbers of ambiguous outcomes.

We feel that the issues above would have been much easier to resolve if the authors of the studies had preregistered their studies, declaring in advance what their primary and secondary outcomes and analyses of interest were. This is not to say that the authors have done anything unusual or wrong; our understanding is that preregistration was extremely rare (perhaps nonexistent) in economics at the time the study was done. In fact, Ted Miguel, one of the authors of the study, has gone on to co-author one of the first studies we’ve seen in this field that does utilize (and discuss) preregistration. Our point here is just that the practice of preregistration carries substantial credence benefits with us as consumers of research, and would affect our qualitative assessment of these findings.

How well would the studies generalize to other settings?
The area in which the deworming experiment was conducted had unusually high infection rates; in fact, infection rates rose substantially over the course of the study, as shown in Tables II and V of Miguel and Kremer 2004:

Measure Year 1 Prevalence Year 2 Prevalence
Moderate to heavy schistosomiasis infection 7% 18%
Moderate to heavy hookworm infection 15% 22%
Moderate to heavy roundworm infection 16% 24%
Moderate to heavy whipworm infection 10% 17%

The above table likely substantially understates the degree of change, because the second-year figure includes the benefits of treatment externalities experienced by the control group (discussed above). A calculation sent to us by the authors implied that the 18% prevalence of moderate to heavy schistosomiasis infection in the control group in year 2 shown above should be augmented by the 22 percentage point externality effect of treatment to get a genuine counterfactual infection rate of 40% – despite the fact that the initial prevalence was (as shown above) only 7%. This implies that without the program, the area would have seen an extreme rise in prevalence of moderate-to-heavy schistosomiasis infections. A footnote in Miguel and Kremer 2004 attributes this phenomenon to “the extraordinary flooding in 1998 associated with the El Niño weather system, which increased exposure to infected fresh water (note the especially large increases in moderate-to-heavy schistosomiasis infections), created moist conditions favorable for geohelminth larvae, and led to the overflow of latrines, incidentally also creating a major outbreak of fecal-borne cholera” (Pg 174).

Because of this unusual situation, we worry that the results of studies from this place and time may not generalize well to other circumstances in which rates are at lower, more typical levels.

What we found

We did a couple of analyses to see whether the headline effects were sensitive to the prevalence of moderate-to-heavy infections, particularly schistosomiasis infections. As we expected, we did see some reason to believe that deworming had had larger impacts in higher- than in lower-prevalence areas, and that it had had larger impacts for schools with substantial schistosomiasis prevalence. That said, what we found was far from sufficient to completely explain away the studies’ findings. In particular, dividing the schools into those that did and didn’t have substantial schistosomiasis prevalence left us working with fairly small sample sizes, from which it is difficult to conclude anything.

Overall, we are moderately less concerned about this issue than we were before.

Other alternative hypotheses
The findings of the studies discussed here – particularly the follow-up showing substantially improved earnings, resulting from just a couple of rounds of additional deworming treatment in childhood – have always struck us as quite counterintuitive and surprising. As such, we’ve been particularly attentive to alternative explanations of the findings.

One possible alternative explanation is that parents/students may have sought to switch into the schools that “won” early deworming treatments, which could cause the treatment and control groups to differ in ways not picked up by the baseline data measured by the studies. Further discussion with the authors made this concern appear far less likely to us (our understanding is that the deworming program was announced very close to the time when students were registered as “treatment” or “control,” and that school transfers after the start of the program were rare and relatively symmetric between schools).

Another possibility that has become more salient to us in the course of analyzing these studies is that efforts to encourage students to attend school in order to receive treatment might have bled over to later days, increasing attendance in treatment schools over the following years. The particular piece of data that led us to examine this possibility is that within schools, there is no statistically significant difference in attendance rates for treated and untreated students (the effects only appear across schools). (The authors assume that this phenomenon occurred due to the presence of within-school externalities.) In the course of analyzing the studies more closely, we learned that treatment dates were announced at the school in advance in an attempt to boost take-up, and that some efforts were undertaken to boost attendance on drug distribution days.

Finally, we wondered whether the results of these studies might be driven by a few “outlier” schools, but after the analysis we’ve done of the raw data, we are now convinced that this is not an issue.

Bottom line
By sharing their data, code, and other materials with us, the authors of these studies helped us to perform analyses that ultimately gave us more confidence in their results. Several of the aspects of these studies that most worried us (particularly regarding publication bias) turned out not to be important to the conclusions.

We still have substantial reservations about the studies. Preregistration would have been an additional measure that could have increased our confidence and lowered our concerns. In fact, this is the single case we’ve seen in which preregistration would have had the most influence on our conclusions. Had the authors preregistered hours worked or income amongst those employed (the key metrics showing improvement in Baird et al. 2012) as their main outcome of interest prior to collecting follow-up data, we would have far more confidence in the validity of the findings.

Going forward, field replications (carrying out similar deworming programs, and similar analysis to see whether similar results are obtained) would – in our view – greatly improve the robustness of the evidence.

In our view, the vast majority of aid interventions have almost no rigorous evidence behind them. A very small set of interventions – including LLIN distribution – have a broad, impressive evidence base. Deworming is somewhere in between. The studies discussed here are rigorous, have highly encouraging findings, held up to the best scrutiny we could bring to them. At the same time, many questions remain unanswered. This is one of the areas in which an additional long-term study would have the most effect on our views.

Conference call discussing our top charities, Thu. Dec 6th, 7pm Eastern

We put a lot of effort into making our research process and reasoning transparent so that anyone can understand and vet the thinking behind our charity recommendations.

Consistent with this, we will be holding a conference call on Thursday, December 6th at 7pm Eastern to discuss our recently updated recommendations. The call is open to anyone who registers via our online form. Staff will take questions by email and answer them over the conference line. If you can’t make this date but would be interested in joining another call at a later date, you can indicate this on the registration form.

If you’re thinking of giving to one of our top charities this year, or you’re just curious about our thinking, we welcome you to join.

We’ve recently held similar discussions with smaller groups of GiveWell supporters. Audio and transcripts from these are available on the conference call page on our website.

Register for the December 6th GiveWell Conference Call