The GiveWell Blog

Some considerations against more investment in cost-effectiveness estimates

When we started GiveWell, we were very interested in cost-effectiveness estimates: calculations aiming to determine, for example, the “cost per life saved” or “cost per DALY saved” of a charity or program. Over time, we’ve found ourselves putting less weight on these calculations, because we’ve been finding that these estimates tend to be extremely rough (and in some cases badly flawed).

One can react to what we’ve been finding in different ways: one can take it as a sign that we need to invest more in cost-effectiveness estimation (in order to make it more accurate and robust), or one can take it as a sign that we need to invest less in cost-effectiveness estimation (if one believes that estimates are unlikely to become robust enough to take literally and that their limited usefulness can be achieved with less investment). At this point we are tentatively leaning more toward the latter view, this post lays out our thinking on why.

This post does not argue against the conceptual goal of maximizing cost-effectiveness, i.e., achieving the maximal amount of good per dollar donated. We strongly support this conceptual goal; rather, we are arguing that focusing on directly estimating cost-effectiveness is not the best way to maximize cost-effectiveness. We believe there are alternative ways of maximizing cost-effectiveness – in particular, making limited use of cost-effectiveness estimates while focusing on finding high-quality evidence (an approach we have argued for previously and will likely flesh out further in a future post).

In a nutshell, we argue that the best currently available cost-effectiveness estimates – despite having extremely strong teams and funding behind them – have the problematic combination of being extremely simplified (ignoring important but difficult-to-quantify factors), extremely sensitive (small changes in assumptions can lead to huge changes in the figures), and not reality-checked (large flaws can persist unchecked – and unnoticed – for years). We believe it is conceptually difficult to improve on all three of these at once: improving on the first two is likely to require substantially greater complexity, which in turn will worsen the ability of outsiders to understand and reality-check estimates. Given the level of resources that have been invested in creating the problematic estimates we see now, we’re not sure that really reliable estimates can be created using reasonable resources – or, perhaps, at all.

We expand on these points using the case study of deworming, the only DCP2 estimate that we have enough detail on to be able to fully understand and reconstruct.

Simplicity of the estimate
The estimate is extremely simplified. It consists of

  • Costs: two possible figures for “cost per child treated,” one for generic drugs and one for name-brand drugs. These figures are drawn from a single paper (a literature review published 3 years prior to the publication of the estimate); costs are assumed to scale linearly with the number of children treated, and to be constant regardless of the region.
  • Drug effectiveness: for each infection, a single “effectiveness” figure is used, i.e., treatment is assumed to reduce disease burden by a set percentage for a given disease. For each infection, a single paper is used as the source of this “effectiveness” figure.
  • Symptoms averted: the prevalence of different symptoms is assumed to be different by region, but the regions are broad (there are 6 total regions). Prevalence figures are taken from a single paper. The severity of each symptom is assumed to be constant regardless of context, using standard disability weights. Effective treatment is presumed to prevent symptoms for exactly one year, with no accounting for externalities, side effects, or long-term effects (in fact, in the original calculation even deaths are assumed to be averted for only one year).
  • Putting it all together: the estimate calculates benefits of deworming by estimating the number of children cured of each symptom for a single year (based on the six regional figures re: how common symptoms are), converting to DALYs using its single set of figures on how severe each symptom is, and multiplying by the single drug effectiveness figure. It divides these DALY-denominated benefits into the costs, which are again done using a single per-child figure.

No sensitivity analysis is included to examine how cost-effectiveness would vary if certain figures or assumptions turned out to be off. No adjustments are made to address issues such as (a) the high uncertainty of many of the figures (which has implications for overall cost-effectiveness); (b) the fact that figures are taken from a relatively small number of studies, and are thus likely to be based on unusually well-observed programs.

In our view, any estimate this simple and broad has very limited application when examining a specific charity operating in a specific context.

Sensitivity of the estimate
The estimate is extremely sensitive to changes in inputs. In the course of examining it and trying different approaches to estimating the cost-effectiveness of deworming, we arrived at each of the following figures at one point or another:

Cost per DALY for STH treatment Key assumptions behind this cost
$3.41 original DCP2 calculation
$23.92 +corrected disability weight of ascariasis symptoms
$256 -corrected disability weight of ascariasis symptoms
+corrected prevalence interpretation for all STHs and symptoms and disability weight of trichuriasis symptoms
$529 +corrected disability weight of ascariasis symptoms
$385 +incorrectly accounting for long-term effects
$326 -incorrectly accounting for long-term effects
+corrected duration of trichuriasis symptoms
$138 +correctly accounting for long-term effects
$82.54 Jonah’s independent estimate for, implicitly accounting for long-term effects and using lower drug costs

Our final corrected version of the DCP2’s estimate varies heavily within regions as well:

Cost per DALY for STH treatment Region
$77.39 East Asia & Pacific
$83.16 Latin America & Caribbean
$412.22 Middle East & North Africa
$202.69 South Asian Seas
$259.57 Sub-Saharan Africa

Lack of reality-checks
As we wrote previously, we believe that a helminth expert reviewing this calculation would have noticed the errors that we pointed to. This is because when one examines the details of the (uncorrected) estimate, it becomes clear that nearly all of the benefits of deworming are projected to come from a single symptom of a single disease – a symptom which is, in fact, only believed to be about 1/20 as severe as the calculation implies, and only about 1/100 as common.

So why wasn’t the error caught between its 2006 publication (and numerous citations) and our 2011 investigation? We can’t be sure, but we can speculate that

  • The DALY metric – while it has the advantage of putting all health benefits in the same units – is unintuitive. We don’t believe it is generally possible to look at a cost-per-DALY figure and compare it with one’s informal knowledge of an intervention’s costs and benefits (though it is more doable when the benefits are concentrated in preventing mortality, which eliminates one of the major issues with interpreting DALYs).
  • That means that in order to reality-check an estimate, one needs to look at the details of how it was calculated.
  • But looking at the details of how an estimate is calculated is generally a significant undertaking – even for an estimate as simple as this one. It requires a familiarity with the DALY framework and with the computational tools being used (in this case Excel) that a subject matter expert – the sort of person who would be best positioned to catch major problems – wouldn’t necessarily have. And it may require more time than such a subject matter expert will realistically have available.

In most domains, a badly flawed calculation – when used – will eventually produce strange results and be noticed. In aid, by contrast, one can use a completely wrong figure indefinitely without ever finding out. The only mechanism for catching problems is to have a figure that is sufficiently easy to understand that outsiders (i.e., those who didn’t create the calculation) can independently notice what’s off. It appears that the DCP2 estimates do not pass this test.

Our point here isn’t about the apparent lack of formal double-check in the DCP2’s process (though this does affect our view of the DCP2) but about the lack of reality-check in the 5 years since publication – the fact that at no point did anyone notice that the figure seemed off, and investigate its origin.

And the problem pertains to more than “catching errors”; it also pertains to being able to notice when the calculation becomes out of line with (for example) new technologies, new information about the diseases and interventions in question, or local conditions in a specific case. An estimate that can’t be – or simply isn’t – continually re-examined for its overall and local relevance may be “correct,” but its real-world usefulness seems severely limited.

The dilemma: the less simplified and sensitive, the more esoteric
It currently appears to us that the general structure of these estimates is too simplified and sensitive to be reliable without relatively constant reality-checks from outsiders (particularly subject matter experts), but so complex and esoteric that these reality-checks haven’t been taking place.

Improving the robustness and precision of the estimates would likely have to mean making them far more complex, which in turn could make it far more difficult for outsiders (including subject matter experts) to make sense of them, adapt them to new information and local conditions, and give helpful feedback.

The resources that have already been invested in these cost-effectiveness estimates are significant. Yet in our view, the estimates are still far too simplified, sensitive, and esoteric to be relied upon. If such a high level of financial and (especially) human-capital investment leaves us this far from having reliable estimates, it may be time to rethink the goal.

All that said – if this sort of analysis were the only way to figure out how to allocate resources for maximal impact, we’d be advocating for more investment in cost-effectiveness analysis and we’d be determined to “get it right.” But in our view, there are other ways of maximizing cost-effectiveness that can work better in this domain – in particular, making limited use of cost-effectiveness estimates while focusing on finding high-quality evidence (an approach we have argued for previously and will likely flesh out further in a future post).

GiveWell is aiming to have a new #1 charity by December

Our current top-rated charity is VillageReach. In 2010, we directed over $1.1 million to it, which met its short-term funding needs (i.e., its needs for the next year or so).

VillageReach still has longer-term needs, and in the absence of other giving opportunities that we consider comparable, we’ve continued to feature it as #1 on our website. However, we’ve also been focusing most of our effort this year on identifying and investigating other potential top-rated charities, with the hope that we can refocus attention on an organization with shorter-term needs this December. (In general, the vast bulk of our impact on donations comes in December.) We believe that we will be able to do so. We don’t believe we’ll be able to recommend a giving opportunity as good as giving to VillageReach was last year, but given VillageReach’s lack of short-term (1-year) room for more funding, we do expect to have a different top recommendation by this December.

We haven’t been updating our rankings continuously; we prefer to do very deep investigations of top contenders, and aim for an all-at-once refresh in time for December. This is largely because we’ve continued to raise the bar for what it takes to become a top charity. For example, since we’ve found field visits to be useful, we now have a strong preference to avoid naming a charity “top-rated” before we’ve seen its work on the ground (for this reason, staff is currently split up between Malawi and India, visiting contender charities; we will post notes and pictures after we return and get the content approved by charities we’ve visited). More generally, we are looking to examine a charity from many different angles and have a high level of confidence before we start directing significant funds to it.

Bottom line – by December, we will have a new “top-rated” charity. This is not a “demotion” of VillageReach; rather, it reflects our success in directing enough funding to it to close its short-term gap.

What it takes to evaluate impact

When someone asks me what makes GiveWell different from other third-party charity evaluators, I often answer by listing all the things we’ve done in order to investigate our current top-rated charity, VillageReach.

All in all, we’ve spent hundreds of hours examining VillageReach – yet we still feel very far from being “settled” on the question of how promising its activities are. Like any outstanding opportunity to do good, VillageReach’s work involves large and complex challenges. We’ll never have 100% of the relevant information or 100% certainty on its merits, but because we’ve recommended VillageReach so highly and moved over $1 million to it, it’s important to us that we do the best we can.

It isn’t realistic to do this kind of in-depth investigation for thousands (or even hundreds) of charities. We have to save our resources for the most promising charities if we want to have a reasonable level of confidence in our top recommendations. That means we take shortcuts on less promising charities, and we don’t put in the work it would take to distinguish between “worst,” “bad,” “mediocre” and “decent” groups – we’re laser-focused on the ones that we consider “best.”

Other independent charity evaluators tend to measure themselves by how many charities they rate. They exist largely for donors who already know where they want to give, and want a basic legitimacy check before they finalize the donation. To accommodate this goal, these other evaluators need to be far less thorough and more simplified than we are. That means – in our view – that they have no realistic chance of ever meaningfully rating impact, i.e., the degree to which a charity is succeeding at its mission.

GiveWell isn’t for everyone. Donors looking to check the charity they already want to give to are better off with other resources. But for donors who don’t already have a charity in mind and are looking to maximize their impact, we don’t know of any other group that provides a comparable product.

GiveWell Labs: Our criteria for giving opportunities

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

We’re starting a new initiative, GiveWell Labs, an arm of our research process that will be open to any giving opportunity, no matter what form and what sector.

This post lays out, very broadly, what qualities we are looking for in giving opportunities. Future posts will elaborate on each of these criteria, and we will also discuss how we think these criteria apply to specific areas of philanthropy. Readers will hopefully be left with a strong sense of our beliefs and biases and what we’re looking for.

The main things we’re looking for in a giving opportunity are:

  1. Upside: we’d prefer to fund projects that have the potential to go extremely well. Projects aiming to demonstrate a model that can be scaled up, generate new scientific knowledge that can be used by many others, or put a program in place that eventually becomes self-sustaining independent of philanthropic support all have “upside.” Simply aiming to deliver insecticide-treated nets using established delivery methods does not have much “upside” (though it may score well on many of these other criteria).
  2. High likelihood of success: we’d prefer to fund projects that are very likely to do a respectable amount of good per dollar. The “evidence base” of a project – i.e., the set of past well-understood events that can be used to put its likelihood of success in context – is key here. Obviously this criterion will often be in tension with the “upside” criterion; the ideal for us is a project that has both, i.e., a project that’s both very likely to do some good and has some possibility of doing enormous amounts of good (we think that giving to VillageReach in 2010 fit into this category).
  3. Accountability. We’re OK with funding a project that might fail, but it’s very important to us that we be able to recognize, document, publicly discuss, and learn from such a failure if it happens. We thus have a strong preference to fund projects with specific and meaningful deliverables that will give us a strong sense of whether things are going as hoped (as well as permission to publish updates on these deliverables).We are relatively new to giving and plan to be doing a lot more of it in the future, so making sure that early projects are learning opportunities is crucial.
  4. People we’re confident in. We prefer to fund projects where we are impressed by and confident in the people involved. However, our take on how to evaluate people seems to be different from that of some other funders; we’ll elaborate in a future post.
  5. Room for more funding. We prefer to fund projects that would not happen without our funding. This means that we aren’t actually looking for the “best ways to spend philanthropic funds”; we’re looking for the “best ways to spend philanthropic funds that aren’t already on the agendas of other funders.”

We don’t have an explicit formula for weighing the above criteria above against each other. Broadly speaking, we’d prefer to fund an opportunity that is strong on all of the following: (a) at least one of #1 and #2; (b) at least one of #3 and #4; (c) #5. (Note that we do not feel the approach of estimating ‘expected good accomplished’ for each project, and simply ranking by this metric, is a good way to maximize actual expected good accomplished; for more, see the body and comments of a recent post on expected-value calculations.)

One more consideration is leverage: we prefer projects where our funding mobilizes more funding from other givers as well, thus multiplying the impact of our funds in some sense. However, we think this is far less important than the criteria listed above. We’d rather fund a great project all on our own, and leave other funders to spend on their own projects, than get a 5:1 or 100:1 funding match from others on a project that is weak on the above criteria.

If you think we’re missing any important impact-related criteria, please let us know.

Update on GiveWell’s web traffic / money moved: Q3 2011

In addition to evaluations of other charities, GiveWell publishes substantial evaluation on itself, from the quality of its research to its impact on donations. This year, we have added quarterly updates regarding two key metrics: (a) donations to top charities directly through our website (b) web traffic.

Money moved

By “money moved” we mean donations to our top charities that we can confidently identify as being made on the strength of our recommendation. This update focuses only on “money moved” that comes through GiveWell’s website; we’ll report on all donations due to GiveWell’s research at the end of the year (when the majority of large gifts occur).

While money moved through the website is only a fraction of overall money moved (and is also far greater in December than in other months), we believe this is a meaningful metric for tracking our progress/growth (as opposed to overall influence).

The charts below show dollars donated and the number of donations by month. Overall, growth in 2011 has been strong.


We report annually money moved to each of our recommended charities, but we don’t plan on including this information in quarterly reports because (a) there are some donations that have been made but we can’t yet to attribute to an organization; (b) overall we don’t feel these figures are very meaningful or good predictors of what the year-end allocation will be.

Web traffic

The table below shows quarterly web traffic to GiveWell’s website.

Quarter Visitors Y/Y growth
Q1 2009 20,681
Q2 2009 14,974
Q3 2009 18,418
Q4 2009 45,956
Q1 2010 48,027 132%
Q2 2010 33,173 122%
Q3 2010 27,729 51%
Q4 2010 68,870 50%
Q1 2011 89,588 87%
Q2 2011 102,506 209%
Q3 2011 115,482 316%

The charts below show our web traffic over time, including the latest quarter.


Errors in DCP2 cost-effectiveness estimate for deworming

Two notes on this post:

  • This post discusses flaws in a particular published cost-effectiveness estimate for deworming. It should not be taken as a general argument against deworming as a promising intervention, and it does not address various other publications on deworming including the 2003 paper by Edward Miguel and Michael Kremer.
  • Prior to publication, we sent a draft of this post to several relevant scholars including the authors of the estimate. They have reviewed our work and confirmed the major errors we point out.

Over the past few months, GiveWell has undertaken an in-depth investigation of the cost-effectiveness of deworming, a treatment for parasitic worms that are very common in some parts of the developing world. While our investigation is ongoing, we now believe that one of the key cost-effectiveness estimates for deworming is flawed, and contains several errors that overstate the cost-effectiveness of deworming by a factor of about 100. This finding has implications not just for deworming, but for cost-effectiveness analysis in general: we are now rethinking how we use published cost-effectiveness estimates for which the full calculations and methods are not public.

The cost-effectiveness estimate in question comes from the Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation. This report provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of deworming – contained a crucial typo: the published figure was $3.36-$6.92 per DALY, but the correct figure is $336-$692 per DALY. (This figure appears, correctly, on page 46 of the DCP2.)

We do believe that the corrected DCP2 calculations are too harsh on deworming; our best estimate of the cost-effectiveness of deworming is in between the corrected and uncorrected DCP2 figures, at $30-$80 per DALY. In addition, there are strong arguments for deworming as an excellent intervention that do not depend on these figures. Overall we consider deworming a highly promising (though not the single most promising) intervention; we will be discussing our thoughts on this intervention further in the future. This post focuses not on deworming in general, but on the DCP2 figures and what lessons we should take from the flaws in them.

  • The estimates on deworming are the only DCP2 figures we’ve gotten enough information on to examine in-depth. Getting to this point took a lot of work and communication with a number of different scholars, so we aren’t sure of the extent to which other estimates might also turn out to be flawed if examined closely.
  • We believe that the errors we’ve found in the estimate would have been caught by a helminth expert independently examining the estimate. Therefore, the presence of these errors implies to us that there has been no such examination. If this is the case, it would argue against the reliability of the DCP2’s estimates in general.
  • We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.
  • More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust – no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.
  • It is possible that we have made errors in our corrections to the calculation. One of the reasons we go to great lengths to be transparent is because we want our errors to be caught as quickly as possible.

Outline for the remainder of this post:

About the DCP2’s estimate

The DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation.

The Gates Foundation also appears to have invested substantially in the dissemination of the DCP2’s findings, including a $4.4 million grant to the Population Reference Bureau to “disseminate key messages from [the DCP2].”

The DCP2 aims to estimate the cost-effectiveness of different health interventions, in terms of dollars per disability-adjusted life-year (DALY) saved, in order to prioritize the most cost-effective interventions–the ones that will have the largest effects in reducing mortality and morbidity for a given amount of funding. The DCP2’s published estimates imply that soil-transmitted helminth (STH) treatment is one of the cheapest ways to improve health: the same “amount of health” could be provided by spending $1 on STH deworming or roughly $34 on family planning programs or more than $90 on treating drug-resistant tuberculosis. In fact, it appears that the DCP2 rates STH treatment as the second most cost-effective health intervention of all, behind only hygiene promotion (p. 41).

The DCP2’s cost-effectiveness estimates for deworming have been cited widely to advocate a greater focus on treating STH infections, including in:

  • an article (PDF) in The Lancet
  • a report (PDF) by REACH, a consortium of large international NGOs and other organizations working to end child hunger, which labeled deworming one of 11 “promoted interventions”
  • the most-cited paper (PDF) published in the journal International Health
  • an editorial by Peter Hotez, a co-founder of the Global Network for Neglected Tropical Diseases, which has received more than $40 million in funding from the Gates Foundation
  • work by charity evaluators, such as GiveWell, Giving What We Can, and the University of Pennsylvania’s Center for High Impact Philanthropy.

Why we decided to look into the DCP2’s deworming estimates

We undertook this research because:

  • We wanted to do a case study of a cost-effectiveness estimate from the DCP2, understanding the full details of what goes into it and where the room for error is.
  • We were particularly curious about the estimate for treatment of soil-transmitted helminths since the published $3.41 per DALY averted figure didn’t seem to sync with what we knew about the costs and effectiveness of STH treatment (or the independent estimate of $280/DALY given by another study, as we’ve mentioned previously).
  • We also wanted to focus on STH treatment since the DCP2 rates it as the second most cost-effective health intervention of all, behind only hygiene promotion.
  • Finally, we wanted to learn more about deworming after Elie visited the Schistosomiasis Control Initiative in London and we became more optimistic about this organization than we had been.

Our process for investigating the estimate

GiveWell took the following steps to investigate the DCP2’s estimate for the cost effectiveness of STH deworming:

  • We initially contacted Peter Hotez, the lead author of the DCP2 chapter on intestinal nematode infections; he sent us several papers on the costs and effectiveness of deworming and referred us to another scholar to explain the calculation that the DCP2 had published.
  • This scholar, in turn, referred us to two more, who sent us further references in response to our questions.
  • At this point we had an extended back-and-forth trying to understand the details of the calculation that had been done, and since we weren’t sure we would reach a conclusion on this, we asked volunteer Jonah Sinick to use all the references we’d been sent to create his own best guess estimate for the cost-effectiveness estimate of deworming. This estimate implied a significantly higher cost per DALY than the published figure, which seemed strange since we were now using the references and inputs suggested to us by the chapter authors.
  • The scholars we had been corresponding with sent us a spreadsheet with the full details of the calculation, as well as an accompanying table, which we will call Table 9, that had been used to input some of the figures in the spreadsheet. Here is the PDF of Table 9 that we were sent.
  • However, the interpretation of the numbers from Table 9 was still unclear to us. Table 9 is not clearly labeled; the scholars involved in the calculation appeared to have conflicting interpretations of what the numbers meant, and both meanings were highly counterintuitive to us (details below).
  • So we contacted another scholar who had worked on Table 9 to get her help in interpreting it. She sent us the full paper from which Table 9 was taken, Intestinal Nematode Infections, and this paper appeared to have a different interpretation of Table 9 than the spreadsheet’s. We confirmed this with her.
  • We also found the disability weights being used counterintuitive, and after some investigation we received confirmation that they were erroneous (details below).
  • All in all, we found five errors in the estimate, not all of which were attributable to the creator of the spreadsheet.

Problems with the official estimate of the cost-effectiveness of deworming

The basic approach of the estimate is to:

  • Calculate the benefits of deworming by
    • Starting from a population of schoolchildren being dewormed;
    • Estimating the percentage of these children suffering from different symptoms of infection;
    • Using the above, estimate the number of children cured of these symptoms (the estimate assumes that they are cured for exactly one year, since reinfection can occur after deworming)
    • Incorporating the severity of symptoms to arrive at DALYs saved by the deworming
  • Separately calculate the costs of deworming this population of schoolchildren, and divide costs by DALYs to obtain the cost per DALY.

When we examined the details of the official estimate, it struck us that nearly all of the DALYs saved (i.e., nearly all of the benefit) were coming from the reduction of a single symptom of a single worm infection: cognitive impairment due to ascariasis (we abbreviate this as CIDTA). Specifically, the figures going into the estimate implied that:

  • In a hypothetical population of 208,530 children (age 5-14 in Latin America) treated, 45,060 suffer from CIDTA. (Cells C44 and L44 in “ascariasis” sheet). That’s about 22%.
  • The disability weight of CIDTA is 0.463 (cell E8). While these figures are difficult to interpret, this implies that having CIDTA is about half as bad as being dead (disability weight 1.0), and only slightly less debilitating than being blind (disability weight 0.6). (See the official list of disability weights published alongside the DCP2.) These figures implied (to us) that CIDTA was not a matter of subtle cognitive impairment, but of mental handicap so severe as to truly prevent normal functioning.
  • The intervention in question – a single dose of albendazole – could completely restore normal mental functioning (i.e., completely eliminate disability associated with CIDTA) for one year.

These implications didn’t sync with the information we had from other sources, such as the Global Burden of Disease (GBD) report published alongside the DCP2.

  • If ascariasis caused this sort of symptom, we’d expect to see much more focus on ascariasis (relative to other helminth infections) in the global health and deworming communities.
  • In addition (as we observed when trying to reconcile the official estimate with our own estimate), if 22% of the 110 million 5-14 year olds in Latin America (GBD, 198-199) had a disability with weight 0.463, then this – alone – would result in 11.2 million DALYs lost to ascariasis per year in this region (22% * 110 million * 0.463). However, the official DALY burden for this ascariasis (all symptoms) among this population is only 31,000 (GBD, 198-199) – in fact, the worldwide DALY burden for ascariasis is only 915,000 (GBD, 180-181).

We therefore did further investigation on the CIDTA symptom – both how prevalent it is and how severe it is. It turns out that the official calculation significantly overstates both. For example, among 5-14 year olds in Latin America, CIDTA affects about 0.23% of the population – not 22.6% as the official calculation suggests – and its correct disability weight is 0.024 (the same severity as anemia), not 0.463.

Specifics of these errors:

  • Prevalence of CIDTA. The official calculation starts from a hypothetical population of 1 million people of all ages, then calculates the number of 5-14 year olds (per million people) using demographic data, then takes the number of CIDTA cases directly from Table 9 (this figure is multiplied by 10 before being put in the official spreadsheet). For example, for 5-14 year olds in Latin America, Table 9’s “A/B” column has the figure, “4506”; the official calculation records “45060” for the number of CIDTA cases among 5-14 year olds.

    The labeling of Table 9 is ambiguous and doesn’t make it clear whether this is the intended meaning of the figures. We contacted one of the original authors who wrote the paper from which Table 9 is taken, received a copy of the (unpublished) paper from her, discussed it with her, and found that this figure’s intended interpretation is different from the official calculations, in two ways:

    • The figure in the “A/B” column refers number of people at risk for a given symptom, not the number of people suffering from that symptom. These are equivalent for Type A and Type C symptoms, but not for Type B symptoms including CIDTA. Intestinal Nematode Infections (PDF), the working paper that contains Table 9, says that “in any annual cohort of heavily infected children some 5% suffer [Type B symptoms, which are the only symptoms that have life-long effects]” (p. 26). Using the figures as the official calculation did would therefore lead to a 20x overstatement in the prevalence of CIDTA.

      This mistake applies not just to cognitive impairment due to ascariasis, but also to cognitive impairment due to trichuriasis and hookworms, similarly leading to a 20x overstatement of the prevalence of cognitive impairment due to those infections as well.

    • The figures in Table 9 refer to the number of children at risk, per 100,000 children of the age group indicated in the row. For 5-14 year olds in Latin America, the figure (for symptoms “A/B”) is “4506”; this means that 4506 out of 100,000 5-14 year olds are at risk for CIDTA. This in turn means that 45060 of every million 5-14 year olds are at risk. However, the official calculation assumes 45060 cases not for one million 5-14 year olds, but for only 208,530 5-14 year olds (which is the number of 5-14 year olds one would expect in a population of 1 million people across the three age groups). Thus, this difference results in overstating the prevalence of CIDTA by about 5x.

      This mistake applies to each of the symptoms of all three soil-transmitted helminths, not just to CIDTA, and therefore leads to an overstate of the prevalence of every symptom of STHs by about 5x.

    Bottom line – the correct interpretation of Table 9 (for 5-14 year olds in Latin America) is that 45060 out of every million 5-14 year olds are at risk for CIDTA, and 5% of these actually have it – so 2253 out of every million 5-14 year olds have CIDTA. The official calculation assumes that in a population of 208,530 5-14 year olds, 45060 have CIDTA. The same types of errors apply to the other regions and conditions as well.

  • Severity of CIDTA. The disability weight of 0.463 is correctly transcribed from the Global Burden of Disease official disability weights, which in turn takes the figure from the earlier 1996 edition (which we examined in a library). However, we still found this figure odd because of the contrast with the other two kinds of helminth infections:
    Helminth type Symptom A – disability weight Symptom A – description Symptom B – disability weight Symptom B – description Symptom C – disability weight Symptom C – description
    Ascariasis 0.006 Reduction in cognitive ability in school-age children, which occurs only while infection persists 0.463 Delayed psychomotor development and impaired performance in language skills, motor skills, and coordination equivalent to a 5- to 10-point deficit in IQ 0.024 Blockage of the intestines due to worm mass
    Trichuriasis 0.006 Reduction in cognitive ability in school-age children, which occurs only while infection persists 0.024 Delayed psychomotor development and impaired performance in language skills, motor skills, and coordination equivalent to a 5- to 10-point deficit in IQ 0.114-0.138 Rectal prolapse and/or tenesmus and/or bloody mucoid stools due to carpeting of intestinal mucosa by worms
    Hookworm NA NA 0.024 Delayed psychomotor development and impaired performance in language skills, motor skills, and coordination equivalent to a 5- to 10-point deficit in IQ 0.024 Anemia due to hookworm infection

    It looked to us as though the weights may have been switched, in the case of ascariasis, for symptoms B and C. We contacted Colin Mathers, the second-listed author on the Global Burden of Disease publication, and he confirmed to us that the weights are in fact switched, stating, “We also noticed this and corrected it in the spreadsheets for WHO estimates, but possibly it has remained uncorrected in some of the summary tables of weights.” Thus, CIDTA’s correct disability weight is 0.024, but the published disability weight in both editions of the GBD – and the weight used in the official cost-effectiveness calculation – is 0.463.

We created a version of the official calculation that corrected for the above errors, as well as two other errors that we found in the process of checking the calculation as thoroughly as we could. (See Footnote 1 below.) Our version is here (XLS).

This calculation leads to a revised cost-effectiveness estimate of $326.43 per DALY, rather than the $3.41 per DALY in the original.

The DCP cost-effectiveness estimates only took into account short term effects of the three diseases, even though they have some long term effects. This seems to have been an intentional decision rather than an error, but our feeling is that a best estimate of the true cost-effectiveness of deworming would likely take these long-term effects into account. We therefore created another version of the estimate that does so, as best as we can. (See Footnote 2 below.) Taking these long-term effects into account, our cost-effectiveness estimate for STH treatment moves to $138.28 per DALY.

These corrections also have implications for the cost-effectiveness estimate for combination deworming (simultaneously addressing both STH and schistosomiasis, another type of infection). The DCP2 reports a cost-effectiveness estimate of $8-$19/DALY averted for combined treatment, depending on whether generic or brand-name drugs are used for schistosomiasis treatment. Using our overall best guess for the revised DCP2 estimate for STH of $138.28/DALY and the DCP2’s estimate for generic schistosomiasis drugs of $336/DALY (note that this is incorrectly presented as “$3.36/DALY” on page 476, but the correct figure – without the erroneous decimal point – appears on page 46), we estimate the cost-effectiveness of a combined program, according to the DCP2, as $177/DALY. Ignoring the long-term effects of STH treatment, as the DCP2 does, changes that figure to $272/DALY.

In our first email to the author of the spreadsheet, we had only caught the first four of the five errors mentioned above, and made substantial mistakes in our attempts to take long-term effects into account. It was only when we checked the figures later that we noticed both of these mistakes. Mistakes are easy to make in this type of situation (for an interesting study on spreadsheet mistakes, see here). Transparency is the best way we can think of to avoid such mistakes. Now that we’ve published the spreadsheets, we look forward to hearing about any other mistakes you find – in the original or ours.

Our independent estimate of the cost-effectiveness of STH treatment

At the same time we were working through the DCP cost-effectiveness estimate for STH deworming, Jonah Sinick, a GiveWell volunteer, was working on an independent set of cost-effectiveness estimates for deworming, separately for both STH and a second type of worm-based disease, schistosomiasis. His report on the results is now available here. His bottom-line best guess for the cost-effectiveness of STH deworming is $82.54/DALY. Jonah’s calculation implicitly takes long-term effects into account, as we do in our more optimistic version of the calculation (the one that comes to $138.28 per DALY). Most of the discrepancy between Jonah’s $82.54/DALY figure and our $138.28 figure can be explained by the DCP’s use of a much higher cost-per-child treated ($0.225 vs. $0.085), though Jonah also finds different levels of disease burden and treatment effectiveness. (See footnote 3 below.)

Jonah also found more promising results for schistosomiasis treatment, another form of deworming that (as mentioned above) can be combined with STH treatment. His estimate ranges from $28.19-$70.48/DALY for schistosomiasis deworming. This is much more optimistic than the DCP’s estimate of $336-$692/DALY because Jonah finds, following the current consensus in the literature, a much higher disability weight for schistosomiasis than the DCP used (0.02-0.05 vs. 0.005-0.006). The DCP’s higher cost-effectiveness estimate also assumes using much more expensive brand-name drugs, while the lower estimate, like Jonah’s, assumes generics.

Conservatively combining Jonah’s estimates for the cost-effectiveness of schistosomiasis and STH deworming (by assuming that no delivery costs are saved), we reach an estimate of $32-72/DALY, depending on the disability weight of schistosomiasis. More liberally assuming that a combined program would eliminate delivery costs equal to half the per-child cost of STH treatment, Jonah’s estimate of the cost-effectiveness of a combined program ranges from $29/DALY to $66/DALY, depending on the disability weight of schistosomiasis.

Implications for donors interested in deworming

These estimates are only a small part of the picture, in our view, regarding how promising deworming is as an intervention. We will be writing more about this in the future.

However, we think it is important to note that the DCP2’s original published figures implied that deworming is among the most cost-effective interventions listed in the publication; with errors corrected, it appears comparable to treating drug-resistant tuberculosis; taking into account long-term effects, it seems comparable to providing family planning services. Neither of those interventions are traditionally considered especially cost-effective. (Note that that according to the DCP2’s original estimate, STH deworming is 30-100X more cost-effective than those interventions.)

Whether or not the long-term effects are taken into account, the corrected DCP2 estimate of STH treatment falls outside of the $100/DALY range that the World Bank initially labeled as highly cost-effective (see page 36 of the DCP2.) With the corrections, a variety of interventions, including vaccinations and insecticide-treated bednets, become substantially more cost-effective than deworming.

The more important takeaway, for us, concerns the DCP2’s cost-effectiveness estimates in general. We believe that the errors we’ve found in the estimate – described above – would have been caught by a helminth expert independently examining the estimate. Therefore, the presence of these errors implies to us that there has been no such examination. If this is the case, it would argue against the reliability of the DCP2’s estimates in general. We have not done similar investigations of other DCP2 estimates, and given the process it took to get the details of this one, we are not planning to do many more until and unless the details of estimates become available publicly.

Our takeaways

  • We’re now much more hesitant to place any weight on DCP2 cost-effectiveness figures except where we can fully understand and check the calculations.
  • More generally, we feel this case illustrates how opaque, formal calculations can obscure important information and demonstrate high sensitivity to minor errors. We see this as support for our position that formalized cost-effectiveness analysis can do more harm than good in trying to maximize actual cost-effectiveness.
  • Explicit cost-effectiveness estimates will continue to play a relatively small role in our decisions between top charities, though we will still use them in deciding which charities are potential top candidates.
  • We’re continuing to investigate deworming as a promising intervention, but one of the most encouraging figures widely cited in its favor appears deeply flawed.
  • Transparency is crucial. Had the scholars we discussed these issues with been less willing to engage with us, or had we been unable to find Intestinal Nematode Infections or the spreadsheet, these substantial errors would not have come to light.

Footnote 1: The other two problems we found in the calculation both have to do with the burden of trichuriasis:

  • The spreadsheet swaps the disability weights for Type B and C symptoms of trichuriasis. In the Global Burden of Disease and Risk Factors (GBD) 1990, which the spreadsheet cites, the Type B symptom of trichuriasis is cognitive impairment, which has a disability weight of 0.024, while the Type C symptom is massive dysentery syndrome, with disability weights ranging from 0.116 to 0.138. In the ‘trichuriasis’ sheet of the spreadsheet, Type B morbidity has disability weights ranging from 0.116 to 0.138 while Type C morbidity has the lower disability weight of 0.024. In the original calculation, this leads to an overestimate of the burden of trichuriasis by nearly 4x, but once the main errors described above are corrected, correcting this error actually makes STH treatment appear more cost-effective.
  • The spreadsheet uses a duration of .05 years for trichuriasis symptom Type C, while Intestinal Nematode Infections suggests that the duration for trichuriasis symptom Type C should be 12 months (pg. 24). This mistake likely occurred because the duration for ascariasis symptom Type C is .05 years.

In the corrected spreadsheet, sheets ‘a.3’, ‘t.5’, and ‘h.3’ contain our corrections to all five of the issues we have identified (for ascariasis, trichuriasis, and hookworm respectively). Most of the corrections should be fairly self-explanatory, but please don’t hesitate to email us or comment here if you have questions. We corrected the second main error above by changing the population of 5-14 year olds treated to 1,000,000 (see, e.g., sheet ‘a.3’ cell C23).

Footnote 2: The Type B symptom of all three diseases treated by STH deworming is called “cognitive impairment,” has a disability weight of 0.024, and lasts a lifetime once it develops. Intestinal Nematode Infections implies that 3% of the population at risk for symptom B (that is, 3% of the population listed in the A/B columns in Table 9) newly acquires a lifelong disability each year (pg. 26). We therefore altered the calculation to reflect lifelong (not just 1-year) benefits for these 3% (replacing the 5% listed in #2 above because that 5% is the total proportion infected during a given year, not the total proportion newly infected). At the same time, we also changed DALYs saved due to prevented mortality to compound to the end of life, rather than just counting the one year of life saved during the treatment. (This, arguably, is an actual error in the DCP2 process, not just a disagreement about how to take long term effects into account. When an intervention prevents someone from dying, it does not seem reasonable to count just one extra year of life saved.)

Footnote 3: We also looked into the possibility that the disability weights for helminth infections are “too low,” as implied by a passage in the DCP2:

The Disease Control Priorities Project helminth working group has determined that the WHO global burden of disease estimates are low because they do not incorporate the full clinical spectrum of helminth-associated morbidity and chronic disability, including anemia, chronic pain, diarrhea, exercise intolerance, and undernutrition (King, Dickman, and Tisch 2005). (DCP2, pg. 471)

Based on our review of the literature and correspondence with relevant scholars, we believe this argument has never been raised specifically in respect to STHs; most of the papers about it are about schistosomiasis, another type of worm infection. There is one paper (Chan 1997) that appears to imply a higher disability burden for STHs than the standard burden, which gives rise to Jonah’s more optimistic STH cost-effectiveness estimate of $11.25/DALY. We think the data from that paper is no longer credible: it appears to have been based on a lower worm threshold for experiencing morbidity than further research has found appropriate (Brooker 2010). Furthermore, the cited source of the relevant data is a working paper, the published version of which does not contain the data cited.