When we started GiveWell, we were very interested in cost-effectiveness estimates: calculations aiming to determine, for example, the “cost per life saved” or “cost per DALY saved” of a charity or program. Over time, we’ve found ourselves putting less weight on these calculations, because we’ve been finding that these estimates tend to be extremely rough (and in some cases badly flawed).
One can react to what we’ve been finding in different ways: one can take it as a sign that we need to invest more in cost-effectiveness estimation (in order to make it more accurate and robust), or one can take it as a sign that we need to invest less in cost-effectiveness estimation (if one believes that estimates are unlikely to become robust enough to take literally and that their limited usefulness can be achieved with less investment). At this point we are tentatively leaning more toward the latter view, this post lays out our thinking on why.
This post does not argue against the conceptual goal of maximizing cost-effectiveness, i.e., achieving the maximal amount of good per dollar donated. We strongly support this conceptual goal; rather, we are arguing that focusing on directly estimating cost-effectiveness is not the best way to maximize cost-effectiveness. We believe there are alternative ways of maximizing cost-effectiveness – in particular, making limited use of cost-effectiveness estimates while focusing on finding high-quality evidence (an approach we have argued for previously and will likely flesh out further in a future post).
In a nutshell, we argue that the best currently available cost-effectiveness estimates – despite having extremely strong teams and funding behind them – have the problematic combination of being extremely simplified (ignoring important but difficult-to-quantify factors), extremely sensitive (small changes in assumptions can lead to huge changes in the figures), and not reality-checked (large flaws can persist unchecked – and unnoticed – for years). We believe it is conceptually difficult to improve on all three of these at once: improving on the first two is likely to require substantially greater complexity, which in turn will worsen the ability of outsiders to understand and reality-check estimates. Given the level of resources that have been invested in creating the problematic estimates we see now, we’re not sure that really reliable estimates can be created using reasonable resources – or, perhaps, at all.
We expand on these points using the case study of deworming, the only DCP2 estimate that we have enough detail on to be able to fully understand and reconstruct.
- Costs: two possible figures for “cost per child treated,” one for generic drugs and one for name-brand drugs. These figures are drawn from a single paper (a literature review published 3 years prior to the publication of the estimate); costs are assumed to scale linearly with the number of children treated, and to be constant regardless of the region.
- Drug effectiveness: for each infection, a single “effectiveness” figure is used, i.e., treatment is assumed to reduce disease burden by a set percentage for a given disease. For each infection, a single paper is used as the source of this “effectiveness” figure.
- Symptoms averted: the prevalence of different symptoms is assumed to be different by region, but the regions are broad (there are 6 total regions). Prevalence figures are taken from a single paper. The severity of each symptom is assumed to be constant regardless of context, using standard disability weights. Effective treatment is presumed to prevent symptoms for exactly one year, with no accounting for externalities, side effects, or long-term effects (in fact, in the original calculation even deaths are assumed to be averted for only one year).
- Putting it all together: the estimate calculates benefits of deworming by estimating the number of children cured of each symptom for a single year (based on the six regional figures re: how common symptoms are), converting to DALYs using its single set of figures on how severe each symptom is, and multiplying by the single drug effectiveness figure. It divides these DALY-denominated benefits into the costs, which are again done using a single per-child figure.
No sensitivity analysis is included to examine how cost-effectiveness would vary if certain figures or assumptions turned out to be off. No adjustments are made to address issues such as (a) the high uncertainty of many of the figures (which has implications for overall cost-effectiveness); (b) the fact that figures are taken from a relatively small number of studies, and are thus likely to be based on unusually well-observed programs.
In our view, any estimate this simple and broad has very limited application when examining a specific charity operating in a specific context.
|Cost per DALY for STH treatment||Key assumptions behind this cost|
|$3.41||original DCP2 calculation|
|$23.92||+corrected disability weight of ascariasis symptoms|
|$256||-corrected disability weight of ascariasis symptoms
+corrected prevalence interpretation for all STHs and symptoms and disability weight of trichuriasis symptoms
|$529||+corrected disability weight of ascariasis symptoms|
|$385||+incorrectly accounting for long-term effects|
|$326||-incorrectly accounting for long-term effects
+corrected duration of trichuriasis symptoms
|$138||+correctly accounting for long-term effects|
|$82.54||Jonah’s independent estimate for, implicitly accounting for long-term effects and using lower drug costs|
Our final corrected version of the DCP2’s estimate varies heavily within regions as well:
|Cost per DALY for STH treatment||Region|
|$77.39||East Asia & Pacific|
|$83.16||Latin America & Caribbean|
|$412.22||Middle East & North Africa|
|$202.69||South Asian Seas|
So why wasn’t the error caught between its 2006 publication (and numerous citations) and our 2011 investigation? We can’t be sure, but we can speculate that
- The DALY metric – while it has the advantage of putting all health benefits in the same units – is unintuitive. We don’t believe it is generally possible to look at a cost-per-DALY figure and compare it with one’s informal knowledge of an intervention’s costs and benefits (though it is more doable when the benefits are concentrated in preventing mortality, which eliminates one of the major issues with interpreting DALYs).
- That means that in order to reality-check an estimate, one needs to look at the details of how it was calculated.
- But looking at the details of how an estimate is calculated is generally a significant undertaking – even for an estimate as simple as this one. It requires a familiarity with the DALY framework and with the computational tools being used (in this case Excel) that a subject matter expert – the sort of person who would be best positioned to catch major problems – wouldn’t necessarily have. And it may require more time than such a subject matter expert will realistically have available.
In most domains, a badly flawed calculation – when used – will eventually produce strange results and be noticed. In aid, by contrast, one can use a completely wrong figure indefinitely without ever finding out. The only mechanism for catching problems is to have a figure that is sufficiently easy to understand that outsiders (i.e., those who didn’t create the calculation) can independently notice what’s off. It appears that the DCP2 estimates do not pass this test.
Our point here isn’t about the apparent lack of formal double-check in the DCP2’s process (though this does affect our view of the DCP2) but about the lack of reality-check in the 5 years since publication – the fact that at no point did anyone notice that the figure seemed off, and investigate its origin.
And the problem pertains to more than “catching errors”; it also pertains to being able to notice when the calculation becomes out of line with (for example) new technologies, new information about the diseases and interventions in question, or local conditions in a specific case. An estimate that can’t be – or simply isn’t – continually re-examined for its overall and local relevance may be “correct,” but its real-world usefulness seems severely limited.
Improving the robustness and precision of the estimates would likely have to mean making them far more complex, which in turn could make it far more difficult for outsiders (including subject matter experts) to make sense of them, adapt them to new information and local conditions, and give helpful feedback.
The resources that have already been invested in these cost-effectiveness estimates are significant. Yet in our view, the estimates are still far too simplified, sensitive, and esoteric to be relied upon. If such a high level of financial and (especially) human-capital investment leaves us this far from having reliable estimates, it may be time to rethink the goal.
All that said – if this sort of analysis were the only way to figure out how to allocate resources for maximal impact, we’d be advocating for more investment in cost-effectiveness analysis and we’d be determined to “get it right.” But in our view, there are other ways of maximizing cost-effectiveness that can work better in this domain – in particular, making limited use of cost-effectiveness estimates while focusing on finding high-quality evidence (an approach we have argued for previously and will likely flesh out further in a future post).
If researchers who did a cost-benefit calculation wanted to make their analysis as transparent as possible, so that it would be accessible to as many people as possible, how transparent could they make it? And how would they do it?
One thought is that they could make a web page which sketches out how the did the calculation. The main page would go through the calculation step-by-step, stating all of their assumptions (both the structural assumptions about how to do the calculation and the quantitative assumptions of all the numbers involved) as concisely and clearly as possible. It would give intermediate calculations/conclusions along the way. Each assumption would be linked to a discussion page where the researchers could say more about their assumption (citing sources and so on), and anyone could comment on the discussion page with questions, criticisms of the assumption, comments about its uncertainty and its impact on the calculation, suggested alternatives, and so on. Each of the quantitative assumptions would be modifiable by the person browsing the web page, so that they could put in their own numbers and the page would automatically recalculate to get their own estimate.
Would something like this be feasible for researchers to make? Would it allow subject matter experts (who lack experience with DALY calculations) to provide reality checks?
I do think the sort of thing you’re describing would be helpful. Whether it would lead to enough reality-checks is an open question.
I recently ran across what strikes me as an example of putting an admirable amount of care into making one’s quantitative work susceptible to reality-checks. The example is a paper on projecting the impact of insecticide-treated nets based on mathematical modeling of malaria transmission. Key quote:
I found the linked spreadsheet to be well-labeled, readable, and usable, even though the model itself is relatively complex (much more complex than the cost-per-DALY estimate discussed in our blog post). I’m also glad to see a willingness to make some sacrifices in technical precision in order to improve accessibility and thus susceptibility to reality-checks.
$3.5 MM seems like a tiny, rather than an excessive, expenditure on cost-effectiveness research in the context of many billions in aid spending. Applying a similar sum to back institutionally independent “red teams” in critiquing existing estimates would probably greatly improve data quality, based on past successes. That money could go further by first critiquing currently top-rated interventions and working its way down.
Likewise, Cochrane Reviews-style assessments of data quality could be funded, and data-quality scores combined with cost-effectiveness point estimates or confidence intervals.
A few points:
1. These don’t strike me as good arguments against doing CE estimates where estimates haven’t been done yet, such as funding medical research, lobbying, and x-risk. CE estimates have been useful, if imperfect, guides to GiveWell and others, and even lower quality estimates could be useful for making first cuts.
2. These don’t seem to be good arguments against GiveWell using its available information to make its own explicit estimates
a. GW can make their estimates transparent
b. GW can use high quality evidence to inform its estimates for various reasons:
c. If you don’t make an explicit estimate and instead rely on gut feelings after looking at data, your estimate is even less transparent
d. GW can fund interventions with potential reality checks
3. It is unclear what proportion of the research thrown at the DCP2 goes into the CE estimates. They wrote long chapters–I’m betting a lot more work went into the chapters than the spreadsheets (at least in the case of deworming). We can really wonder how much of the effort is going into the CE estimates themselves. If a relatively small amount of the resources are going into the actual calculation, then we might be able to get a big gain by funding more CE estimates.
4. This may be a good argument that CE estimates are a lower priority than the kind of work GW does, but given the huge amounts of resources we throw at poverty, the money we spend on CE estimates is quite small. It seems like we could get a lot further if we tried to improve CE estimates.
Note that we are currently putting a good deal of effort into cost-effectiveness estimation as an important input into our upcoming refreshed charity rankings. Because we’re evaluating specific charities, we’re able to make the estimates a bit more context-specific than what the DCP2 provides (though the estimates are still extremely rough).
I wasn’t arguing that we should stop doing any cost-effectiveness analysis, but rather that it may be more promising to continue with back-of-the-envelope estimates – and give them a limited role commensurate with their status as back-of-the-envelope estimates – than to try to get cost-effectiveness analysis “right” and use estimates as the primary factor in decision making.
Carl, I think the measures you propose would largely eliminate the sorts of errors we’ve written about recently, but I don’t see how they would address the dilemma discussed in this post. There would still be the problem of making estimates very simple and sensitive or of making them opaque to subject-matter experts; either way would create substantial uncertainty over the applicability of a generalized cost-effectiveness estimate to a specific context.
STH = soil-transmitted helminth, right?
Vollmer: Yes, STHs are soil-transmitted helminths.
Comments are closed.