Cost-effectiveness estimates: inside the sausage factory

We’ve long had mixed feelings about cost-effectiveness estimates of charitable programs, i.e., attempts to figure out “how much good is accomplished per dollar donated.”

The advantages of these estimates are obvious. If you can calculate that program A can help much more people – with the same funds, and in the same terms – than program B, that creates a strong case (arguably even a moral imperative) for funding program A over program B. The problem is that by the time you get the impact of two different programs into comparable “per-dollar” terms, you’ve often made so many approximations, simplifications and assumptions that a comparison isn’t much more meaningful than a roll of the dice. In such cases, we believe there are almost always better ways to decide between charities.

This post focuses on the drawbacks of cost-effectiveness estimates. I’m going to go through the details of what we know about one of the best-known, most often-cited cost-effectiveness figures there is: the cost per disability-adjusted life-year (DALY) for deworming schoolchildren. This figure uses the disability-adjusted life-year (DALY) metric, probably the single most widely cited and accepted “standardized” measure of social impact within the unusually quantifiable area of health.

Note that various versions of this figure:

  • Occupy the “top spot” in the Disease Control Priorities Report‘s chart of “Cost-effectiveness of Interventions Related to Low-Burden Diseases” (see page 42 of the full report). (I’ll refer to this report as “DCP” for the rest of this post.)
  • Are featured in a policy briefcase by the Poverty Action Lab (which we are fans of), calling deworming a “best buy for education and health.”
  • Appear to be the primary factor in the decision by Giving What We Can
    (a group that promotes both more generous and more intelligent giving) to designate deworming-related interventions as its top priority (see the conclusion of its report on neglected tropical diseases), and charities focused on these interventions as its two top-tier charities.

I don’t feel that all the above uses of this figure are necessarily inappropriate (details in the conclusion of this post). But I do feel that they point to the worthiness of inspecting this figure closely, and it is important to be aware of the following issues.

  1. The estimate is likely based on successful, thoroughly observed programs and may not be representative of what one would expect from an “average” deworming program.
  2. The estimate appears to rely on an assumption of continued successful treatment over time, an assumption which could easily be problematic in certain cases.
  3. A major input into the estimate is the prevalence of worm infections. In general, prevalence data is itself is the product of yet more estimations and approximations.
  4. Many factors in cost-effectiveness, positive and negative, appear to be ignored in the estimate simply because they cannot be quantified.
  5. Different estimates of the same program’s cost-effectiveness appear to strongly contradict each other.

Details follow.

Issue 1: the estimate is likely based on successful, thoroughly observed programs.

The Poverty Action Lab estimate of $5 per DALY is based on a 2003 study by Miguel and Kremer of a randomized controlled trial in Kenya. As the subject of an unusually rigorous evaluation, this program likely had an unusual amount of scrutiny throughout (and may also have been picked in the first place partly for its likelihood of succeeding). In addition, this program was carried out by a partnership between the Kenyan government and a nonprofit, ICS (pg 165), that has figured prominently in numerous past evaluations (for example, see this 2003 review of rigorous studies on education interventions).

In this sense, it seems reasonable to view its results as “high-end/optimistic” rather than “representative of what would one expect on average from a large-scale government rollout.”

Note also that the program included a significant educational component (169). The quality of hygiene education, in particular, might be much higher in a closely supervised experiment than in a large-scale rollout.

It is less clear whether the same issue applies to the DCP estimate, because the details and sources for the estimate are not disclosed (see box on page 476). However,

  • The other studies referenced throughout the chapter appear to be additional “micro-level” evaluations – i.e., carefully controlled and studied programs – as opposed to large-scale government-operated programs.
  • The DCP’s cost-effectiveness estimate for combination deworming (the program most closely resembling the program discussed in Miguel & Kremer) is very close to the Miguel & Kremer estimate of $5 per DALY. (There is some ambiguity on this point – more on this under Issue 5 below.)

Issue 2: the estimate appears to rely on an assumption of continued successful treatment over time, an assumption which could easily be problematic in certain cases.

Miguel & Kremer states:

single-dose oral therapies can kill the worms, reducing … infections by 99 percent … Reinfection is rapid, however, with worm burden often returning to eighty percent or more of its original level within a year … and hence geohelminth drugs must be taken every six months and schistosomiasis drugs must be taken annually. (pg 161)

Miguel & Kremer emphasizes the importance of externalities (i.e., the fact that eliminating some infections slows the overall transmission rate) in cost-effectiveness (pg 204), and it therefore seems important to ask whether the “$5 per DALY” estimate is made (a) assuming that periodic treatment will be sustained over time; (b) assuming that it won’t be.

Miguel & Kremer doesn’t explicitly spell out the answer, but it seems fairly clear that (a) is in fact the assumption. The study states that the program averted 649 DALYs (pg 204) over two years (pg 165), of which 99% could be attributed to aversion of moderate-to-heavy schistosomiasis infections (pg 204). Such infections have a disability weight of 0.006 per year, so this is presumably equivalent to averting over 100,000 years ((649*99%)/0.006) of schistosomiasis infection – even though well under 10,000 children were even loosely included in the project (including control groups and including pupils near but not included in the program – see pg 167). Even if a higher-than-standard disability weight was used, it seems fairly clear that many years of “averted infection” were assumed per child.

In my view, this is the right assumption to make in creating the cost-effectiveness estimate … as long as the estimate is used appropriately, i.e., as an estimate of how cost-effective a deworming program would be if carried out in an near-ideal way, including a sustained commitment over time.

However, it must be noted that sustaining a program over time is far from a given, especially for organizations hoping for substantial and increasing government buy-in over time. As we will discuss in a future post, one of the major deworming organizations appears to have aimed to pass its activities to the government, with unclear/possibly mixed results. And as we have discussed before, there are vivid examples of excellent, demonstrably effective projects failing to achieve sustainability in the past.

Does the DCP’s version of the estimate make a similar assumption? Again, we do not have the details of the estimate, but the DCP chapter – like the Miguel & Kremer paper – stresses the importance of “Regular chemotherapy at regular intervals” (pg 472).

One more concern along these lines: even if a program is sustained over time, there may be “diminished efficacy with frequent and repeated use … possibly because of anthelmintic resistance” (pg 472).

Extrapolation from a short-term trial to long-term effects is probably necessary to produce an estimate, but it further increases the uncertainty.

Issue 3: cost-effectiveness appears to rely on disease incidence/prevalence data that itself is the product of yet more estimations and approximations.

The Miguel & Kremer study took place in an area with extremely high rates of infection: 80% prevalence of schistosomiasis (where schistosomiasis treatment was applied), and 40-80% prevalence of three other infections (see pg 168). The DCP emphasizes the importance of carrying out the intervention in high-prevalence areas (for example, see the box on page 476). Presumably, the program should be carried out in as high-prevalence areas as possible for maximum cost-effectiveness.

The problem is that prevalence data may not be easy to come by. The Global Burden of Disease report reports using a variety of elaborate methods to estimate prevalence, using “environmental data derived from satellite remote sensing” as well as mathematical modeling (see pg 80). Though I don’t have a source for this statement, I recall either a conversation or a paper making a fairly strong case that data on neglected tropical diseases is particularly spotty and unreliable, likely because it is harder to measure morbidity than mortality (the latter can be collected from death records; the former requires more involved examinations and/or judgment calls and/or estimates).

Issue 4: many factors in cost-effectiveness appear to be ignored in the estimate simply because they cannot be quantified.

Both positive and negative factors have likely been ignored in the estimate, including:

  • Possible negative health effects of the deworming drugs themselves (DCP pg 479). (Negative impact on cost-effectiveness)
  • Possible development of resistance to the drugs, and thus diminishing efficacy, over time (mentioned above). (Negative impact on cost-effectiveness)
  • Possible interactions between worm infections and other diseases including HIV/AIDS (DCP pg 479), which may increase the cost-effectiveness of deworming. (Positive impact on cost-effectiveness)
  • The question of whether improving some people’s health leads them to contribute back to their families, communities, etc. and improve others’ lives. This question applies to any health intervention, but not necessarily to the same degree, since different programs affect different types of people. From what I’ve seen, there is very little available basis for making any sorts of estimates of such differences.

Issue 5: different estimates of the same program’s cost-effectiveness appear to strongly contradict each other.

The DCP’s summary of cost-effectiveness alone (box on pg 476) raises considerable confusion:

the cost per DALY averted is estimated at US $3.41 for STH infections [the type of infection treated with albendazole] … The estimate of cost per DALY is higher for schistosomiasis relative to STH infections because of higher drug costs and lower disability weights … the cost per DALY averted ranges from US$3.36 to US$6.92. However, in combination, treatment with both albendazole and PZQ proves to be extremely cost-effective, in the range of US$8 to US$19 per DALY averted.

The language seems to strongly imply that the combination program is more effective than treating schistosomiasis alone, but the numbers given imply the opposite. Our guess is actually that the numbers are inadvertently switched. To one taking the numbers too literally, the expected “cost-effectiveness” of a donation could be off by a factor of 2-5 depending on this question of copy editing.

Comparing this statement with the Miguel & Kremer study adds more confusion. The DCP estimates albendazole-only treatment at $3.41 per DALY, which appears to be better than (or at least at the better end of the range for) the combination program. However, Miguel & Kremer estimates that albendazole-only treatment is far less effective than the combination program, at $280 per DALY (pg 204).

Perhaps the DCP envisions albendazole treatment carried out in a different way or in a different type of environment. But given that the Miguel & Kremer study appears to be examining a fairly suitable environment for albendazole-only treatment (see above comments about high infection prevalence and strong program execution), this would indicate that cost-effectiveness is extremely sensitive to subtle changes in the environment or execution.

Bottom line

There is a lot of uncertainty in this estimate, and this uncertainty isn’t necessarily “symmetrical.” Estimates of different programs’ cost-effectiveness, in fact, could be colored by very different degrees of optimistic assumptions.

Despite all of the above issues, I don’t find the cost-effectiveness estimate discussed here to be meaningless or useless.

Researchers’ best guesses put the cost-effectiveness of deworming in the same ballpark as that of other high-priority interventions such as vaccines, tuberculosis treatment, etc. (I do note that many of these appear to have more robust evidence bases behind their cost-effectiveness – for example, estimated effects of large-scale government programs are sometimes available, giving an extra degree of context.)

I think it is appropriate to say that available evidence suggests that deworming can be as cost-effective as any other health intervention.

I think it is appropriate to call deworming a “best buy,” as the Poverty Action Lab does.

I do not think it is appropriate to conclude that deworming is more cost-effective than vaccinations, tuberculosis treatment, etc. I think it is especially inappropriate to conclude that deworming is several times more cost-effective than vaccinations, tuberculosis treatment, etc.

Most of all, I do not think it is appropriate to expect results in line with this estimate just because you donate to a deworming charity. I believe cost-effectiveness estimates usually represent “what you can achieve if the program goes well” more than they represent “what a program will achieve on average.”

In my view, the greatest factor behind the realized cost-effectiveness of a program is the specifics of who carries it out and how.

Comments

Cost-effectiveness estimates: inside the sausage factory — 9 Comments

  1. It sounds like what you’re saying is that the cost-effectiveness of a treatment can be tied down to a confidence interval, but not to a single number. And such intervals can overlap with other treatments, indicating that it’s not clear which is cost effective. Seems pretty reasonable to me.

  2. Ian, what I am saying is stronger than that. To me, speaking of a “confidence interval” means speaking of a formal range with an implied midpoint. Instead, the argument I am making is that when two cost-effectiveness estimates are in the same ballpark, a donor should use different criteria entirely to choose between charities. I will probably elaborate in a future post on what I mean by “same ballpark,” because I think it turns out to be a bit more of a complex concept than “factor of X.”

    By the way, I do not believe the confidence intervals given in the DCP attempt to incorporate issues like what I listed in this post. For example, the ranges given for this intervention by the DCP span a factor of 2 or 3, but the M&K estimate for albendazole-only treatment is two orders of magnitude different from the DCP’s, and the “sustainability” assumption alone could easily be worth an order of magnitude. So these ranges should be taken as “ranges under certain assumptions, many of them optimistic,” not “the full range of possibilities.”

  3. I think your cautions are well-taken and I think people who know this literature would probably agree with these cautions.

    I haven’t even looked at the DCP stuff lately but the cost-effectiveness estimates I’m familiar with don’t give confidence intervals in the sense you are using them. What they do is sensitivity analyses, to see how much changing the assumptions of a base-case analysis changes the cost-effectiveness estimates. The base-case would not necessarily be a mean or median of the highest and lowest estimates.

    I’m bothered by your title “Inside the Sausage Factory.” It sounds very rash. What you actually write is much more reasonable. But it makes me wonder if you have such a strong emotional aversion to cost-effectiveness analysis that you are committed to never taking it seriously.

  4. Ron, we definitely take cost-effectiveness analysis seriously. We put a lot of time into understanding and adapting cost-effectiveness estimates; we eliminate charities from consideration at early stages on the basis of cost-effectiveness; we have a preference for international aid largely on the basis of cost-effectiveness.

    The only intent of the title is to imply that many people would be more cautious in how they use cost-effectiveness estimates if they knew how those estimates were created.

  5. Hi,

    Three general points:

    First, there’s two ways in which one could use the cost-effectiveness
    estimates. First, one could use them in order to rank different
    charities on a cardinal scale: so that you know that deworming is more
    cost-effective than DOTS, which is more cost-effective than
    antiretrovirals, and that the difference between deworming and DOTS is
    much smaller than the difference between DOTS and antiretrovirals.
    For this, you don’t need to know the absolute levels of
    cost-effectiveness; you only know relative cost-effectiveness.
    Second, one could treat them as giving the absolute levels of
    cost-effectiveness: if you put $3.4 into deworming, you get out one
    DALY.

    The two uses can differ, for example, if there is a systematic bias in
    the DCP2’s reports. Suppose that it ranks every intervention as twice
    as cost-effective as it really is. In which case, one would be wrong
    to say that, if you put $3.4 into deworming, you get out one DALY.
    But all the claims about the relative cost-effectiveness of
    antiretrovirals, DOTS and deworming, would be just as true as before.
    For the purposes of charity evaluation, the first use is much more
    important than the latter use (we primarily need to know which
    charities are better than which other charities, rather than how good
    they are in absolute terms). For the purposes of explaining the idea
    and importance of cost-effectiveness to the general public, the latter
    is more important (it gets the idea across better). It seems like you
    take more objection to the latter than the former? If so, that’s fair
    enough, up to a point – in statements like ‘X is 100 times more
    cost-effective than Y’, there is no mention of uncertainty.

    The second general point is that uncertainty counts against an
    intervention only if the uncertainty is either symmetrical (and one is
    risk-averse), or if the uncertainty is asymmetrical, biased towards
    lower-cost-effectiveness. For deworming, in contrast, the uncertainty
    seems to be biased towards higher cost-effectiveness: the DCP2 claim,
    for example, that, given more accurate disability-weights, the
    cost-effectiveness for deworming would be something like 20x higher
    (!) than the stated estimate. And note that the potential upside is not symmetrical with respect to the chance of doing 1/20th as much: if we suppose that there is a 10% chance that deworming is 1/20th as effective as the stated estimate, and only a 1% chance that it is 20x as effective as the stated estimate, the expected value of deworming would still be twice the stated estimate.

    Third, The DCP2 acknowledge the uncertainty of their estimates. They
    state that you should pay more attention to the order of magnitude
    than to the precise figure. Because cost-effectiveness varies so
    greatly, that’s enough to differentiate between many interventions
    (e.g. between deworming and condom distribution).
    You say to focus on charities, not causes. But the cost-effectiveness
    of causes can vary by several orders of magnitude, whereas I’d be surprised if the
    efficiency of charities implementing a specific cause varied by more
    than one order of magnitude. Perhaps the cost-effectiveness estimates
    shouldn’t be used to distinguish between deworming ($3.4/DALY) and
    DOTS ($5-10/DALY in sub-Saharan Africa, if you read the DCP2 closely
    enough). But it’s certainly wrong to ignore any differences better
    than $1000/significant life change – that’s missing out on more than
    an order of magnitude of detail.

    Some responses to some of your specific points:

    Suggesting that it’s little better than a role of a die is a bit
    hyperbolic. The cost-effectiveness estimates have been made in order
    to advise developing world governments to better use their resources.
    If they were really just random figures, they would be no help at all,
    and the WHO et al wouldn’t have wasted their money on them.

    (1)- As I said above, it’s the relative cost-effectiveness that is
    most important for our purposes. So if there’s a systematic bias
    (e.g. optimism), that’s not that important, unless it varies
    significantly between interventions.
    Moreover, it’s not clear that one should take DCP2 to be more likely to be an
    overestimate than an underestimate. DCP2 is attempting, at least, to give the most accurate judgments that they can. And there are reasons why DCP2 could underestimate cost-effectiveness: e.g.
    economies of scale (DtW spend only $0.18 (economic cost) or $0.12
    (financial cost) per treatment in India, rather than $0.50, which is
    what the DCP2 used).

    (4)- We’ll always have to ignore some details, because they can’t be
    quantified. To take a flippant example: given the butterfly effect,
    deworming schoolchildren in Africa has some chance of causing a
    hurricane in China. We are happy to ignore that possibility.
    Obviously, there are more salient factors (e.g. the difficulty of
    quantifying educational benefits). But surely we should start where
    we do have quantifiable knowledge (e.g. within health), and then move
    on to the areas that can’t be quantified so easily.
    Moreover, in terms of the additional benefits and costs of a health
    intervention, we can make educated guesses about whether those
    benefits will exceed the costs. In the case of deworming, this seems
    to be the case: where additional costs/benefits have been studied,
    they seem to almost entirely be positive (e.g. the educational
    benefits, the reduction in transmission of HIV/AIDS). So the fact
    that the additional benefits can’t be quantified so easily seems to
    count against deworming, rather than for it.

    (5) – Yes, the schistosomiasis figures are wrong (we contacted the
    World Bank: ignore the decimal point). This hardly constitutes an
    argument against cost-effectiveness, though: everything admits of
    typos, and of course using the wrong figures will give the wrong
    cost-effectiveness estimates.
    The Miguel and Kremer paper is more worrying, and I thoroughly agree
    is something that should be taken into account: how it should weigh
    against the DCP2 figures; what, if anything, explains the discrepancy, etc.

    In terms of deworming vs DOTS, vaccinations etc – that would take more
    time than can be answered on a blog post.

    A trap that much discussion of climate change falls into is the
    paralysis of analysis: that there seem to be so many different
    considerations, and a great deal of uncertainty, so it’s impossible to
    believe anything. A similar trap here is to ignore the more uncertain
    but relevant factors (cost-effectiveness) in favour of less uncertain
    but less relevant factors (like whether the charity replied to your emails, or what its website looks like).

  6. Will, thanks for the thoughtful comments.

    I think we have three main points of disagreement.

    Can symmetric uncertainty matter for a risk-neutral donor?

    You say “uncertainty counts against an intervention only if the uncertainty is either symmetrical (and one is risk-averse), or if the uncertainty is asymmetrical, biased towards lower-cost-effectiveness.” I disagree. I haven’t fully explored the concepts here, but I feel fairly strongly that there are other ways in which uncertainty can matter.

    First, I feel one should have a “prior” that different charitable programs are equally cost-effective, especially when much about the interventions is similar (rolling out proven medical interventions with relatively low drug costs relative to staff costs, via a top-down bureaucracy, to extremely disadvantaged populations) and what is different does not clearly, conceptually point in one direction (for example, applying cheap drugs to everyone once a year for relatively minor conditions vs. applying a more expensive regimen directly to people afflicted with an often-fatal disease). A cost-effectiveness estimate should shift one’s prior, not simply be used as one’s midpoint. The weaker and more uncertain the estimate is, the more it should be taken as “limited evidence that A > B” as opposed to “reason to assume that A = 5B until proven otherwise.”

    This is what I mean when I say cost-effectiveness estimates should not be taken literally. If I’m watching two people fish, and person A catches 10 fish in an hour while person B catches one, I now have some evidence that person A is better at fishing, but I don’t actually believe as a midpoint estimate that s/he is 10x as good in terms of future expected fish caught per hour. This is true even though I know nothing about fishing and what sort of luck-based and skill-based variation is normal.

    More concretely, purely from a math perspective, the midpoint of a noisy estimate cannot be used equivalently to the midpoint of a precise estimate. Consider the following hypothetical: (a) Men’s height has a normal distribution with mean 69″ and standard deviation 3″. (b) We create very noisy estimates (how noisy we aren’t sure – we just know we’re getting a pretty unreliable number) of the height of two men, Alan and Bob. Our noisy estimates tell us that Alan is 100″ tall and Bob is 20″ tall. (c) We try to guess, having only our estimates, which number will be higher: 50% of Alan’s (true) height or 100% of Bob’s. If asymmetric uncertainty could simply be ignored, the smart bet would be on 50% of Alan’s height, yet mine would be on 100% of Bob’s.

    I feel this hypothetical is analogous (though not necessarily equivalent) to the situation we’re looking at. If Charity A runs a program estimated to be 5x as cost-effective as Charity B’s (by a very noisy estimate), yet wastes half its money while Charity B wastes none of its money, Charity B may be the better bet.

    This of course depends on the relative variance of (a) actual effectiveness vs. (b) the noise in the estimate. My preliminary estimates suggest that this issue is very relevant in this case. At some point I hope to do a more careful analysis (and I have asked Toby Ord for some information that would help with it). However, this ought to illustrate the concept that the sheer amount of noisiness matters, even if symmetrical.

    This is why I have emphasized the typo, the unknown effects, etc. These issues should increase your view of the noisiness of the estimates relative to the variance in actual effectiveness of programs. That in turn should decrease your weight on the relative magnitudes of the midpoints of the estimates, relative to your weight on the relative apparent effectiveness of the organizations.

    Is there an asymmetry to the estimates’ errors that is relevant here?

    You seem to think that the estimates are systematically biased in the pessimistic direction. I think they are systematically biased in the optimistic direction because they assume competent, successful, most likely sustained execution of a program such that its real-world execution corresponds to the models used to estimate its cost-effectiveness, which themselves are based on unusually carefully observed (and probably unusually successful) projects.

    This again relates directly to the competence of the organizations themselves. It seems to me that incompetent, or simply not sustained, execution can easily make the impact of the program fall to zero or below. So, when Charity A is running a program that would be 5x as cost-effective as Charity B’s if it were executed very well, but we have strong evidence of Charity B’s effectiveness and no evidence of Charity A’s, we could easily be in fact looking at a situation where Charity B is accomplishing 100x+ as much per dollar.

    Is GiveWell collecting meaningful information on charities’ relative effectiveness?

    Both of the above two points I’ve made point to the importance of organizational effectiveness. You seem to think that we have very little meaningful information on this point, characterizing what we have as “whether the charity replied to your emails, or what its website looks like.” If our information on organizational effectiveness were in fact meaningless, I’d stick by cost-effectiveness estimates, but I don’t think it is. A couple of points:

    • I think “whether the charity replied to your emails, or what its website looks like” is a misleading depiction of our criteria. We are not concerned with what a website “looks like” but rather with whether substantive information is made available on it; and most of our emails are replied to, but result in extremely different results from substantive and helpful information to “We won’t share anything.” I prefer this framing: “Whether the charity voluntarily – or upon request – discloses substantive information about its effectiveness or does not disclose any such information.” This is a heuristic we have discussed and defended at length.
    • When some charities choose to share substantive information about their impact and others choose to share little or nothing, what is the appropriate response in terms of estimating the effectiveness of each? The answer doesn’t seem obvious to me at all, and seems to be heavily dependent on one’s priors (and clearly we seem to have different priors from yours, as you’ve noted elsewhere). However, I note that the effectiveness of charities can vary as widely as the cost-effectiveness of programs, since the aggregate impact of a poorly run organization is likely to be zero or negative.

    One more observation: as you note, the DCP authors themselves do not seem to wish that these estimates be taken literally. “Readers are encouraged to pay attention to the order of magnitude of each estimate rather than the specific number presented” (pg 40). To me this indicates that they see things more as I do than as you do.

  7. Hi,

    There are a lot of points being raised now. Given that few people will read long posts on an old thread, perhaps it’s better to continue discussion of more technical matters over the phone. Here I’ll just state the points of agreement and disagreement, so we can at least get clear on where our differences lie. For others reading, it should be borne in mind that any disagreement is against a background of mutual agreement concerning the purposes of charitable giving, and how to assess charities.

    Points where we agree:
    1) Cost-effectiveness point estimates embody a lot of uncertainty (we both know and take seriously the DCP2 comment that the point estimates are orders of magnitude estimates). Therefore, the more info the better. In particular, on the issue of deworming, it would be very useful to have info on the following.
    a. Areas which might plausibly lead us reduce our cost-effectiveness estimates (where we’ve previously had optimistic assumptions): What exactly DtW (and SCI) does, in terms of technical assistance and scaling up?; What explains the cost-effectiveness discrepancy between PAL’s Schisto and STH estimates, and DCP2’s estimate?
    b. Areas which might plausibly lead us to increase our cost-effectiveness estimates (where we’ve previously had pessimistic assumptions): What is the true DALY weighting for STHs and Schisto?; How effective is the advocacy work that DtW and SCI perform?; How effective have their efforts to hand over programs onto developing world governments been?
    2) When using the DCP2’s estimates, one should take into account regression towards the mean: the highest estimates will likely be over-estimates; the lowest estimates will likely be underestimates. One should also take into account qualitative factors: does it make sense that this intervention is incredibly cost-effective (e.g. for deworming: very low drug costs, no need for trained medical staff; for TB: curing TB also prevents large numbers of new infections)?
    3) Combinations of factors – for example, regression towards the mean, and charities spending only some percentage on the most cost-effective activities, and the charity being inefficient at implementing the intervention – can multiply up, so it shouldn’t be assumed that charities working on more cost-effective causes are therefore more cost-effective themselves.
    4) For this reason, it’s a good thing to get to know the inner workings of charities. (I didn’t mean the “whether the charity responds to emails” comment to be offensive, nor to imply that GiveWell is collecting only meaningless information, but only to say that one shouldn’t disregard a charity on that basis alone (see point (2) below)).

    Points where we disagree:
    1) We think, as a prior, that the distribution of cost-effectiveness among causes is much greater than the distribution of cost-effectiveness among charities. We think that the former distribution is log normal (which is backed up by the DCP2’s findings, if these are plotted graphically). We think that the cost effectiveness of causes among developing world health interventions varies by 4 orders of magnitude, whereas the cost-effectiveness of charities implementing the same intervention varies by one order of magnitude. In contrast, you think ‘the effectiveness of charities can vary as widely as the cost-effectiveness of programs’. (In which case, how many orders of magnitude do you think is typical of both?)
    2) For reason (1), we think that it can be best to give to a charity even though in-depth information about the charity is unavailable; whereas you will only recommend a charity if in-depth information is available.
    3) For reason (1), we think that, in general, additional research into cost-effectiveness of causes is more valuable than additional research into cost-effectiveness of charities. Not sure what you think on this issue: I’m guessing you think that both are of roughly equal value.
    4) We think, in the case of deworming, that the uncertainty favours higher expected cost-effectiveness (especially because of the DALY weights issue). You think that the uncertainty favours lower expected cost-effectiveness.

    Is this an accurate assessment, do you think?

  8. Hi Will,

    Your statement of points where we agree and disagree is accurate, but not, I think, complete (particularly for the latter). Some additions:

    • It is not only necessary to consider the distributions of (a) cause cost-effectiveness and (b) charity cost-effectiveness. It is also necessary to consider the distribution of (c) “estimate error,” i.e., discrepancies between the estimate of cost-effectiveness and the real cost-effectiveness, caused by the fact that the estimates involve extrapolation and assumption (not just real-world variation). If (c) is large relative to (a), then even relatively small variations in organization-specific cost-effectiveness can be decisive. (I am not sure the term “regression to the mean” fully captures what I am talking about here; I might instead term it “regression to the prior” since there is a significant source of error that we can’t get information about just from observing the distribution of our observations.)
    • Another reason we feel it is important to support only demonstrably effective organizations relates to the incentives we create for organizations. I feel that if our approach became highly influential, charities would have incentive to do and share a lot of evaluation; I feel that if your approach became highly influential, charities would have incentive to formally classify themselves as working on the most cost-effective interventions (which they might do by providing “technical assistance” of dubious quality, for example) but not to do evaluation and share substantive information.
    • Finally, I don’t dispute that the cost-effectiveness estimates are “conservative” in the sense of ignoring many more difficult-to-quantify benefits. However, I feel that this applies to most cost-effectiveness estimates, and that the cost-effectiveness estimates are “optimistic” in other ways such as being based on the most closely observed projects.

    Detail question – regarding DALY weights, the passage I read on this (DCP pgs 470-471) seemed to be pointing to DALY weights both as a potential source of underestimation of cost-effectiveness for deworming and as a source of high symmetric uncertainty, but I don’t recall anything specifying that appropriate DALY weights would lead to 20x the estimated cost-effectiveness. What passage are you referring to there?

    Responses to the questions you asked:

    How many orders of magnitude do we think that different causes differ by, vs. different charities within a cause?

    I think an organization providing not-very-helpful technical assistance, or not-very-effective advocacy (or advocacy for programs that the specific governments they’re working with are not good at executing), or funding developing-world governments without holding them accountable, can have zero or below-zero cost-effectiveness, i.e., do more harm than good. So I think the “range” of cost-effectiveness for charities within a cause (the min-max range, which I think is the figure you gave) is undefined/infinite when speaking geometrically. Estimating something more meaningful like the standard deviation would be more difficult, but I would guess that differences of more than an order of magnitude are common (whereas you seem to think an order of magnitude is the “max” difference).

    I’d guess that “estimate error” accounts for roughly as much variation in estimated cost-effectiveness as actual variation does. My first rough cut at this implied that if this is true, a factor-of-two difference in the “charity effectiveness multiplier” can outweigh a factor-of-five difference in estimated intervention cost-effectiveness.

    Would we rather see further research into the cost-effectiveness of different interventions or the cost-effectiveness of different charities?

    I think the latter is more needed as there is practically no information on it. For the former there have already been significant and well-resourced efforts, and we have something to go on; I am skeptical that more research would itself be cost-effective in terms of improving our understanding. This is especially the case for neglected tropical disease (NTD) control, for which cost-effectiveness estimates depend so heavily on prevalence/incidence data that is so difficult to get.

  9. Some comments regarding:

    Both positive and negative factors have likely been ignored in the estimate, including:

    • Possible negative health effects of the deworming drugs themselves (DCP pg 479). (Negative impact on cost-effectiveness)

    According to

    Evaluation of Coverage of Deworming Interventions in Vietnam

    by A. Ehrhardt, et. al. which discusses treatment of soil-transmitted helminths by mebendazole,

    Approximately 0.4% of those interviewed reported side-effects after deworming, while the occurrence of side effects through routine reports was 0.15%. In both cases, the side effects were described as mild forms of nausea, abdominal pain, or headaches that did no required pharmacological treatment.

    The other drug that’s used to treat soil-transmitted helminths is Mebendazole. I’ve been unable to collect information about the side effects of this drug.

    According to

    Efficacy and side effects of praziquantel treatment in a highly endemic Schistosoma mansoni focus at Lake Albert, Uganda

    by N.B. Kaatereine, et. al.

    Side effects of the treatment were studied in a separate cohort of 346 people.

    [...]

    Most of the side effects developed between 30 min and 6 h after treatment but had subsided within 24 h. None of the side effects were so severe, that they warranted steroid treatment.

    [...]

    The symptoms questionnaire showed increased rates of abdominal pain, diarrhoea, vomiting, nausea, dizziness, body rash, and fatigue following treatment. Only 20.5% of the treated individuals did not report any side effects. Most of the side effects became manifest soon after drug therapy, and most were mild and short-lived.

    Whether or not side effects play a significant role in determining the cost-effectiveness of deworming depends on the (still nebulous) upside of deworming. Furthermore, the studies above list only short-term side effects and it’s conceivable that the drugs carry long-term side effects as well.

    However, based on a preliminary look at the numbers and on the fact that the short-term side effects seem somewhat mild I would guess that negative side-effects of the drugs have an relatively insignificant role in determining the cost-effectiveness of deworming efforts.