# Should we expect an ongoing study to meet its “goal?”

One of our newly “standout” charities, Development Media International (DMI), is in the midst of a randomized controlled trial. So far, all we have from the trial is information about self-reported behavior change, and we’ve tried to use that information to estimate how many lives the program will likely save (for purposes of our cost-effectiveness analysis). We estimate that the measured behavior changes should equate to about a 3.5% reduction in child mortality. However, DMI is hoping for a 19% reduction, and by our estimate, if it falls short of 10-14%, it will likely fail to find a statistically significant impact. What should we put more credence in – GiveWell’s projection based on available data about behavior change, or DMI’s projection?

Ordinarily, I’d be happy to consider the GiveWell estimate a best guess. I’m used to charities’ estimates turning out to be optimistic, and DMI’s estimate is based on a general model rather than on the actual data we have about its impact on behavior.

However, I find myself very uncomfortable predicting a figure of 3.5% when the people carrying out a study – and paying the considerable expenses associated with it – are expecting 10-20%. I’m uncomfortable with this discrepancy for two reasons:

• It’s a little hard to imagine that an organization would go to this level of expense – and reputational risk – if they weren’t fairly confident of achieving strong results. Most predictions and projections charities put out are, in a sense, “cheap talk,” by which I mean it costs a charity little to make strong claims. However, in this case DMI is conducting a study costing millions of dollars*, and by being public about the study, they face a significant public relations risk if the results are disappointing (as our projection implies they will be).
• I also struggle to think of examples of studies like this one – large, expensive, publicized studies focused on developing-world health or economic empowerment – that have turned out to be “disappointing” from the perspective of people carrying out (and/or paying for) the study. Though I do know of a fair number of studies showing “no impact” for an intervention, I believe they’ve generally been academic studies looking at very common/popular interventions (e.g. improved cookstoves, microlending). These “no impact” results were noteworthy in themselves, and didn’t necessarily reflect poorly on the people conducting or paying for the studies. I have a much harder time thinking of cases in which a major developing-world study found results that I’d consider disappointing or embarrassing for those carrying out or funding the study. The only one that comes to mind is the DEVTA trial on vitamin A and deworming.

I haven’t taken the time to systematically examine the intuition that “developing-world studies rarely find results that are disappointing/embarrassing for those carrying out the study.” It’s possible that the intuition is false; it’s also possible that it’s an artifact of the sort of publication bias that won’t affect DMI’s study, since the DMI study’s existence and hypothesis are already public. Finally, it seems worth noting that I don’t have the same intuition about clinical trials: indeed, failed clinical trials are frequent (especially in the relatively expensive Phase II).

With that said, if my intuition is correct, there are a couple of distinct possible explanations:

1. Perhaps, in developing-world settings, it is often possible to have a good sense for whether an intervention will work before deciding to run a formal study on it. Accordingly, perhaps expensive studies rarely occur unless people have a fairly good sense for what they’re going to find.
2. Perhaps publication-bias-type issues remain important in developing-world randomized studies. In other fields, I’ve seen worrying suggestive evidence that researchers “find what they want to find” even in the presence of seemingly strong safeguards against publication bias. (Example.) Even with a study’s hypothesis publicly declared, we believe there will still be some flexibility in terms of precisely how the researchers define outcomes and conduct their analysis. This idea is something that continues to worry me when it comes to relying too heavily on randomized studies; I am not convinced that the ecosystem and anti-publication-bias measures around these studies are enough to make them truly reliable indicators of a program’s impact.

Even with #2 noted as a concern, the bottom line is that I see a strong probability that DMI’s results will be closer to what it is projecting than to what we are projecting, and conditional on this, I see a relatively strong probability that this result will reflect legitimate impact as opposed to publication bias. Overall, I’d estimate a 50% chance that DMI’s measured impact on mortality falls in the range of 10-20%; if I imagine a 50% chance of a 15% measured impact and a 50% chance of a 3.5% measured impact (the latter is what we are currently projecting), that comes out to about a 9% expected measured impact, or ~2.5x what we’re currently projecting.

In either case, I’ll want our cost-effectiveness estimate to include a “replicability adjustment” assigning only a 30-50% probability that the result would hold up upon further scrutiny and replication (this adjustment would account for my reservations about randomized studies in general, noted under #2 above). Our current cost-effectiveness estimate assigns a 50% probability. Overall, then, it could be argued that DMI’s estimated cost-effectiveness with the information we have today should – based on my expectations – be 1.5-2.5x what our review projects. That implies a “cost per life saved” of ~$2000-$3300, or about 1-1.7x as strong as what we estimate for AMF. It is important to note that this estimate would be introducing parameters with a particular sort of speculativeness and uncertainty, relative to most of the parameters in our cost-effectiveness calculations, so it’s highly debatable how this “cost per life saved” figure should be interpreted alongside our other published estimates.

DMI has far less of a track record than our top charities this year. In my view, slightly better estimated cost-effectiveness – using extremely speculative reasoning (so much so that we decided not to include it in our official cost-effectiveness estimate for DMI) – is not enough to make up for that. Furthermore, we should know fairly soon (hopefully by late 2015) what the study’s actual results are; given that situation, I think it makes sense to wait rather than give now based on speculation about what the study will find. But I do have mixed feelings on the matter. People who are particularly intent on cost-effectiveness estimates, and agree with my basic reasoning about what we should expect from prominent randomized studies, should consider supporting DMI this year.

*The link provided discusses DMI’s overall expenses. Its main activity over the time period discussed at the link has been carrying out this study.

• This is marginal to the topic of the post, but the linked article on DEVTA goes to some lengths to laud its methodology. The link might not be meant as a full endorsement of the article, but I hadn?t heard of DEVTA before, so I found it surprising (and somewhat unnerving) that a study of such purported high quality did not seem to have had a stronger effect on the more-or-less consensus that deworming and vitamin A supplementation are strong interventions. How should I reconcile these findings?

The GiveWell review of deworming points so some methodological flaws in a footnote and references Sommer et al. who are very critical of DEVTA?s methodology. Cursory skimming of some responses and reviews also indicates that DEVTA contradicts dozens (is that the correct range?) of studies that did find significant positive effects of the interventions and were not of lesser quality. Is that correct? Evidence Action implies that the DEVTA results (for deworming) only hold for regions with low incidence of worms. Is that correct too?

I?d investigate this further myself, but I seem to have hit a point where the time I?d have to invest to gain further clarity starts to increase rather steeply. Sorry. Plus, it?s bed time.

• Ben Kuhn on December 23, 2014 at 12:10 am said:

What’s the source of your estimate that DMI will find only a 3.5% reduction in all-cause mortality? Is it only due to the discounting factors that you list in your cost-effectiveness analysis? Because at least the discounts for external validity (i.e. single-country and part of single-study discounts) shouldn’t apply to your model of the expected results of *this particular RCT*, only your model of the expected results of a similar RCT run in another country.

• Ryan carey on December 23, 2014 at 9:10 am said:

Interesting post.
“I also struggle to think of examples of studies like this one – large, expensive, publicized studies focused on developing-world health or economic empowerment – that have turned out to be “disappointing” from the perspective of people carrying out (and/or paying for) the study… Perhaps publication-bias-type issues remain important in developing-world randomized studies.”

Yes, outcome reporting bias and multiple hypothesis testing are issues with research quality that are quite different from publication bias but cause studies to be more likely to show a positive result for the very reason that researchers and participants don’t want the experiment to end up as a disappointment.

I haven’t assessed if these or other biases apply to DMI.

• Alexander on December 24, 2014 at 4:29 pm said:

Thanks for the questions and comments!

Telofy – Sorry we haven’t made it easier for you to find what you’re looking for! We’re planning to publish an intervention report on Vitamin A soon, which will discuss DEVTA and the other evidence around Vitamin A in depth, and should address your questions. In the mean time, you can read our notes from conversations with some of the DEVTA researchers and one of the authors of the Cochrane Review on Vitamin A supplementation if you want to learn more. In general, we didn’t find the portion of the DEVTA study focusing on deworming surprising, because we wouldn’t expect deworming to have effects on the outcomes it measured (e.g. mortality).

Ben – the only adjustment to the 3.5% figure is to assume a 25% reduction in effect sizes because of self-reported results. In our overall cost-per-life saved estimate for DMI, we also include a 50% replicability adjustment, but that isn’t included in the calculation for the 3.5% figure. You can see these adjustments in cells G39-42 of our DMI cost-effectiveness spreadsheet (though note that column G defaults to hidden, so you’ll need to unhide it).

• Thanks, Alex! I’m looking forward to that report on vitamin A. I’ve long been meaning to learn more about that intervention. Thanks also for the links.

That deworming doesn’t have a noticeable effect on mortality does seem straightforward. I’ll have to skim the study, though, to see if the other claims I’ve found about it hold up.