Development Media International (DMI), an organization that “runs radio and TV campaigns to change behaviours and save lives in developing countries,” recently released preliminary midline results from a randomized controlled trial. These results imply that exposure to DMI’s content caused people to take up a variety of behaviors (including early initiation of breastfeeding and various measures for controlling diarrhea) at increased rates. We have not yet seen the full technical report and we have substantial questions about these results, but believe that the basic design of the study is a major improvement on previous evidence regarding similar programs.
If the results – and DMI’s cost-effectiveness calculations – held up, DMI could become a GiveWell top charity with substantially better estimated “cost per (equivalent) life saved” than our current top charities. Because of the nature of DMI’s intervention – attempting to change behavior through mass media – it is quite plausible to us that its cost per person reached is extremely small, and thus that the cost-effectiveness is extremely high. However, there are many questions we would need to look into before determining whether this is in fact the case.
As such, we plan to investigate DMI thoroughly this year. There is some possibility that we will decide to recommend it at the end of this calendar year, and some possibility that we will instead choose to hold off on deciding whether to recommend it until next year, when the study’s endline results (which we expect to give a more reliable picture of its impact) are released. Note that we recommend only charities that we have thoroughly vetted and spent substantial time considering alongside other contenders, and we have designed our process to be most up-to-date when it is most relevant (in December); so regardless of what we find, we are extremely unlikely to change any official recommendations ahead of that timeline. Our recommendations do not reflect real-time “best guesses,” but rather reflect the results of thorough investigation and deliberation.
However, as we state in our current writeup on the subject, we have previously found research on such programs to have important limitations and to be generally prone to selection bias. As such, we have been keeping tabs on DMI’s ongoing study, which – unlike previous work – uses randomization to isolate the impacts of its programming.
Preliminary, public discussion of DMI’s midline results became available recently, and we conducted a conversation with DMI representatives about the results (notes from this conversation are forthcoming). We note that:
- The preliminary results seem to indicate that DMI’s intervention had substantial impacts on behavior change (the easiest way to see this is to scan the rightmost column, “differences in differences (unadjusted)” in DMI’s table, though there is some other evidence available as well). We do not yet have complete information about the statistical significance of these findings (the public report gives p-values for four of ten behaviors).
- It appears that the treatment and control groups differed substantially at baseline (in terms of their behaviors and other indicators), making the comparison less than ideal. In addition, the control group (which did not receive DMI’s intervention) saw substantial improvement on the key indicators, which we don’t have an explanation for at this time. With that said, skimming the summary table provided by DMI, it appears that in many cases, the treatment group started with lower rates of a given behavior, and then caught or surpassed the control group, which (when viewed in the context of a randomized selection process) does seem to indicate a positive effect to us, though the magnitude may prove hard to estimate reliably.
- DMI provides an estimate that its intervention costs ~$60-300 per life saved, which would make it much stronger (around an order of magnitude stronger) than the strongest of our top charities. DMI also stated to us in conversation that the midline results are reasonably consistent with the assumptions that went into the estimate, though it appears to us that there will be substantial judgment calls in how to interpret the results in light of the estimate. We have not yet seen the technical details of DMI’s estimate. We expect our final estimate of DMI’s “cost per life saved” to be substantially less optimistic than DMI’s current estimate, but at this time it is hard to say by how much.
- DMI’s midline results come entirely from self-reported data on behavior, which is potentially a highly problematic measure for assessing the impact of a mass media campaign (since the media campaign may impact people’s perceptions of how they should respond to surveys more than it affects their actual behavior). It appears to us that DMI is aware of this issue, and states that its endline results (slated for 2015) will measure mortality.
- Based on DMI’s website and our discussion, we preliminarily perceive DMI so far to be unusually thoughtful and analytical in its self-evaluation and -analysis.
- Obtain the technical report behind the midline results and examine it closely. DMI has offered to share this with us in the next few weeks.
- Examine the technical details behind DMI’s “cost per life saved” estimate and assess them with our analysis of the midline results in mind. DMI has shared its model with us, though we have not yet examined it.
- Get more information about DMI’s plans and room for more funding and consider how our cost-effectiveness estimate might change in light of these.
- If warranted, further examine DMI via a site visit and create a full charity review.
Because of the strong study design, the degree of thoughtfulness and analysis we’ve seen so far from DMI, and the strong claimed cost-effectiveness and basic plausibility of the claim, we believe there is a substantial chance that DMI will eventually become a recommended charity, and that there is also a (smaller) chance that we will end up estimating substantially higher “bang for the buck” for DMI compared to our current top charities. If this turns out to be the case, we will hope to help close any funding gap of DMI’s.
Comments
This seems really exciting!
Re the issue that the control group and the intervention group have such different baselines for a variety of indicators: I wonder if a better protocol would have been to constrain the sample space for the randomization to enforce some bounds on how much the baselines can differ. The authors do attempt to correct for the different baselines statistically, but presumably at some cost of statistical power (and possibly with some additional modeling assumptions).
“It appears that the treatment and control groups differed substantially at baseline”
Just on this point and in response to Colin –
* The control group had an average baseline metric of 50%, while the treatment group had 42% (from http://www.developmentmedia.net/burkina-faso-rct-preliminary-midline-results).
* However, within each group there is no relationship between the baseline metric and the percentage improvement in the trial so far.
Overall, an exciting result and I look forward to learning more.
Robert Wiblin writes:
Why do you say that? For instance, have a look at the odds ratio for receiving ORT for diarrhoea (on p.2 of this doc). They quote an adjusted odds ratio of 1.83. But note in this case the intervention group increased by 25 percentage points whereas the control group increased by just 2 points, so you would naively expect a double-digit odds ratio (crunching the numbers I get an unadjusted odds ratio of 14.6). My point here is just that the adjustments (in this case, for baseline and household level proximity to a health care center) do make a big difference.
The control group is overall much better served for health care than the intervention group. In addition to the point Robert noted, the authors remark that mortality is lower in the control group (they don’t specify how much) and, perhaps most remarkably, “there are also twice as many health centres in the control zones as in the intervention zones” (I’m not sure if that’s per capita or what, though presumably both groups are of roughly equal population).
My original point was about the study design, specifically the randomization. Out of 14 zones, they had to choose 7 intervention zones; there are 14-choose-7 or 3432 ways to do this. I don’t know exactly what their process was: did they randomly choose among all those or choose from some more limited set? Of course, there’s a fair amount of discreteness with just 14 zones, but still I’d think you could add some constraints on the difference in various baselines, difference in total household count, etc. and therefore reject most of the 3432 zones before making a random selection. You could even formulate it as an optimization problem (I could spec this out in more detail if anyone cares) and narrow it down to a single coin toss to pick which half of a partition is the intervention half.
None of this is to deny that you can’t statistically correct for the differences in baselines in the final analysis. This is especially true if you can correct the bias with household level data (e.g. if correcting for distance to the nearest health center turns out to substantially eliminate the various biases). I would just expect there is some loss of statistical power that may have been avoidable at the design stage. Also, I don’t know anything about the literature on Randomized Control Trials; so I’m somewhat shooting from the hip and what I’m advocating may be a bad idea for some reason, or missing something important, etc.
(In the above comment, “3432 zones” should be “3432 possibilities.”)
Colin and Rob – thanks for the thoughts.
The kind of constrained randomization that Colin describes is common in the statistical literature and seems like it would have been helpful in this case. There’s a literature on “rerandomization” and more generally the keywords to search for seem to be balanced randomization or stratified randomization. Relative to post-hoc regression analysis, which is what DMI is doing, some mechanism for incorporating covariates into the initial randomization process avoids bias. That said, the regression they’re doing in the case of ORT for diarrhea actually makes the treatment appear less effective than the unadjusted figures, so we are somewhat less concerned than if the adjustment was in the opposite direction. It would be helpful if DMI pre-specified in detail before the collection of final mortality what covariates they plan to include in that analysis.
As an aside, I don’t think these kinds of post-hoc adjustments typically reduce power (though they might relative to some ideal initial balanced randomization procedure). Because they add some explanatory power, they reduce the remaining variance for the experimental variable to explain, giving the comparison more power in some sense (even though there may be a sense in which they shouldn’t).
Hi Colin,
I think I agree with what you are saying. My point was simply that this small sample doesn’t show locations with the worst baseline results improving any more than those with the best baseline results. This is modest evidence that the difference in baselines between the control and treatment group isn’t going to bias the overall result.
Thanks, Alexander! That Morgan and Rubin paper you linked to is exactly the methodology I had in mind. (As you say, it looks like the literature uses the term “rerandomization”, but that’s equivalent to restricting the sample space. In the notation of the paper, they keep selecting a random sample W until they get one with phi(x,W)=1, which is equivalent to restricting the sample space to those randomizations that satisfy phi(x,W)=1.)
At least judging from the Morgan and Rubin paper, it looks like the literature is focussed on the case that the sample space for the randomization is large. But in this case it is quite small, small enough in fact that you could easily analyze all possible randomizations (even in a spreadsheet) and select phi such that there is only a single partition (two randomizations W) such that phi(x,W) = 1.
On the issue of apparently reducing power, I indeed meant that compared to an ideal more balanced randomization. Sorry if I was confusing on that. I’m saying it seems that the study had a suboptimal design (so that confidence intervals will be wider than they could have been). I agree the post-hoc adjustments are important.
Is there any available information on DMI’s room for more funding or how underfunded DMI is relative to the arts in general?
Michael, we haven’t yet assessed DMI’s room for more funding but would do so as part of an investigation.
Comments are closed.