This post discusses our detailed examination (including, with help from the authors, reanalyzing raw data) of the Miguel and Kremer 2004 study on deworming (treating people for parasite infections) as a way to raise school attendance, and a followup study (Baird et al. 2012) on the later-in-life impacts.
Our current #3 charity, SCI, focuses on deworming. Deworming is quite cheap (we estimate that SCI spends ~$0.50 per person treated, including all costs) but the benefits are not as obvious and tangible as those of many other health programs because the parasites treated cause few deaths and their effects may be subtle. (So much so that a recent Cochrane review fails to find statistically significant impacts of population deworming on the outcomes that have been most studied). In our view, the case that deworming is a “good buy” depends heavily on the idea of developmental effects: the possibility that deworming children has a subtle, lasting impact on their development, and thus on their ability to be productive and successful throughout life.
The main evidence for this idea comes from (a) Bleakley 2004, a study of the Rockefeller Sanitary Commission’s campaign to eradicate hookworm in the American South in the early 20th century; (b) a series of studies in Kenya, in which school deworming was rolled out on a purposefully arbitrary (randomization-like) basis, and children who received more years of deworming were compared to children who had received fewer. This post focuses on the latter.
We have long had questions about these studies, relating both to the possibility of publication bias and to possible ways in which the setting of the studies was unrepresentative. This year, we decided to ask the study authors for the data and code behind their studies, so that we could run additional analyses to gain more information about the seriousness of our concerns. The authors graciously shared their data and code and helped us to interpret it.
The full details of our reanalysis are available here. A big-picture summary of our concerns and findings follows.
Data-mining is a form of publication bias, in which researchers look at many possible analyses of the data they collect, and present only the analyses most favorable to the conclusions they’re hoping to find. We were concerned about this issue for the series of deworming studies because
- The first study (Miguel and Kremer 2004) drew a great deal of attention for its positive result (regarding the positive impact of deworming on school attendance), raising the possibility that researchers had the incentive to declare positive results from subsequent studies as well.
- Different studies used different definitions of “treatment group,” and emphasized different outcomes (for example, the initial study emphasized school attendance; the second emphasized height; the third emphasized earnings).
What we found
After performing our own analysis, we are less concerned about this issue than we were previously. Changing our definition of the “treatment group” didn’t have much impact on the findings, and while the authors did share some outcomes with us that had not been reported in the paper, there were not a lot of these and they didn’t significantly change the picture.
That said, there do remain some reasons to be concerned about this issue. The authors of the most recent follow-up study (the one emphasizing earnings) shared not only data but also their funding proposal and survey questionnaire with us, and we note that
- The survey was extensive, and included a lot of information that wasn’t included in the data that researchers analyzed. We weren’t surprised to see that this was the case (the follow-up study aimed to collect a rich data set, not just data customized for the questions it was asking), but it still raises the possibility that bias could have crept into the process of transforming raw survey answers into analyzable data.
- The funding proposal expressed an interest in a wide variety of outcomes, including educational attainment, labor market outcomes (measurable in multiple ways, though the authors stated to us that one of the statistically significant positive effects described in the paper, labor market earnings, is the “canonical” measure for the field of labor economics), cognitive performance, happiness, health measures, and more; it did not clearly declare that any particular one was the primary outcome of interest.We attempted some rudimentary analysis of whether this fact could have facilitated spurious findings, and didn’t find strong reason to think that the findings were spurious. However, we struggled with the question of how a study’s findings should be adjusted for the fact that it had a larger universe of multiple outcomes that it could have chosen to emphasize; normal statistical adjustments for multiple comparisons do not perform well in cases with large numbers of ambiguous outcomes.
We feel that the issues above would have been much easier to resolve if the authors of the studies had preregistered their studies, declaring in advance what their primary and secondary outcomes and analyses of interest were. This is not to say that the authors have done anything unusual or wrong; our understanding is that preregistration was extremely rare (perhaps nonexistent) in economics at the time the study was done. In fact, Ted Miguel, one of the authors of the study, has gone on to co-author one of the first studies we’ve seen in this field that does utilize (and discuss) preregistration. Our point here is just that the practice of preregistration carries substantial credence benefits with us as consumers of research, and would affect our qualitative assessment of these findings.
|Measure||Year 1 Prevalence||Year 2 Prevalence|
|Moderate to heavy schistosomiasis infection||7%||18%|
|Moderate to heavy hookworm infection||15%||22%|
|Moderate to heavy roundworm infection||16%||24%|
|Moderate to heavy whipworm infection||10%||17%|
The above table likely substantially understates the degree of change, because the second-year figure includes the benefits of treatment externalities experienced by the control group (discussed above). A calculation sent to us by the authors implied that the 18% prevalence of moderate to heavy schistosomiasis infection in the control group in year 2 shown above should be augmented by the 22 percentage point externality effect of treatment to get a genuine counterfactual infection rate of 40% – despite the fact that the initial prevalence was (as shown above) only 7%. This implies that without the program, the area would have seen an extreme rise in prevalence of moderate-to-heavy schistosomiasis infections. A footnote in Miguel and Kremer 2004 attributes this phenomenon to “the extraordinary flooding in 1998 associated with the El Niño weather system, which increased exposure to infected fresh water (note the especially large increases in moderate-to-heavy schistosomiasis infections), created moist conditions favorable for geohelminth larvae, and led to the overflow of latrines, incidentally also creating a major outbreak of fecal-borne cholera” (Pg 174).
Because of this unusual situation, we worry that the results of studies from this place and time may not generalize well to other circumstances in which rates are at lower, more typical levels.
What we found
We did a couple of analyses to see whether the headline effects were sensitive to the prevalence of moderate-to-heavy infections, particularly schistosomiasis infections. As we expected, we did see some reason to believe that deworming had had larger impacts in higher- than in lower-prevalence areas, and that it had had larger impacts for schools with substantial schistosomiasis prevalence. That said, what we found was far from sufficient to completely explain away the studies’ findings. In particular, dividing the schools into those that did and didn’t have substantial schistosomiasis prevalence left us working with fairly small sample sizes, from which it is difficult to conclude anything.
Overall, we are moderately less concerned about this issue than we were before.
One possible alternative explanation is that parents/students may have sought to switch into the schools that “won” early deworming treatments, which could cause the treatment and control groups to differ in ways not picked up by the baseline data measured by the studies. Further discussion with the authors made this concern appear far less likely to us (our understanding is that the deworming program was announced very close to the time when students were registered as “treatment” or “control,” and that school transfers after the start of the program were rare and relatively symmetric between schools).
Another possibility that has become more salient to us in the course of analyzing these studies is that efforts to encourage students to attend school in order to receive treatment might have bled over to later days, increasing attendance in treatment schools over the following years. The particular piece of data that led us to examine this possibility is that within schools, there is no statistically significant difference in attendance rates for treated and untreated students (the effects only appear across schools). (The authors assume that this phenomenon occurred due to the presence of within-school externalities.) In the course of analyzing the studies more closely, we learned that treatment dates were announced at the school in advance in an attempt to boost take-up, and that some efforts were undertaken to boost attendance on drug distribution days.
Finally, we wondered whether the results of these studies might be driven by a few “outlier” schools, but after the analysis we’ve done of the raw data, we are now convinced that this is not an issue.
We still have substantial reservations about the studies. Preregistration would have been an additional measure that could have increased our confidence and lowered our concerns. In fact, this is the single case we’ve seen in which preregistration would have had the most influence on our conclusions. Had the authors preregistered hours worked or income amongst those employed (the key metrics showing improvement in Baird et al. 2012) as their main outcome of interest prior to collecting follow-up data, we would have far more confidence in the validity of the findings.
Going forward, field replications (carrying out similar deworming programs, and similar analysis to see whether similar results are obtained) would – in our view – greatly improve the robustness of the evidence.
In our view, the vast majority of aid interventions have almost no rigorous evidence behind them. A very small set of interventions – including LLIN distribution – have a broad, impressive evidence base. Deworming is somewhere in between. The studies discussed here are rigorous, have highly encouraging findings, held up to the best scrutiny we could bring to them. At the same time, many questions remain unanswered. This is one of the areas in which an additional long-term study would have the most effect on our views.