This post discusses our detailed examination (including, with help from the authors, reanalyzing raw data) of the Miguel and Kremer 2004 study on deworming (treating people for parasite infections) as a way to raise school attendance, and a followup study (Baird et al. 2012) on the later-in-life impacts.
Our current #3 charity, SCI, focuses on deworming. Deworming is quite cheap (we estimate that SCI spends ~$0.50 per person treated, including all costs) but the benefits are not as obvious and tangible as those of many other health programs because the parasites treated cause few deaths and their effects may be subtle. (So much so that a recent Cochrane review fails to find statistically significant impacts of population deworming on the outcomes that have been most studied). In our view, the case that deworming is a “good buy” depends heavily on the idea of developmental effects: the possibility that deworming children has a subtle, lasting impact on their development, and thus on their ability to be productive and successful throughout life.
The main evidence for this idea comes from (a) Bleakley 2004, a study of the Rockefeller Sanitary Commission’s campaign to eradicate hookworm in the American South in the early 20th century; (b) a series of studies in Kenya, in which school deworming was rolled out on a purposefully arbitrary (randomization-like) basis, and children who received more years of deworming were compared to children who had received fewer. This post focuses on the latter.
We have long had questions about these studies, relating both to the possibility of publication bias and to possible ways in which the setting of the studies was unrepresentative. This year, we decided to ask the study authors for the data and code behind their studies, so that we could run additional analyses to gain more information about the seriousness of our concerns. The authors graciously shared their data and code and helped us to interpret it.
The full details of our reanalysis are available here. A big-picture summary of our concerns and findings follows.
Data-mining is a form of publication bias, in which researchers look at many possible analyses of the data they collect, and present only the analyses most favorable to the conclusions they’re hoping to find. We were concerned about this issue for the series of deworming studies because
- The first study (Miguel and Kremer 2004) drew a great deal of attention for its positive result (regarding the positive impact of deworming on school attendance), raising the possibility that researchers had the incentive to declare positive results from subsequent studies as well.
- Different studies used different definitions of “treatment group,” and emphasized different outcomes (for example, the initial study emphasized school attendance; the second emphasized height; the third emphasized earnings).
What we found
After performing our own analysis, we are less concerned about this issue than we were previously. Changing our definition of the “treatment group” didn’t have much impact on the findings, and while the authors did share some outcomes with us that had not been reported in the paper, there were not a lot of these and they didn’t significantly change the picture.
That said, there do remain some reasons to be concerned about this issue. The authors of the most recent follow-up study (the one emphasizing earnings) shared not only data but also their funding proposal and survey questionnaire with us, and we note that
- The survey was extensive, and included a lot of information that wasn’t included in the data that researchers analyzed. We weren’t surprised to see that this was the case (the follow-up study aimed to collect a rich data set, not just data customized for the questions it was asking), but it still raises the possibility that bias could have crept into the process of transforming raw survey answers into analyzable data.
- The funding proposal expressed an interest in a wide variety of outcomes, including educational attainment, labor market outcomes (measurable in multiple ways, though the authors stated to us that one of the statistically significant positive effects described in the paper, labor market earnings, is the “canonical” measure for the field of labor economics), cognitive performance, happiness, health measures, and more; it did not clearly declare that any particular one was the primary outcome of interest.We attempted some rudimentary analysis of whether this fact could have facilitated spurious findings, and didn’t find strong reason to think that the findings were spurious. However, we struggled with the question of how a study’s findings should be adjusted for the fact that it had a larger universe of multiple outcomes that it could have chosen to emphasize; normal statistical adjustments for multiple comparisons do not perform well in cases with large numbers of ambiguous outcomes.
We feel that the issues above would have been much easier to resolve if the authors of the studies had preregistered their studies, declaring in advance what their primary and secondary outcomes and analyses of interest were. This is not to say that the authors have done anything unusual or wrong; our understanding is that preregistration was extremely rare (perhaps nonexistent) in economics at the time the study was done. In fact, Ted Miguel, one of the authors of the study, has gone on to co-author one of the first studies we’ve seen in this field that does utilize (and discuss) preregistration. Our point here is just that the practice of preregistration carries substantial credence benefits with us as consumers of research, and would affect our qualitative assessment of these findings.
Measure | Year 1 Prevalence | Year 2 Prevalence |
---|---|---|
Moderate to heavy schistosomiasis infection | 7% | 18% |
Moderate to heavy hookworm infection | 15% | 22% |
Moderate to heavy roundworm infection | 16% | 24% |
Moderate to heavy whipworm infection | 10% | 17% |
The above table likely substantially understates the degree of change, because the second-year figure includes the benefits of treatment externalities experienced by the control group (discussed above). A calculation sent to us by the authors implied that the 18% prevalence of moderate to heavy schistosomiasis infection in the control group in year 2 shown above should be augmented by the 22 percentage point externality effect of treatment to get a genuine counterfactual infection rate of 40% – despite the fact that the initial prevalence was (as shown above) only 7%. This implies that without the program, the area would have seen an extreme rise in prevalence of moderate-to-heavy schistosomiasis infections. A footnote in Miguel and Kremer 2004 attributes this phenomenon to “the extraordinary flooding in 1998 associated with the El Niño weather system, which increased exposure to infected fresh water (note the especially large increases in moderate-to-heavy schistosomiasis infections), created moist conditions favorable for geohelminth larvae, and led to the overflow of latrines, incidentally also creating a major outbreak of fecal-borne cholera” (Pg 174).
Because of this unusual situation, we worry that the results of studies from this place and time may not generalize well to other circumstances in which rates are at lower, more typical levels.
What we found
We did a couple of analyses to see whether the headline effects were sensitive to the prevalence of moderate-to-heavy infections, particularly schistosomiasis infections. As we expected, we did see some reason to believe that deworming had had larger impacts in higher- than in lower-prevalence areas, and that it had had larger impacts for schools with substantial schistosomiasis prevalence. That said, what we found was far from sufficient to completely explain away the studies’ findings. In particular, dividing the schools into those that did and didn’t have substantial schistosomiasis prevalence left us working with fairly small sample sizes, from which it is difficult to conclude anything.
Overall, we are moderately less concerned about this issue than we were before.
One possible alternative explanation is that parents/students may have sought to switch into the schools that “won” early deworming treatments, which could cause the treatment and control groups to differ in ways not picked up by the baseline data measured by the studies. Further discussion with the authors made this concern appear far less likely to us (our understanding is that the deworming program was announced very close to the time when students were registered as “treatment” or “control,” and that school transfers after the start of the program were rare and relatively symmetric between schools).
Another possibility that has become more salient to us in the course of analyzing these studies is that efforts to encourage students to attend school in order to receive treatment might have bled over to later days, increasing attendance in treatment schools over the following years. The particular piece of data that led us to examine this possibility is that within schools, there is no statistically significant difference in attendance rates for treated and untreated students (the effects only appear across schools). (The authors assume that this phenomenon occurred due to the presence of within-school externalities.) In the course of analyzing the studies more closely, we learned that treatment dates were announced at the school in advance in an attempt to boost take-up, and that some efforts were undertaken to boost attendance on drug distribution days.
Finally, we wondered whether the results of these studies might be driven by a few “outlier” schools, but after the analysis we’ve done of the raw data, we are now convinced that this is not an issue.
We still have substantial reservations about the studies. Preregistration would have been an additional measure that could have increased our confidence and lowered our concerns. In fact, this is the single case we’ve seen in which preregistration would have had the most influence on our conclusions. Had the authors preregistered hours worked or income amongst those employed (the key metrics showing improvement in Baird et al. 2012) as their main outcome of interest prior to collecting follow-up data, we would have far more confidence in the validity of the findings.
Going forward, field replications (carrying out similar deworming programs, and similar analysis to see whether similar results are obtained) would – in our view – greatly improve the robustness of the evidence.
In our view, the vast majority of aid interventions have almost no rigorous evidence behind them. A very small set of interventions – including LLIN distribution – have a broad, impressive evidence base. Deworming is somewhere in between. The studies discussed here are rigorous, have highly encouraging findings, held up to the best scrutiny we could bring to them. At the same time, many questions remain unanswered. This is one of the areas in which an additional long-term study would have the most effect on our views.
Comments
Thanks for this interesting review — I especially appreciate that the authors shared the material necessary for you to examine their results in more depth, and that you talk through your thought process.
However, one thing you highlighted in your post on the new Cochrane review (https://blog.givewell.org/2012/07/13/new-cochrane-review-of-the-effectiveness-of-deworming/) that isn’t mentioned here, and which I thought was much more important than the doubts about this Miguel and Kremer study, was that there have been so many other studies that did not find large effect on health outcomes! I’ve been meaning to write a long blog post about this when I really have time to dig into the references, but since I’m mid-thesis I’ll disclaim that this quick comment is based on recollection of the Cochrane review and your and IPA’s previous blog posts, so forgive me if I misremember something.
The Miguel and Kremer study gets a lot of attention in part because it had big effects, and in part because it measured outcomes that many (most?) other deworming studies hadn’t measured — but it’s not as if we believe these outcomes to be completely unrelated. This is a case where what we believe the underlying causal mechanism for the social effects to be is hugely important. For the epidemiologists reading, imagine this as a DAG (a directed acyclic graph) where the mechanism is “deworming -> better health -> better school attendance and cognitive function -> long-term social/economic outcomes.” That’s at least how I assume the mechanism is hypothesized. So while the other studies don’t measure the social outcomes, it’s harder for me to imagine how deworming could have a very large effect on school and social/economic outcomes without first having an effect on (some) health outcomes — since the social outcomes are ‘downstream’ from the health ones. Maybe different people are assuming that something else is going on — that the health and social outcomes are somehow independent, or that you just can’t measure the health outcomes as easily as the social ones, which seems backwards to me. (To me this was the missing gap in the IPA blog response to GiveWell’s criticism as well.)
So continuing to give so much attention to this study, even if it’s critical, misses what I took to be the biggest takeaway from that review — there have been a bunch of studies that showed only small effects or none at all. They were looking at health outcomes, yes, but those aren’t unrelated to the long-term development, social, and economic effects. You try to get at the exernal validity of this study by looking for different size effects in areas with different prevalence, which is good but limited. Ultimately, if you consider all of the studies that looked at various outcomes, I think the most plausible explanation for how you could get huge (social) effects in the Miguel Kremer study while seeing little to no (health) effects in the others is not that the other studies just didn’t measure the social effects, but that the Miguel Kremer study’s external validity is questionable because of its unique study population.
Agreed with the concerns. I understand that the Miguel data didn’t provide enough of a sample size of low prevalence schools to test the hypothesis, but I don’t quite understand how this leads to affirming or having greater confidence in the results. As Brett notes, given the other studies and the uniqueness of this study context it seems like the working assumption would lead to lowering our estimates of program impacts pretty significantly.
All these deep intellectual arguments are all very well but the stark truth is that if your children came home harbouring worms and you could deworm them for less than $1 you would immediately do so. The poorest kids in Africa deserve the same access to better health
Alan – much agreed that the poorest kids everywhere deserve the same access to better health. If I could snap my fingers and make it so I would: if I could fix my children’s or really anyone’s health problems I’d pay a lot more than just $1 to do so! But I don’t have enough money to deworm every kid in the world, or give everyone bednets, or give everyone with cancer chemotherapy — and so forth. The need is much larger than my capacity to give, or yours, or even Bill Gates’! So I think it does makes sense to figure out the BEST way to help and prioritize doing as much of that as possible. After all, if you spent $1 deworming a child, when spending the same $1 to give them a bednet (or pick your alternate intervention) would have a much *bigger* impact — well, wouldn’t you want to do the other thing?
I haven’t looked closely enough at what’s included in all of the non-Miguel studies, but it would be interesting to see a table with the prevalence rates and the significance of the variable of interest to the study.
I’m no epidemiologist, but it seems like a pretty simple model could explain the variation in results.
And wouldn’t *THE* key question for NGOs in this area be what the prevalence is in their areas of operations? Again, I just don’t see how after reviewing the literature one would feel confident about a blanket endorsement without confirmation that such a key contextual variable is above a certain threshold. Maybe this exists and I missed it, in which case, mea culpa.
Interesting discussion regardless.
to Brett Keller – your comment is fair – its going to be an individual decision when it comes to making a choice. When I worked in Sudan I had the experiance of seeing what a childhood infection with schistosomiasis does to people by the time they are 35 years old and just feel that I want to protect all children from these consequences.
Comments are closed.