We often use academic research to inform our work, but we try to do so with great caution, rather than simply taking reported results at face value. We believe that if you trust academic research just because it is peer-reviewed, published, and/or reputable, this is a mistake.
A good example of why we’re concerned comes from the recent back-and-forth between David Roodman and Mark Pitt, which continues a debate begun in 1999 over what used to be considered the single best study on the social impact of microfinance.
It appears that the leading interpretation of this study swung wildly back and forth over the course of a decade, based not on major reinterpretations but on arguments over technical details, while those questioning the study were unable to view the full data and calculations of the original. We feel that this illustrates problems with taking academic research at face value and supports many of the principles we use in our approach to using academic research. Details follow.
According to a 2005 white paper published by the Grameen Foundation (PDF), a 1998 book and accompanying paper released by Shahidur Khandker and Mark Pitt “were influential because they were the first serious attempt to use statistical methods to generate a truly accurate assessment of the impact of microfinance.”
Jonathan Morduch challenged these findings shortly after their publication, but a 2005 followup by Khandker appeared to answer the challenge and claim that microlending had a very strong social impact:
each additional 100 taka of credit to women increased total annual household expenditures by more than 20 taka … microfinance accounted for 40 percent of the entire reduction of moderate poverty in rural Bangladesh.
As far as we can tell, this result stood for about four years as among the best available evidence that microlending helped bring people out of poverty. Our mid-2008 review of the evidence stated,
These studies rely heavily on statistical extrapolation about who would likely have participated in programs, and they are far from the strength and rigor of the Karlan and Zinman (2007) study listed above, but they provide somewhat encouraging support for the idea that the program studied had a widespread positive effect.
2009 response by Roodman and Morduch
A 2009 paper by David Roodman and Jonathan Morduch argued that
- The Khandker and Pitt studies were seriously flawed in their attempts to attribute impact. The reduction in poverty they observed could have been an artifact of wealth driving borrowing, rather than the other way around.
- The Khandker and Pitt studies could not be replicated: the full data and calculations they had used were not public, and Roodman and Morduch’s best attempts at a replication did not produce a remotely similar conclusion (they demonstrated no positive social impact of microlending, even a slight negative one).
This paper stood for the next two years as a prominent refutation of the Khandker and Pitt studies. Pitt writes that the work of Roodman and Morduch has become “well-known in academic circles” and “seems to have had a broad impact.” It appeared in a “new volume of the widely respected Handbook of Development Economics” as well as in congressional testimony.
Earlier this year,
- Mark Pitt published a response arguing that Roodman and Morduch’s failure to replicate his study was due to Roodman and Morduch’s errors.
- David Roodman replied, conceding an error in his original replication but defending his claim that the original study (by Khandker and Pitt) was not a valid demonstration of the impact of microlending.
- Mark Pitt responded again and argued that the study was a valid demonstration.
- David Roodman defended his statement that it was not and added, “this is the first time someone other than [Mark Pitt] has been able to run and scrutinize the headline regression in the much-discussed paper … If you anchor Pitt and Khandker’s regression properly in the half-acre rule … the bottom-line impact finding goes away.”
- We had hoped to see a further response from Mark Pitt before discussing this matter, but Roodman also wrote that Mark Pitt is now traveling and that “this could be the last chapter in the saga for a while.”
Bottom line: as far as we can tell, we still have one researcher claiming that the original study strongly demonstrates a positive social impact of microfinance; another researcher claiming it demonstrates no such thing; and no end in sight, 13 years after the publication of the original study.
Disagreements among researchers are common, but this one is particularly worrisome for a few reasons.
- Conflicting interpretations of the study have each stood for several years at a time. The original study stood as the leading evidence about microlending’s social impact between 2005-2009; the challenge by Roodman and Morduch was highly prominent, and apparently not commented on at all by the original authors, between 2009-2011.
- Disagreements have been technical, many concerning details that few understand and that still don’t seem resolved. David Roodman states that “the omission of the dummy for a household’s target status” is responsible for his estimated effect of microlending coming out negative instead of positive. Numerous other errors on both sides are alleged, and the remaining disagreements over causal inference are certainly beyond what I can easily follow (if a reader can explain them in clear terms I encourage doing so in the comments).
- Resolution has been hampered by the fact that Roodman and Morduch could only guess at the calculations Pitt and Khandker performed. This is the biggest concern to me. Roodman writes that he was never able to obtain the original data set used in the paper; that the data set he did receive (upon request) was (in his view) confusingly labeled; and even that one of the original authors “fought our efforts to obtain the later round of survey data from the World Bank.” As a result, his attempt at replication was a “scientific whodunit,” and his April 2011 update represents “the first time someone other than [the original author] has been able to run and scrutinize the headline regression in the much-discussed paper.”If I weren’t already somewhat familiar with this field, I would be shocked that it’s even possible to have a study accepted to any journal (let alone a prestigious one) without sharing the full details of the data and calculations, and having the calculations replicated and checked. But in fact, disclosure of data – and replication/checking of calculations – appears to be the exception, not the rule, and is certainly not a standard part of the publication/peer review process.
Bottom line – the leading interpretation of a reputable and important study swung wildly back and forth over the course of a decade, based not on revolutionary reinterpretations but on quibbles over technical details, while no one was able to view the full data and calculations of the original. For anyone assuming that a prestigious journal’s review process – or even a paper’s reputation – is a sufficient stamp of reliability on a paper, this is a wake-up call.
Some principles we use in interpreting academic research
- Never put too much weight on a single study. If nothing else, the issue of publication bias makes this an important guideline. (On this note, note that the 2009 Roodman and Morduch paper was rejected for publication; its sole peer-reviewer was an author of the original paper that Roodman and Morduch were questioning.)
- Strive to understand the details of a study before counting it as evidence. Many “headline claims” in studies rely on heavy doses of assumption and extrapolation. This is more true for some studies than for others.
- If a study’s assumptions, extrapolations and calculations are too complex to be easily understood, this is a strike against the study. Complexity leaves more room for errors and judgment calls, and means it’s less likely that meaningful critiques have had the chance to emerge. Note that before the 2009 response to the study discussed here was ever published, GiveWell took it with a grain of salt due to its complexity (see quote above). Randomized controlled trials tend to be relatively easy to understand; this is a point in their favor.
- If a study does not disclose the full details of its data and calculations, this is another strike against it – and this phenomenon is more common than one might think.
- Context is key. We often see charities or their supporters citing a single study as “proof” of a strong statement (about, for example, the effectiveness of a program). We try not to do this – we generally create broad overviews of the evidence on a given topic and source our statements to these.
While a basic fact can be researched, verified and cited quickly, interpreting an impact study with appropriate care takes – in our view – concentrated time and effort and plenty of judgment calls. This is part of why we’re less optimistic than many about the potential for charity research based on (a) crowdsourcing; (b) objective formulas. Instead, our strategy revolves around transparency and external review.