Our suggestions should be taken in context, of course. On one hand, we do not have staff with backgrounds in academia; there’s a lot we don’t know about how the academic ecosystem works and what’s feasible. On the other hand, through our work researching charities, we do have an unusual amount of experience trying to use academic research to make concrete decisions. In a sense we’re a “customer” of academia; think of this as customer feedback, which has both strengths and weaknesses compared to internal suggestions.
Our single biggest concern when examining research is publication bias, broadly construed. We wonder both (a) how many studies are done, but never published because people don’t find the results interesting or in line with what they had hoped; (b) for a given paper, how many different interpretations of the data were assembled before picking the ones that make it into the final version.
The best antidote we can think of is pre-registration of studies along the lines of ClinicalTrials.gov, a service of the U.S. National Institutes of Health. On that site, medical researchers announce their questions, hypotheses, and plans for collecting and analyzing data, and these are published before the data is collected and analyzed. If the results come out differently from what the researchers hope for, there’s then no way to hide this from a motivated investigator.
In addition, in some ways we think that a debate over a study would more honest before its results are known.
- When we see a charity evaluation, we generally assume that the researchers did what they could to portray the evaluation as positive for the charity; we therefore tend to demand that the study has a very high level of rigor (maximal attempt to eliminate selection bias, full and transparent presentation of methodology and results) before putting any credence in it. We want to feel that the analysis was done the way we would have done it, and not the way the researcher felt would produce the “right answer.”
- But if someone showed us an evaluation plan before either of us knew the results, we’d be much more inclined to accept that shortcuts and methodological weaknesses were there for the purpose of saving time, saving money, ethical considerations, etc. instead of the purpose of producing the “right answer.” If the plan sounded reasonable to us, and the study then produced a favorable outcome, we’d find it much less necessary to pick apart the study’s methodology.
Bottom line – we’d potentially lower our bar for study design quality in exchange for a chance to discuss and consider the design before anyone knows the results.
The basic principle of pre-registration is a broad one, and will apply differently in different scenarios, but here are a few additional thoughts on applications:
- What we’re describing wouldn’t be feasible with studies that retrospectively analyze old data sets – there’s no way to know whether someone has done the analysis before registering the design and hypothesis. What we’re describing would be feasible with field experiments, which seem to be growing in popularity thanks partly to the excellent work of Poverty Action Lab and Innovations for Poverty Action. It could also be feasible with non-experiments: researchers could pre-register their plans for analyzing any data sets that aren’t yet available.
- Poverty Action Lab (example) and Innovations for Poverty Action (example) do publicly publish brief descriptions of studies in progress. These are helpful in that they are a safeguard against burying/canceling studies that aren’t going as hoped, but they would be more helpful if they included information about data to be collected, planned methods of analysis, and hypotheses (for contrast, see this example from ClinicalTrials.gov).
- There can be legitimate reasons to change aspects of the study design and analysis methods as the study is ongoing. So failure to conform to the pre-registered version wouldn’t necessarily cause us to dismiss a study. But seeing the original intent, the final study, and a list of changes with reasoning for each change would greatly increase transparency; we would consider whether the reasoning sounded more motivated by the desire to get accurate results or a desire to get specific results.
- There can be legitimate reasons to carry out a study without a strong hypothesis (for example, trying a health intervention and examining its impact on a variety of quality-of-life measures without a strong prior view on which would be most likely to show results). However, we still advocate explicitly declaring the hypothesis or lack thereof beforehand. Just seeing that there was no initial strong hypothesis would help us understand a study. Studies with no strong hypothesis are arguably best viewed as “exploratory,” generating hypotheses for future “confirmatory” studies.
- As food for thought, imagine a journal that accepted only studies for which results were not yet known. Arguably this journal would be more credible as a source of “well-designed studies addressing worthwhile questions, regardless of their results” as opposed to “studies whose results make the journal editors happy.” For our part, we’d probably watch this sort of journal eagerly while on the watch for top charities.
We think the “randomista” movement has done a lot of good for the credibility and usefulness of research. We’d love to see a “preregistrista” movement.
Chris Blattman’s suggestions. We liked Chris’s suggestions:
1. Journals should require submission of replication data and code files with final paper submissions, for posting on the journal site. (The Journal of Conflict Resolution is one of the few major political science or economics journals I know that does so faithfully.)
2. PhD field and method courses ought to encourage replication projects as term assignments. (Along with encouragements to diplomacy–something new scholars are slow to learn, to their detriment.)
Tougher norms regarding the use of footnotes to support strong claims. It is frequent for a paper to make a strong claim, with a footnote leading to a citation; it’s then up to us to find the paper cited (not always possible) and figure out in what way and to what degree it supports the claim. The paper may not be a literature review or other broad overview; it’s often a single study, which we don’t consider sufficient evidence for a strong claim (see our previous post).
On the GiveWell site we use a different convention, one that grew out of our own wish to facilitate easy checking and updating of footnotes by different staff members and volunteers. A GiveWell footnote will either (a) lead to another GiveWell page (these tend to be concise, up-to-date overviews of a topic); or (b) include enough quotation and/or analysis to make it clear what the heart of the support for the claim is, along with page numbers for any quotation. It’s therefore easy to quickly see the degree to which our footnotes support our claims (though one may still wish to vet the papers that we cite and see whether their analysis supports their own claims).
If space limitations are an issue, footnotes could be published online, as they are for some books.
Online tools for viewing the full debate/discussion around a paper at a glance. We’d like to see websites organizing the relationships between different papers and comments, such that we could look up a paper and see the whole debate/discussion around it at a glance – including
- Published responses and challenges
- Comments from other scholars, without these comments’ having to get published separately
- Literature reviews discussing the paper
- Related (but chronologically later) papers
- Perhaps a summary of the overall discussion, strengths, weaknesses, etc. (when someone wanted to submit one and it passed the website’s quality criteria)