Meta-research

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

We previously laid out our working set of focus areas for GiveWell Labs. This post further elaborates on the cause of “meta-research” and explains why meta-research is currently a very high priority for us – it is our #2 highest-priority focus area, after global health and nutrition.

Meta-research refers to improving the incentives in the academic world, to bring them more in line with producing work of maximal benefit to society. Below, we discuss

  • Problems and potential solutions we perceive for (the incentives within) development economics, the area of academia we’re currently most familiar with.
  • Some preliminary thoughts on the potential of meta-research interventions in other fields, particularly medicine.
  • Why we find meta-research so promising and high-priority as a cause.
  • Our plans at the moment for investigating meta-research further.

Meta-research issues for development economics

Through our work in trying to find top charities, we’ve examined a fair amount of the literature on how Western aid might contribute to reducing poverty, which we broadly refer to in this post as “development economics.” In doing so, we’ve noticed – and discussed – multiple ways in which development economics appears to be falling short of its full potential to generate useful knowledge:

Lack of adequate measures against publication bias. We have written extensively about publication bias, which refers broadly to the tendency of studies to be biased toward drawing the “right” conclusions (the conclusions the author would like to believe in, the conclusions the overall peer community would like to believe in, etc.) Publication bias can come both from “data mining” (an author interprets the data in many different ways and publishes/highlights the ways that point to the “right” conclusions) and the “file drawer problem” (studies that do not find the “right” conclusions have more difficulty getting published).

Conceptually, publication bias seems to us like one of the most fundamental threats to academia’s producing useful knowledge – it is a force that pushes research to “find” what is already believed (or what people want to believe), rather than what is true, in a way that is difficult for the users of research to detect. The existing studies on publication bias suggest that it is a major problem. There are potential solutions to publication bias – particularly preregistration – that appear underutilized (we have seen next to no use of preregistration in development economics).

A funder recently forwarded us the following comment on a paper under review from a journal, which illustrates this problem:

Overall, I think the paper addresses very important research questions. The authors did well in trying to address issues of causality. But the lack of results has weakened the scope and the relevance of the paper. Unless the authors considerably generate new and positive results by looking say at more heterogeneous treatment effects, the paper cannot, in my view, be published in an academic journal such as the [journal in question].

Lack of open data and code, by which we mean the fact that academic authors rarely share the full details behind their calculations and claims. David Roodman wrote in 2010:

Not only do authors often keep their data and computer programs secret, but journals, whose job it is to assure quality, let them get away with it. For example, it took two relatively gargantuan efforts—Jonathan Morduch’s in the late 1990s, and mine (joining Jonathan) more recently—just to check the math in the Pitt and Khandker paper claiming that microcredit reduced poverty in Bangladesh. And it’s pretty clear now that the math was wrong.

The case he discusses turned out, in our opinion, to be an excellent illustration of the problems that can arise when authors do not share the full details of their calculations: a study was cited for years as some of the best available evidence regarding the impact of microfinance, but it ultimately turned out to be badly flawed, and later more rigorous studies contradicted its conclusions. (See our 2011 discussion of this case.)

Another example of the importance of open data was our 2011 uncovering of errors in a prominent cost-effectiveness estimate for deworming. This estimate had been public and cited since 2006, and it took us months of back-and-forth to obtain the full details behind it; at that point it turned out to contain multiple basic errors that caused it to be off by a factor of ~100.

The lack of open data is significant for reasons other than the difficulty of understanding and examining prominent findings. It also is significant because open data could be a public good for researchers; one data set could be used by many different researchers to generate multiple valuable findings. Currently, incentives to create such public goods seem weak.

Inadequate critical discussion and examination of prominent research results. The above two examples, in addition to illustrating open-data-related problems, illustrate another issue: it appears that there are few incentives within academia to critically examine and challenge others’ findings. And when critical examinations and challenges do occur, they can be difficult to find. Note that Roodman and Morduch’s critique (from the example above) was rejected by the journal that had published the study they were critiquing (the sole reviewer was the author of the critiqued study); as for the case of the DCP2 estimate, the critique came from GiveWell and has been published only on our blog (five years after the publication of the estimate).

Overall, our impression is that there is little incentive for academics to actively investigate and question each others’ findings, and that doing so is difficult due to the lack of open data (mentioned above).

Lack of replication. In addition to questioning the analysis of prominent studies, it would also be useful to replicate them: to try carrying out similar interventions, in similar contexts, and seeing whether similar results hold.

In the field of medicine, it is common for an intervention to be carried out in many different rigorous studies (for example, the literature on the effects of distributing insecticide-treated nets includes 22 different randomized controlled trials, and the programs executed are broadly similar though there are some differences). But in development economics, this practice is relatively rare.

More at a recent post by Berk Ozler.

General disconnect between “incentive to publish” and “incentive to contribute maximally to the stock of useful knowledge.” This point is vaguer, but we have heard it raised in multiple conversations with academics. In general, it seems that academics are encouraged to do a certain kind of work: work that results in frequent insights that can lead to publications. Other kinds of useful work may be under-rewarded:

  • Creating public goods for other researchers, such as public data sets (as discussed above)
  • Work whose main payoff is far in the future (for example, studies that take 20 years to generate the most important findings)
  • Studies that challenge widely held, fundamental assumptions in the field (and thus may have difficulty being published and cited despite having high value)
  • Studies whose findings are important from a policymaking or funding perspective, but not interesting (and thus difficult to publish) in terms of delivering surprising or generalizable new insights. For example, we have only been able to identify one randomized controlled trial of a program for improving rural point-of-source water quality, despite the popularity and importance of this type of intervention.

Potential interventions to address these issues.

We’ve had several conversations with academics and funders who work on development economics about how the above issues might be addressed. Most are directed at the specific problems we’ve listed above, though some are more generally in the category of “creating public goods for the research community as a whole.” Some of the more interesting ideas we’ve come across:

  • Funding efforts to promote the use of preregistration and data/code sharing, such as advocating that journals require these things of their publications (a journal might require preregistration and data/code sharing as a condition of publication) or that funders require these things of their grantees (a funder might require preregistration and data/code sharing from all funded studies).
  • Creating a “journal of good questions” – a journal that makes publication decisions on the basis of preregistered study plans rather than on the basis of results. The idea is to reward (with publication) good choices of topics and hypotheses and plans for investigating them, regardless of whether the results themselves turn out to be “interesting.” (We have previously discussed this idea.)
  • Funding a journal, or special issue of a journal, devoted to open-access data sets. Each data set would be accompanied by an explanation of its value and published as a “publication,” to be cited by any future publication drawing on that data set. This may improve incentives to create and publish useful open-access data sets, since scholars who did so well could end up publishing the data sets as papers and having them cited.
  • Funding the creation of large-scale, general-purpose open-access data sets. Currently, researchers generally collect data for the purpose of conducting a particular study; an effort that aimed specifically to create a public good might be better suited to maximizing the general usefulness of the collected data, and may be able to do so at greater scale than would be realistic for a data set aiming to answer a particular question. For example, one might fund a long-term effort to track a representative population in a particular developing country, randomly separating the population into a large “control group” and a set of “treatment groups” that could be treated with different interventions of general interest (cash transfers, scholarships, nutrition programs, etc.)
  • Funding a journal, or special issue of a journal, devoted to discussion, critiques, re-analyses, etc. of existing studies, in order to put more emphasis on – and give more reward to – this activity.
  • Funding awards for excellent public data sets and for excellent replicative studies, reanalysis, and other work that causes either confirmation or re-examination of earlier studies’ findings.
  • Creating a group that specializes in high-quality systematic reviews that summarize the evidence on a particular question, giving heavier weight to more credible studies (similar to the work of the Cochrane Collaboration, which we discuss more below). These reviews might make it easier for funders, policymakers, etc. to make sense of research, and would also provide incentives to researchers to conduct their studies in more credible ways (employing preregistration, data/code sharing, etc.)
  • Creating a web application for sharing, discussing, and rating papers (discussed previously).
  • Awards for the most useful and important research from a policymaker’s or funder’s perspective (these could take practices like data sharing and registration into account as inputs into the credibility of the research).
  • Promoting an “alternative/supplemental reputation system” for papers (and potentially academics) directly based on the value of research from a funder’s or policymaker’s perspective, taking practices like data sharing and registration into account as inputs into the credibility of the research.
  • Creating an organization dedicated to taking quick action to take advantage of “shocks” (natural disasters, policy changes, etc.) that may provide opportunities to test hypotheses. When a “shock” occurred, the organization could poll relevant academics on what the important questions are and what data should be collected, record the academics’ predictions, and fund the collection of relevant data.

Meta-research for other fields

We aren’t as familiar with most fields of research as we are with development economics. However, we have some preliminary reason to think that many fields in academia have a similar story to development economics: multiple issues that keep them short of reaching their full potential to generate useful knowledge, and substantial room for interventions that may improve matters.

  • We recently met with representatives of the Cochrane Collaboration, a group that does systematic reviews of medical literature. We have found Cochrane’s work to be valuable and high-quality, and we were surprised to be told that the U.S. Cochrane Center raises very little in the way of unrestricted funding. After talking to more people in the field, we have formed a preliminary impression that there is little funding available for medical initiatives that cut across biological categories, including the sort of work that Cochrane does (which we would characterize as “meta-research” in the sense that it works toward improved incentives and higher value-added for research in general). We will be further investigating the Cochrane Collaboration’s funding situation and writing more about it in the future.
  • Informal conversations have given me the impression that many of the problems described above – particularly lack of adequate measures against publication bias, lack of preregistration, lack of data/code sharing, and general misalignment between what academics have incentives to study and what would be most valuable – apply to many other fields within the natural and social sciences.
  • I’ve also heard of other problems and ideas that are specific to other fields. For example, a friend of mine in the field of computer science stated to me that
    • There are too few literature reviews in the field of computer science, summarizing what is known and what remains to be determined within a particular field. The literature reviews that do exist quickly become out of date. More up-to-date literature reviews would make it easier for people to contribute to fields without having to be at the right school (and thus in the right social network) for these fields.
    • There are some sub-fields in computer science that require testing different algorithms on data sets, such that the number of appropriate available data sets is highly limited. (For example, testing an algorithm for analyzing online social networks against a data set based on an actual online social network.) In practice, academics often design algorithms that are “over-fitted” to the data sets in use, such that their predictive power over new data sets is questionable. He proposed a set of centralized “canonical” data sets, each split into an “exploration” half and a “confirmation” half; while the “exploration” half would be open access, the “confirmation” half would be controlled by a central authority and academics would be able to test their algorithms on it only in a limited, controlled way (for example, perhaps each academic would be given 5 test runs per month). These data sets would constitute a public good making it easier to compare different academics’ algorithms in a meaningful way, both by reducing the risk of over-fitting and by bringing more standardization to the tests.

Overall, the conversations I’ve had about meta-research – even with people who aren’t carefully selected, such as personal friends – have resulted in an unusually high density of strong opinions and novel (to me) ideas for bringing about positive change.

Why we find meta-research promising as a cause

High potential impact. As we wrote previously, it seems to us that many of philanthropy’s most impressive success stories come from funding scientific research, and that meta-research could have a leveraged impact in the world of scientific research.

Seeming neglect by other funders. We see multiple preliminary signs that this area is neglected by other funders:

  • In examining what foundations work on today, we haven’t seen anyone who appears to have a focus on meta-research. We recently attended a funders’ meeting on promoting preregistration and got the same impression from that meeting.
  • As mentioned above, informal conversations seem to lead more quickly to “ideas for projects that could be worked on but aren’t currently being worked on” than conversations in other domains.
  • As mentioned above, we are surprised by the U.S. Cochrane Center’s apparent low level of funding and need for more funds, and feel that this may point to meta-research as a neglected area.

Good learning opportunities. We have identified funding scientific research as an important area for further investigation. We believe it is one of the most promising areas in philanthropy and also one of the areas that we know the least about. We believe that investigating the question, “In what ways does the world of academic research function suboptimally?” will lead naturally to a better understanding of how that world operates and where within it we are most likely to find overlooked giving opportunities.

Our plan for further investigation of meta-research as an issue area

We are pursuing the following paths of further investigation:

  • Further investigation of the Cochrane Collaboration, starting with conversations with potential funders about why it is having trouble attracting funding. We believe that the Cochrane Collaboration may turn out to be an excellent giving opportunity, and if it does, that this will provide further evidence that meta-research is a promising and under-invested-in cause; on the other hand, if we discover reasons to doubt Cochrane’s effectiveness or need for more funds, this will likely be highly educational in thinking about meta-research in general.
  • Conversations with academics about meta-research-related issues. Some of the key questions we have been asking and will continue to ask:
    • Are there any ways in which the academic system is falling short of its full potential to generate useful knowledge? What are they?
    • What could be done about them?
    • Who is working on the solutions to these problems? Who would be the logical people for a funder to work with on them?
    • Is there any research that you wish you could do but can’t get funded to do? Is there any research that you generally feel ought to be taking place and isn’t? If so, why is this happening?
    • Are there areas of research that you think is overdone or overinvested in? Why do you think this is?
    • What do you think of the ideas we’ve accumulated so far? To the extent that you find one or more to be good ideas, whom would you recommend working with to move forward on or further investigate them?
    • Whom else would you recommend speaking with?
  • Trying to get a bird’s-eye view of the world of academic research, i.e., a view of what the various fields are, how large they are (in terms of people and funding), and where the funding for them comes from. We hope that this bird’s-eye view will help us be more strategic about which fields best combine “high potential” with “major room for interventions to improve their value-added,” and thus to pick fields to focus on for meta-research in a more systematic manner than we’ve done so far.

Comments

Meta-research — 5 Comments

  1. Good stuff. A few comments:

    - Normally the area of ‘meta-research’ would include doing systematic reviews (as Cochrane does), i.e., it’s not limited solely to changing the incentives on academics

    - Groups already exist which do systematic reviews in development economics, e.g., Copenhagen Consensus Centre and the Campbell Collaboration. DFID has a good intro paper. They’re rather harder in social sciences than medicine because of the external validity problem (drugs work basically the same in all bodies, but teachers don’t work basically the same in all countries) so it’s harder (often impossible) to aggregate the results of trials as the medics do.

    - There are already pre-registration systems in development economics, precisely to counter publication bias. 3ie has one, as do JPAL & IPA (I think). I’ve no clue whether any of them is complete

    - One of the reasons behind IPA’s formation is the problem of academics having little incentive to do replication studies. Once somebody’s found that Intervention X works amazingly in North Kenya, it’s not academically rewarding to investigate it in South Kenya or East Kenya let alone India or wherever – so IPA deliberately trains non-academics to do those studies, because the data are useful and non-academics can be incentivised differently. IPA’s current multi-country replication of an intervention found successful in East India is led by Annie Duflo, for instance.

    - We have our work cut out to build demand for this kind of data. The UK’s Guardian ran a piece recently about how we need LESS uniformity between measurement systems: systematic reviews and all the learning they produce are ONLY possible if we have a lot more. I suspect that this, like most things, would magically appear if donors demanded it.

  2. Thanks for the thoughts, Caroline. Some responses:

    • Re: systematic reviews. The Campbell Collaboration focuses on “education, crime and justice, and social welfare.” I’m not aware of any work it’s done within development economics. While there are others (including Copenhagen Consensus) that do literature reviews with development economics, I’m not aware of any systematic reviews that are comparable in quality (in terms of transparency of protocol, detailed discussion of the merits and drawbacks of each included study, etc.) to the Cochrane reviews I’ve examined; relatedly, I’m not aware of any group that is dedicated specifically to systematic reviews in this field.
    • I’m aware that there are pre-registration systems in progress within development economics, but I have never seen one in use: the only cases I’m aware of in which a development economics study pre-declared its analysis plan are for (a) studies that include health outcomes and are registered on clinicaltrials.gov; (b) GiveDirectly.
    • I agree that IPA emphasizes the importance of replication and is making some progress in making it happen, but we think progress could be faster and that there is room for a funder to help (and I would guess that IPA would agree with this). As of today I’m not aware of any development RCTs (except perhaps on cash transfers) that have what I consider enough replication behind them to be robust.
  3. Holden,

    This is a very exciting and promising area for GiveWell to explore, and I’m ecstatic that you guys have chosen to pursue it as a cause.

    It’s always seemed to me to be a “solvable problem”; compared to, say, averting genocide or eradicating AIDS, changing the culture and practices of the academic community has always seemed to me to be a lower hanging fruit.

    I heard a talk from a chemist at Harvard named George Whitesides last year about this topic. The take-home point for me was made in a single 2×2 table he drew: one axis was importance of the research, and the other was the likelihood of success of the research (“success” in this context simply referred to the production of a result that would lead to a publication, which of course is the currency for all academics.)

    He said that a graduate student/professor/researcher always has an incentive to gravitate towards work in the top right quadrant of the chart (unimportant work that will probably lead to a result and therefore a publication) rather than the bottom left of the chart (work that is very important, but has a low probability of producing publishable results anytime soon).

    I’ll be excited to hear what avenues you find for donors looking to fund meta-research.

  4. I think there are at least a couple other examples of pre-registration in development economics/political science:

    Of course, I agree with Holden’s point that talk about these proposals has outstripped implementation to date.