The GiveWell Blog

GAVI appears to be out of room for more funding (good news)

We’ve always been interested in GAVI, a large funding vehicle for immunizations (which we consider to be one of the best interventions out there for accomplishing good).

Until recently, GAVI projected a need for $3.7 billion between 2011-2015 (archived). However, yesterday there was an announcement that GAVI had raised $4.3 billion, more than enough to cover this need (archived).

In the past, we’ve refrained from recommending GAVI because we have trouble fully understanding its activities, but we’ve been continuing to revisit it and think about how we might gain better understanding. Now we are fairly confident in not recommending GAVI because it appears to have all the funding it needs (which, given its area of focus, we consider very good news).

This situation illustrates one of the trickier room for more funding-related challenges of individual giving. Even if a nonprofit has significant funding needs today, are there big donors right around the corner about to swoop in and render your donation irrelevant?

Before we recommend a charity, we seek as good an understanding as possible of its room for more funding, and this includes asking it about what revenue it expects in the future. This is the best way we know to avoid recommending a charity just before its funding gap is closed by megadonors, but we don’t think this approach is foolproof. We continue to find the issue of room for more funding – and in particular, the possibility of a GAVI-like situation – to be very difficult to deal with.

Why we should expect good giving to be hard

We’ve written before about a couple of consistent worldview differences we encounter:

When discussing any specific charity, I can usually think of specific reasons that the charity’s mission is difficult, and specific ways that it might be failing. Here I’m going to try to give a more general argument for why it’s hard to accomplish a lot of good with your donation. I’m not presenting this as a rigorous, evidence-backed argument; I’m just further clarifying my worldview and where I’m coming from.

When you want to help people as a donor, you have to get in line behind all of the groups below:

  • For-profit companies. I believe that most of the things you can do that make strangers’ lives better are things you can get paid for. Every day people help each other send packages, prepare food, recover from illness, etc. via market transactions. This may seem like a trivial and obvious point, but it’s the reason we are so focused on helping the very poor. When you’re trying to help people who aren’t poor, you’re competing with for-profit enterprises.

    And even the very poor get a lot of help from for-profit services. For example, when people started realizing that cellphones could be useful to the very poor, the result was expansion of for-profit cellphone service into the developing world. There were some nonprofit attempts to contribute to this dynamic, but we’re skeptical that they added much value on top of the profit-driven ones.

    I am certainly not saying that all profit-making enterprises are helpful, nor that all forms of help are profitable. But a lot of the easiest help to provide – even for the poorest – is already being provided by people who are doing it to make money.

  • Governments. When a market failure is clear and severe, the government often steps in. Many feel it does not step in enough or that it does more harm than good, but the fact remains that much of the “lowest-hanging fruit” for helping people where markets won’t is covered by governments. Low-income people in the U.S. get free education with high teacher attendance rates, free emergency medical care, and cash among other things. People in the developing-world get far less from their governments, but most governments still provide a good deal of free medical care.
  • Local philanthropy and community. When it comes to market failures that the government has failed to address, there are still often local nonprofits – and just local people – who are well placed to step in quickly and effectively. This is not an endorsement of small community-based organizations as giving opportunities for individual donors outside the community. If you’re outside the community you’re trying to help, you’re going to have trouble figuring out what the real problems are and who ought to be funded to address them; the people in the community will often be better placed to help, by donating and otherwise, than you are.
  • Big foundations. There are opportunities to help that are missed by for-profits, governments, and locals. There are many extremely well-funded and well-staffed foundations looking for just these opportunities.
  • Other donors. If you want your donation to have an impact, you need to find opportunities that have been missed not only by all the groups above, but by other individual donors. Our focus on room for more funding is an attempt to deal with this situation.

In my view, the wealthier the community, the more effective the first three items above (for-profits, government and locals) will be in addressing their problems. Therefore, if you want to find opportunities to provide help that isn’t already being given, you probably need to look at the world’s poorest communities – but doing that probably means helping people who are very far away and very culturally different from yourself, and you have to find opportunities that haven’t already been found by the big foundations or other donors.

When a donor says, “I have $1000 that I’d like to use to help someone,” it may not sound like they’re asking for much. But on reflection, I think they’re really saying, “I’m looking for someone who needs help that they can’t get from a company, their government, their community, or any other donor big or small – and I expect to provide this help just by sending a $1000 check, despite having very little experience or knowledge of the situation.”

Put this way, the donor’s request sounds somewhat exorbitant, and it seems that we shouldn’t expect them to be able to accomplish much with their $1000. Yet as it turns out, I believe that (if they take the rare opportunities that we highlight at GiveWell) they can often use that money to save a life. I think this is a somewhat shocking observation and that it reflects serious problems with the nonprofit ecosystem.

I also think we shouldn’t expect this to be the situation indefinitely. I hope that as the world gets better at providing help to those who need it, all the opportunities to save a life for $1000 will be snapped up more quickly. That will leave GiveWell customers – individual donors looking to help people they’ve never met and know little about – with much less exciting options, and that’s how it should be.

Profile of a GiveWell customer

The Money for Good study examined the size of the potential audience for work like GiveWell’s. What we’d like to see next would be a study on the nature of this audience: what sort of donor is open to giving based on third-party research? How do they think, what sorts of causes are they interested in, and where can they be found?

We don’t have the resources for a large-scale study on this topic, but in the absence of this, we thought it would be worthwhile to share our impressions from interacting with our own “customer donors,” i.e., people who use GiveWell to decide where to give. Because our money moved is somewhat “top heavy” (the bulk of the money comes from a relatively small number of relatively large donors), we have had in-depth conversations and formed informal relationships with people accounting for a large chunk of our influence. Some of the things we’ve found out about GiveWell customers have surprised us and led to changes in strategy; below we discuss the general impressions we’ve gotten.

  • Attitudes toward evidence seem less key than we would have guessed. When we started GiveWell, we and most of our supporters imagined that new customers could be found in certain industries where people are accustomed to using measurement to evaluate and learn from their decisions. We hoped these people would resonate with our desire to bring feedback loops into areas where feedback loops don’t naturally exist. But we’ve found that a lot of them don’t, largely because impact isn’t the main thing they’re aiming for when they give. People give for many reasons – to maintain friendships, to overcome guilt and cognitive dissonance, to achieve recognition – and a given donor is unlikely to be interested in GiveWell unless achieving impact is at the top of his/her list.
  • Moral and spiritual values seem more key than we would have guessed. Several of our major “customer donors” see their giving as fulfilling religious imperatives to help others. Many others have backgrounds in philosophy and deep interests in secular moral systems. What these two groups have in common, in my view, is serious investment in abstract principles of morality – principles that tell them it is as worthwhile to help people they’ll never meet as it is to help those in their community.
  • GiveWell customers tend to be cause-agnostic and/or aligned with our views on the most promising philanthropic causes. We discussed this phenomenon in early 2011; it’s been one of the major surprises for us. We initially anticipated that many donors would agree with our basic approach to research, but would have strong disagreements with us on which causes to support, and that we would thus have to research a lot of different causes to serve them. That hasn’t turned out to be the case: most people tend to be either fully aligned with our approach to giving (including issue-agnosticism and a preference to help the poorest people, even if they’re overseas) or not aligned at all.
  • GiveWell customers tend to be busy with non-philanthropic pursuits. People who spend all their time on philanthropy generally have their own relationships, convictions based on personal observation, etc. The people most likely to use our research are the ones who don’t have time to do their own. While not surprising, this phenomenon has made it somewhat difficult for us to get thoughtful, meaningful feedback on our research despite our efforts to make it transparent. That’s led us to experiment with more ways to solicit feedback, including formal evaluations, our public email group, summaries of our views on key topics posted to this blog, and (in progress) a possible in-person meeting to discuss our research with any “customer donors” who want to attend.
  • GiveWell customers don’t tend to be friends with each other; it’s been very hard to find “clusters” of them. Most of our customers have had little success interesting their friends in us. We’ve found very little in the way of particular industries or social groups filled with GiveWell customers; instead, it seems that for every twenty people we talk to in any setting, one or two will become a customer. It’s thus not surprising that one of our top sources of referrals is Google: people tend to find us more than we find them.The major exception is that we seem to get disproportionate interest from fans of Peter Singer.
  • Some of GiveWell’s donors are very young (considering how much they give) and very few are retired. This has implications for the future of GiveWell’s influence: the money we’re moving now is mostly coming from people whose peak giving years are yet to come.
  • GiveWell customers never seem interested in public recognition. In our first year, we considered posting acknowledgements to major supporters on our website, but there was no interest. Since then, we have had many customers who require anonymity (even when we ask them to take public credit for our sake) and no customers who’ve requested that we publicly thank them or otherwise help them get recognition.

In my view, the last point is most consistent and most interesting. GiveWell customers have many reasons for giving – some religious, some secular – but they all revolve around achieving an impact for others rather than around getting any direct personal benefit (whether social, reputational or emotional). GiveWell customers are the kind of people who give when no one is watching, and for whom giving is not about the giver. One of the highlights of running GiveWell has been coming into contact with the people who share this value with us.

In defense of the streetlight effect

In a recent guest post for Development Impact, Martin Ravallion writes:

The current fashion [for evaluating aid projects] is for randomized control trials (RCTs) and social experiments more generally … The problem is that the interventions for which currently favored methods are feasible constitute a non-random subset of the things that are done by donors and governments in the name of development. It is unlikely that we will be able to randomize road building to any reasonable scale, or dam construction, or poor-area development programs, or public-sector reforms, or trade and industrial policies—all of which are commonly found in development portfolios. One often hears these days that opportunities for evaluation are being turned down by analysts when they find that randomization is not feasible. Similarly, we appear to invest too little in evaluating projects that are likely to have longer-term impacts; standard methods are not well suited to such projects … (Emphasis mine)

He concludes with a call for “‘central planning’ in terms of what gets evaluated” to ensure that evaluation doesn’t become concentrated among the projects that are easy to evaluate. His post could be seen as a direct retort to the kind of work emphasized in the recent books Poor Economics and More Than Good Intentions (our review). These books present ideas and evidence that is mostly drawn from high-quality studies, and have little to say on questions that high-quality studies cannot help answer.

My instinct is the opposite of Dr. Ravallion’s: I feel that the move toward high-quality evaluations is a good thing, even if it starts to cause bias in what sorts of programs are evaluated – and carried out. What follows is an attempt to explain my feeling on this. My feeling is a function of my worldview and biases, and this post should be taken less as a “rebuttal” than as an opportunity to explicate my worldview and biases.

My disagreement with Dr. Ravallion has to do with my experience as a “customer” of social science. The vast majority of studies I’ve come across have been seemed so methodologically suspect to me that I’ve ended up not feeling they shed much light on anything at all; and many (not all) exceptions are studies that have come out of the “randomista” movement. (Another particularly helpful source of evidence has been Millions Saved, which focused on global health.) Given this situation, I’m not excited about using “central planning” to make sure that researchers continue to try answering questions that they simply don’t have the methods to answer well. I’d rather see them stick to areas where they can be helpful.

What does it look like when we build knowledge only where we’re best at building knowledge, rather than building knowledge on the “most important problems?” A few thoughts jump to mind:

  • Over the last several decades, I am not sure whether we’ve generated any useful and general knowledge about how to promote women’s empowerment and equality – from the outside – in developing-world countries. But we’ve generated a lot of knowledge about how to produce affordable, convenient birth control in a variety of forms. I would guess (though this is just a guess, as empowerment itself is so hard to measure) that the latter kind of knowledge generation has done much more for empowerment and equality than attempts to study empowerment/equality directly.
  • Similarly, what has done more for political engagement in the U.S.: studying how to improve political engagement, or studying the technology that led to the development of the Internet, the World Wide Web, and ultimately to sites like Change.org (as well as new campaign methods)?
  • More broadly, studying areas we’re good at studying and generating knowledge we’re good at generating has led to a lot of wealth generation and poverty reduction. I feel poverty reduction brings a lot of benefits that would be hard to bring about (or even fully understand) directly.

Bottom line – researching topics we’re good at researching can have a lot of benefits, some unexpected, some pertaining to problems we never expected such research to address. Researching topics we’re bad at researching doesn’t seem like a good idea no matter how important the topics are. Of course I’m in favor of thinking about how to develop new research methods to make research good at what it was formerly bad at, but I’m against applying current problematic research methods to current projects just because they’re the best methods available.

If we focus evaluations on what can be evaluated well, is there a risk that we’ll also focus on executing programs that can be evaluated well? Yes and no.

  • Some programs may be so obviously beneficial that they are good investments even without high-quality evaluations available; in these cases we should execute such programs and not evaluate them.
  • But when it comes to programs that where evaluation seems both necessary and infeasible, I think it’s fair to simply de-emphasize these sorts of programs, even if they might be helpful and even if they address important problems. This reflects my basic attitude toward aid as “supplementing people’s efforts to address their own problems” rather than “taking responsibility for every problem clients face, whether or not such problems are tractable to outside donors.” I think there are some problems that outside donors can be very helpful on and others that they’re not well suited to helping on; thus, “helping with the most important problem” and “helping as much as possible” are not at all the same to me.

It’s common in our sphere to warn against the “streetlight effect,” i.e., “looking for your keys where there’s light, rather than where the keys are most likely to be.” In the context of aid, this means executing – and studying – the programs that are easiest to evaluate rather than the programs that are most likely to do good. (Chris Blattman uses this analogy in the context of Dr. Ravallion’s post.)

But for the aid world, the right analogy would acknowledge that there are a lot of keys to be found, and a lot of unexplored territory both in and outside the light. In that context, the “streetlight effect” seems like a good thing to me.

Suggestions for the social sciences

Chris Blattman cites our advice on using academic research and asks for “Other suggestions for the profession.” We have several and this seemed like a good time to share them.

Our suggestions should be taken in context, of course. On one hand, we do not have staff with backgrounds in academia; there’s a lot we don’t know about how the academic ecosystem works and what’s feasible. On the other hand, through our work researching charities, we do have an unusual amount of experience trying to use academic research to make concrete decisions. In a sense we’re a “customer” of academia; think of this as customer feedback, which has both strengths and weaknesses compared to internal suggestions.

The #1 change we’d like to see: pre-registration of studies
Our single biggest concern when examining research is publication bias, broadly construed. We wonder both (a) how many studies are done, but never published because people don’t find the results interesting or in line with what they had hoped; (b) for a given paper, how many different interpretations of the data were assembled before picking the ones that make it into the final version.

The best antidote we can think of is pre-registration of studies along the lines of ClinicalTrials.gov, a service of the U.S. National Institutes of Health. On that site, medical researchers announce their questions, hypotheses, and plans for collecting and analyzing data, and these are published before the data is collected and analyzed. If the results come out differently from what the researchers hope for, there’s then no way to hide this from a motivated investigator.

In addition, in some ways we think that a debate over a study would more honest before its results are known.

  • When we see a charity evaluation, we generally assume that the researchers did what they could to portray the evaluation as positive for the charity; we therefore tend to demand that the study has a very high level of rigor (maximal attempt to eliminate selection bias, full and transparent presentation of methodology and results) before putting any credence in it. We want to feel that the analysis was done the way we would have done it, and not the way the researcher felt would produce the “right answer.”
  • But if someone showed us an evaluation plan before either of us knew the results, we’d be much more inclined to accept that shortcuts and methodological weaknesses were there for the purpose of saving time, saving money, ethical considerations, etc. instead of the purpose of producing the “right answer.” If the plan sounded reasonable to us, and the study then produced a favorable outcome, we’d find it much less necessary to pick apart the study’s methodology.

Bottom line – we’d potentially lower our bar for study design quality in exchange for a chance to discuss and consider the design before anyone knows the results.

The basic principle of pre-registration is a broad one, and will apply differently in different scenarios, but here are a few additional thoughts on applications:

  • What we’re describing wouldn’t be feasible with studies that retrospectively analyze old data sets – there’s no way to know whether someone has done the analysis before registering the design and hypothesis. What we’re describing would be feasible with field experiments, which seem to be growing in popularity thanks partly to the excellent work of Poverty Action Lab and Innovations for Poverty Action. It could also be feasible with non-experiments: researchers could pre-register their plans for analyzing any data sets that aren’t yet available.
  • Poverty Action Lab (example) and Innovations for Poverty Action (example) do publicly publish brief descriptions of studies in progress. These are helpful in that they are a safeguard against burying/canceling studies that aren’t going as hoped, but they would be more helpful if they included information about data to be collected, planned methods of analysis, and hypotheses (for contrast, see this example from ClinicalTrials.gov).
  • There can be legitimate reasons to change aspects of the study design and analysis methods as the study is ongoing. So failure to conform to the pre-registered version wouldn’t necessarily cause us to dismiss a study. But seeing the original intent, the final study, and a list of changes with reasoning for each change would greatly increase transparency; we would consider whether the reasoning sounded more motivated by the desire to get accurate results or a desire to get specific results.
  • There can be legitimate reasons to carry out a study without a strong hypothesis (for example, trying a health intervention and examining its impact on a variety of quality-of-life measures without a strong prior view on which would be most likely to show results). However, we still advocate explicitly declaring the hypothesis or lack thereof beforehand. Just seeing that there was no initial strong hypothesis would help us understand a study. Studies with no strong hypothesis are arguably best viewed as “exploratory,” generating hypotheses for future “confirmatory” studies.
  • As food for thought, imagine a journal that accepted only studies for which results were not yet known. Arguably this journal would be more credible as a source of “well-designed studies addressing worthwhile questions, regardless of their results” as opposed to “studies whose results make the journal editors happy.” For our part, we’d probably watch this sort of journal eagerly while on the watch for top charities.

We think the “randomista” movement has done a lot of good for the credibility and usefulness of research. We’d love to see a “preregistrista” movement.

Some other suggestions

Chris Blattman’s suggestions. We liked Chris’s suggestions:

1. Journals should require submission of replication data and code files with final paper submissions, for posting on the journal site. (The Journal of Conflict Resolution is one of the few major political science or economics journals I know that does so faithfully.)

2. PhD field and method courses ought to encourage replication projects as term assignments. (Along with encouragements to diplomacy–something new scholars are slow to learn, to their detriment.)

Tougher norms regarding the use of footnotes to support strong claims. It is frequent for a paper to make a strong claim, with a footnote leading to a citation; it’s then up to us to find the paper cited (not always possible) and figure out in what way and to what degree it supports the claim. The paper may not be a literature review or other broad overview; it’s often a single study, which we don’t consider sufficient evidence for a strong claim (see our previous post).

On the GiveWell site we use a different convention, one that grew out of our own wish to facilitate easy checking and updating of footnotes by different staff members and volunteers. A GiveWell footnote will either (a) lead to another GiveWell page (these tend to be concise, up-to-date overviews of a topic); or (b) include enough quotation and/or analysis to make it clear what the heart of the support for the claim is, along with page numbers for any quotation. It’s therefore easy to quickly see the degree to which our footnotes support our claims (though one may still wish to vet the papers that we cite and see whether their analysis supports their own claims).

If space limitations are an issue, footnotes could be published online, as they are for some books.

Online tools for viewing the full debate/discussion around a paper at a glance. We’d like to see websites organizing the relationships between different papers and comments, such that we could look up a paper and see the whole debate/discussion around it at a glance – including

  • Followups
  • Published responses and challenges
  • Comments from other scholars, without these comments’ having to get published separately
  • Literature reviews discussing the paper
  • Related (but chronologically later) papers
  • Perhaps a summary of the overall discussion, strengths, weaknesses, etc. (when someone wanted to submit one and it passed the website’s quality criteria)

Microlending debate: An example of why academic research should be used with caution

We often use academic research to inform our work, but we try to do so with great caution, rather than simply taking reported results at face value. We believe that if you trust academic research just because it is peer-reviewed, published, and/or reputable, this is a mistake.

A good example of why we’re concerned comes from the recent back-and-forth between David Roodman and Mark Pitt, which continues a debate begun in 1999 over what used to be considered the single best study on the social impact of microfinance.

It appears that the leading interpretation of this study swung wildly back and forth over the course of a decade, based not on major reinterpretations but on arguments over technical details, while those questioning the study were unable to view the full data and calculations of the original. We feel that this illustrates problems with taking academic research at face value and supports many of the principles we use in our approach to using academic research. Details follow.

Timeline/summary
1998-2005 studies by Khandker and Pitt

According to a 2005 white paper published by the Grameen Foundation (PDF), a 1998 book and accompanying paper released by Shahidur Khandker and Mark Pitt “were influential because they were the first serious attempt to use statistical methods to generate a truly accurate assessment of the impact of microfinance.”

Jonathan Morduch challenged these findings shortly after their publication, but a 2005 followup by Khandker appeared to answer the challenge and claim that microlending had a very strong social impact:

each additional 100 taka of credit to women increased total annual household expenditures by more than 20 taka … microfinance accounted for 40 percent of the entire reduction of moderate poverty in rural Bangladesh.

As far as we can tell, this result stood for about four years as among the best available evidence that microlending helped bring people out of poverty. Our mid-2008 review of the evidence stated,

These studies rely heavily on statistical extrapolation about who would likely have participated in programs, and they are far from the strength and rigor of the Karlan and Zinman (2007) study listed above, but they provide somewhat encouraging support for the idea that the program studied had a widespread positive effect.

2009 response by Roodman and Morduch

A 2009 paper by David Roodman and Jonathan Morduch argued that

  • The Khandker and Pitt studies were seriously flawed in their attempts to attribute impact. The reduction in poverty they observed could have been an artifact of wealth driving borrowing, rather than the other way around.
  • The Khandker and Pitt studies could not be replicated: the full data and calculations they had used were not public, and Roodman and Morduch’s best attempts at a replication did not produce a remotely similar conclusion (they demonstrated no positive social impact of microlending, even a slight negative one).

This paper stood for the next two years as a prominent refutation of the Khandker and Pitt studies. Pitt writes that the work of Roodman and Morduch has become “well-known in academic circles” and “seems to have had a broad impact.” It appeared in a “new volume of the widely respected Handbook of Development Economics” as well as in congressional testimony.

2011 back-and-forth

Earlier this year,

  • Mark Pitt published a response arguing that Roodman and Morduch’s failure to replicate his study was due to Roodman and Morduch’s errors.
  • David Roodman replied, conceding an error in his original replication but defending his claim that the original study (by Khandker and Pitt) was not a valid demonstration of the impact of microlending.
  • Mark Pitt responded again and argued that the study was a valid demonstration.
  • David Roodman defended his statement that it was not and added, “this is the first time someone other than [Mark Pitt] has been able to run and scrutinize the headline regression in the much-discussed paper … If you anchor Pitt and Khandker’s regression properly in the half-acre rule … the bottom-line impact finding goes away.”
  • We had hoped to see a further response from Mark Pitt before discussing this matter, but Roodman also wrote that Mark Pitt is now traveling and that “this could be the last chapter in the saga for a while.”

Bottom line: as far as we can tell, we still have one researcher claiming that the original study strongly demonstrates a positive social impact of microfinance; another researcher claiming it demonstrates no such thing; and no end in sight, 13 years after the publication of the original study.

Disagreements among researchers are common, but this one is particularly worrisome for a few reasons.

Major concerns highlighted by this case

  • Conflicting interpretations of the study have each stood for several years at a time. The original study stood as the leading evidence about microlending’s social impact between 2005-2009; the challenge by Roodman and Morduch was highly prominent, and apparently not commented on at all by the original authors, between 2009-2011.

 

  • Disagreements have been technical, many concerning details that few understand and that still don’t seem resolved. David Roodman states that “the omission of the dummy for a household’s target status” is responsible for his estimated effect of microlending coming out negative instead of positive. Numerous other errors on both sides are alleged, and the remaining disagreements over causal inference are certainly beyond what I can easily follow (if a reader can explain them in clear terms I encourage doing so in the comments).
  • Resolution has been hampered by the fact that Roodman and Morduch could only guess at the calculations Pitt and Khandker performed. This is the biggest concern to me. Roodman writes that he was never able to obtain the original data set used in the paper; that the data set he did receive (upon request) was (in his view) confusingly labeled; and even that one of the original authors “fought our efforts to obtain the later round of survey data from the World Bank.” As a result, his attempt at replication was a “scientific whodunit,” and his April 2011 update represents “the first time someone other than [the original author] has been able to run and scrutinize the headline regression in the much-discussed paper.”If I weren’t already somewhat familiar with this field, I would be shocked that it’s even possible to have a study accepted to any journal (let alone a prestigious one) without sharing the full details of the data and calculations, and having the calculations replicated and checked. But in fact, disclosure of data – and replication/checking of calculations – appears to be the exception, not the rule, and is certainly not a standard part of the publication/peer review process.

    Bottom line – the leading interpretation of a reputable and important study swung wildly back and forth over the course of a decade, based not on revolutionary reinterpretations but on quibbles over technical details, while no one was able to view the full data and calculations of the original. For anyone assuming that a prestigious journal’s review process – or even a paper’s reputation – is a sufficient stamp of reliability on a paper, this is a wake-up call.

    Some principles we use in interpreting academic research

    • Never put too much weight on a single study. If nothing else, the issue of publication bias makes this an important guideline. (On this note, note that the 2009 Roodman and Morduch paper was rejected for publication; its sole peer-reviewer was an author of the original paper that Roodman and Morduch were questioning.)
    • Strive to understand the details of a study before counting it as evidence. Many “headline claims” in studies rely on heavy doses of assumption and extrapolation. This is more true for some studies than for others.
    • If a study’s assumptions, extrapolations and calculations are too complex to be easily understood, this is a strike against the study. Complexity leaves more room for errors and judgment calls, and means it’s less likely that meaningful critiques have had the chance to emerge. Note that before the 2009 response to the study discussed here was ever published, GiveWell took it with a grain of salt due to its complexity (see quote above). Randomized controlled trials tend to be relatively easy to understand; this is a point in their favor.
    • If a study does not disclose the full details of its data and calculations, this is another strike against it – and this phenomenon is more common than one might think.
    • Context is key. We often see charities or their supporters citing a single study as “proof” of a strong statement (about, for example, the effectiveness of a program). We try not to do this – we generally create broad overviews of the evidence on a given topic and source our statements to these.

    While a basic fact can be researched, verified and cited quickly, interpreting an impact study with appropriate care takes – in our view – concentrated time and effort and plenty of judgment calls. This is part of why we’re less optimistic than many about the potential for charity research based on (a) crowdsourcing; (b) objective formulas. Instead, our strategy revolves around transparency and external review.