The GiveWell Blog

US Cochrane Center (USCC) gets our first “quick grant” recommendation

Recently, we did something that may strike many GiveWell followers as out of character. We recommended a $100,000 grant to the US Cochrane Center, despite the fact that we have done relatively little investigation of it so far (compared with our investigations of current top charities)—and have many unanswered questions. Good Ventures, which helped with our investigation and therefore followed it closely, was a part of the conversation in which we came to the conclusion that this grant represented a good giving opportunity, and it committed the funds shortly afterward (before we had finalized our writeup; we considered this appropriate since, as we discuss below, speed was desirable in this situation.*)

This post covers two topics:

  • Why we believe it is important to be able to make quick grants (i.e., grants with far less than our usual level of investigation) when warranted, and we are working on principles for doing so.
  • Why we believe that the grant discussed in this post meets our working criteria for a quick grant.

In brief:

  • We believe that in certain cases, speed is valuable in grantmaking: sometimes because circumstances demand action by a certain date (for example, some projects involve close coordination between multiple entities, including governments, and may need to happen on timelines that work for these entities), and sometimes because speed in giving can help organizations make better planning decisions.
  • We have developed working principles for when a “quick grant” is called for, which are discussed in this post. We intend to experiment with “quick grants” while ensuring that they represent only a small portion of our money moved in the near future.
  • We see the USCC as an ideal recipient for our first “quick grant” partly because we are continuing to investigate the USCC as a potential top charity (in which case we would recommend it to individual donors). The grant discussed in this post meets our working criteria for “quick grants,” and in addition, we are confident that we will learn more over time about the extent to which this “quick grant” was warranted.

Note that the bulk of this post was written in July, following the grant recommendation and grant. We held publication of the post, after drafting it, for a variety of reasons including (a) the desire to get and respond to feedback from all parties discussed in this post (as we generally do); (b) the desire to publish more public communications about our evolution as an organization in order to give more context on how this recommendation fits in.

The importance of making “quick grant” recommendations

To date, our usual approach to making recommendations has been:

  • Survey an entire field, looking for the organizations that perform best according to various heuristics.
  • Deeply investigate the top-contender organizations, making a major effort to answer all major questions to the degree that we reasonably can.
  • Set a deadline (usually giving season) by which we must make or refresh our recommendations; this ensures that we avoid perfectionism and make the best recommendations we can with the information we have.

We’ve put an increasing amount of effort into investigating contenders for our #1 ranking; we’ve found it very important that we be able to fully stand behind any such recommendation, with reasonable answers to any question that one might raise.

There are a lot of positive things about this approach, and we intend to retain it for the recommendations that comprise the bulk of our money moved. We believe that being systematic, deliberate and thorough is likely to lead to much better giving, in general, than relying on happenstance and intuition alone; at the same time, having regular deadlines leads to regular giving (which we favor).

That said, we also believe that in certain cases, speed is valuable. Sometimes speed is valuable because circumstances demand action by a certain date (for example, some projects involve close coordination between multiple entities, including governments, and may need to happen on timelines that work for these entities). Sometimes the argument for speed is more subtle – for example, giving quickly can help organizations make better decisions about how to plan their budgets as well as how much time to invest in fundraising from different sources.

We’ve long been intrigued by the ideas of people like Bill Somerville, and wondered whether the approach of taking calculated risks after a shorter period of research than has been our norm has something to recommend it at times. Since connecting with Good Ventures, this issue has become more salient to us for a couple of reasons:

  • We were the recipient of one such “quick grant.”After our first interaction with Cari and Dustin (a 90-minute meeting between them and myself in February 2011), they expressed an intention to contribute $100,000 over two years in operating support to GiveWell. At that time, we were facing a substantial projected deficit and were investing more time than usual in fundraising. This “quick grant” was extremely helpful for our planning and efficiency, in a way that it couldn’t have been if it had required the kind of investigation that GiveWell normally puts in. Noticing this further highlighted the advantages of “quick grants.”
  • In addition, our relationship with Good Ventures means that we are able to make “quick grant” recommendations that can be taken up quickly; while we have some other connections who provide similar opportunities, the bulk of our other money moved tends to be concentrated in the month of December.

On May 8, two GiveWell representatives (Stephanie Wykstra and I) visited the US Cochrane Center (the USCC) (note that we have posted notes from the visit). For reasons discussed below, we came away with the impression that (a) the USCC plays an important role in meta-research; (b) the USCC faced a drastic shortage of operating funding, even to the point where it might not be able to continue uninterrupted, minimal staff support. Over the next six weeks (as discussed below), we tried to find persuasive counterarguments to this viewpoint and failed to do so. While I still had many unanswered questions, it struck me that a grant to the USCC, made sooner rather than later, could potentially do an enormous amount of good in terms of helping the USCC plan intelligently; the mere fact that I wasn’t confident in the value of such a grant didn’t seem to change the fact that the expected value of making such a grant quickly seemed high.

While principles such as thoroughness and transparency are important, the principle of doing as much good as possible with one’s giving is more core to GiveWell than any other. We strongly believe that we should never find ourselves passing up what we see as an opportunity to do maximal good, just because the opportunity is an awkward fit with our existing habits and processes. We are constantly watching the actions of other funders and asking whether they are finding great opportunities to do good, in a way that our existing approach would fail to (and if so, what we can do about it).

Thus, once we saw the potential value of a “quick grant” to the USCC, we started actively considering the idea of modifying our approach to facilitate such a grant (rather than passing on the grant because it didn’t fit well with our existing approach). At the same time, we wanted to make sure that we were adopting and discussing a principled modification to our general approach, not simply recommending a “quick grant” on a whim. As such, we worked to develop the best set of general principles we could for when a “quick grant” is called for. These principles were inspired by the case of the USCC, but we took our best shot at making them reasonable for general application.

Principles and process for making “quick grant” recommendations

This section discusses (a) the key questions we feel are appropriate for a potential “quick grant”; (b) the process we intend to follow for answering these questions quickly and efficiently and making “quick grants.”

Key questions for a potential “quick grant”

  1. Is there a reason that speed, in and of itself, is valuable for this grant? Some possible reasons that speed may be valuable:
    • There may be a specific reason that funds are needed by a particular date in order to go forward with a particular project.
    • There is always a potential argument that (as outlined above) an early commitment can help an organization plan better and use its time more efficiently. This argument will generally be stronger in the case when (a) a relatively small grant can make a big difference to an organization’s planning (usually because the organization has little in the way of unrestricted support), and (b) we have a positive view of the organization and its people overall (as opposed to a small subset of the work it does).
    • An early grant can also be helpful for learning purposes. If an organization is highly promising from our perspective, but hesitant to engage in our process because of how it perceives the costs and benefits of doing so, an early grant may be valuable as a way to improve our access and ability to learn.
  2. Is the organization/project in question focused on work that seems valuable, reasonably cost-effective, suited to philanthropy (as opposed to other approaches) and thus “worth doing” overall? For the purposes of a “quick grant,” the approach to this question will be highly intuitive and perhaps unsatisfying compared to the cost-effectiveness analysis we often do. Explicit cost-effectiveness analysis is very difficult to do with reasonable speed, without sacrificing reasonable robustness, and when comparing radically different approaches to doing good it can be nearly impossible to say with much confidence how they compare in terms of “bang for the buck.” This question acts only as a basic screen: is the organization/project in question filling a valuable role in an important ecosystem, leveraging the work of others, and overall taking on an approach to helping people that is at least defensible from a cost-effectiveness standpoint?
  3. Do we see a convincing reason that this organization would not be able to raise the funding it needs in the relevant time frame, even if it made a good case for such funding? As discussed below, one of the things that excites us about the USCC as a funding opportunity is the sense that the USCC is “structurally underfunded”: it struggles to raise funds not because major funders have thoughtfully considered and rejected its case, but because major funders largely have program areas and issue focuses that don’t leave room for the kind of work the USCC does.

    This is not often the case. We often find ourselves saying, “It seems that if this work were as valuable as the organization claims, funder X would support it.” When such a statement can be convincingly refuted, the case for a “quick grant” (as for a recommendation in general) becomes much stronger.

  4. Are the people involved impressive, competent, and capable of making good on-the-fly decisions, such that we’re comfortable with grants that we have fairly little visibility into the specific intended use of? We’ll be writing more in the future about how we evaluate people. This question is important for all recommendations but is particularly important for “quick grants,” since it generally takes us a long time to feel confident about the specifics of how funds will be used; we are much more likely to recommend “quick grants” when these specifics aren’t necessary in order to feel the funds will be used well.
  5. How much will we learn in the future about the organization/project and the extent to which the “quick grant” was a good idea? Since “quick grants” are likely to be smaller than the money we moved to our top charities, it isn’t a given that we will engage in the same sort of followup on them that we do for our top charities. In addition, we’ve found that it’s much easier to learn about an organization when we invest up front in defining goals, metrics, etc., which is more difficult to do in the case of a “quick grant.” So by default, there is a risk that we won’t learn much from or about a “quick grant”; we need to be attentive to this and have a strong preference for grants with more learning potential.

    We expect that there will often be cases in which we consider “quick grants” to organizations that we also plan to evaluate more thoroughly (with the possibility of moving significantly more money to them). In these situations, the situation looks much better in terms of learning about whether our grant was a good one and what ultimately came of it. The USCC is one of these cases.

Our process for making “quick grants”

We’ve provisionally agreed to the following process for making quick grants:

  • The process starts when we have some unusually strong signs that the answers to the above key questions are positive. We do not need a definitive case; the case may be largely intuitive and suggestive, but it should be unusually strong in the scope of opportunities we come across.
  • We then pick any “low-hanging fruit” in terms of further investigating the answers to our questions – any investigative work that can be done quickly and is likely to lead to a substantially better understanding of the situation.

    While all of the above key questions are important, we are likely to have a fairly quick and intuitive read on questions #2, #4, and #5, and the questions that we are most likely to focus on investigating are #1 (is there a reason that speed is likely to be helpful?) and particularly #3 (are there other funders who are a logical fit to fund this organization/project or is it underfunded for structural reasons?)

  • We actively seek out counterarguments to our views on key questions, as effectively and efficiently as we can. The main approach we used for the USCC – an approach we are likely to use for future “quick grants” as well – is to seek out conversations with funders who seem like the closest logical fit for the funding opportunity, and try to understand whether they are (a) planning to fund the project/organization; (b) planning not to fund the project/organization, for reasons we find compelling; (c) planning not to fund the project/organization, for reasons we don’t find compelling. (A “quick grant” should be made when (c) holds, not when (a) or (b) holds.)
  • Before any “quick grant,” we hold a meeting or conference call that combines (a) all staff who have been highly involved with the investigation; (b) some staff who haven’t; (c) funders who are particularly likely to follow the recommendation for a “quick grant.” The staff recommending the “quick grant” summarize the answers to key questions as well as what investigations have been done to learn more about these questions and to identify counterarguments. Others on the call focus on (a) evaluating the strength of the arguments (answers to key questions) given the information already available; (b) determining whether there is other information that would be likely to quickly and substantially shift answers to the key questions (i.e., whether there is “low-hanging fruit” in investigative terms).
  • If the basic case for a “quick grant” is accepted, the next step is to determine the size of the grant recommendation (i.e., the dollar amount past which we would stop recommending a “quick grant”). When the “quick grant” is to meet a specific need or fund a specific project, we should understand the nature of the time sensitivity and the size of the need, and the current funding status, before recommending the award. When the “quick grant” is more along the lines of general support for purposes of helping the organization plan and/or improving our access to the organization, we should pay more attention to the size and variance of the organization’s budget (as well as any available room for more funding analysis) and try to aim for something that provides substantial benefit (in terms of planning and/or access) but doesn’t come close to meeting all the organization’s needs.
  • When we do recommend a “quick grant,” we publicly write up the recommended grant amount, recipient, and a summary of our process and reasoning in making the recommendation.

How we decided to recommend a “quick grant” to the USCC

We’ve long been familiar with the work of the Cochrane Collaboration, having used it in our research. We’ve noted before that

We have found that its reports generally review a large number of studies and are very clear about the findings, strengths and weaknesses of these studies. For health programs, when there are often many high-quality studies available, we therefore use Cochrane as our main source of information on “micro” evidence when possible.

In April of this year, I attended a meeting on preregistration in development economics and encountered Kay Dickersin of the USCC, who stated to me that (a) the USCC is struggling to attract unrestricted support; (b) if the USCC had sufficient funds, it would provide general support to US-based Cochrane entities, including direct financial grants in cases where these entities appeared underfunded. We scheduled a full-day visit to the USCC in Baltimore to learn more, because (a) we have long respected the Cochrane Collaboration’s work and were surprised to hear that the USCC was struggling to attract unrestricted funding; (b) this was the first concrete giving opportunity we’d encountered in the area of meta-research, which is a new high-priority focus area for us and which we’re seeking to learn more about; (c) the USCC showed a high level of interest in engaging with us and offered to put together a full-day meeting with multiple representatives, which both raised our expectations about what how much we could learn and served as an additional signal that the USCC was struggling to attract sufficient operating funding.

Our notes from the full-day meeting are published online (DOC). We came away from the meeting feeling there was a strong preliminary case for the USCC based on the five key questions above (though we had not yet formalized these questions as the key ones for “quick grants”):

  • Is there a reason that speed, in and of itself, is valuable for this grant? The USCC appeared to have a concrete and time-sensitive need for unrestricted funding, including the immediate need for funds to continue uninterrupted, minimal, staff support. In our view, this situation is notable not just because of the specific consequences that a grant might have (allowing the USCC to retain core staff), but also because it more broadly illustrates that the USCC does not have a stable situation in terms of unrestricted funds, and thus that support could help it to plan and set priorities more effectively.
  • Is the organization/project in question focused on work that seems valuable, reasonably cost-effective, suited to philanthropy and thus “worth doing” overall? We are positive on the quality of the Cochrane Collaboration’s work, as discussed above; at the meeting we also came away with preliminary reasons to believe the work is influential as well (though we plan to investigate this more). As for the USCC’s role in the Cochrane Collaboration, we saw fairly strong arguments on this point. The Cochrane Collaboration relies on training and supporting volunteers, many of whom are academics. The U.S. has many potential volunteers, including those based within the country’s large university system. But in the U.S. there is far less funding for Cochrane infrastructure (i.e., to train and support volunteers) than in other English-speaking countries such as the UK, Canada and Australia. Jeremy Grimshaw, co-chair of the Cochrane Collaboration’s International Steering Group and Director of Canada’s Cochrane Center, was present by phone at the meeting and supported the message being sent that the USCC is a point of particularly high leverage and importance for the Cochrane Collaboration as a whole. For the reasons discussed above, we don’t feel that formal cost-effectiveness analysis is likely to be helpful in this case.
  • Do we see a convincing reason that this organization would not be able to raise the funding it needs in the relevant time frame, even if it made a good case for such funding? We questioned the USCC about many potential sources of funding and were told that the lack of funding was largely for structural, not substantive reasons. That is, the funding needed would support the infrastructure required for the Cochrane Collaboration’s work, such as staff for training, and methodological support for reviews, and not hypothesis-testing research. As such, the Collaboration’s needs do not fit into the pre-defined categories and issue areas of major funders. Thus, potential funders have considered and declined Cochrane requests and applications for general operating support based on “not a good fit” rather than the quality or importance of the work, overall. This was the point we felt we most needed to examine further after the meeting, and we did so, as discussed below.
  • Are the people involved impressive, competent, and capable of making good on-the-fly decisions, such that we’re comfortable with grants that we have fairly little visibility into the specific intended use of? Multiple representatives were present, and overall we felt they answered our questions reasonably clearly and well; the USCC also appears comfortable with transparency, having signed off quickly and permissively on our notes from the meeting. Our general positive impression of the Cochrane Collaboration’s work is also relevant here. We currently have moderate confidence on this point; we anticipate learning more as we investigate Cochrane further.
  • How much will we learn in the future about the organization/project and the extent to which the “quick grant” was a good idea? We are currently performing an in-depth investigation of the USCC, considering recommending it for more funding than the initial “quick grant,” so we believe that we will learn a great deal about the extent to which this “quick grant” was warranted.

After the meeting, we agreed that the USCC was a promising organization, and our top priority became looking efficiently for counterarguments to the case for funding it. With Good Ventures’s help, we sought out conversations with the major funders that seemed to us like potential fits for the USCC, based both on our prior knowledge and from conversations with the USCC, hoping that we would gain more context on (a) whether it’s true that the USCC doesn’t fit into the issue areas of existing major funders; (b) whether there were general counterarguments to our preliminary views on the USCC’s value and need for more funds.

Feedback was solicited from:

  • Representatives of the Gates and Hewlett Foundations (we had no reason to believe they were a fit for the USCC, but thought they might know who would be, and see them as relatively impact-oriented funders in general; we are not cleared to share notes about these interactions).
  • A representative from the Wellcome Trust, a large medical research funder (notes available as DOC)
  • Representatives from U.S. government agencies: the National Institute of Child Health and Human Development (which contracts with one of the U.S.-based Cochrane review groups but does not provide unrestricted support to the USCC), the NIH Office of Medical Applications of Research in the Office of the Director (which we were pointed to in order to explore whether the USCC might be a fit for funding from the Office of the Director), and the Agency for Healthcare Research and Quality (which has funded the USCC in the past and, like the Cochrane Collaboration, commissions systematic reviews). We are not cleared to share our notes from these interactions.
  • A representative from the HIV/AIDS department of the World Health Organization. We are not cleared to share our notes from this interaction.

In these interactions, we asked for general impressions of the Cochrane Collaboration, thoughts on what sorts of funders might be structurally able to support the USCC, thoughts on what other groups do the sort of work that the Cochrane Collaboration does, and (when relevant) reasoning behind an entity’s support (or lack thereof) for USCC. We came away with the impression that the Cochrane Collaboration’s work is widely respected and seen as high-quality and important, that we can’t easily identify any major funders that are a structural fit for the USCC, and that the main other group focused on systematic reviews is AHRQ, which we plan to investigate further. (AHRQ’s role relative to Cochrane’s is discussed in the notes from our visit to the USCC; our takeaways from other conversations were broadly consistent with these notes.)

We also spoke to

  • Jeremy Grimshaw, co-chair of the Cochrane Collaboration’s International Steering Group and Director of Canada’s Cochrane Center. We sought to press the question of whether the USCC, specifically, is the best entity to fund in order to supporting the overall mission of the Cochrane Collaboration. Dr. Grimshaw conferred with other international Cochrane Collaboration representatives, including the other Steering Group co-chair, the interim Executive Director and the Editor-in-Chief of the Cochrane Library, and informed us that they endorsed supporting the USCC and would seek the formal endorsement of The Cochrane Collaboration Steering Group. Since then (following the grant), we have further pressed the issue of whether there might be other opportunities to support the Cochrane Collaboration that are higher-leverage than the USCC, and we are continuing to speak with Dr. Grimshaw about how best to work with international representatives to investigate this question. We therefore intend to investigate other Cochrane entities as well, and believe that doing so will take a significant amount of time, so given that the USCC had been endorsed as a strong opportunity in terms of potential leverage for a donation at the time of the grant recommendation (though no single opportunity within the Cochrane Collaboration was put forth as the “best”), we feel it was the right decision to move forward.
  • John Ioannidis, whom we see as a leading figure in the field of meta-research generally (and meta-research for medicine in particular). We have published extensive notes from this conversation in transcript form (DOC); Dr. Ioannidis has been involved in the Cochrane Collaboration in the past and believes the USCC to be a strong funding opportunity.
  • Professor Steven Goodman of the Stanford School of Medicine, a referral from one of the funders we spoke with (summary forthcoming).

Finally, we obtained detailed room for more funding analysis from the USCC; this analysis is now available online (DOCX).

Having done the above investigations, we felt that

  • We had strong – though far from conclusive – reasons to believe that the USCC has strong answers to our key questions. In particular, we believed that it had an urgent need for more unrestricted funding to assist with its planning; that its struggle to attract unrestricted funding could largely be attributed to structural issues (the fact that major funders often focus on particular diseases and do not prioritize meta-research) rather than to substantive objections to the USCC’s work; and that it was a strong candidate for “best leverage point for supporting the overall mission of the Cochrane Collaboration.”
  • We had many remaining questions about the USCC, and planned a thorough investigation to answer them. However, we didn’t see any “low-hanging fruit” remaining on the investigative end, and believed it would take a lot of work to obtain more satisfying answers to our key questions.

We discussed a $100,000 grant – enough to replace a specific source of support the USCC expects to lose, and enough to ensure that it can continue uninterrupted, minimal staff support. We came to the conclusion that $100,000 was enough to make a significant difference to the USCC without coming anywhere near meeting its funding needs (as expressed in the “room for more funding” analysis, linked above), and thus that such a grant could be justified not only based on its specific effect (allowing the USCC to retain uninterrupted, minimal staff support) but also on the more general principle (discussed above) of helping a highly underfunded organization with its ability to plan and prioritize. We checked in one more time with the USCC to make sure its funding situation had not changed materially, and recommended the grant.

*There is usually a substantial lag between our coming to a conclusion about a giving opportunity and our writing up & publishing our reasoning. In this case, for reasons discussed below, we did not want to accept the lag of writing up & publishing our reasoning before driving donations, so we made our recommendation via a discussion with Good Ventures and are publishing our reasoning now. In the future, we will generally publish our reasoning publicly before making a recommendation to any particular donor in cases where the funding gap is large and we are seeking to drive donations from many donors, and/or where a lag between the recommendation and the funding commitment is acceptable, but we may act as we have in this case when these conditions do not hold.

Recent board meeting on GiveWell’s evolution

[Added August 27, 2014: GiveWell Labs is now known as the Open Philanthropy Project.]

This year, GiveWell has been evolving in a couple of significant ways:

  • We’ve been exploring giving opportunities that may involve restricted/project-specific funding (as opposed to unrestricted support of charities), as well as giving opportunities that could be relatively speculative, hard to evaluate and high-risk (contrast with our previous focus on “proven cost-effective”) charities. (Previous discussion)
  • We’ve been working closely with Good Ventures, a major funder (previous discussion). We’ve also been reflecting on whether we ought to be focusing our outreach efforts more on major funders (relative to our current target audience of people giving $250,000 or less per year).

We recently held a Board meeting to discuss these shifts, and some of the potential challenges and decisions that may come up as a result. We have now published audio from this meeting, as well as the attachment featured in it that summarizes the issues we see ourselves as facing. This post gives a high-level overview of the issues we discussed and what we’ve concluded for the time being.

Summary:

  • GiveWell continues to prioritize research aimed at finding outstanding giving opportunities for individual donors. GiveWell continues to place a high importance on providing enough of these opportunities to keep up with demand, i.e., the amount of money we expect to move from individual donors to our top charities.
  • GiveWell’s research process is evolving in ways that we feel are necessary in order to find the best giving opportunities possible for all donors, both small and large. Since GiveWell’s staff capacity is increasing, it is able to increase its work on “proven cost-effective” interventions while also exploring other areas.
  • GiveWell will continue to work closely with Good Ventures, and may prioritize outreach to other potential major-donor partners. However, it does not plan to become a consultant to Good Ventures or any other “major donor.” The purpose of GiveWell’s working with Good Ventures, and of outreach to potential major donors, is to find people who share GiveWell’s core values and seek to support its mission – not to customize or alter its work to suit major donors. And transparency remains a core value of GiveWell’s; it continues to seek to publicly publish as much as possible of what goes into its reasoning and recommendations.

Evolution of our research process
As discussed previously, we feel that we’ve hit diminishing returns to our approach of focusing on no-strings-attached donations to organizations focused on proven cost-effective interventions. We’ve begun broadening the universe of giving opportunities we will consider.

We previously aimed to draw a bright line between our “traditional” research and GiveWell Labs, which is open to any giving opportunity regardless of form or sector. However, because our traditional approach has hit diminishing returns, we now are focusing the bulk of our research capacity on investigations that are “experimental” in some sense – either because they may involve project-specific funding or because they are in sectors outside of “direct aid.” Accordingly, we no longer find it helpful to draw a bright line between “GiveWell traditional” and “GiveWell Labs” – instead, we have laid out a research agenda and focus area for GiveWell as a whole.

That said, we are still committed to

  • Finding the most proven, cost-effective giving opportunities for individual donors to support. We believe that finding more of these opportunities (beyond our current top charities) requires being open to project-specific funding, as discussed previously.
  • Continuing to provide regular updates on previously recommended giving opportunities, including both good news and bad.
  • Continuing to maintain, assess, and update our top charities list, and clearly communicating the difference between this list (which focuses on proven cost-effective charities) and any giving opportunities we may recommend that fall into other categories.
  • Doing everything we can to provide enough “proven cost-effective” giving opportunities to meet the demand for them (i.e., the amount of money we expect to move to them) from our audience.

Since GiveWell’s staff capacity is increasing, it is able to increase its work on “proven cost-effective” interventions while also exploring other areas.

Relationship with Good Ventures
As we wrote previously, we have been working closely with Good Ventures in multiple ways. We find the relationship to be highly mutually beneficial; at the same time, it is important to us that

  • We retain our independence: the ability to prioritize giving opportunities based on what we find most promising, and to allocate our resources in line with our own prioritization.
  • We retain our transparency, continuing to publicly publish the full details of our analysis and other items of interest to donors.
  • We are not perceived as being unduly influenced – in our research direction, our use of resources, or otherwise – by Good Ventures.
  • We continue to serve other donors and to bring them enough outstanding giving opportunities to meet demand (i.e., the amount of money we expect to move from them).
  • We remain open to working closely with other major funders, as we are with Good Ventures.

In order to accomplish the above goals, we are planning to develop and publish some general guidelines regarding how we work with major donors, including policies for ensuring that we retain our independence and for ensuring that the role of any major donor in our research process is made transparent.

In addition, as discussed previously, we are thinking of putting more of our outreach efforts into reaching major funders (relative to our current target audience of people giving $250,000 or less per year). However, this concerns only our outreach efforts, not our research efforts or our commitment to transparency.

Seeking your feedback
If you’re a user of GiveWell’s research, we’d like to hear your thoughts on the above. We’d particularly like to hear from you if you have any concerns or see any risks to GiveWell’s value for you as a source of independent, in-depth research on how to accomplish the most good possible with your giving.

The ideal form of feedback (from our perspective) would be comments on this blog post, since that allows anyone to see the exchange, but we are also happy to be contacted privately.

Updated thoughts on our key criteria

For years, the 3 key things we’ve looked for in a charity have been (a) evidence of effectiveness; (b) cost-effectiveness; (c) room for more funding. Over time, however, our attitude toward all three of these things – and the weight that we should put on our analysis of each – has changed. This post discusses why

  • On the evidence of effectiveness front, we used to look for charities that collected their own data that could make a compelling case for impact. We no longer expect to see this in the near future. We believe that the best evidence for effectiveness is likely to come from independent literature (such as academic studies). We believe that if a program does not have a strong independent case, there is unlikely to be a charity that can demonstrate impact with such a program.
  • We have continually lowered our expectations for how much role cost-effectiveness analysis will play in our decisions. We still believe that doing such analysis is worthwhile when possible – partly because of the questions it raises – but we believe the cases where it can meaningfully distinguish between two interventions are limited.
  • We have continually raised our expectations for how much role room for more funding analysis will play in our decisions. Questions around “room for more funding” are now frequently the first – and most core – questions we ask about a giving opportunity.

Evidence for effectiveness
In our 2007-2008 search for outstanding charities, we took applications and asked charities to make their own case for impact. In 2009, we identified evidence-backed “priority programs” using independent literature, but still actively looked for charities (even outside these programs) with their own evidence of effectiveness. In 2011, we continued this hybrid approach.

In all of these searches, we’ve found very little in the way of “charities demonstrating effectiveness using their own data.”

We believe the underlying dynamic is that

  • Evidence on these sorts of interventions is very difficult and expensive to collect.
  • It’s particularly difficult to collect such evidence in a way that addresses various common concerns that we believe to be very common and important in the context of evaluating charitable programs.
  • Studies that can adequately address these issues are generally “gold-standard” studies, and are therefore of general interest (and can be found by searching independent/academic literature).

Accordingly, our interest in “program evaluation” – the work that charities do to systematically and empirically evaluate their own programs – has greatly diminished. We are skeptical of the value of studies that fall below the “gold standard” bar that usually accompanies high-reputation independent literature.

This shift in our thinking has greatly influenced how our process works and what we expect it to find. Rather than putting a lot of time into scanning charities’ websites for empirical evidence, as we did previously, we now are focused on identifying the evidence-backed interventions, then finding the vehicles by which donors can fund these interventions.

Cost-effectiveness
The ultimate goal of a GiveWell recommendation is to help a donor accomplish as much good as possible, per dollar spent. Accordingly, we have long been interested in trying to estimate how much good is accomplished per dollar spent, in terms such as lives saved per dollar or DALYs averted per dollar.

Over the years, we’ve put a lot of effort into this sort of analysis, and learned a lot about it. In particular:

  • In sectors outside of global health and nutrition, it is generally impractical to connect measurable outcomes to meaningful outcomes (for example, we may observe that an education program raises test scores, but it is very difficult to connect this to something directly related to improvements in quality of life). Not surprisingly, the vast majority of attempts to do cost-effectiveness analysis (including both GiveWell’s attempts and others’ attempts) have been in the field of global health and nutrition.
  • Within global health and nutrition, even the most prominent, best-resourced attempts at cost-effectiveness analysis have had questionable quality and usefulness.
  • Our own attempts to do cost-effectiveness analysis have turned out to be very sensitive to small variations in basic assumptions. Such sensitivity is directly relevant to how much weight we should put on such estimates in decision-making.
  • That said, we continue to find cost-effectiveness analysis to be very useful when feasible, partly because it is a way of disciplining ourselves to make sure we’ve addressed every input and question that matters on the causal chain between interventions (e.g., nets) and morally relevant outcomes (e.g., lives saved). In addition, cost-effectiveness analysis can be useful for extreme comparisons, identifying interventions that are extremely unlikely to have competitive cost-effectiveness (for example, see our comparison of U.S. and international aid).

While we still intend to work hard on cost-effectiveness analysis, and we still see value in it, we do not see it as holding out much promise for helping to resolve difficult decisions between one giving opportunity and another. We find other criteria to be easier to make distinctions on – criteria such as strength of evidence (discussed above) and room for more funding (discussed below).

Room for more funding
For the first few years of our history, we knew that the issue of room for more funding was important, but we made little headway on figuring out how to assess it. We tried asking charities directly how additional dollars would be used, but didn’t receive very helpful answers (see applications received for our 2007-2008 process).

In 2010, as a result of substantial conversations with VillageReach, we developed the basic approach of scenario analysis, and since then we’ve used this approach to reach some surprising conclusions, such as the lack of short-term room for more funding for the Nurse-Family Partnership and recommending KIPP Houston rather than the KIPP Foundation due to “room for more funding” issues.

By now, room for more funding is in some ways the “primary” criterion we look at, in the sense that it’s often the first thing we ask for and sits at the core of our view on an organization. This is because

  • Asking “what activities additional dollars would allow” determines what activities we focus on evaluating.
  • Many of the charities and programs that may seem to have the most “slam-dunk” case for impact also seem – not surprisingly – to have their funding needs already met by others. We’ve found it relatively challenging to find activities that are both highly appealing and truly underfunded.
  • In the absence of reliable explicit cost-effectiveness analysis, an alternative way of maximizing impact is to look for the most appealing activities that have funding gaps. The analytical, “sector-agnostic” approach we bring to giving seems well-suited to doing so in a way that other funders can’t or won’t.

Many people – including us early in our history – may be inclined to think that maximizing impact consists of laying out all the options, estimating their quantified impact-per-dollar, and ranking them. We’ve seen major limitations to this approach (though we still utilize it). We’ve also, however, come across another way of thinking about maximizing impact: finding where one can fit into the philanthropic ecosystem such that one is funding the best work that others won’t.

Surveying the research on a topic

We’ve previously discussed how we evaluate a single study. For the questions we try to answer, though, it’s rarely sufficient to consult a single study; studies are specific to a particular time, place, and context, and to get a robust answer to a question like “Do insecticide-treated nets reduce child mortality?” one should conduct – or ideally, find – a thorough and unbiased survey of the available research. Doing so is important: we feel it is easy (and common) to form an inaccurate view based on a problematic survey of research.

This post discusses what we feel makes for a good literature review: a report that surveys the available research on a particular question. Our preferred way to answer a research question is to find an existing literature review with strong answers to these questions; when necessary, we conduct our own literature review with the same questions in mind.

Our key questions for a literature review

  • What are the motivations of the literature reviewer? A biased survey of research can easily lead to a biased conclusion, if the reviewer is selective about which studies to include and which to focus on. We are generally highly wary of literature reviews commissioned by charities (for example, a 2005 survey of studies on microfinance commissioned by the Grameen Foundation) or advocacy groups. We prefer reviews that are done by parties with no obvious stake in coming to one recommendation or another, and with a stake in maintaining a reputation for neutrality (these can, in appropriate cases, include government agencies as well as independent groups such as the Cochrane Collaboration).
  • How did the literature reviewer choose which studies to include? Since one of the ways a literature review can be distorted is through selective inclusion of studies, we take interest in the question of whether it has included all (and only) sufficiently high-quality studies that bear on the question of interest.In some cases, there are only a few high-quality studies available on the question of interest, such that the reviewer can discuss each study individually, and the reader can hold the reviewer accountable if s/he knows of another high-quality study that has been left out. However, for a topic like the impact of insecticide-treated nets on malaria, there may be many high-quality studies available. In these cases, we prefer literature reviews in which the reviewer is clear about his/her search protocol, ideally such that the search could be replicated by a reader.
  • How thoroughly and consistently does the literature review discuss the strengths and weaknesses of each study? As we wrote previously, studies can vary a great deal in quality and importance. When we see a literature review simply asserting that a particular study supports a particular claim – without discussing the strengths and weaknesses of this study – we consider it a low-quality literature review and do not put weight on it. In our view, a good literature review is one that provides a maximally thorough, consistent, understandable summary of the strengths and weaknesses of each study it includes.
  • Does the literature review include meta-analysis, attempting to quantitatively combine the results of several studies? In some cases it is possible to perform meta-analysis: combining the results from multiple studies to get a single “pooled” quantitative result. In other cases a literature review limits itself to summarizing the strengths and weaknesses of each study reviewed and giving a qualitative conclusion.

Strong and weak literature reviews
In general, we feel that the Cochrane Collaboration performs strong literature reviews by the criteria above. Examples of its reviews include a review we discussed previously on deworming and a review on insecticide-treated nets to protect against malaria.

  • The Cochrane Collaboration is an independent group that aims to base its brand on unbiased research, and does not take commercial funding.
  • Cochrane reviews generally explicitly lay out their search strategy and selection criteria in their summaries.
  • Cochrane reviews generally list all of the studies considered along with relatively in-depth discussions of their methodology, strengths and weaknesses (full text is required to see these).
  • Cochrane reviews generally perform quantitative meta-analysis and include the conclusions of such analysis in their summaries.

An example of a more problematic literature review is King, Dickman and Tisch 2005, cited in our report on deworming. This review does well on some of our criteria: it is clear about its search and inclusion criteria (see Figure 1 on page 1562), and it performs quantified meta-analysis (see Table 1 on page 1565). However,

  • It provides a list of all studies included, but unlike the Cochrane reviews we’ve seen, it does not provide any information for these studies (methodology, sample size, etc.) other than the reference.
  • It does not discuss individual studies’ strengths and weaknesses at all.
  • It does not make it possible for the reader to connect the study’s conclusions (in Table 5) to specific studies. (Figures 2-4 break down a few, but not all, of the study’s conclusions with lists of individual studies.) Since over 100 studies were included, we do not see a practical way for a reader to vet the literature review’s conclusions.
  • There is also ambiguity in what the reported conclusions mean: for example, Table 5 does not specify whether it is examining the impact of deworming on the level or change of each listed outcome (i.e., impact on weight vs. impact on change in weight over time).

We have at times seen advocacy groups and/or foundations put out literature reviews that are far more flawed than the study discussed above. Though we generally don’t keep track of these, we provide one example, a paper entitled “What can we learn from playing interactive games?” A representative quote from this paper:

There is also evidence that game playing can improve cognitive processing skills such as visual discernment, which involves the ability to divide visual attention and allocate it to two or more simultaneous events (Greenfield et al., 1994b); parallel processing, the ability to engage in multiple cognitive tasks simultaneously (Gunter, 1998); and other forms of visual discrimi-nation including the ability to process cluttered visual scenes and rapid sequences of images (Riesenhuber, 2004). Experiments have also found improvements in eye-hand coordination after playing video games (Rosenberg et al., 2005).

The paper does not discuss selection, inclusion, strengths, or weaknesses of studies, or even their basic design and the nature/magnitude of their findings (for example, how is “parallel processing” measured?)

All else equal, we would prefer a world in which all literature reviews were more like Cochrane reviews than like the more problematic reviews discussed above. However, it’s worth noting that Cochrane reviews appear to be quite expensive, upwards of $100,000 each. Conducting a truly thorough and unbiased literature review is not necessarily easy or cheap, but we feel it is often necessary to get an accurate picture of what the research says on a given question.

How we evaluate a study

We previously wrote about our general principles for assessing evidence, where “evidence” is construed broadly (it may include awards/recognition/reputation, testimony, and broad trends in data as well as formal studies). Here we discuss our approach to a particular kind of evidence, what we call “micro data”: formal studies of the impact of a program on a particular population at a particular time, using quantitative data analysis.

We list several principles that are important to us in deciding how much weight to put on a study’s claims. A future post will discuss the application of these principles to some example studies.

Causal attribution
A study of a charity’s impact will generally highlight a particular positive change in the data – for example, improved school attendance or fewer health problems among children who were dewormed. One of the major challenges of a study is to argue that such a change was caused by the program being studied, as opposed to other factors. Many studies make simple before-or-after comparisons, which may be conflating program effects with other unrelated changes over time (for example, generally improving wealth/education/sanitation/etc.) Many studies make simple participant-to-non-participant comparisons, which can face a significant problem of selection bias: the people who are chosen to participate in a program, or who choose to participate in a program, may be different from non-participants in many ways, so differences may emerge that can’t be attributed to the program.

One way to deal with the problem of causal attribution is via randomization. A randomized controlled trial (in this context) is a study in which a set of people is identified as potential program participants, and then randomly divided into one or more “treatment group(s)” (group(s) participating in the program in question) and a “control group” (a group that experiences no intervention). When this is done, it is generally presumed that any sufficiently large differences that emerge between the treatment and control groups were caused by the program.

Many, including us, consider the randomized controlled trial to be the “gold standard” in terms of causal attribution. However, there are often cases in which randomized controlled trials are politically, financially or practically non-feasible, and there are a variety of other techniques for attributing causality, including:

  • Instrumental variables. An “instrumental variable” is a variable that affects the outcome of interest (for example, income) only through its impact on the intervention/program of interest (for example, access to schooling). An example of such an approach is Duflo 2001, which examines a large-scale government school construction program; it reasons that people who lived in districts that the program reached earlier got better access to education, through a “luck of the draw” that could be thought of as similar to randomization, so any other differences between people who lived in such districts and other people could fairly be attributed to differences in access to education, rather than other differences.We are open to the possibility of a compelling instrumental-variables study, but in practice, it seems that we see very few instrumental variables that are highly plausible as meeting the criteria, and many that seem very questionable. For example, a a paper by McCord, Conley and Sachs uses malaria ecology as an instrument for mortality, implying that the only way malaria ecology could affect the outcome of interest (fertility) is through its impact on mortality. However, Sachs has elsewhere argued that malaria ecology affects people in many ways other than through mortality, and we believe this to be the case.
  • Regression discontinuity. Sometimes there is a relatively arbitrary “cutoff point” for participation in a program, and a study may therefore compare people who “barely qualify” with people who “barely fail to qualify,” along the lines of this study on giving children vouchers to purchase computers. We believe this method to be a relatively strong method of causal attribution, but (a) there tend to be major issues with external validity, since comparing “people who barely qualified with people who barely failed to qualify” may not give results that are representative of the whole population being served; (b) this methodology appears relatively rare when it comes to the topics we focus on.
  • Using a regression to “control for” potential confounding variables. We often see studies that attempt to list possible “confounders” that could serve as alternative explanations for an observed effects, and “control” for each confounder using a regression. For example, a study might look at the relationship between education and later-life income, recognize that this relationship might be misleading because people with more education may have more income to begin with, and therefore examine the relationship between education and income while “controlling for” initial income.We believe that this approach is very rarely successful in creating a plausible case for causality. It is difficult to name all possible confounders and more difficult to measure them; in addition, the idea that such confounders are appropriately “controlled for” usually depends on subtle (and generally unjustified) assumptions about the “shape” of relationships between different variables. Details of our view are beyond the scope of this post, but we recommend Macro Aid Effectiveness Research: A Guide for the Perplexed (authored by David Roodman, whom we have written about before) as a good introduction to the common shortcomings of this sort of analysis.
  • Visual and informal reasoning. Researcher sometimes make informal arguments about the causal relationship between two variables, by e.g. using visual illustrations. An example of this: the case for VillageReach includes a chart showing that stock-outs of vaccines fell dramatically during the course of VillageReach’s program. Though no formal techniques were used to isolate the causal impact of VillageReach’s program, we felt at the time of our VillageReach evaluation that there was a relatively strong case in the combination of (a) the highly direct relationship between the “stock-outs” measure and the nature of VillageReach’s intervention (b) the extent and timing of the drop in stockouts, when juxtaposed with the timing of VillageReach’s program. (We have since tempered this conclusion.)We sometimes find this sort of reasoning compelling, and suspect that it may be an under-utilized method of making compelling causal inferences.

Publication bias
We’ve written at length about publication bias, which we define as follows:

“Publication bias” is a broad term for factors that systematically bias final, published results in the direction that the researchers and publishers (consciously or unconsciously) wish them to point.

Interpreting and presenting data usually involves a substantial degree of judgment on the part of the researcher; consciously or unconsciously, a researcher may present data in the most favorable light for his/her point of view. In addition, studies whose final conclusions aren’t what the researcher (or the study funder) hoped for may be less likely to be made public.

Publication bias is a major concern of ours. As non-academics, we aren’t easily able to assess the magnitude and direction of this sort of bias, but we suspect that there are major risks anytime there is a combination of (a) a researcher who has an agenda/”preferred outcome”; (b) a lot of leeway for the researcher to make decisions that aren’t transparent to the reader. We’d guess that both (a) and (b) are very common.

When we evaluate a study, we consider the following factors, all of which bear on the question of how worried we should be that the paper reflects “the conclusions the researcher wanted to find” rather than “the conclusions that the data, impartially examined, points to”:

  • What are the likely motivations and hopes of the authors? If a study is commissioned/funded by a charity, the researcher is probably looking for an interpretation that reflects well on the charity. If a study is published in an academic journal, the researcher is likely looking for an interpretation that could be considered “interesting” – which usually means finding “some effect” rather than “no effect” for a given intervention, though there are potential exceptions (for example, it seems to us that the relatively recent studies of microfinance would have been considered “interesting” whether they found strong effects or no effects, since the impacts of microfinance are widely debated).
  • Is the paper written in a neutral tone? Do the authors note possible alternate interpretations of the data and possible objections to their conclusions? When we saw a white paper commissioned by the Grameen Foundation (at the time, the most comprehensive review of the literature on microfinance we could find) making statements like “Unfortunately, rather than ending the debate over the effectiveness of microfinance, Pitt and Khandker’s paper merely fueled the fire” and “The previous section leaves little doubt that microfinance can be an effective tool to reduce poverty” (a statement that didn’t seem true to us), we questioned the intentions of the author, and were more inclined to be pessimistic where details were scarce. In general, we expect a high-quality paper to proactively identify counterarguments and limitations to its findings.
  • Is the study preregistered? Does it provide a link to the full details of its analysis, including raw data and code? As we have previously written, preregistration and data/code sharing are two important tools that can alleviate concerns around publication bias (by making it harder for questionable analysis decisions to go unnoticed).It seems to us that these practices are relatively rare in the field of economics, and less rare in the field of medicine.
  • How many outcomes does the study examine, and which outcomes does it emphasize in its summary? We often see studies that look for an intervention’s effect on a wide range of outcomes, find significant effects only on a few, and emphasize these few without acknowledging (or quantitatively analyzing) the fact that focusing on the “biggest measured effect size” is likely to overstate the true effect size. Preregistration (see above) would alleviate this issue by allowing researchers to credibly claim that the outcome they emphasize was the one they had intended to emphasize all along (or, if it wasn’t, to acknowledge as much). However, even if a study isn’t preregistered, it can acknowledge the issue and attempt to adjust for it quantitatively; studies frequently do not do so.
  • Is the study expensive? Were its data collected to answer a particular question? If a lot of money and attention is put into a study, it may be harder for the study to fall prey to one form of publication bias: the file drawer problem. Most of the field studies we come across involve collecting data on developing-world populations over a period of years, which is fairly expensive, for the purpose of answering a particular question; by contrast, studies that consist simply of analyzing already-publicly-available data, or of experiments that can be conducted in the course of a day (as with many psychology studies), seem to us to be more susceptible to the file-drawer problem.

Other considerations

  • Effect size and p-values. A study will usually report the “effect size” – the size of the effect it is reporting for the program/treatment – in some form, along with a p-value that expresses, roughly speaking, how likely it is that an effect size at least as big as the reported effect size would have been observed, by chance, if the treatment fundamentally had no effect.We find the effect size useful for obvious reasons – it tells us how much difference the program is reported to have made, and we can then put this in context with what we’ve seen of similar programs to gauge plausibility. We find the p-value (and, related, reports of which effects are “statistically significant” – which, in the social sciences, generally means a p-value under 5%) useful for a couple of reasons:
    • Even a very large observed effect (if observed in a relatively small sample) could simply be random variation. We generally emphasize effects with p-values under 5%, which is a rough and common proxy for “unlikely to be random variation.”
    • The p-value tends to be considered important within academia: researchers generally emphasize the findings with p-values under a certain threshold (which varies by field). We would guess that most researchers, in designing their studies, seek to find a sample size high enough that they’ll get a sufficiently low p-value if they observe an effect as large as they hope/expect. Therefore, asking “is the p-value under the commonly accepted threshold?” can be considered a rough way of asking “Did the study find an effect as large as what the researcher hoped/expected to find?”

     

  • Sample size and attrition. “Sample size” refers to the number of observations in the study, both in terms of how many individuals were involved and how many “clusters” (villages, schools, etc.) were involved. “Attrition” refers to how many of the people originally included in the study were successfully tracked for reporting final outcomes.In general, we put more weight on a study when it has greater sample size and less attrition. In theory, the reported “confidence interval” around an effect size should capture what’s important about the sample size (larger sample sizes will generally lead to narrower confidence intervals, i.e., more precise estimates of effect size). But (a) we aren’t always confident that confidence intervals are calculated appropriately, especially in nonrandomized studies; (b) large sample size can be taken as a sign that a study was relatively expensive and prominent, which bears on “publication bias” as discussed above; (c) more generally, we intuitively see a big difference between a statistically significant impact from a study that randomized treatment between 72 clusters including a total of ~1 million individuals (as an unpublished study on the impact of vitamin A did) and a statistically significant impact from a study that included only 111 children (as the Abecedarian Project did), or a study that compared two villages receiving a nutrition intervention to two villages that did not (as with a frequently cited study on the long-term impact of childhood nutrition).
  • Effects of being studied? We think it’s possible that in some cases, the mere knowledge that one has been put into the “treatment group,” receiving a treatment that is supposed to improve one’s life, could be partly or fully responsible for an observed effect. One mechanism for this issue would be the well-known “placebo effect.” Another is the possibility that people might actively try to get themselves included in the treatment group, leading to a dynamic in which the most motivated or connected people become overrepresented in the treatment group.The ideal study is “double-blind”: neither the experimenters nor the subjects know which people are being treated and which aren’t. “Double-blind” studies aren’t always possible; when a study isn’t blinded, we note this, and ask how intuitively plausible it seems that the outcomes observed could have been due to the lack of blinding.
  • External validity. Most of the points above emphasize “internal validity”: the validity of the study’s claim a certain effect occurred in the particular time and place that the study was carried out in. However, even if the study’s claims about what happened are fully valid, there is the additional question: “how will the effects seen in this study translate in other settings and larger-scale programs?”We’d guess that the programs taking place in studies are often unusually high-quality, in terms of personnel, execution, etc. (For example, see our discussion of studies on insecticide-treated nets: the formal studies of net distribution programs involved a level of promoting usage that large-scale campaigns do not and cannot include.) In addition, we often note something about a study that indicates that it took place under unusual conditions (for example, a prominent study of deworming took place while El Nino was bringing worm infections to unusually high levels).

A note on randomized controlled trials (RCTs)
The merits of randomized controlled trials (RCTs) have been debated, and in particular the question has arisen of whether the RCT should be considered the “gold standard.”

We believe that RCTs have multiple qualities that make them – all else equal – more credible than other studies. In addition to their advantages for causal attribution, RCTs tend to be relatively expensive and to be clearly aimed at answering a particular question, which has advantages regarding publication bias. In today’s social sciences environment – in which preregistration is rare – we think that the property of being an RCT is probably the single most encouraging (easily observed) property a study can have, which has a practical implication: we often conduct surveys of research by focusing/starting on finding RCTs (while also trying to include the strongest and most prominent non-RCTs).

That said, the above discussion hopefully makes it clear that we ask a lot of questions about a study besides whether it is an RCT. There are nonrandomized studies we find compelling as well as randomized studies we don’t find compelling. And we think it’s possible that if preregistration were more common, we’d consider preregistration to be a more important and encouraging property of a study than randomization.

Our principles for assessing evidence

For several years now we’ve been writing up our thoughts on the evidence behind particular charities and programs, but we haven’t written a great deal about the general principles we follow in distinguishing between strong and weak evidence. This post will

  • Lay out the general properties that we think make for strong evidence: relevant reported effects, attribution, representativeness, and consonance with other observations. (More)
  • Discuss how these properties apply to several common kinds of evidence: anecdotes, awards/recognition/reputation, “micro” data and “macro” data. (More)

This post focuses on broad principles that we apply to all kinds of “evidence,” not just studies. A future post will go into more detail on “micro” evidence (i.e., studies of particular programs in particular contexts), since this is the type of evidence that has generally been most prominent in our discussions.

General properties that we think make for strong evidence
We look for outstanding opportunities to accomplish good, and accordingly, we generally end up evaluating charities that make (or imply) relatively strong claims about the impact of their activities on the world. We think it’s appropriate to approach such claims with a skeptical prior and thus to require evidence in order to put weight on them. By “evidence,” we generally mean observations that are more easily reconciled with the charity’s claims about the world and its impact than with our skeptical default/”prior” assumption.

To us, the crucial properties of such evidence are:

  • Relevant reported effects. Reported effects should be plausible as outcomes of the charity’s activities and consistent with the theory of change the charity is presenting; they should also ideally get to the heart of the charity’s case for impact (for example, a charity focused on economic empowerment should show that it is raising incomes and/or living standards, not just e.g. that it is carrying out agricultural training).
  • Attribution. Broadly speaking, the observations submitted as evidence should be easier to reconcile with the charity’s claims about the world than with other possible explanations. If a charity simply reports that its clients have higher incomes/living standards than non-participants, this could be attributed to selection bias (perhaps higher incomes cause people to be more likely to participate in the charity’s program, rather than the charity’s program causing higher incomes), or to data collection issues (perhaps clients are telling surveyors what they believe the surveyors want to hear), or to a variety of other factors.The randomized controlled trial is seen by many – including us – as a leading method (though not the only one) for establishing strong attribution. By randomly dividing a group of people into “treatment” (people who participate in a program) and “control” (people who don’t), a researcher can make a strong claim that any differences that emerge between the two groups can be attributed to the program.
  • Representativeness. We ask, “Would we expect the activities enabled by additional donations to have similar results to the activities that the evidence in question applies to?” In order to answer this well, it’s important to have a sense of a charity’s room for more funding; it’s also important to be cognizant of issues like publication bias and ask whether the cases we’re reviewing are likely to be “cherry-picked.”
  • Consonance with other observations. We don’t take studies in isolation: we ask about the extent to which their results are credible in light of everything else we know. This includes asking questions like “Why isn’t this intervention better known if its effects are as good as claimed?”

Common kinds of evidence

  • Anecdotes and stories – often of individuals directly affected by charities’ activities – are the most common kind of evidence provided by charities we examine. We put essentially no weight on these, because (a) we believe the individuals’ stories could be exaggerated and misrepresented (either by the individuals, seeking to tell charity representatives what they want to hear and print, or by the charity representatives responsible for editing and translating individuals’ stories); (b) we believe the stories are likely “cherry-picked” by charity representatives and thus not representative. Note that we have written in the past that we would be open to taking individual stories as evidence, if our “representativeness” concerns were addressed more effectively.
  • Awards, recognition, reputation. We feel that one should be cautious and highly context-sensitive in deciding how much weight to place on a charity’s awards, endorsements, reputation, etc. We have long been concerned that the nonprofit world rewards good stories, charismatic leaders, and strong performance on raising money (all of which are relatively easy to assess) rather than rewarding positive impact on the world (which is much harder to assess). We also suspect that in many cases, a small number of endorsements can quickly snowball into a large number, because many in the nonprofit world (having little else with which to assess a charity’s impact) decide their own endorsements more or less exclusively on the basis of others’ endorsements. Because of these issues, we think this sort of evidence often is relatively weak on the criteria of “relevant reported effects” and “attribution.”We certainly feel that a strong reputation or referral is a good sign, and provides reason to prioritize investigating a charity; furthermore, there are particular contexts in which a strong reputation can be highly meaningful (for example, a hospital that is commonly visited by health professionals and has a strong reputation probably provides quality care, since it would be hard to maintain such a reputation if it did not). That said, we think it is often very important to try to uncover the basis for a charity’s reputation, and not simply rely on the reputation itself.
  • Testimony. We see value in interviewing people who are well-placed to understand how a particular change took place, and we have been making this sort of evidence a larger part of our process (for example, see our reassessment of VillageReach’s pilot project). When assessing this sort of evidence, we feel it is important to assess what the person in question is and isn’t well-positioned to know, and whether they have incentive to paint one sort of picture or another. How the person was chosen is another factor: we generally place more weight on the testimony of people we’ve sought out (using our own search process) than on the testimony of people we’ve been connected to by a charity looking to paint a particular picture.
  • “Micro” data. We often come across studies that attempt to use systematically collected data to argue that, e.g., a particular program improved people’s lives in a particular case. The strength of this sort of evidence is that researchers often put great care into the question of “attribution,” trying to establish that the observed effects are due to the program in question and not to something else. (“Attribution” is a frequent weakness of the other kinds of evidence listed here.) The strength of the case for attribution varies significantly, and we’ll discuss this in a future post.When examining “micro” data, we often have concerns around representativeness (is the case examined in a particular study representative of a charity’s future activities?) and around the question of relevant reported outcomes (these sorts of studies often need to quantify things that are difficult to quantify, such as standard of living, and as a result they often use data that may not capture the full reality of what happened).
  • “Macro” data. Some of the evidence we find most impressive is empirical analysis of broad (e.g., country-level) trends. While this sort of evidence is often weaker on the “attribution” front than “micro” data, it is often stronger on the “representativeness” front. (More.)

In general, we think the strongest cases use multiple forms of evidence, some addressing the weaknesses of others. For example, immunization campaigns are associated with both strong “micro” evidence (which shows that intensive, well-executed immunization programs can save lives) and “macro” evidence (which shows, less rigorously, that real-world immunization programs have led to drops in infant mortality and the elimination of various diseases).