The GiveWell Blog

Why we should expect good giving to be hard

June 11, 2011 (updated on: December 8, 2011) | by Holden

We’ve written before about a couple of consistent worldview differences we encounter:

Our default assumption is that a charity isn’t succeeding; most people’s default assumption is that a charity is succeeding.
We think that helping people is hard; most people seem to think it’s easy.

When discussing any specific charity, I can usually think of specific reasons that the charity’s mission is difficult, and specific ways that it might be failing. Here I’m going to try to give a more general argument for why it’s hard to accomplish a lot of good with your donation. I’m not presenting this as a rigorous, evidence-backed argument; I’m just further clarifying my worldview and where I’m coming from.

When you want to help people as a donor, you have to get in line behind all of the groups below:

For-profit companies. I believe that most of the things you can do that make strangers’ lives better are things you can get paid for. Every day people help each other send packages, prepare food, recover from illness, etc. via market transactions. This may seem like a trivial and obvious point, but it’s the reason we are so focused on helping the very poor. When you’re trying to help people who aren’t poor, you’re competing with for-profit enterprises.
And even the very poor get a lot of help from for-profit services. For example, when people started realizing that cellphones could be useful to the very poor, the result was expansion of for-profit cellphone service into the developing world. There were some nonprofit attempts to contribute to this dynamic, but we’re skeptical that they added much value on top of the profit-driven ones.

I am certainly not saying that all profit-making enterprises are helpful, nor that all forms of help are profitable. But a lot of the easiest help to provide – even for the poorest – is already being provided by people who are doing it to make money.
Governments. When a market failure is clear and severe, the government often steps in. Many feel it does not step in enough or that it does more harm than good, but the fact remains that much of the “lowest-hanging fruit” for helping people where markets won’t is covered by governments. Low-income people in the U.S. get free education with high teacher attendance rates, free emergency medical care, and cash among other things. People in the developing-world get far less from their governments, but most governments still provide a good deal of free medical care.
Local philanthropy and community. When it comes to market failures that the government has failed to address, there are still often local nonprofits – and just local people – who are well placed to step in quickly and effectively. This is not an endorsement of small community-based organizations as giving opportunities for individual donors outside the community. If you’re outside the community you’re trying to help, you’re going to have trouble figuring out what the real problems are and who ought to be funded to address them; the people in the community will often be better placed to help, by donating and otherwise, than you are.
Big foundations. There are opportunities to help that are missed by for-profits, governments, and locals. There are many extremely well-funded and well-staffed foundations looking for just these opportunities.
Other donors. If you want your donation to have an impact, you need to find opportunities that have been missed not only by all the groups above, but by other individual donors. Our focus on room for more funding is an attempt to deal with this situation.

In my view, the wealthier the community, the more effective the first three items above (for-profits, government and locals) will be in addressing their problems. Therefore, if you want to find opportunities to provide help that isn’t already being given, you probably need to look at the world’s poorest communities – but doing that probably means helping people who are very far away and very culturally different from yourself, and you have to find opportunities that haven’t already been found by the big foundations or other donors.

When a donor says, “I have $1000 that I’d like to use to help someone,” it may not sound like they’re asking for much. But on reflection, I think they’re really saying, “I’m looking for someone who needs help that they can’t get from a company, their government, their community, or any other donor big or small – and I expect to provide this help just by sending a $1000 check, despite having very little experience or knowledge of the situation.”

Put this way, the donor’s request sounds somewhat exorbitant, and it seems that we shouldn’t expect them to be able to accomplish much with their $1000. Yet as it turns out, I believe that (if they take the rare opportunities that we highlight at GiveWell) they can often use that money to save a life. I think this is a somewhat shocking observation and that it reflects serious problems with the nonprofit ecosystem.

I also think we shouldn’t expect this to be the situation indefinitely. I hope that as the world gets better at providing help to those who need it, all the opportunities to save a life for $1000 will be snapped up more quickly. That will leave GiveWell customers – individual donors looking to help people they’ve never met and know little about – with much less exciting options, and that’s how it should be.

Profile of a GiveWell customer

June 2, 2011 (updated on: July 25, 2016) | by Holden

The Money for Good study examined the size of the potential audience for work like GiveWell’s. What we’d like to see next would be a study on the nature of this audience: what sort of donor is open to giving based on third-party research? How do they think, what sorts of causes are they interested in, and where can they be found?

We don’t have the resources for a large-scale study on this topic, but in the absence of this, we thought it would be worthwhile to share our impressions from interacting with our own “customer donors,” i.e., people who use GiveWell to decide where to give. Because our money moved is somewhat “top heavy” (the bulk of the money comes from a relatively small number of relatively large donors), we have had in-depth conversations and formed informal relationships with people accounting for a large chunk of our influence. Some of the things we’ve found out about GiveWell customers have surprised us and led to changes in strategy; below we discuss the general impressions we’ve gotten.

Attitudes toward evidence seem less key than we would have guessed. When we started GiveWell, we and most of our supporters imagined that new customers could be found in certain industries where people are accustomed to using measurement to evaluate and learn from their decisions. We hoped these people would resonate with our desire to bring feedback loops into areas where feedback loops don’t naturally exist. But we’ve found that a lot of them don’t, largely because impact isn’t the main thing they’re aiming for when they give. People give for many reasons – to maintain friendships, to overcome guilt and cognitive dissonance, to achieve recognition – and a given donor is unlikely to be interested in GiveWell unless achieving impact is at the top of his/her list.
Moral and spiritual values seem more key than we would have guessed. Several of our major “customer donors” see their giving as fulfilling religious imperatives to help others. Many others have backgrounds in philosophy and deep interests in secular moral systems. What these two groups have in common, in my view, is serious investment in abstract principles of morality – principles that tell them it is as worthwhile to help people they’ll never meet as it is to help those in their community.
GiveWell customers tend to be cause-agnostic and/or aligned with our views on the most promising philanthropic causes. We discussed this phenomenon in early 2011; it’s been one of the major surprises for us. We initially anticipated that many donors would agree with our basic approach to research, but would have strong disagreements with us on which causes to support, and that we would thus have to research a lot of different causes to serve them. That hasn’t turned out to be the case: most people tend to be either fully aligned with our approach to giving (including issue-agnosticism and a preference to help the poorest people, even if they’re overseas) or not aligned at all.
GiveWell customers tend to be busy with non-philanthropic pursuits. People who spend all their time on philanthropy generally have their own relationships, convictions based on personal observation, etc. The people most likely to use our research are the ones who don’t have time to do their own. While not surprising, this phenomenon has made it somewhat difficult for us to get thoughtful, meaningful feedback on our research despite our efforts to make it transparent. That’s led us to experiment with more ways to solicit feedback, including formal evaluations, our public email group, summaries of our views on key topics posted to this blog, and (in progress) a possible in-person meeting to discuss our research with any “customer donors” who want to attend.
GiveWell customers don’t tend to be friends with each other; it’s been very hard to find “clusters” of them. Most of our customers have had little success interesting their friends in us. We’ve found very little in the way of particular industries or social groups filled with GiveWell customers; instead, it seems that for every twenty people we talk to in any setting, one or two will become a customer. It’s thus not surprising that one of our top sources of referrals is Google: people tend to find us more than we find them.The major exception is that we seem to get disproportionate interest from fans of Peter Singer.
Some of GiveWell’s donors are very young (considering how much they give) and very few are retired. This has implications for the future of GiveWell’s influence: the money we’re moving now is mostly coming from people whose peak giving years are yet to come.
GiveWell customers never seem interested in public recognition. In our first year, we considered posting acknowledgements to major supporters on our website, but there was no interest. Since then, we have had many customers who require anonymity (even when we ask them to take public credit for our sake) and no customers who’ve requested that we publicly thank them or otherwise help them get recognition.

In my view, the last point is most consistent and most interesting. GiveWell customers have many reasons for giving – some religious, some secular – but they all revolve around achieving an impact for others rather than around getting any direct personal benefit (whether social, reputational or emotional). GiveWell customers are the kind of people who give when no one is watching, and for whom giving is not about the giver. One of the highlights of running GiveWell has been coming into contact with the people who share this value with us.

In defense of the streetlight effect

May 27, 2011 (updated on: July 25, 2016) | by Holden

In a recent guest post for Development Impact, Martin Ravallion writes:

The current fashion [for evaluating aid projects] is for randomized control trials (RCTs) and social experiments more generally … The problem is that the interventions for which currently favored methods are feasible constitute a non-random subset of the things that are done by donors and governments in the name of development. It is unlikely that we will be able to randomize road building to any reasonable scale, or dam construction, or poor-area development programs, or public-sector reforms, or trade and industrial policies—all of which are commonly found in development portfolios. One often hears these days that opportunities for evaluation are being turned down by analysts when they find that randomization is not feasible. Similarly, we appear to invest too little in evaluating projects that are likely to have longer-term impacts; standard methods are not well suited to such projects … (Emphasis mine)

He concludes with a call for “‘central planning’ in terms of what gets evaluated” to ensure that evaluation doesn’t become concentrated among the projects that are easy to evaluate. His post could be seen as a direct retort to the kind of work emphasized in the recent books Poor Economics and More Than Good Intentions (our review). These books present ideas and evidence that is mostly drawn from high-quality studies, and have little to say on questions that high-quality studies cannot help answer.

My instinct is the opposite of Dr. Ravallion’s: I feel that the move toward high-quality evaluations is a good thing, even if it starts to cause bias in what sorts of programs are evaluated – and carried out. What follows is an attempt to explain my feeling on this. My feeling is a function of my worldview and biases, and this post should be taken less as a “rebuttal” than as an opportunity to explicate my worldview and biases.

My disagreement with Dr. Ravallion has to do with my experience as a “customer” of social science. The vast majority of studies I’ve come across have been seemed so methodologically suspect to me that I’ve ended up not feeling they shed much light on anything at all; and many (not all) exceptions are studies that have come out of the “randomista” movement. (Another particularly helpful source of evidence has been Millions Saved, which focused on global health.) Given this situation, I’m not excited about using “central planning” to make sure that researchers continue to try answering questions that they simply don’t have the methods to answer well. I’d rather see them stick to areas where they can be helpful.

What does it look like when we build knowledge only where we’re best at building knowledge, rather than building knowledge on the “most important problems?” A few thoughts jump to mind:

Over the last several decades, I am not sure whether we’ve generated any useful and general knowledge about how to promote women’s empowerment and equality – from the outside – in developing-world countries. But we’ve generated a lot of knowledge about how to produce affordable, convenient birth control in a variety of forms. I would guess (though this is just a guess, as empowerment itself is so hard to measure) that the latter kind of knowledge generation has done much more for empowerment and equality than attempts to study empowerment/equality directly.
Similarly, what has done more for political engagement in the U.S.: studying how to improve political engagement, or studying the technology that led to the development of the Internet, the World Wide Web, and ultimately to sites like Change.org (as well as new campaign methods)?
More broadly, studying areas we’re good at studying and generating knowledge we’re good at generating has led to a lot of wealth generation and poverty reduction. I feel poverty reduction brings a lot of benefits that would be hard to bring about (or even fully understand) directly.

Bottom line – researching topics we’re good at researching can have a lot of benefits, some unexpected, some pertaining to problems we never expected such research to address. Researching topics we’re bad at researching doesn’t seem like a good idea no matter how important the topics are. Of course I’m in favor of thinking about how to develop new research methods to make research good at what it was formerly bad at, but I’m against applying current problematic research methods to current projects just because they’re the best methods available.

If we focus evaluations on what can be evaluated well, is there a risk that we’ll also focus on executing programs that can be evaluated well? Yes and no.

Some programs may be so obviously beneficial that they are good investments even without high-quality evaluations available; in these cases we should execute such programs and not evaluate them.
But when it comes to programs that where evaluation seems both necessary and infeasible, I think it’s fair to simply de-emphasize these sorts of programs, even if they might be helpful and even if they address important problems. This reflects my basic attitude toward aid as “supplementing people’s efforts to address their own problems” rather than “taking responsibility for every problem clients face, whether or not such problems are tractable to outside donors.” I think there are some problems that outside donors can be very helpful on and others that they’re not well suited to helping on; thus, “helping with the most important problem” and “helping as much as possible” are not at all the same to me.

It’s common in our sphere to warn against the “streetlight effect,” i.e., “looking for your keys where there’s light, rather than where the keys are most likely to be.” In the context of aid, this means executing – and studying – the programs that are easiest to evaluate rather than the programs that are most likely to do good. (Chris Blattman uses this analogy in the context of Dr. Ravallion’s post.)

But for the aid world, the right analogy would acknowledge that there are a lot of keys to be found, and a lot of unexplored territory both in and outside the light. In that context, the “streetlight effect” seems like a good thing to me.

Suggestions for the social sciences

May 19, 2011 (updated on: July 25, 2016) | by Holden

Chris Blattman cites our advice on using academic research and asks for “Other suggestions for the profession.” We have several and this seemed like a good time to share them.

Our suggestions should be taken in context, of course. On one hand, we do not have staff with backgrounds in academia; there’s a lot we don’t know about how the academic ecosystem works and what’s feasible. On the other hand, through our work researching charities, we do have an unusual amount of experience trying to use academic research to make concrete decisions. In a sense we’re a “customer” of academia; think of this as customer feedback, which has both strengths and weaknesses compared to internal suggestions.

The #1 change we’d like to see: pre-registration of studiesOur single biggest concern when examining research is publication bias, broadly construed. We wonder both (a) how many studies are done, but never published because people don’t find the results interesting or in line with what they had hoped; (b) for a given paper, how many different interpretations of the data were assembled before picking the ones that make it into the final version.

The best antidote we can think of is pre-registration of studies along the lines of ClinicalTrials.gov, a service of the U.S. National Institutes of Health. On that site, medical researchers announce their questions, hypotheses, and plans for collecting and analyzing data, and these are published before the data is collected and analyzed. If the results come out differently from what the researchers hope for, there’s then no way to hide this from a motivated investigator.

In addition, in some ways we think that a debate over a study would more honest before its results are known.

When we see a charity evaluation, we generally assume that the researchers did what they could to portray the evaluation as positive for the charity; we therefore tend to demand that the study has a very high level of rigor (maximal attempt to eliminate selection bias, full and transparent presentation of methodology and results) before putting any credence in it. We want to feel that the analysis was done the way we would have done it, and not the way the researcher felt would produce the “right answer.”
But if someone showed us an evaluation plan before either of us knew the results, we’d be much more inclined to accept that shortcuts and methodological weaknesses were there for the purpose of saving time, saving money, ethical considerations, etc. instead of the purpose of producing the “right answer.” If the plan sounded reasonable to us, and the study then produced a favorable outcome, we’d find it much less necessary to pick apart the study’s methodology.

Bottom line – we’d potentially lower our bar for study design quality in exchange for a chance to discuss and consider the design before anyone knows the results.

The basic principle of pre-registration is a broad one, and will apply differently in different scenarios, but here are a few additional thoughts on applications:

What we’re describing wouldn’t be feasible with studies that retrospectively analyze old data sets – there’s no way to know whether someone has done the analysis before registering the design and hypothesis. What we’re describing would be feasible with field experiments, which seem to be growing in popularity thanks partly to the excellent work of Poverty Action Lab and Innovations for Poverty Action. It could also be feasible with non-experiments: researchers could pre-register their plans for analyzing any data sets that aren’t yet available.
Poverty Action Lab (example) and Innovations for Poverty Action (example) do publicly publish brief descriptions of studies in progress. These are helpful in that they are a safeguard against burying/canceling studies that aren’t going as hoped, but they would be more helpful if they included information about data to be collected, planned methods of analysis, and hypotheses (for contrast, see this example from ClinicalTrials.gov).
There can be legitimate reasons to change aspects of the study design and analysis methods as the study is ongoing. So failure to conform to the pre-registered version wouldn’t necessarily cause us to dismiss a study. But seeing the original intent, the final study, and a list of changes with reasoning for each change would greatly increase transparency; we would consider whether the reasoning sounded more motivated by the desire to get accurate results or a desire to get specific results.
There can be legitimate reasons to carry out a study without a strong hypothesis (for example, trying a health intervention and examining its impact on a variety of quality-of-life measures without a strong prior view on which would be most likely to show results). However, we still advocate explicitly declaring the hypothesis or lack thereof beforehand. Just seeing that there was no initial strong hypothesis would help us understand a study. Studies with no strong hypothesis are arguably best viewed as “exploratory,” generating hypotheses for future “confirmatory” studies.
As food for thought, imagine a journal that accepted only studies for which results were not yet known. Arguably this journal would be more credible as a source of “well-designed studies addressing worthwhile questions, regardless of their results” as opposed to “studies whose results make the journal editors happy.” For our part, we’d probably watch this sort of journal eagerly while on the watch for top charities.

We think the “randomista” movement has done a lot of good for the credibility and usefulness of research. We’d love to see a “preregistrista” movement.

Some other suggestions
Chris Blattman’s suggestions. We liked Chris’s suggestions:

1. Journals should require submission of replication data and code files with final paper submissions, for posting on the journal site. (The Journal of Conflict Resolution is one of the few major political science or economics journals I know that does so faithfully.)

2. PhD field and method courses ought to encourage replication projects as term assignments. (Along with encouragements to diplomacy–something new scholars are slow to learn, to their detriment.)

Tougher norms regarding the use of footnotes to support strong claims. It is frequent for a paper to make a strong claim, with a footnote leading to a citation; it’s then up to us to find the paper cited (not always possible) and figure out in what way and to what degree it supports the claim. The paper may not be a literature review or other broad overview; it’s often a single study, which we don’t consider sufficient evidence for a strong claim (see our previous post).

On the GiveWell site we use a different convention, one that grew out of our own wish to facilitate easy checking and updating of footnotes by different staff members and volunteers. A GiveWell footnote will either (a) lead to another GiveWell page (these tend to be concise, up-to-date overviews of a topic); or (b) include enough quotation and/or analysis to make it clear what the heart of the support for the claim is, along with page numbers for any quotation. It’s therefore easy to quickly see the degree to which our footnotes support our claims (though one may still wish to vet the papers that we cite and see whether their analysis supports their own claims).

If space limitations are an issue, footnotes could be published online, as they are for some books.

Online tools for viewing the full debate/discussion around a paper at a glance. We’d like to see websites organizing the relationships between different papers and comments, such that we could look up a paper and see the whole debate/discussion around it at a glance – including

Followups
Published responses and challenges
Comments from other scholars, without these comments’ having to get published separately
Literature reviews discussing the paper
Related (but chronologically later) papers
Perhaps a summary of the overall discussion, strengths, weaknesses, etc. (when someone wanted to submit one and it passed the website’s quality criteria)

Microlending debate: An example of why academic research should be used with caution

May 13, 2011 (updated on: July 26, 2016) | by Holden

We often use academic research to inform our work, but we try to do so with great caution, rather than simply taking reported results at face value. We believe that if you trust academic research just because it is peer-reviewed, published, and/or reputable, this is a mistake.

A good example of why we’re concerned comes from the recent back-and-forth between David Roodman and Mark Pitt, which continues a debate begun in 1999 over what used to be considered the single best study on the social impact of microfinance.

It appears that the leading interpretation of this study swung wildly back and forth over the course of a decade, based not on major reinterpretations but on arguments over technical details, while those questioning the study were unable to view the full data and calculations of the original. We feel that this illustrates problems with taking academic research at face value and supports many of the principles we use in our approach to using academic research. Details follow.

Timeline/summary1998-2005 studies by Khandker and Pitt

According to a 2005 white paper published by the Grameen Foundation (PDF), a 1998 book and accompanying paper released by Shahidur Khandker and Mark Pitt “were influential because they were the first serious attempt to use statistical methods to generate a truly accurate assessment of the impact of microfinance.”

Jonathan Morduch challenged these findings shortly after their publication, but a 2005 followup by Khandker appeared to answer the challenge and claim that microlending had a very strong social impact:

each additional 100 taka of credit to women increased total annual household expenditures by more than 20 taka … microfinance accounted for 40 percent of the entire reduction of moderate poverty in rural Bangladesh.

As far as we can tell, this result stood for about four years as among the best available evidence that microlending helped bring people out of poverty. Our mid-2008 review of the evidence stated,

These studies rely heavily on statistical extrapolation about who would likely have participated in programs, and they are far from the strength and rigor of the Karlan and Zinman (2007) study listed above, but they provide somewhat encouraging support for the idea that the program studied had a widespread positive effect.

2009 response by Roodman and Morduch

A 2009 paper by David Roodman and Jonathan Morduch argued that

The Khandker and Pitt studies were seriously flawed in their attempts to attribute impact. The reduction in poverty they observed could have been an artifact of wealth driving borrowing, rather than the other way around.
The Khandker and Pitt studies could not be replicated: the full data and calculations they had used were not public, and Roodman and Morduch’s best attempts at a replication did not produce a remotely similar conclusion (they demonstrated no positive social impact of microlending, even a slight negative one).

This paper stood for the next two years as a prominent refutation of the Khandker and Pitt studies. Pitt writes that the work of Roodman and Morduch has become “well-known in academic circles” and “seems to have had a broad impact.” It appeared in a “new volume of the widely respected Handbook of Development Economics” as well as in congressional testimony.

2011 back-and-forth

Earlier this year,

Mark Pitt published a response arguing that Roodman and Morduch’s failure to replicate his study was due to Roodman and Morduch’s errors.
David Roodman replied, conceding an error in his original replication but defending his claim that the original study (by Khandker and Pitt) was not a valid demonstration of the impact of microlending.
Mark Pitt responded again and argued that the study was a valid demonstration.
David Roodman defended his statement that it was not and added, “this is the first time someone other than [Mark Pitt] has been able to run and scrutinize the headline regression in the much-discussed paper … If you anchor Pitt and Khandker’s regression properly in the half-acre rule … the bottom-line impact finding goes away.”
We had hoped to see a further response from Mark Pitt before discussing this matter, but Roodman also wrote that Mark Pitt is now traveling and that “this could be the last chapter in the saga for a while.”

Bottom line: as far as we can tell, we still have one researcher claiming that the original study strongly demonstrates a positive social impact of microfinance; another researcher claiming it demonstrates no such thing; and no end in sight, 13 years after the publication of the original study.

Disagreements among researchers are common, but this one is particularly worrisome for a few reasons.

Major concerns highlighted by this case

Conflicting interpretations of the study have each stood for several years at a time. The original study stood as the leading evidence about microlending’s social impact between 2005-2009; the challenge by Roodman and Morduch was highly prominent, and apparently not commented on at all by the original authors, between 2009-2011.

Disagreements have been technical, many concerning details that few understand and that still don’t seem resolved. David Roodman states that “the omission of the dummy for a household’s target status” is responsible for his estimated effect of microlending coming out negative instead of positive. Numerous other errors on both sides are alleged, and the remaining disagreements over causal inference are certainly beyond what I can easily follow (if a reader can explain them in clear terms I encourage doing so in the comments).
Resolution has been hampered by the fact that Roodman and Morduch could only guess at the calculations Pitt and Khandker performed. This is the biggest concern to me. Roodman writes that he was never able to obtain the original data set used in the paper; that the data set he did receive (upon request) was (in his view) confusingly labeled; and even that one of the original authors “fought our efforts to obtain the later round of survey data from the World Bank.” As a result, his attempt at replication was a “scientific whodunit,” and his April 2011 update represents “the first time someone other than [the original author] has been able to run and scrutinize the headline regression in the much-discussed paper.”If I weren’t already somewhat familiar with this field, I would be shocked that it’s even possible to have a study accepted to any journal (let alone a prestigious one) without sharing the full details of the data and calculations, and having the calculations replicated and checked. But in fact, disclosure of data – and replication/checking of calculations – appears to be the exception, not the rule, and is certainly not a standard part of the publication/peer review process.
Bottom line – the leading interpretation of a reputable and important study swung wildly back and forth over the course of a decade, based not on revolutionary reinterpretations but on quibbles over technical details, while no one was able to view the full data and calculations of the original. For anyone assuming that a prestigious journal’s review process – or even a paper’s reputation – is a sufficient stamp of reliability on a paper, this is a wake-up call.

Some principles we use in interpreting academic research
- Never put too much weight on a single study. If nothing else, the issue of publication bias makes this an important guideline. (On this note, note that the 2009 Roodman and Morduch paper was rejected for publication; its sole peer-reviewer was an author of the original paper that Roodman and Morduch were questioning.)
- Strive to understand the details of a study before counting it as evidence. Many “headline claims” in studies rely on heavy doses of assumption and extrapolation. This is more true for some studies than for others.
- If a study’s assumptions, extrapolations and calculations are too complex to be easily understood, this is a strike against the study. Complexity leaves more room for errors and judgment calls, and means it’s less likely that meaningful critiques have had the chance to emerge. Note that before the 2009 response to the study discussed here was ever published, GiveWell took it with a grain of salt due to its complexity (see quote above). Randomized controlled trials tend to be relatively easy to understand; this is a point in their favor.
- If a study does not disclose the full details of its data and calculations, this is another strike against it – and this phenomenon is more common than one might think.
- Context is key. We often see charities or their supporters citing a single study as “proof” of a strong statement (about, for example, the effectiveness of a program). We try not to do this – we generally create broad overviews of the evidence on a given topic and source our statements to these.
While a basic fact can be researched, verified and cited quickly, interpreting an impact study with appropriate care takes – in our view – concentrated time and effort and plenty of judgment calls. This is part of why we’re less optimistic than many about the potential for charity research based on (a) crowdsourcing; (b) objective formulas. Instead, our strategy revolves around transparency and external review.

Evaluating local charities in India

May 4, 2011 (updated on: December 8, 2011) | by Holden

A donor of ours earmarked $10,000 for regranting to a local charity in India, and in deciding how to give this away (and for general learning) we conducted 20+ site visits to small NGOs during our travels. In a sense, this was a chance for us to try out a more traditional method of giving: heavily based on referrals, site visits, and informal impressions rather than desk research.

This post, a followup to posts on our general thoughts on India (from myself, Elie and Natalie) summarizes our thoughts from these site visits and this decision. Note that we have posted detailed notes (and in some cases pictures) from the site visits at the official page of GiveWell’s trip to India.

Criteria

Our normal criteria weren’t a good fit with the organizations we visited, most of which were tiny and without the capacity for extensive monitoring, evaluation and documentation. So we followed the same basic principles that led us to our original criteria: look around, ask questions, and try to articulate our reactions in ways that lead to consistent principles.

The questions that ended up mattering most to us were:

Is the organization serving a population that is clearly in need? Is the organization run by people who seem thoughtful, competent and well-intentioned? These are the questions that seem most amenable to being answered by site visits, and need little explanation.

To what extent do the organization’s activities flow from clients’ needs, as opposed to donors’ strategies? It’s easiest to articulate the categories we mentally placed organizations in through a few examples:

Room to Read, a large international NGO, seemed focused, to its core, on supporting and promoting libraries. I’d refer to this as a “strategic” organization, consistently pursuing a particular theory of change.
Helping Hands, a small community NGO, seemed much more “improvisational.” When we asked the woman running it how she chose whom and what to fund, she told stories of individuals and one-off events (girls who needed money, schools that needed new supplies, etc.)
Seva Mandir, an NGO working in many areas around the city of Udaipur, had a set of relatively large-scale programs, each of which seemed (from conversations) to have sprung from specific, consistent, repeated needs and requests of clients. Its described process for deciding which programs to execute revolved around formal village meetings. I’d term this a “systematically bottom-up” organization.

The approach we felt most comfortable with was the “systematically bottom-up” approach. “Improvisational” organizations seemed to rely too heavily on specific personalities and relationships, while “strategic” organizations left us concerned about the extent to which they were run for clients as opposed to donors, and the appropriateness of their programs for the people with whom they were working. These concerns mean not that we would never recommend such organizations, but that the need for strong monitoring and evaluation (which our normal research process emphasizes and our site visits did not) becomes more central for them.

How insulated is the organization’s management from potential problems? Some organizations are structured for infrequent, limited interactions with clients, while others are built on constant and close contact with clients. Some organizations try to make very long-term or difficult-to-observe differences in people’s lives (examples: sports for character development, advocacy for changing sexual and other private practices), while others aim for more tangible, short-term help.

Both of these distinctions relate to what we came to refer to as the “insulation level” of an organization. Some organizations seem, by their basic structure, likely to become aware of any failure to help clients; for organizations without this feature, monitoring and evaluation become more important to us.

Related to this, we often (when we felt it was appropriate) asked staff to tell us about specific clients, and preferred organizations where they seemed to know a lot about the lives, histories and particular needs of individuals.

How do staff prioritize clients vs. funders?

At one orphanage we visited, we were told that the children were sad today because they couldn’t go outside; when we asked why they couldn’t outside, we were told that they were staying in to meet us. On another visit, we joined a village meeting and were told that the people had been waiting for us for over an hour. We were embarrassed and unnerved by these situations.

We feel strongly that clients ought to come first, which for small organizations (often without specialized fundraising staff) may mean that funders and other visitors have to wait their turn. The orphanage we felt best about was the one where the children assembled for a brief greeting and then dispersed to play computer games and otherwise enjoy themselves. The director of the organization we ultimately awarded the $10,000 to was clear and unapologetic about the fact that her work with clients was more important than her time with us (more below).

Thoughts on different types of organizations

The organizations we found ourselves most interested in were the ones that focused on orphans and/or street children. It seemed to us that these were the populations with the greatest risks and needs, and that the help they needed had a “low-insulation” quality: organizations working with them aim to ensure that they are having basic needs met, are being raised in healthy environments closer to “normal” than what they would have otherwise, and end up healthy and educated.

There are concerns about orphanages and the extent to which they may be taking children away from their families. We definitely felt more comfortable with some orphanages than with others (examples above), and as mentioned below, we ultimately felt most comfortable with a center that provided services to children (including food and shelter) while not taking complete control of their lives. One thing that does appeal to me about orphanages, though, is that the children in them seem generally to have excellent English – something that I believe is a very real and strong advantage in the Mumbai economy.

We were also very interested in organizations focused on keeping children out of the sex trade, simply because the need is so extreme and the goal seems so valuable to us. However, the specific organizations we visited in this category didn’t leave us feeling confident, partly because of the greater challenges one faces in knowing whether their activities are working.

Given the striking and visible needs we saw all around us, we were less interested in organizations focused on more intangible, theoretical benefits. For example, Magic Bus facilitates soccer programs for children; it’s hard for me to justify this focus when those same children are underfed and under-sheltered. I have heard the argument that sports have a character-building quality that ultimately results in more good being done, but this sort of reasoning strikes me as theoretical and romanticized, and I wouldn’t accept this hypothesis without strong evidence (which I do not believe exists).

We have mixed feelings about education-focused programs.

The evidence that education is causally helpful for later success in life is surprisingly weak – there isn’t strong evidence either for or against the idea. Our intuition is that education is indeed very important; based on conversations we’ve had, it seems a strong prerequisite to many of the more desirable career tracks. More at our writeup on developing-world education.
We’ve seen no organizations that can make a compelling case, using academic performance data, for their positive impact on education. The best we’ve seen in this area is Pratham, which we visited twice while in Mumbai.
Our site visits to education charities left us with little sense of the quality of the programs; we weren’t even sure what to look for. See our notes on Pratham for more on this.

Salaam Baalak Trust
Salaam Baalak Trust was the organization we felt performed best on the metrics above:

It is a drop-in center for children living on the street. Unlike orphanages, it encourages children to live primarily with their existing families, but it provides shelter, meals, counseling and other services for children to use as they wish.
This basic program is strong on the insulation- and strategy-related criteria above, and the organization is small enough that the two site visits we made gave us a look at a good proportion of what it does.
Its director, Dinaz Stafford, left a good impression on us.
- We had a definite sense that she was putting clients before visitors: she interrupted a meeting to speak to a child who had come in, was 45 minutes late to another meeting (explaining she had been dealing with a child who was having issues) and told us we had to leave when it was time for her to attend a meeting about children who were struggling.
- She also invited us to examine her case files (on condition that we not record or disseminate names), which gave basic information on each child’s family situation and health over time, and when we asked her questions about specific children, she was easily able to rattle off specifics about them.

Enter search terms here.

Search form

The GiveWell Blog