The GiveWell Blog

Measurement is not as common as it should be. Why?

The idea that there should be more measurement appears to be one of the points of widest agreement in the literature on aid. But we believe that agreement in principle is unlikely to mean much until donors (both large and small) act on it. It isn’t enough to request better information; we need to reserve our funds for those who produce it.

This post has two sections. First we give a sample of quotes from a broad set of people and institutions, showing how widespread the call for better measurement is. We then discuss why agreement isn’t enough.

Widespread calls for more and better measurement

From what we’ve seen, the wish for more and better measurement is a near-universal theme in discussions of how to improve international aid. Below is a sample of relevant quotes.

Abhijit Banerjee, Director of the Poverty Action Lab (which used to employ one of our Board members), puts better evaluation at the heart of his 2007 book, Making Aid Work:

The reason [past success stories such as the eradication of smallpox] succeeded, I suspect, is that they started with a project that was narrowly defined and well founded. They were convinced it worked, they could convince others, and they could demonstrate and measure success. Contrast this with the current practice in development aid; as we have seen, what goes for best practice is often not particularly well founded. (pages 22-23)

William Easterly argues for the importance and centrality of evaluation in The White Man’s Burden: Why the West’s Efforts to Aid the Rest Have Done So Much Ill and So Little Good:

[S]ome equally plausible interventions work and others don’t. Aid agencies must be constantly experimenting and searching for interventions that work, verifying what works with scientific evaluation. For learning to take place, there must be information. The aid agencies must carefully track the impact of their projects on poor people using the best scientific tools available, and using outside evaluators to avoid the self-interest of project managers. (page 374)

Think of the great potential for good if aid agencies probed and experimented their way toward effective interventions … Think of the positive feedback loop that could get started as success was rewarded with more resources and expanded further. (page 383)

Jeffrey Sachs, former director of the United Nations Millennium Project (and known for disagreeing with Easterly on many issues – a partial set of debates is available here), calls for more evaluation in The End of Poverty: Economic Possibilities for Our Time:

Much clearer targets of what is to be achieved must accompany a major increase of spending. Every [Millennium Development Goal]-based poverty reduction strategy should be supported by quantitative benchmarks tailored to national conditions, needs, and data availability … Right from the start, the … poverty reduction strategy should prepare to have the investments monitored and evaluated. Budgets and mechanisms for monitoring and evaluation should be essential parts of the strategies. (pages 278-9)

The Center for Global Development created a working group (with support from major foundations) specifically to examine why “very few programs benefit from studies that could determine whether or not they actually made a difference.” Its report provides further argument for why more evaluation is necessary:

Rigorous studies of conditional cash transfer programs, job training, and nutrition interventions in a few countries have guided policymakers to adopt more effective approaches, encouraged the introduction of such programs to other places, and protected large-scale programs from unjustified cuts. By contrast, a dearth of rigorous studies on teacher training, student retention, health financing approaches, methods for effectively conveying public health messages, microfinance programs, and many other important programs leave decisionmakers with good intentions and ideas, but little real evidence of how to effectively spend resources to reach worthy goals. (page 2)

The concern is not limited to researchers and think tanks: it’s one of the primary elements of the Paris Declaration on Aid Effectiveness, which emerged from a major meeting of “development officials and ministers from ninety one countries, twenty six donor organizations and partner countries, representatives of civil society organizations and the private sector.” The declaration states that aid recipients should “Endeavour to establish results-oriented reporting and assessment frameworks that monitor progress against key dimensions of the national and sector development strategies” (pg 8 ), and that donor governments should “Link country programming and resources to results” (pg 8 ). One of its 12 key indicators of progress is an increase in the “Number of countries with transparent and monitorable performance assessment frameworks” (page 10).

For our part, we strongly believe in the importance of measurement, particularly for public charities soliciting individual donations. Some of our reasoning echoes the arguments given above – that helping people is hard, that intuition is a poor guide to what works, and that measurement is necessary for improvement. We also feel the argument is even stronger for the particular area we’re focused on: helping people who have a few thousand dollars and a few hours, but ultimately know very little about the organizations they’re funding and the people they’re trying to help. Formal measurement is necessary for individual donors to hold charities accountable.

Why agreement doesn’t translate to action

The quotes above share a commitment not just to the general usefulness of measurement, but to the idea that there should be more of it than there is. This raises the question taken up by Lant Pritchett in “It pays to be ignorant: a simple political economy of rigorous program evaluation” (PDF): why is quality evaluation so rare, or as he puts it, “How can [the] combination of brilliant well-meaning people and ignorant organization be a stable equilibrium?”

Pritchett argues (pages 33-34) that the scarcity of evaluation can’t be explained by concerns such as expense, practical difficulty, or ethical concerns. He conjectures, instead, that the lack of evaluation is due to strategic behavior by those closest to the programs. These “advocates” tend to be strongly committed to the programs they work on, to the point where their behavior is guided more by trying to get more funding for these programs than trying to get new information on whether they work. According to Pritchett’s model, when donors do not demand rigorous evaluation, “advocates may choose ignorance over public knowledge of true program efficacy … even if it means they too must operate somewhat in the dark” (page 7).

In our view, the most compelling support for Pritchett’s model is his claim that

a huge number of evaluations are started and very few are finished, written up, and publicized. This evaluation attrition is too large to be consistently “bad planning” and is more likely strategic behavior.

This claim matches extremely well with our own observations of charities’ materials. The grant applications we’ve received (available online) frequently give reasonable-sounding plans for future evaluation, but rarely have results from past evaluations available.

We believe that foundations tend to focus on innovation as opposed to past results, while individual donors currently don’t have the information/ability/interest to hold agencies accountable in any way. We know less about the incentives and practices of large aid agencies (governments, the World Bank, etc.) but find it possible that they are driven more by politics and appearances by humanitarian results.

In other words, funders are not forcing evaluation to happen, and until they do, there’s little reason to expect improved/increased measurement. Agreement isn’t enough – if we want better information on what works, we need to commit with our dollars, not just our beliefs. As an individual donor, you have a key role – perhaps the key role – to play.


  • Carl Shulman on March 12, 2009 at 8:47 pm said:

    “a huge number of evaluations are started and very few are finished, written up, and publicized. This evaluation attrition is too large to be consistently “bad planning” and is more likely strategic behavior.”

    It matters a lot whether the evaluations are abandoned when no one’s looking, or finished and buried. For the second problem one solution may be an evaluation registry like medical clinical trial registries:

  • George on March 13, 2009 at 10:44 am said:

    You make some good points here about evaluation, but I question your assertion that individual donors, rather than foundations are better suited to demand improved measurement. You are certainly right that foundations tend to chase innovation rather than results. But aren’t foundation program officers supposed to be evaluating grant results? Aren’t they the seasoned and experienced professionals that best know how to interpret this data?

    NPOs use anecdotal and personal stories with individual donors because they work. This is certainly strategic behavior, but it is not necessarily driven by any underlying desire to obfuscate.

    Which do you think is more likely to solicit a donation from Joe Public: a heartfelt personal story with accompanying glossy photo, or a detailed statistical report that references double-blind studies with p values?

  • Holden on March 16, 2009 at 11:56 am said:

    George, foundations may be better positioned to demand improved measurement, but there’s little that we at GiveWell (or our target audience) can do to change this part of the picture. We also can’t get all individual donors to demand measurement, but we ay be able to get some to do so, and that’s what we focus on.

  • Michele Rodriguez on March 20, 2009 at 2:56 am said:

    I think you underestimate the need for many donors to give based on their emotions rather than their reasoning. For example, good luck telling Mary the cancer survivor that the American Cancer Society has little data to back up the effectiveness of their work.

    You might be able to get some to do so but will you be able to prove that this “some” is enough to make your own efforts worthwhile?

    Speaking on behalf of a small nonprofit that would like very much to reach the highest of standards and be successful at what we do, it would be most helpful if you also provided some materials and resources on your site that helped us learn to successfully evaluate what we’re doing. In our org’s case we desperately want to but are not equipped to.

  • Holden on March 25, 2009 at 11:01 am said:

    Michele, your question about our impact is a valid one. My only answer right now is that we are trying to measure our impact on donations, and time will tell whether it justifies our project. Regarding your other point: there are many existing resources (books, consultants, websites) aiming to help nonprofits improve their efficacy; our goal is more to provide an incentive to do so, although we also try to be as clear as possible about what we look for (as well as posting the materials of the charities we consider strongest).

  • Michele Rodriguez on March 28, 2009 at 12:55 pm said:

    That is ironic. You do not recommend other charities because they cannot meet your requirements but Give Well doesn’t meet these requirements to begin with. I do hope that time proves you right. I do not believe you are.

    In regards to resources, I felt that at least if you provided resources and suggestions based upon the data you were collecting and scrutinizing, you would provide opportunity for the advancement of ideas and possibly improve organizations in that way.

Comments are closed.