The GiveWell Blog

Our principles for assessing evidence

For several years now we’ve been writing up our thoughts on the evidence behind particular charities and programs, but we haven’t written a great deal about the general principles we follow in distinguishing between strong and weak evidence. This post will

  • Lay out the general properties that we think make for strong evidence: relevant reported effects, attribution, representativeness, and consonance with other observations. (More)
  • Discuss how these properties apply to several common kinds of evidence: anecdotes, awards/recognition/reputation, “micro” data and “macro” data. (More)

This post focuses on broad principles that we apply to all kinds of “evidence,” not just studies. A future post will go into more detail on “micro” evidence (i.e., studies of particular programs in particular contexts), since this is the type of evidence that has generally been most prominent in our discussions.

General properties that we think make for strong evidence
We look for outstanding opportunities to accomplish good, and accordingly, we generally end up evaluating charities that make (or imply) relatively strong claims about the impact of their activities on the world. We think it’s appropriate to approach such claims with a skeptical prior and thus to require evidence in order to put weight on them. By “evidence,” we generally mean observations that are more easily reconciled with the charity’s claims about the world and its impact than with our skeptical default/”prior” assumption.

To us, the crucial properties of such evidence are:

  • Relevant reported effects. Reported effects should be plausible as outcomes of the charity’s activities and consistent with the theory of change the charity is presenting; they should also ideally get to the heart of the charity’s case for impact (for example, a charity focused on economic empowerment should show that it is raising incomes and/or living standards, not just e.g. that it is carrying out agricultural training).
  • Attribution. Broadly speaking, the observations submitted as evidence should be easier to reconcile with the charity’s claims about the world than with other possible explanations. If a charity simply reports that its clients have higher incomes/living standards than non-participants, this could be attributed to selection bias (perhaps higher incomes cause people to be more likely to participate in the charity’s program, rather than the charity’s program causing higher incomes), or to data collection issues (perhaps clients are telling surveyors what they believe the surveyors want to hear), or to a variety of other factors.The randomized controlled trial is seen by many – including us – as a leading method (though not the only one) for establishing strong attribution. By randomly dividing a group of people into “treatment” (people who participate in a program) and “control” (people who don’t), a researcher can make a strong claim that any differences that emerge between the two groups can be attributed to the program.
  • Representativeness. We ask, “Would we expect the activities enabled by additional donations to have similar results to the activities that the evidence in question applies to?” In order to answer this well, it’s important to have a sense of a charity’s room for more funding; it’s also important to be cognizant of issues like publication bias and ask whether the cases we’re reviewing are likely to be “cherry-picked.”
  • Consonance with other observations. We don’t take studies in isolation: we ask about the extent to which their results are credible in light of everything else we know. This includes asking questions like “Why isn’t this intervention better known if its effects are as good as claimed?”

Common kinds of evidence

  • Anecdotes and stories – often of individuals directly affected by charities’ activities – are the most common kind of evidence provided by charities we examine. We put essentially no weight on these, because (a) we believe the individuals’ stories could be exaggerated and misrepresented (either by the individuals, seeking to tell charity representatives what they want to hear and print, or by the charity representatives responsible for editing and translating individuals’ stories); (b) we believe the stories are likely “cherry-picked” by charity representatives and thus not representative. Note that we have written in the past that we would be open to taking individual stories as evidence, if our “representativeness” concerns were addressed more effectively.
  • Awards, recognition, reputation. We feel that one should be cautious and highly context-sensitive in deciding how much weight to place on a charity’s awards, endorsements, reputation, etc. We have long been concerned that the nonprofit world rewards good stories, charismatic leaders, and strong performance on raising money (all of which are relatively easy to assess) rather than rewarding positive impact on the world (which is much harder to assess). We also suspect that in many cases, a small number of endorsements can quickly snowball into a large number, because many in the nonprofit world (having little else with which to assess a charity’s impact) decide their own endorsements more or less exclusively on the basis of others’ endorsements. Because of these issues, we think this sort of evidence often is relatively weak on the criteria of “relevant reported effects” and “attribution.”We certainly feel that a strong reputation or referral is a good sign, and provides reason to prioritize investigating a charity; furthermore, there are particular contexts in which a strong reputation can be highly meaningful (for example, a hospital that is commonly visited by health professionals and has a strong reputation probably provides quality care, since it would be hard to maintain such a reputation if it did not). That said, we think it is often very important to try to uncover the basis for a charity’s reputation, and not simply rely on the reputation itself.
  • Testimony. We see value in interviewing people who are well-placed to understand how a particular change took place, and we have been making this sort of evidence a larger part of our process (for example, see our reassessment of VillageReach’s pilot project). When assessing this sort of evidence, we feel it is important to assess what the person in question is and isn’t well-positioned to know, and whether they have incentive to paint one sort of picture or another. How the person was chosen is another factor: we generally place more weight on the testimony of people we’ve sought out (using our own search process) than on the testimony of people we’ve been connected to by a charity looking to paint a particular picture.
  • “Micro” data. We often come across studies that attempt to use systematically collected data to argue that, e.g., a particular program improved people’s lives in a particular case. The strength of this sort of evidence is that researchers often put great care into the question of “attribution,” trying to establish that the observed effects are due to the program in question and not to something else. (“Attribution” is a frequent weakness of the other kinds of evidence listed here.) The strength of the case for attribution varies significantly, and we’ll discuss this in a future post.When examining “micro” data, we often have concerns around representativeness (is the case examined in a particular study representative of a charity’s future activities?) and around the question of relevant reported outcomes (these sorts of studies often need to quantify things that are difficult to quantify, such as standard of living, and as a result they often use data that may not capture the full reality of what happened).
  • “Macro” data. Some of the evidence we find most impressive is empirical analysis of broad (e.g., country-level) trends. While this sort of evidence is often weaker on the “attribution” front than “micro” data, it is often stronger on the “representativeness” front. (More.)

In general, we think the strongest cases use multiple forms of evidence, some addressing the weaknesses of others. For example, immunization campaigns are associated with both strong “micro” evidence (which shows that intensive, well-executed immunization programs can save lives) and “macro” evidence (which shows, less rigorously, that real-world immunization programs have led to drops in infant mortality and the elimination of various diseases).

Quick update: New way to follow GiveWell’s research progress

There are two types of materials we publish periodically throughout the year:

  • We frequently speak with charity representatives or other subject matter experts. We ask permission to take notes during these conversations so that we can publish them to our conversations page.
  • We publish new charity review or intervention report pages.

We’ve set up a Google Group so that those who want can get updated when we publish new material.

You can subscribe to this via RSS using this RSS feed. You can also sign up to receive updates via email at the group’s home page.

Revisiting the 2011 Japan disaster relief effort

Last year, Japan was hit by a severe earthquake and tsunami, and we recommended giving to Doctors Without Borders specifically because it was not soliciting funds for Japan. We reasoned that the relief effort did not appear to have room for more funding – i.e., we believed that additional funding would not lead to a better emergency relief effort. We made our case based on factors including the lack of an official appeal on ReliefWeb, reports from the U.N. Office for the Coordination of Humanitarian Affairs, statements by the Japanese Red Cross, the behavior of major funders including the U.S. government, and the language used by charities in describing their activities. We acknowledged that donations may have beneficial humanitarian impact in Japan, as donations could have beneficial humanitarian impact anywhere, but felt that the nature of the impact was likely to fall under what we characterized as “restitution” and “everyday aid” activities, as opposed to “relief” or “recovery” activities.

Since it’s now been over a year since the disaster, we made an effort to find one-year reports from relevant organizations and get a sense of how donations have been spent.

We have published a detailed and sourced set of notes on what reports we could find and what they revealed about activities. Our takeaways:

  • Very little information on expenditures was provided. Out of 11 organizations we examined, there were only 6 that prominently reported (such that we could find it) the total amount raised or spent for Japan disaster relief. Out of these, only Save the Children, the American Red Cross, and the Japanese Red Cross provided any breakdown of spending by category. Breakdowns provided by Save the Children and American Red Cross were very broad, with 5 and 3 categories respectively; the Japanese Red Cross provided more detail.
  • The Japanese Red Cross spent most of the funds it received on two categories of expense: (1) cash transfers and (2) electrical appliances for those affected. It reports the equivalent of ~$4.2 billion in cash grants. Out of its ~$676 million budget for recovery activities, 49% was spent specifically on “sets of six electronic household appliances … distributed to 18,840 households in Iwate, 48,638 in Miyagi, 61,464 in Fukushima and 1,820 in other prefectures.” (This quote was the extent of the information provided on this activity.) The Japanese Red Cross also spent significant funds on reconstruction/rehabilitation of health centers and services and pneumonia vaccinations for the elderly.

    A relatively small amount of funding (the equivalent of ~$5.6 million) is reported for activities that the Japanese Red Cross puts under its “emergency” categories in its budget. (These include distribution of supplies, medical services, and psychosocial counseling). It is possible that there was a separate budget for emergency relief that is not included in the report.

    Note that the Japanese Red Cross raised and spent substantially more money than the other nonprofits we’ve listed, and also gave substantially more detail on its activities and expenses.

  • Other nonprofits reported a mix of traditional “relief” activities, cash-transfer-type activities, and entertainment/recreation-related activities. Of the groups that provide some concrete description of their activities (not all did), all reported engaging in distribution of basic supplies and/or provision of psychosocial counseling. Most reported some cash-transfer-type activities: cash-for-work; scholarships; grants to community organizations; support for fisheries, including re-branding efforts and provision of fishing vessels. And most reported some entertainment/recreation activities: festivals, performing arts groups, community-building activities, sporting equipment and sports programs for youth, weekend and summer camps. None reported only traditional “relief” activities. (We concede that all of these activities may have had substantial humanitarian impact, and that some may have been complementary to more traditional “relief” activities; however, we think it is important to note these distinctions, for reasons discussed below.)
  • Currently, Oxfam’s page on the Japan disaster states, “Oxfam has been ready to assist further but is not launching a major humanitarian response at this time. We usually focus our resources on communities where governments have been unable – or, in some cases, unwilling – to provide for their people. But the Japanese government has a tremendous capacity for responding in crises, and a clear commitment to using its resources to the fullest.” Note that on the day of the disaster, Oxfam featured a solicitation for this disaster on its front page.

Based on our earlier conclusion that the relief effort did not have “room for more funding,” we expected to find (a) reports of the sort of activities that nonprofits could spend money on in non-disaster-relief settings (including cash-transfer-type programs, giving out either cash itself or items that could easily be resold, which could likely be carried out in any setting without objections); (b) reports that were relatively light on details and financial breakdowns. We observed both of these things in the reports discussed above, in nearly every case. In isolation, nothing about the above-described activities rules out the idea that nonprofits were carrying out important, beneficial activities that were core to recovery; but when combined with our earlier evidence of no “room for more funding,” we feel that the overall picture is consistent.

We were somewhat surprised to see the degree to which many nonprofits funded entertainment/recreation activities; these sort of activities aren’t what we think of as the core competency of international NGOs working mostly in the developing world, and we continue to feel that in a situation such as Japan’s, direct unconditional cash transfers make more sense than activities such as these. (This is a point in favor of the Japanese Red Cross, which – unlike other nonprofits – reported significant spending on cash transfers.)

We therefore stand by the conclusions we reached last year: that the relief and recovery effort did not have room for more funding, that those interested in emergency relief should have donated to Doctors Without Borders, and that those determined to help Japan specifically should have donated to the Japanese Red Cross.

Recent conversation with Bill Easterly

We recently sat down for a conversation with Bill Easterly, on the subject of how to improve the value-added of academic research. Prof. Easterly posted highlights from our public notes from the conversation; we thought we’d share our thoughts on his views.

Points of agreement: we believe we agree with Prof. Easterly on many core points.

  • We are generally highly skeptical of “top-down” interventions. We believe such interventions have many more ways to fail than to succeed, and we generally find “evidence of effectiveness” to have more holes in it and to be less convincing than others find it to be.
  • We agree that, all else equal, “Markets and democracy are better feedback mechanisms than RCTs [randomized controlled trials].” We believe there are cases where markets and democracy fail and aid can provide help that they can’t, and would guess that Prof. Easterly agrees on this as well.
  • We agree that what Prof. Easterly calls “dissidents” play a positive and valuable role.

Points of possible disagreement.

  • We don’t believe in a “first, do no harm” rule for aid. Instead, we try to maximize “expected good accomplished.” It is easy to overestimate benefits and underestimate possible harms, and we try to be highly attentive to this issue, but we believe that it isn’t practical to eliminate all risks of doing harm, and putting too high a priority on “avoiding harm” would cause aid to do less good overall.
  • Prof. Easterly observes, “a lot of things that people think will benefit poor people… {are things} that poor people are unwilling to buy for even a few pennies … The philosophy behind this is that poor people are irrational. That could be the right answer, but I think that we should do more research on the topic.” We have some sympathy with this view and agree that more evidence would be welcome, but we are probably less hesitant than Prof. Easterly is to conclude that people simply undervalue things like insecticide-treated netsBrett Keller observes that irrationality about one’s health is common in the developed world. In the developing world, there are substantial additional obstacles to properly valuing medical interventions such as lack of the education and access necessary to even review the evidence. The effects of something like bednets (estimated at one child death averted for every ~200 children protected) aren’t necessarily easy for recipients to notice or quantify.We’ve previously published some additional reasons to provide proven health interventions rather than taking households’ choices as the final word on what’s best for them.
  • We believe that empowering locals to choose their own aid is much harder in practice than it may sound – and that the best way to achieve the underlying goal may well be to deliver proven health interventions. We’ve argued this point previously.

Bottom line: much of our differing viewpoints may be attributed to differences in how we see our roles. Prof. Easterly appears to see himself as a “dissident”; his role is to challenge the way things are done without recommending a particular course of action. We see ourselves as advisors to donors, helping them to give as well as possible today. So while we share many of Prof. Easterly’s concerns – and would be highly open to new approaches to addressing these concerns – we’re also in the mindset of moving forward based on the best evidence and arguments available at the moment. In our view, this currently means recommending our top charities. However, someone who puts more weight on Prof. Easterly’s concerns may consider donating to GiveDirectly instead, which is aiming to avoid prescriptive aid by giving cash.

GiveWell’s issues log: VillageReach analysis

Recently, we’ve been reflecting on and evaluating our past analysis of VillageReach. We’ve undertaken this analysis and published what we’ve learned because we feel that our process performed suboptimally, and careful consideration of what caused this may lead to improvement on our part.

Broadly, we categorize the problems below as “questions we could have asked to dig even deeper into VillageReach and its program.” The root cause of our failure to ask these questions came down to less context on international aid and a less thorough process than we have now. At the time we conducted most of our VillageReach analysis (2009 and 2010), we felt that our due diligence was sufficient – especially in light of many others (funders and charities) who told us that we were already digging deep enough and that our process was more intense than others they had seen. Today, we feel that a more thorough process is important. We feel that our research process has since advanced to a stage where we would effectively deal with each of the below issues in the course of our research process.

We were not sufficiently sensitive to the possibility that non-VillageReach factors might have led to the rise in immunization rates in Cabo Delgado; this caused us to overestimate the strength of the evidence for VillageReach’s impact

This issue is the main topic of the blog post we recently published on this topic, which describes what occurred in greater detail.

A key part of this issue was our analysis of the chart below, which compares changes in immunization rates in Niassa (where VillageReach did not work) to Cabo Delgado (where it did).

VillageReach’s evaluation presents the larger rise in Cabo Delgado relative to Niassa as suggestive evidence of VillageReach’s impact. We felt that the comparison provided limited evidence of impact. However, we did not ask (or if we asked, we have no record of asking or of VillageReach’s response) VillageReach about why Niassa experienced a large rise in immunization rates during the period of VillageReach’s pilot project. VillageReach was not active in Niassa at the time, and the fact that Niassa experienced a large increase in immunization coverage should have caused us to question whether VillageReach’s program, as opposed to other factors, caused the increase.

Over the last couple of years, we have had multiple experiences (some on the record, some off) with what we now call the “government interest confounder”: a quick and encouraging improvement on some metric coincides with a nonprofit’s entry into an area, but further analysis reveals that both could easily have been a product of the government’s increased interest in the issue the nonprofit works on. We are now keenly aware of this issue and always seek to understand what activities the government was undertaking at the time in question (something we previously were unable to do due to our greater difficulty getting access to the right people).

Note that we are not saying that the improvement in immunization coverage was due to government activities; we still find it possible that VillageReach was primarily responsible for the improvements. But we do find the case for the latter to be less conclusive than we thought it was previously.

We did not ask VillageReach for the raw data associated with the stockouts chart.

In our July 2009 VillageReach review, we copied a chart showing a fall in stockout rates from VillageReach’s evaluation of its pilot project into our review. (See chart here.)

In September 2011, we asked VillageReach for the raw data that they used to create the chart to further vet the accuracy of the data. Using the raw data, we recreated their chart, which matched the copied chart reasonably well. (See chart here.)

In our review of the raw data, we noticed that, in addition to data on stockouts, there was also data for “clinics with missing data.” Because missing data plausibly reflect “clinics with stockouts” (more discussion of this issue here), we created a second chart (which follows) that showed both stockouts and missing data.

This chart presents a more complete picture of VillageReach’s success reducing stockout levels of vaccines at the clinics it served. During 2006, the year in which VillageReach reduced stockouts to near-zero levels, nearly half the year had significant levels of missing data. Having and reviewing all data in 2009 might have led us to ask additional questions such as: “Given that there’s evidence that VillageReach only succeeded in reducing stockouts to extremely low levels for a total of 6 months, how likely is it that it will be able to successfully scale its model to (a) new provinces while (b) using a less hands-on approach to implementing its program?”

We didn’t previously have a habit of asking for raw data behind key charts, but we have learned to do so after incidents such as our uncovering of major errors in an official cost-effectiveness estimate for deworming.

Ultimately, we felt that this particular chart held up fairly well under the raw-data-based examination. We still think it provides good evidence that VillageReach made a difference in this case. But it is a less strong case than we previously perceived it to be, and if we had been in the habit of asking for raw data we would have seen this earlier.

We misinterpreted data on immunization rates in Cabo Delgado following the end of VillageReach’s pilot project.

VillageReach’s baseline coverage study for Cabo Delgado stated, “There has been a reduction in vaccine coverage from 2008 to 2010 (children below 12 months of age) of nearly 18 percentage points” (VillageReach, “Vaccination Coverage Baseline Survey for Cabo Delgado Province,” Pg 31). We echoed this claim in March 2011, as part of our first VillageReach update (we wrote, “Overall immunization has fallen only slightly since the 2008 conclusion of VillageReach’s work in this province, but it has fallen significantly for children under the age of 12 months.”) Since then, we have concluded that we misinterpreted this data: while the percentage of children who were “fully immunized” fell between 2008 and 2010, other indicators of vaccine coverage (e.g., “fully vaccinated” and “DTP3” coverage) did not similarly fall.

We realized our error in early 2012 as we were working on further VillageReach updates (and we published the fact that we had erred in our latest update). This error occurred because we relied on the quote in VillageReach’s report (above) without fully tracing back the source data and recognizing the importance of the different vaccine indicators. On the other hand, other data has since become available that is consistent with our original reading (details in our previous post on the subject).

In a December 2009 blog post, we wrote that immunization rates had fallen after VillageReach’s project ended; instead, we should have written that stockout rates rose after VillageReach’s project ended.

In a blog post published on December 30, 2009, we wrote,

The fact that vaccination rates have since fallen is further evidence that VillageReach made a difference while they were there, but obviously discouraging relative to what they had hoped for.

This case was simply an error. Both Holden and I review each post before we publish it. In this case, Holden wrote it; I approved it; and the error got through.

I believe we knew at the time that we had no information about changes in immunization rates, only data on changes in stockout rates. Thus, I think this quote represents a small “communication error” rather than a large “error in understanding.”

Rethinking VillageReach’s pilot project

Background

Over the past 3 years, VillageReach has received over $2 million as a direct result of our recommendation. VillageReach put these funds towards its work to scale up its health logistics program, which it implemented as a pilot project in one province in Mozambique between 2002 and 2007, to the rest of the country. A core part of our research process is following the progress of charities to which GiveWell directs a significant amount of funding, and we’ve been following and reporting on VillageReach’s progress.

In addition to following VillageReach’s progress scaling up its program, we’ve recently been reassessing the evidence of the effectiveness of VillageReach’s pilot project. We’ve done this for two reasons. First, when we first encountered VillageReach, in 2009, GiveWell was a younger organization with a less developed research process. We believe that our approach for evaluating organizations has significantly improved and we wanted to see how VillageReach’s evidence would stack up given our current approach. Second, in its scale-up work, new data has become available that is relevant to our assessment of whether the pilot project was successful. GiveWell is now better than it was at understanding the extent to which it, or anyone, can draw conclusions from these sorts of impact evaluations.

In the case of the VillageReach evaluation, while we do not have facts to demonstrate otherwise, we now understand it is possible that factors other than VillageReach’s program might have contributed to the increase in coverage rate. As a result, we have moderated the confidence we had earlier in the extent to which VillageReach’s program was responsible for the increase in coverage rates.

This has major implications for our view of VillageReach, as compared to our current top charities (AMF and SCI), as a giving opportunity. We feel that within the framework of “delivering proven, cost-effective interventions to improve health,” AMF and SCI are solidly better giving opportunities than VillageReach (both now and at the time when we recommended VillageReach). Given the information we have, we see less room for doubt in the cases for AMF’s and SCI’s impact than in the case for VillageReach’s.

That said, we continue to view VillageReach as a highly transparent “learning organization” (capable of conducting its activities in a way that can lead to learning). Over the past few years, VillageReach has provided us with the source data behind its evaluations enabling us to do our own in depth analysis and draw our own conclusions. That work has contributed to our own growing ability to evaluate impact evaluations and determine the level of reliance that can be placed on them. We will be talking with VillageReach about how more funding could contribute to more experimentation and learning, and we will likely be interested in recommending such funding – to encourage such outstanding transparency and accountability, and learn more in the future.

VillageReach’s pilot project’s impact

In our July 2009 review of VillageReach, we attributed an increase in vaccination rates in Cabo Delgado to VillageReach’s program.

Two factors carried substantial weight in our view: (a) drops in the “stockout rate” of vaccines (i.e., the percentage of clinics that did not have all basic vaccine types available, see chart below) and (b) VillageReach’s report that other NGOs were unlikely to have contributed to the increase because they were not very involved with immunization activities during the 2002-2007 period.

In March 2012, we published a re-analysis that somewhat changed the picture presented by these charts. In it, the change in immunization coverage appears more similar between Cabo Delgado and Niassa (though quite different from the other provinces in Mozambique); in addition, some of the “low stockouts” period in the first chart turns out to be a period in which there was substantial missing data (it still appears that stockouts were, in fact, low during this period, so this is something of a minor change, but it still presents a different picture from how we interpreted the data previously).

Since we first reviewed VillageReach in 2009, our understanding of international aid generally has improved, and we now have more context for alternative, non-VillageReach factors that could have led to the increase in immunization. For example, in other charity examinations, there have been cases in which we noted that the charity’s entry into an area appeared to coincide with a generally higher level of interest in the charity’s sector on the part of the local government. We sought to understand the extent to which there may be an alternate explanation for the improvements that were concurrent with VillageReach’s activities.

We have not found any evidence that activities by other NGOs (i.e., non-governmental organizations) contributed to the increase in coverage rates, but reflecting on that question led us to focus on whether activities by governmental aid organizations (multilaterals and bilaterals) could have contributed to the increase in coverage rates. To answer this question we contacted and spoke with groups familiar with Mozambique’s immunization program during the 2002-2007 period. We spoke with Karin Turner, Deputy Director, Health Systems for USAID Mozambique (as well as other staff in that office) and Dr. Manuel Novela, a WHO EPI (Expanded Program on Immunization) Specialist for Mozambique.

Our understanding from these conversations is that:

  • As an alternative to prior separate donor-direct funding mechanisms, major international donors started contributing to “common funds” around the year 2000. Common funds aimed to provide general operating support (and greater decision making autonomy) to developing countries’ ministries of health. When we spoke with USAID’s Mozambique office in April, reprensentatives told us that it recalled that Cabo Delgado and Niassa, the two provinces in Mozambique which experienced the largest increases in immunization rates between 2003 and 2008, used a larger proportion of their common fund funds on immunization-related activities than other provinces. We recently reached out to USAID in Mozambique to confirm this and we have not yet received an answer. Unfortunately, we have also not been able to track down data on how common funds were spent. [UPDATE September 10, 2012: Our current understanding is that USAID believes Niassa and Cabo Delgado had the ability to focus common funds on vaccination, but that it does not know whether this was done.]
  • In the early 2000s, other funders became interested in supporting Northern Mozambique (of which Niassa and Cabo Delgado are a part), specifically. According to USAID, Irish Aid and the World Bank provided increased support for immunization activities to Niassa during the 2000s. We have no evidence, however, of additional funders for immunization activities in Cabo Delgado.

At this point we feel that the fall in stockouts and rise in immunization rates observed in Cabo Delgado could be attributed to VillageReach’s activities (and the improvement in Niassa attributed to the activities of Irish Aid and the World Bank discussed above), but it is possible to speculate that the improvements in both provinces were driven by another factor (perhaps the allocation of common funds) that we do not have full context on. The fact that Niassa, a neighboring province, experienced a large rise in immunization rates (although not to the 90%+ range seen in Cabo Delgado) over the same period (see chart above) raises the possibility (from our perspective) that non-VillageReach factors contributed to the rise in immunization rates in Cabo Delgado (although it is also possible to speculate that Irish Aid/World Bank funds spent in Niassa increased coverage rates there while the VillageReach program in Cabo Delgado was responsible for the increases in that province). We have also not looked into immunization funding in other (i.e., non-Niassa or Cabo Delgado) provinces over this period. Were we to find evidence of increased funding for immunization without commensurate increases in immunization coverage, it would reduce our assessment of the probability that government funds were responsible for the increase in Cabo Delgado.

VillageReach’s perspective

We asked Leah Hasselback, VillageReach’s Mozambique Country Director, about possible additional factors. She told us that in completing its evaluation of the pilot project, VillageReach had spoken with WHO as well as with bilateral donors, and that no one had mentioned Cabo Delgado’s using common funds for immunization or additional immunization-specific funding for Cabo Delgado. Note that VillageReach’s assessment of other factors was completed after the fact.

Additional Data

VillageReach exited Cabo Delgado in 2007. Recently, two different data sets have become available on immunization coverage in the province in 2010-2011. The first is a survey conducted by VillageReach, and the second is the DHS report for 2011. The key question we asked when examining these was whether they demonstrate a worsening of immunization coverage relative to 2008; if immunization coverage had worsened in the years since VillageReach exited (during which time its distribution system was discontinued), this would provide some suggestive evidence for the importance of the VillageReach model.

The two data sets present different pictures. The VillageReach survey data shows different trends in different figures, but overall we feel it does not show worsening of immunization coverage. On the other hand, the DHS report does show signs of worsening in coverage. (Details in the footnote at the end of this post.)

VillageReach’s perspective

Leah Hasselback, VillageReach’s Country Director for Mozambique, notes several other factors occurring in Cabo Delgado between 2007 and 2010 may have caused immunization rates to stay higher:

  • Mozambique introduced the pentavalent vaccine in November 2009. This vaccine, which includes 5 needed vaccines in one, was accompanied with significant vaccine-related promotion which also should have improved immunization rates.
  • Cabo Delgado added 20 additional health centers between the end of VillageReach’s pilot project and its beginning its scale up work. During the entire period of the pilot project, Cabo Delgado added only 1 health center.
  • There were immunization campaigns in 2008 that focused specifically on measles and polio.
  • FDC, the local NGO with which VillageReach partnered during the pilot project, ran a social mobilization campaign in 2008-09 in a single district of Cabo Delgado.

Our current take on VillageReach

Though its pilot project evaluation is the single best evaluation we have ever seen from a nonprofit evaluating its own programs (as opposed to academics running randomized controlled trials of aspects of an organization’s activities), and the evaluation is both thoughtful and self-critical, we still feel that there are too many unanswered (and perhaps unanswerable) questions about VillageReach’s impact to have strong confidence that it caused an increase in immunization rates.

This view has major implications for our view of VillageReach, as compared to our current top charities (AMF and SCI), as a giving opportunity. We feel that within the framework of “delivering proven, cost-effective interventions to improve health,” AMF and SCI are solidly better giving opportunities than VillageReach (both now or at the time when we recommended it). Given the information we have, we see less room for doubt in the cases for AMF’s and SCI’s impact than in the case for VillageReach’s.

On the other hand, we wish to emphasize another sense in which VillageReach was – and is – an outstanding giving opportunity. VillageReach is experimenting with a novel approach to health, collecting meaningful data that can lead to learning, and sharing what it finds – both the good and the bad – in a way that is likely to improve the knowledge and efficiency of aid as a whole. In this respect we see it as very unusual: most of the charities we’ve encountered seem to collect little meaningful data, are reluctant to share what they do have, and are especially reluctant to share anything that may call their impact into question.

Groups like VillageReach are creating a new dialogue around charitable giving, and it’s important to us that this type of behavior is supported. We want to encourage VillageReach and other groups to share information about how their programs are going, and we want to continue to see more experimentation and learning. So, we are seriously considering recommending donations to VillageReach, not despite the struggles it’s had but because it’s had these struggles and is being honest about them.

VillageReach has sent us a funding update, which we plan to review and share soon. We will also be writing more, in future posts, about what we’ve learned overall from the experience of working with VillageReach, and what we feel it says about our research process.


Footnote:

VillageReach 2010 suvery in Cabo Delgado

For the below analysis we relied on two studies conducted by VillageReach or contractors hired by VillageReach:

  • A July 2008 survey of two groups of children in Cabo Delgado: children aged 12-23 months (likely vaccinated at the end and after the VR project, which ended in Feb-Apr 2007) and children aged 24-25 months (likely vaccinated during the project).
  • An April 2010 survey of children 12-23 months of age. None of these children would have been vaccinated during the VillageReach pilot project.

There are three main indicators that VillageReach uses as numerators for the “vaccination coverage rate”:

  1. Fully vaccinated: child has received each of 8 vaccinations by the time of the survey (BCG, 3 x DTP, 3 x Polio, Measles). A vaccination is counted if either it is recorded on the child’s vaccination card (which are kept by parents) or if a caregiver states that the child received the vaccination.
  2. Fully immunized (either by time of survey or before 12 months of age): This is a stricter measure than “fully vaccinated.” In addition to having all the vaccinations, there are additional conditions which must be met:
    • All vaccinations and timings must be verified on the child’s vaccination card (verbal confirmation by a caregiver is not valid).
    • All 3 polio vaccinations must be received at least 28 days apart. Same for DTP vaccinations.
    • Measles vaccination must be given after 9 months of age.
  3. DTP3: Received all 3 diptheria, pertussis, and tetanus vaccinations. Verification with the vaccination card is not needed.

In Cabo Delgado rates of “fully vaccinated” and DTP3 remained more or less constant in the 2008 and 2010 surveys:

  • Fully vaccinated:
    • 2008: 92.8% for 24-35 month olds and 87.8% for 12-23 months olds
    • 2010: 89.1% (12-23 month olds)
  • DTP3:
    • 2008: 95.4% for 24-35 month olds and 92.8% for 12-23 months olds
    • 2010: 91.9% (12-23 month olds)

It’s harder to interpret the fully immunized figures. The figure for this did fall between 2008 and 2010:

  • Fully immunized at the time of the survey:
    • 2008: 72.2% for 24-35 month olds and 73.0% for 12-23 months olds
    • 2010: 57.9% or 48.8% (both numbers are given in the report; 48.8% is the one that is repeated in summary reports VillageReach has published)
  • Fully immunized by 12 months of age:
    • 2008: 54.9% for 24-35 month olds and 61.2% for 12-23 months olds
    • 2010: 40.8%

The primary reasons that children failed to qualify as fully immunized in the 2010 survey do not appear to be issues that better vaccine logistics, the issue addressed by VillageReach’s program, would likely have addressed (these categories can overlap):

  • 27% of the whole sample (i.e., at least half of those who didn’t qualify as fully immunized) received their measles vaccine before 9 months of age, up from 8% in the 2008 survey
  • 19% of the sample got polio or DTP shots within 28 days of each other, up from 2% in 2008 survey
  • Only 11.5% of the sample got a vaccination after 12 months of age

Demographic and Health Survey (DHS) of Mozambique from 2011 (preliminary report)

In this spreadsheet, we have compiled vaccination rate data from four national, high-quality surveys: 3 DHS surveys in 1997, 2003, and 2011, and a Multiple Indicator Cluster Survey (MICS) from 2008. Note that only a subset of the children included in the 2008 survey were born in time to potentially directly benefit from VillageReach’s pilot project. With that caveat in mind, a few observations:

  • DPT3 vaccination and fully vaccinated rates observed in Cabo Delgado in 2011 were substantially lower in 2011 than in 2008, while rates were found to have risen over that period in nearby provinces, including Niassa, the comparison province from VillageReach’s project evaluation.
  • Vaccination rates for vaccines earlier in the vaccination series (such as DPT1, DPT2, and BCG) were found to be about the same or decreased only slightly from the 2008 to the 2011 surveys.

Sources: