The GiveWell Blog

Recent conversation with Bill Easterly

August 2, 2012 (updated on: July 26, 2016) | by Holden

We recently sat down for a conversation with Bill Easterly, on the subject of how to improve the value-added of academic research. Prof. Easterly posted highlights from our public notes from the conversation; we thought we’d share our thoughts on his views.

Points of agreement: we believe we agree with Prof. Easterly on many core points.

We are generally highly skeptical of “top-down” interventions. We believe such interventions have many more ways to fail than to succeed, and we generally find “evidence of effectiveness” to have more holes in it and to be less convincing than others find it to be.
We agree that, all else equal, “Markets and democracy are better feedback mechanisms than RCTs [randomized controlled trials].” We believe there are cases where markets and democracy fail and aid can provide help that they can’t, and would guess that Prof. Easterly agrees on this as well.
We agree that what Prof. Easterly calls “dissidents” play a positive and valuable role.

Points of possible disagreement.

We don’t believe in a “first, do no harm” rule for aid. Instead, we try to maximize “expected good accomplished.” It is easy to overestimate benefits and underestimate possible harms, and we try to be highly attentive to this issue, but we believe that it isn’t practical to eliminate all risks of doing harm, and putting too high a priority on “avoiding harm” would cause aid to do less good overall.
Prof. Easterly observes, “a lot of things that people think will benefit poor people… {are things} that poor people are unwilling to buy for even a few pennies … The philosophy behind this is that poor people are irrational. That could be the right answer, but I think that we should do more research on the topic.” We have some sympathy with this view and agree that more evidence would be welcome, but we are probably less hesitant than Prof. Easterly is to conclude that people simply undervalue things like insecticide-treated nets. Brett Keller observes that irrationality about one’s health is common in the developed world. In the developing world, there are substantial additional obstacles to properly valuing medical interventions such as lack of the education and access necessary to even review the evidence. The effects of something like bednets (estimated at one child death averted for every ~200 children protected) aren’t necessarily easy for recipients to notice or quantify.We’ve previously published some additional reasons to provide proven health interventions rather than taking households’ choices as the final word on what’s best for them.
We believe that empowering locals to choose their own aid is much harder in practice than it may sound – and that the best way to achieve the underlying goal may well be to deliver proven health interventions. We’ve argued this point previously.

Bottom line: much of our differing viewpoints may be attributed to differences in how we see our roles. Prof. Easterly appears to see himself as a “dissident”; his role is to challenge the way things are done without recommending a particular course of action. We see ourselves as advisors to donors, helping them to give as well as possible today. So while we share many of Prof. Easterly’s concerns – and would be highly open to new approaches to addressing these concerns – we’re also in the mindset of moving forward based on the best evidence and arguments available at the moment. In our view, this currently means recommending our top charities. However, someone who puts more weight on Prof. Easterly’s concerns may consider donating to GiveDirectly instead, which is aiming to avoid prescriptive aid by giving cash.

GiveWell’s issues log: VillageReach analysis

August 1, 2012 (updated on: November 20, 2012) | by Elie

Recently, we’ve been reflecting on and evaluating our past analysis of VillageReach. We’ve undertaken this analysis and published what we’ve learned because we feel that our process performed suboptimally, and careful consideration of what caused this may lead to improvement on our part.

Broadly, we categorize the problems below as “questions we could have asked to dig even deeper into VillageReach and its program.” The root cause of our failure to ask these questions came down to less context on international aid and a less thorough process than we have now. At the time we conducted most of our VillageReach analysis (2009 and 2010), we felt that our due diligence was sufficient – especially in light of many others (funders and charities) who told us that we were already digging deep enough and that our process was more intense than others they had seen. Today, we feel that a more thorough process is important. We feel that our research process has since advanced to a stage where we would effectively deal with each of the below issues in the course of our research process.

We were not sufficiently sensitive to the possibility that non-VillageReach factors might have led to the rise in immunization rates in Cabo Delgado; this caused us to overestimate the strength of the evidence for VillageReach’s impact

This issue is the main topic of the blog post we recently published on this topic, which describes what occurred in greater detail.

A key part of this issue was our analysis of the chart below, which compares changes in immunization rates in Niassa (where VillageReach did not work) to Cabo Delgado (where it did).

VillageReach’s evaluation presents the larger rise in Cabo Delgado relative to Niassa as suggestive evidence of VillageReach’s impact. We felt that the comparison provided limited evidence of impact. However, we did not ask (or if we asked, we have no record of asking or of VillageReach’s response) VillageReach about why Niassa experienced a large rise in immunization rates during the period of VillageReach’s pilot project. VillageReach was not active in Niassa at the time, and the fact that Niassa experienced a large increase in immunization coverage should have caused us to question whether VillageReach’s program, as opposed to other factors, caused the increase.

Over the last couple of years, we have had multiple experiences (some on the record, some off) with what we now call the “government interest confounder”: a quick and encouraging improvement on some metric coincides with a nonprofit’s entry into an area, but further analysis reveals that both could easily have been a product of the government’s increased interest in the issue the nonprofit works on. We are now keenly aware of this issue and always seek to understand what activities the government was undertaking at the time in question (something we previously were unable to do due to our greater difficulty getting access to the right people).

Note that we are not saying that the improvement in immunization coverage was due to government activities; we still find it possible that VillageReach was primarily responsible for the improvements. But we do find the case for the latter to be less conclusive than we thought it was previously.

We did not ask VillageReach for the raw data associated with the stockouts chart.

In our July 2009 VillageReach review, we copied a chart showing a fall in stockout rates from VillageReach’s evaluation of its pilot project into our review. (See chart here.)

In September 2011, we asked VillageReach for the raw data that they used to create the chart to further vet the accuracy of the data. Using the raw data, we recreated their chart, which matched the copied chart reasonably well. (See chart here.)

In our review of the raw data, we noticed that, in addition to data on stockouts, there was also data for “clinics with missing data.” Because missing data plausibly reflect “clinics with stockouts” (more discussion of this issue here), we created a second chart (which follows) that showed both stockouts and missing data.

This chart presents a more complete picture of VillageReach’s success reducing stockout levels of vaccines at the clinics it served. During 2006, the year in which VillageReach reduced stockouts to near-zero levels, nearly half the year had significant levels of missing data. Having and reviewing all data in 2009 might have led us to ask additional questions such as: “Given that there’s evidence that VillageReach only succeeded in reducing stockouts to extremely low levels for a total of 6 months, how likely is it that it will be able to successfully scale its model to (a) new provinces while (b) using a less hands-on approach to implementing its program?”

We didn’t previously have a habit of asking for raw data behind key charts, but we have learned to do so after incidents such as our uncovering of major errors in an official cost-effectiveness estimate for deworming.

Ultimately, we felt that this particular chart held up fairly well under the raw-data-based examination. We still think it provides good evidence that VillageReach made a difference in this case. But it is a less strong case than we previously perceived it to be, and if we had been in the habit of asking for raw data we would have seen this earlier.

We misinterpreted data on immunization rates in Cabo Delgado following the end of VillageReach’s pilot project.

VillageReach’s baseline coverage study for Cabo Delgado stated, “There has been a reduction in vaccine coverage from 2008 to 2010 (children below 12 months of age) of nearly 18 percentage points” (VillageReach, “Vaccination Coverage Baseline Survey for Cabo Delgado Province,” Pg 31). We echoed this claim in March 2011, as part of our first VillageReach update (we wrote, “Overall immunization has fallen only slightly since the 2008 conclusion of VillageReach’s work in this province, but it has fallen significantly for children under the age of 12 months.”) Since then, we have concluded that we misinterpreted this data: while the percentage of children who were “fully immunized” fell between 2008 and 2010, other indicators of vaccine coverage (e.g., “fully vaccinated” and “DTP3” coverage) did not similarly fall.

We realized our error in early 2012 as we were working on further VillageReach updates (and we published the fact that we had erred in our latest update). This error occurred because we relied on the quote in VillageReach’s report (above) without fully tracing back the source data and recognizing the importance of the different vaccine indicators. On the other hand, other data has since become available that is consistent with our original reading (details in our previous post on the subject).

In a December 2009 blog post, we wrote that immunization rates had fallen after VillageReach’s project ended; instead, we should have written that stockout rates rose after VillageReach’s project ended.

In a blog post published on December 30, 2009, we wrote,

The fact that vaccination rates have since fallen is further evidence that VillageReach made a difference while they were there, but obviously discouraging relative to what they had hoped for.

This case was simply an error. Both Holden and I review each post before we publish it. In this case, Holden wrote it; I approved it; and the error got through.

I believe we knew at the time that we had no information about changes in immunization rates, only data on changes in stockout rates. Thus, I think this quote represents a small “communication error” rather than a large “error in understanding.”

Rethinking VillageReach’s pilot project

July 26, 2012 (updated on: September 10, 2012) | by Elie

Background

Over the past 3 years, VillageReach has received over $2 million as a direct result of our recommendation. VillageReach put these funds towards its work to scale up its health logistics program, which it implemented as a pilot project in one province in Mozambique between 2002 and 2007, to the rest of the country. A core part of our research process is following the progress of charities to which GiveWell directs a significant amount of funding, and we’ve been following and reporting on VillageReach’s progress.

In addition to following VillageReach’s progress scaling up its program, we’ve recently been reassessing the evidence of the effectiveness of VillageReach’s pilot project. We’ve done this for two reasons. First, when we first encountered VillageReach, in 2009, GiveWell was a younger organization with a less developed research process. We believe that our approach for evaluating organizations has significantly improved and we wanted to see how VillageReach’s evidence would stack up given our current approach. Second, in its scale-up work, new data has become available that is relevant to our assessment of whether the pilot project was successful. GiveWell is now better than it was at understanding the extent to which it, or anyone, can draw conclusions from these sorts of impact evaluations.

In the case of the VillageReach evaluation, while we do not have facts to demonstrate otherwise, we now understand it is possible that factors other than VillageReach’s program might have contributed to the increase in coverage rate. As a result, we have moderated the confidence we had earlier in the extent to which VillageReach’s program was responsible for the increase in coverage rates.

This has major implications for our view of VillageReach, as compared to our current top charities (AMF and SCI), as a giving opportunity. We feel that within the framework of “delivering proven, cost-effective interventions to improve health,” AMF and SCI are solidly better giving opportunities than VillageReach (both now and at the time when we recommended VillageReach). Given the information we have, we see less room for doubt in the cases for AMF’s and SCI’s impact than in the case for VillageReach’s.

That said, we continue to view VillageReach as a highly transparent “learning organization” (capable of conducting its activities in a way that can lead to learning). Over the past few years, VillageReach has provided us with the source data behind its evaluations enabling us to do our own in depth analysis and draw our own conclusions. That work has contributed to our own growing ability to evaluate impact evaluations and determine the level of reliance that can be placed on them. We will be talking with VillageReach about how more funding could contribute to more experimentation and learning, and we will likely be interested in recommending such funding – to encourage such outstanding transparency and accountability, and learn more in the future.

VillageReach’s pilot project’s impact

In our July 2009 review of VillageReach, we attributed an increase in vaccination rates in Cabo Delgado to VillageReach’s program.

Two factors carried substantial weight in our view: (a) drops in the “stockout rate” of vaccines (i.e., the percentage of clinics that did not have all basic vaccine types available, see chart below) and (b) VillageReach’s report that other NGOs were unlikely to have contributed to the increase because they were not very involved with immunization activities during the 2002-2007 period.

In March 2012, we published a re-analysis that somewhat changed the picture presented by these charts. In it, the change in immunization coverage appears more similar between Cabo Delgado and Niassa (though quite different from the other provinces in Mozambique); in addition, some of the “low stockouts” period in the first chart turns out to be a period in which there was substantial missing data (it still appears that stockouts were, in fact, low during this period, so this is something of a minor change, but it still presents a different picture from how we interpreted the data previously).

Since we first reviewed VillageReach in 2009, our understanding of international aid generally has improved, and we now have more context for alternative, non-VillageReach factors that could have led to the increase in immunization. For example, in other charity examinations, there have been cases in which we noted that the charity’s entry into an area appeared to coincide with a generally higher level of interest in the charity’s sector on the part of the local government. We sought to understand the extent to which there may be an alternate explanation for the improvements that were concurrent with VillageReach’s activities.

We have not found any evidence that activities by other NGOs (i.e., non-governmental organizations) contributed to the increase in coverage rates, but reflecting on that question led us to focus on whether activities by governmental aid organizations (multilaterals and bilaterals) could have contributed to the increase in coverage rates. To answer this question we contacted and spoke with groups familiar with Mozambique’s immunization program during the 2002-2007 period. We spoke with Karin Turner, Deputy Director, Health Systems for USAID Mozambique (as well as other staff in that office) and Dr. Manuel Novela, a WHO EPI (Expanded Program on Immunization) Specialist for Mozambique.

Our understanding from these conversations is that:

As an alternative to prior separate donor-direct funding mechanisms, major international donors started contributing to “common funds” around the year 2000. Common funds aimed to provide general operating support (and greater decision making autonomy) to developing countries’ ministries of health. When we spoke with USAID’s Mozambique office in April, reprensentatives told us that it recalled that Cabo Delgado and Niassa, the two provinces in Mozambique which experienced the largest increases in immunization rates between 2003 and 2008, used a larger proportion of their common fund funds on immunization-related activities than other provinces. We recently reached out to USAID in Mozambique to confirm this and we have not yet received an answer. Unfortunately, we have also not been able to track down data on how common funds were spent. [UPDATE September 10, 2012: Our current understanding is that USAID believes Niassa and Cabo Delgado had the ability to focus common funds on vaccination, but that it does not know whether this was done.]
In the early 2000s, other funders became interested in supporting Northern Mozambique (of which Niassa and Cabo Delgado are a part), specifically. According to USAID, Irish Aid and the World Bank provided increased support for immunization activities to Niassa during the 2000s. We have no evidence, however, of additional funders for immunization activities in Cabo Delgado.

At this point we feel that the fall in stockouts and rise in immunization rates observed in Cabo Delgado could be attributed to VillageReach’s activities (and the improvement in Niassa attributed to the activities of Irish Aid and the World Bank discussed above), but it is possible to speculate that the improvements in both provinces were driven by another factor (perhaps the allocation of common funds) that we do not have full context on. The fact that Niassa, a neighboring province, experienced a large rise in immunization rates (although not to the 90%+ range seen in Cabo Delgado) over the same period (see chart above) raises the possibility (from our perspective) that non-VillageReach factors contributed to the rise in immunization rates in Cabo Delgado (although it is also possible to speculate that Irish Aid/World Bank funds spent in Niassa increased coverage rates there while the VillageReach program in Cabo Delgado was responsible for the increases in that province). We have also not looked into immunization funding in other (i.e., non-Niassa or Cabo Delgado) provinces over this period. Were we to find evidence of increased funding for immunization without commensurate increases in immunization coverage, it would reduce our assessment of the probability that government funds were responsible for the increase in Cabo Delgado.

VillageReach’s perspective

We asked Leah Hasselback, VillageReach’s Mozambique Country Director, about possible additional factors. She told us that in completing its evaluation of the pilot project, VillageReach had spoken with WHO as well as with bilateral donors, and that no one had mentioned Cabo Delgado’s using common funds for immunization or additional immunization-specific funding for Cabo Delgado. Note that VillageReach’s assessment of other factors was completed after the fact.

Additional Data

VillageReach exited Cabo Delgado in 2007. Recently, two different data sets have become available on immunization coverage in the province in 2010-2011. The first is a survey conducted by VillageReach, and the second is the DHS report for 2011. The key question we asked when examining these was whether they demonstrate a worsening of immunization coverage relative to 2008; if immunization coverage had worsened in the years since VillageReach exited (during which time its distribution system was discontinued), this would provide some suggestive evidence for the importance of the VillageReach model.

The two data sets present different pictures. The VillageReach survey data shows different trends in different figures, but overall we feel it does not show worsening of immunization coverage. On the other hand, the DHS report does show signs of worsening in coverage. (Details in the footnote at the end of this post.)

VillageReach’s perspective

Leah Hasselback, VillageReach’s Country Director for Mozambique, notes several other factors occurring in Cabo Delgado between 2007 and 2010 may have caused immunization rates to stay higher:

Mozambique introduced the pentavalent vaccine in November 2009. This vaccine, which includes 5 needed vaccines in one, was accompanied with significant vaccine-related promotion which also should have improved immunization rates.
Cabo Delgado added 20 additional health centers between the end of VillageReach’s pilot project and its beginning its scale up work. During the entire period of the pilot project, Cabo Delgado added only 1 health center.
There were immunization campaigns in 2008 that focused specifically on measles and polio.
FDC, the local NGO with which VillageReach partnered during the pilot project, ran a social mobilization campaign in 2008-09 in a single district of Cabo Delgado.

Our current take on VillageReach

Though its pilot project evaluation is the single best evaluation we have ever seen from a nonprofit evaluating its own programs (as opposed to academics running randomized controlled trials of aspects of an organization’s activities), and the evaluation is both thoughtful and self-critical, we still feel that there are too many unanswered (and perhaps unanswerable) questions about VillageReach’s impact to have strong confidence that it caused an increase in immunization rates.

This view has major implications for our view of VillageReach, as compared to our current top charities (AMF and SCI), as a giving opportunity. We feel that within the framework of “delivering proven, cost-effective interventions to improve health,” AMF and SCI are solidly better giving opportunities than VillageReach (both now or at the time when we recommended it). Given the information we have, we see less room for doubt in the cases for AMF’s and SCI’s impact than in the case for VillageReach’s.

On the other hand, we wish to emphasize another sense in which VillageReach was – and is – an outstanding giving opportunity. VillageReach is experimenting with a novel approach to health, collecting meaningful data that can lead to learning, and sharing what it finds – both the good and the bad – in a way that is likely to improve the knowledge and efficiency of aid as a whole. In this respect we see it as very unusual: most of the charities we’ve encountered seem to collect little meaningful data, are reluctant to share what they do have, and are especially reluctant to share anything that may call their impact into question.

Groups like VillageReach are creating a new dialogue around charitable giving, and it’s important to us that this type of behavior is supported. We want to encourage VillageReach and other groups to share information about how their programs are going, and we want to continue to see more experimentation and learning. So, we are seriously considering recommending donations to VillageReach, not despite the struggles it’s had but because it’s had these struggles and is being honest about them.

VillageReach has sent us a funding update, which we plan to review and share soon. We will also be writing more, in future posts, about what we’ve learned overall from the experience of working with VillageReach, and what we feel it says about our research process.

Footnote:

VillageReach 2010 suvery in Cabo Delgado
For the below analysis we relied on two studies conducted by VillageReach or contractors hired by VillageReach:

A July 2008 survey of two groups of children in Cabo Delgado: children aged 12-23 months (likely vaccinated at the end and after the VR project, which ended in Feb-Apr 2007) and children aged 24-25 months (likely vaccinated during the project).
An April 2010 survey of children 12-23 months of age. None of these children would have been vaccinated during the VillageReach pilot project.

There are three main indicators that VillageReach uses as numerators for the “vaccination coverage rate”:

Fully vaccinated: child has received each of 8 vaccinations by the time of the survey (BCG, 3 x DTP, 3 x Polio, Measles). A vaccination is counted if either it is recorded on the child’s vaccination card (which are kept by parents) or if a caregiver states that the child received the vaccination.
Fully immunized (either by time of survey or before 12 months of age): This is a stricter measure than “fully vaccinated.” In addition to having all the vaccinations, there are additional conditions which must be met:
- All vaccinations and timings must be verified on the child’s vaccination card (verbal confirmation by a caregiver is not valid).
- All 3 polio vaccinations must be received at least 28 days apart. Same for DTP vaccinations.
- Measles vaccination must be given after 9 months of age.
DTP3: Received all 3 diptheria, pertussis, and tetanus vaccinations. Verification with the vaccination card is not needed.

In Cabo Delgado rates of “fully vaccinated” and DTP3 remained more or less constant in the 2008 and 2010 surveys:

Fully vaccinated:
- 2008: 92.8% for 24-35 month olds and 87.8% for 12-23 months olds
- 2010: 89.1% (12-23 month olds)
DTP3:
- 2008: 95.4% for 24-35 month olds and 92.8% for 12-23 months olds
- 2010: 91.9% (12-23 month olds)

It’s harder to interpret the fully immunized figures. The figure for this did fall between 2008 and 2010:

Fully immunized at the time of the survey:
- 2008: 72.2% for 24-35 month olds and 73.0% for 12-23 months olds
- 2010: 57.9% or 48.8% (both numbers are given in the report; 48.8% is the one that is repeated in summary reports VillageReach has published)
Fully immunized by 12 months of age:
- 2008: 54.9% for 24-35 month olds and 61.2% for 12-23 months olds
- 2010: 40.8%

The primary reasons that children failed to qualify as fully immunized in the 2010 survey do not appear to be issues that better vaccine logistics, the issue addressed by VillageReach’s program, would likely have addressed (these categories can overlap):

27% of the whole sample (i.e., at least half of those who didn’t qualify as fully immunized) received their measles vaccine before 9 months of age, up from 8% in the 2008 survey
19% of the sample got polio or DTP shots within 28 days of each other, up from 2% in 2008 survey
Only 11.5% of the sample got a vaccination after 12 months of age

Demographic and Health Survey (DHS) of Mozambique from 2011 (preliminary report)

In this spreadsheet, we have compiled vaccination rate data from four national, high-quality surveys: 3 DHS surveys in 1997, 2003, and 2011, and a Multiple Indicator Cluster Survey (MICS) from 2008. Note that only a subset of the children included in the 2008 survey were born in time to potentially directly benefit from VillageReach’s pilot project. With that caveat in mind, a few observations:

DPT3 vaccination and fully vaccinated rates observed in Cabo Delgado in 2011 were substantially lower in 2011 than in 2008, while rates were found to have risen over that period in nearby provinces, including Niassa, the comparison province from VillageReach’s project evaluation.
Vaccination rates for vaccines earlier in the vaccination series (such as DPT1, DPT2, and BCG) were found to be about the same or decreased only slightly from the 2008 to the 2011 surveys.

Sources:

VillageReach’s 2008 Cabo Delgado vaccination survey (PDF).
VillageReach’s 2010 Cabo Delgado vaccination survey (PDF).
Summary presentation on 2010 Cabo Delgado vaccination survey (PDF).
Multiple indicator cluster survey (2008) (PDF).
Demographic and Health Survey (2011; preliminary report) (PDF).
Recommended schedule for routine immunizations for children (PDF) from the World Health Organization.

Some history behind our shifting approach to research

July 20, 2012 (updated on: July 25, 2016) | by Holden

The approach that GiveWell took from 2007-2011 had two crucial qualities:

We have been passive. That is, we have focused on finding the best existing organizations and supporting them with no-strings-attached donations, rather than a more “active” approach of designing our own strategy, treating charities as partners in carrying out this strategy, and restricting donations accordingly.
We have sought proven cost-effective giving opportunities. That is, we have looked for situations where a donor can be reasonably confident – based on empirical evidence – that his/her donation will result in lives being changed for the better, at a high rate of “expected good accomplished per dollar spent.”

This year, we have been experimenting with giving opportunities that lack one or both of these qualities. We previously defended our shift in this direction; this post gives more context on the history that has led us to this point and discusses why we don’t think we can retain both of the qualities above and continue to find great giving opportunities at an acceptable rate. A future post will go into some of the questions we are addressing as we begin to shift our approach.

The history of GiveWell’s approach to finding outstanding giving opportunitiesIn our first year, we took an approach that was highly passive and highly focused on proven cost-effective interventions. We invited grant applications from a wide variety of nonprofits, and we didn’t attempt to focus on any particular strategies for helping people; we emphasized only that we wished to see a convincing case for proven, cost-effective, scalable impact, and we picked recommended charities accordingly. (See our first-round application linked from our overview of applications we received for our 2007-2008 research process.)

At that point we weren’t sure what to expect; what we found was that much of the most convincing evidence for effectiveness was at the “intervention level” rather than the “charity level” (i.e., there are programs, such as distribution of insecticide-treated nets, that have strong publicly available evidence bases, and few charities whose in-house evidence seems to add much to the case). Accordingly, for our 2008-2009 process, we did substantial independent review of research on aid interventions, and published a list of “priority programs” with strong evidence bases. This was a step in the direction of being less “passive”: doing our own independent analysis to determine what sorts of charities were most promising, rather than simply asking charities to make their own case.

In 2011, we announced an intensifying focus on deeply examining the best charities we could find (rather than evaluating charities by the standards of their issue areas). The research process that followed (leading up to our 2011 recommendations) was broad and open to many different types of groups, but it was also “active” in the sense that we often deprioritized a charity after the initial phone call, based on our judgment of how likely it seemed to ultimately merit a confident recommendation. In addition to examining charities focused on what we considered to be proven interventions, we also flagged charities for having seeming “high upside” in various ways, and considered these groups for recommendations; we tried to be open to groups we could recommend even though they didn’t work on what we considered priority interventions.

Ultimately, we didn’t find any such groups promising enough, and our top charities ended up being groups focused on interventions with strong evidence bases.

At this point, we feel that

There is a distinct set of interventions – concentrated in the area of global health and nutrition – for which there is strong and generalizable empirical evidence of cost-effective impact on saving and improving lives.
We have made intensive efforts over the past 3+ years to find all charities that focus on these interventions. Since these interventions are frequently delivered by/through developing-world governments, it is rare to see many charities taking different approaches to a given such intervention; it is more common that for each such intervention, there are a small number of fairly large charities that work with governments to provide funding, technical assistance, etc. in delivering these interventions.
Our current two top charities are the groups we’ve identified that both focus on such interventions (e.g., we can confidently predict that additional donations to these groups will result in more of these interventions) and demonstrate the necessary transparency such that we can perform thorough evaluations and updates.
Therefore, we do not expect to find any more “top charities” (in the sense we’ve previously used – “charities that will use additional dollars to carry out cost-effective, proven, scalable activities with high transparency and accountability”) in the near future.
By focusing our efforts at the project level rather than the organization level, we may be able to generate more options for donors to deliver such interventions (e.g., considering a single nutritional program implemented by UNICEF. By being open to recommending the funding of particular projects – rather than just the writing of unrestricted checks – we would be further shifting in the direction of “active funding.”

Why aren’t there more organizations focused on our priority interventions?It may seem puzzling that there are relatively few charities focused on what we consider the most proven interventions. Our basic picture of the reasons for this:

As mentioned above, there are only a small number of interventions – concentrated in the area of global health and nutrition – for which there is strong and generalizable empirical evidence of cost-effective impact on saving and improving lives. It seems that global health and nutrition are particularly amenable to meaningful data and analysis; we feel that other sectors are very far from having meaningful data on how to improve lives, perhaps due to the inherent difficulty of measurement rather than due to a failure of effort. (We’ll be writing more about this idea in the future.)
In many cases, these interventions are generally thought to be best delivered in partnership with the government (and often many other organizations) and at large scale. Large funders (for example, government funders and major foundations), when they seek to roll out these interventions, often work directly with governments; they may pull in nonprofits for specific sorts of support (example: the rollout of ART in Botswana).This dynamic may limit the opportunities for “entrepreneurial” charities working on these interventions (charities that start small and earn prominence through the work they do). Many of the organizations that do focus on these interventions were essentially founded with very large grants from large funders (examples: Schistosomiasis Control Initiative, GAVI).
It is rare for a charity to be exclusively focused on one of these interventions; in fact, it is somewhat rare for a charity to be exclusively focused on any particular intervention. (For illustration, see our list of charities considered in 2009 and note how many have “highly diverse activities.”) A common theme in our conversations with charities has been that we often ask, “What would you do with more unrestricted funding?” and have trouble getting a definite answer; charities often come back with multiple possibilities, asking us which we prefer and offering to submit proposals tailored to our interests.Our impression – both from looking at the grants of major funders and from these conversations with charities – is that charities’ agendas are often partly or fully set by major funders, and thus often reflect the diversity of the different major funders the charities work with, with smaller unrestricted donations serving as support for these diverse agendas. This dynamic makes it difficult to find groups that focus exclusively on a particular proven intervention.

Earlier in our development, we expected the nonprofit sector to look something like the for-profit sector in terms of how organizational strategies and agendas are set. That is, we expected to find that nonprofits usually set their own agendas and seek funders who will support these agendas with relatively limited stipulations and modifications. Instead, we found charities constantly asking what agenda we wanted to fund.

Perhaps this reflects one of the fundamental differences between the two sectors. For-profit investment is what we might call “accountable to profits”: the success of a for-profit investment ultimately depends on whether the company can ultimately turn a profit, and thus it depends on things like the company’s ability to win over customers. By contrast, nonprofit investment is ultimately not accountable to anyone or anything: if a funder sets a poor agenda or fails to support a good one, there are no consequences except what the funder chooses to impose on itself. Nonprofit funders thus have fewer reasons (other than self-imposed ones) to defer to others on agenda-setting and strategizing.

Bottom line: it seems to us that agendas are often set by funders, not charities; looking for charities that have their own predefined agendas limits our options; looking for charities that focus on proven interventions further limits our options; so we have few options unless we expand our scope beyond “charities focused on proven interventions.”

What would our options be if we remained committed to both “passive funding” and “proven cost-effective” interventions?We could “hold out” for more giving opportunities that meet our original criteria, continuing to recommend the best charities we’ve found while waiting for other charities meeting our criteria to emerge organically (and hoping that our money moved to top charities served as an incentive for this to happen). This would be less work per year than what we’ve done so far (which has involved a lot of exploration, getting up to speed on academic literature, examining many different charities, etc.), so if we went this route we would probably either shrink GiveWell (to perform the same role with minimal resources) or further deepen our due diligence on existing top charities (e.g., perform more site visits).

We’ve also considered looking into areas such as clean water provision and surgery, where we might find giving opportunities that still fit the rubric of “passive” and “proven cost-effective,” but with less strong evidence and/or likely inferior cost-effectiveness. Though we doubt we would find giving opportunities here as strong as our top charities, there could be benefits down the line simply to having more absorptive capacity (i.e., if our money moved continues to grow, we may need more options for donors than what we currently offer.)

In our view, sticking to either of the above approaches would be leaving a lot of potential for impact on the table. We believe that we can broaden our criteria while continuing to bring a level of transparency and public critical reflection that is absent, but badly needed, in today’s nonprofit sector. We believe that this approach may lead to better giving opportunities than those we’ve found so far (even if not by the original criteria), as well as a broader influence on donors and nonprofits. In the future, we’ll be writing more about how we plan to accomplish this.

New Cochrane review of the effectiveness of deworming

July 13, 2012 (updated on: July 25, 2016) | by Alexander

Update 07/20/12: Miguel and Kremer (and others) have responded to the characterization of their 2004 study by the updated Cochrane review here. We find many of their responses to the Cochrane authors’ objections (which are distinct from our reservations) persuasive, especially regarding attrition and sample selection in the haemoglobin data and baseline school attendance data. As we wrote last week, the Baird et al. 2011 follow-up to Miguel and Kremer 2004 remains especially important to our view on deworming; neither the updated Cochrane review nor the author’s response has changed that.

On Wednesday, the Cochrane Collaboration published a new systematic review of the effectiveness of deworming drugs in improving nutritional status, school performance, and cognitive test scores.

The new Cochrane review of deworming to kill soil-transmitted intestinal worms (STHs) finds almost no evidence of benefits on nutrition, cognitive development, or school performance in mass deworming studies, and small benefits on nutrition in small, screened studies; this is largely the same conclusion as the older Cochrane review, though the new one is updated with more studies and a persuasive response to criticisms. It excludes studies that treat both STHs and schistosomiasis, which is what the Schistosomiasis Control Initiative does, so it does not directly affect our assessment of them. However, the new review reinforces our skepticism about the quality of much of the evidence supporting deworming, and strengthens our view that the evidence in favor of distributing bednets is stronger. Accordingly, SCI continues to hold our #2 rating. We plan to continue to investigate the papers that are most crucial to our assessment of the benefits of deworming.

In a nutshell, the new Cochrane review does not directly challenge the case for SCI as our #2 charity, though we have somewhat less confidence than we did.

In the remainder of this post, we:

summarize the new Cochrane review;
explain why it gives us additional cause for concern about the overall quality of previous research on deworming; and
review the evidence underlying our support for SCI.

The new Cochrane review on STH deworming

In the new Cochrane review on STH deworming, Taylor-Robinson et al. examine randomized controlled trials (RCTs) of deworming to address soil-transmitted intestinal worms (STHs), looking at impacts on nutrition, cognitive skills, and educational outcomes. Excluding studies that treated both STHs and schistosomiasis, they find surprisingly limited evidence of nutritional benefits, and very little support for cognitive or educational benefits.

In particular, they find that:

in mass deworming programs that treated everyone without testing them first, there is no consistent evidence for any effect on nutrition, cognitive performance, or school performance (more);
in small pilot programs that screened for the presence of worms prior to treatment, treatment was associated with increased weight and haemoglobin, which implies a reduction in anemia (more).

The previous Cochrane review of STH deworming, also by Taylor-Robinson et al., reached many similar conclusions, but we believe the new one to be more robust (more). The older review did not separate out studies that screened for worm infection and look at their effects separately, as the new review does. Doing so sharpens our take on the evidence, without fundamentally changing the picture.

We write more below about how this affects our take on SCI, but it is worth noting that the new systematic review might affect our likelihood of recommending Deworm the World, another deworming charity that we have been investigating. Unlike SCI, which conducts combination deworming, we believe that Deworm the World does some STH-only deworming.

Changes since the last Cochrane review and response to critics

The new review differs from the previous Cochrane review of STH deworming in several ways. Most importantly, from our perspective:

it incorporates many additional studies, including more studies focused on haemoglobin/ anemia and Miguel and Kremer 2004, which was previously excluded;
it stratifies mass deworming studies by the prevalence of infections, so it can determine whether effects are consistently larger in higher-prevalence studies; and
it distinguishes between mass and screened deworming programs.

The new review also differs from several systematic reviews–Hall 2008, Albonico 2008, and Gulani 2007–that have been published since the last major update to the Cochrane review, all of which found statistically significant benefits to deworming.

Some of the changes since the last review were undertaken in order to respond to criticisms from deworming scholars. Taylor-Robinson et al. write:

Critics of a previous version of this review (Dickson 2000a) stated that the impact must be considered stratified by the intensity of the infection (Cooper 2000; Savioli 2000). We have done this comprehensively in this edition and no clear pattern of effect has emerged….

Other advocates of deworming, such as Bundy 2009, have argued that many of the underlying trials of deworming suffer from three critical methodological problems: treatment externalities in dynamic infection systems, inadequate measurement of cognitive outcomes and school attendance, and sample attrition. We agree with these points. However, externalities will be detected by large cluster-RCTs with a year or more follow up, and there are now five trials such as this included in this review.

We find these responses from Taylor-Robinson et al. compelling and we believe the new review to be a significant improvement over the older Cochrane review of deworming.

The new review’s take on mass-deworming programs

Unlike screened programs, mass deworming programs treat everyone with deworming drugs without testing whether they have a worm infection first (because doing so is costly relative to the price of the deworming drugs). The new Cochrane review finds that there is little evidence from studies of mass deworming programs to show that they improve nutrition, cognitive performance, or school outcomes.

Two studies in one location in Kenya with extremely high worm prevalence found that a single deworming treatment caused weight gain, but seven more studies in different areas found no effect, and larger studies with multiple doses were even more inconclusive: two found large and significant results, while ten others found small statistically insignificant results (pgs 19-21). There is essentially no evidence from studies of mass STH deworming to show that it improves haemoglobin status, height, cognitive test scores, or school performance; the evidence for an improvement in school attendance comes solely from the Miguel and Kremer 2004 study, with the other unscreened RCT finding no improvement in attendance. (See our update about this study above).

The older Cochrane review on STH deworming, which we wrote about in our intervention report on deworming, did not distinguish as sharply between mass and screened programs. Though a sensitivity analysis in the old review that focused on mass studies found no significant effect on weight, the main analysis found a small statistically significant benefit by combining screened and mass studies. The new review continues to find that mass deworming has no statistically significant benefit on weight, but it differs from the older review in that it foregrounds this result.

The new Cochrane review also includes haemoglobin status as a main outcome for the first time. It is the first systematic review we’ve seen that distinguishes between the haemoglobin outcomes of mass and screened deworming, finding no statistically significant effect of mass STH deworming.

The new review’s take on smaller programs that screened for worm infection

Despite finding little evidence from mass deworming studies to support deworming, the new Cochrane review does find some evidence from randomized controlled trials to indicate that STH deworming improves nutrition in programs that screen for worm infections (i.e. only give deworming drugs to infected people).

In three small RCTs with a total of 149 participants who were screened for STH infections prior to participation, deworming pills caused a statistically significant increase in weight of about .6 kilograms. In a few other small screened RCTs, deworming statistically significantly improved mid-upper arm circumference and skin fold thickness; similar studies found no effect on height, body mass index, or school attendance. Two screened RCTs with a total of 108 participants found that treating STH infections causes a statistically significant increase in haemoglobin of 3.7g/L (which implies a reduction in anemia).

What does it mean if smaller programs with screened participants show effects, while larger programs of mass deworming do not? One possibility is that STH deworming does have some impact on nutrition in infected individuals, but that the effect is too small to pick up in unscreened population studies. Another possibility is that the effects seen in smaller programs are spurious. The Cochrane review highlights the latter possibility, stating that “the data on targeted deworming is limited (three small trials, n = 149); the quality of the evidence is ’moderate’ for weight and ’low’ for haemoglobin.” (The Cochrane review also points to a third possibility: “the intervention itself is different … having been screened, and then told they have worms, children are more likely to comply with treatment, and alter their behaviour.” We find this possibility least likely.)

The overall quality of deworming research: publication bias, data-mining, and representativenessOne of our big take-aways from the Taylor-Robinson et al. review is that we should be really worried about publication bias, data-mining, and the representativeness of the research we rely on.

Publication bias

The best example of publication bias comes from the DEVTA study of deworming and Vitamin A supplementation, conducted on a population of more than a million children in Lucknow, India from 1999 to 2004, which remains unpublished to this day. We had already been aware of DEVTA from our research on Vitamin A supplementation, but the particulars of Taylor-Robinson et al.’s correspondence with the authors are new to us:

DEVTA: the world’s largest ever RCT, which includes over a million children randomized in a cluster design with mortality as the primary outcome, remains unpublished six years after completion. We have corresponded with the senior author on several occasions. We also wrote a letter to the Lancet in June 2011, asking for publication of this important study. When this letter was accepted, the authors submitted the manuscript to the Lancet within a week, and we withdrew our letter. However, at the time of writing (June 2012) the paper remains unpublished.

Results presented at a conference in 2007 (PPT) indicate that compliance was high but that the treatment did not cause a statistically significant reduction in mortality. Combining this results with other studies of Vitamin A, there still appears to be an effect on mortality, but the lack of formal publication means that the international consensus continues to overestimate the impact of Vitamin A on mortality.

We don’t think that STH deworming prevents a significant number of deaths, so whatever the impact of the deworming branch of the treatment in DEVTA on child mortality turns out to be is unlikely to affect our assessment of deworming. However, the fact that such a large and important study remains unpublished eight years after the trial was completed and five years after a conference presentation conveying the key results speaks to the power of publication bias.

Data mining

More generally, Taylor-Robinson et al. make it clear that studies have looked for potential impacts of deworming on a large number of different outcomes. (I count more than ten—weight, height, mid-upper arm circumference, skin-fold thickness, body mass index, measures of physical exertion like the Harvard Step Test, hemoglobin status, school attendance, school persistence, school exam performance, and cognitive test scores—with many potential sub-categories and measures each.) With so many different outcomes measured and little theoretical basis for determining which results are genuine, the potential for spurious results seems large, especially for outcomes which have been measured in only a few studies. (This would be a form of data-mining, and seems to have played a role in the previous systematic reviews that did find significant results.)

Representativeness

Taylor-Robinson et al. point to an additional concern about representativeness, which, while not really fitting the rubric of data-mining and publication bias, raises the specter of a set of rigorous research results that nonetheless don’t translate into practice. They write:

Evidence of benefit of deworming on nutrition appears to depend on three studies, all conducted more than 15 years ago, with two from the same area of Kenya where nearly all children were infected with worms and worm burdens were high. Later and much larger studies have failed to demonstrate the same effects. It may be that over time the intensity of infection has declined, and that the results from these few trials are simply not applicable to contemporary populations with lighter worm burdens.

This worry comports with our own reservations about the evidence from the Miguel and Kremer 2004 experiment, which was conducted during a period of abnormally elevated worm prevalences due to flooding caused by El Nino.

Together, these examples heighten our concern about the potential for bias and unrepresentativeness in the key studies we rely on in our assessment of the evidence for deworming.

The evidence in favor of the Schistosomiasis Control Initiative

Our intervention report on combination deworming, of the kind conducted by the Schistosomiasis Control Initiative, focuses on three kinds of benefits:

Subtle general health impacts, especially on haemoglobin. We drew our conclusions on haemoglobin effects from Smith and Brooker 2010‘s analysis of studies on combination deworming; since the new review examines STH-only deworming and not combination deworming, it does not address these studies.
Prevention of potentially severe effects, such as intestinal obstruction. These effects are rare and play a relatively small role in our position on deworming. The Cochrane review does not address these effects for the most part. (As stated above, it does discuss one study, with unavailable results, that examined mortality, but we believe mortality from STHs is rare enough that we wouldn’t expect it to show up in such a study.)
Developmental impacts, particularly on income later in life. The new review does not directly address the studies we used here. Bleakley 2004 is outside of the scope of the Cochrane review because it is not an experimental analysis, and Baird et al. 2011 is not mentioned, presumably because it has not yet been published. However, Taylor-Robinson do discuss Miguel and Kremer 2004, which underlies the Baird et al. 2011 follow-up; in their assessment of the risk of bias in included studies, Miguel and Kremer 2004 does poorly (it appears to be the worst-graded of the 42 included trials; Figure 3). (See our update about this study above.) Presumably, the follow-up is subject to most, if not all, of the same worries that characterize the initial study since it relies on the same underlying experiment. We have written before about our reservations about these studies, and the new Taylor-Robinson et al. review reinforces those reservations without adding substantial new information. We plan to continue to research the details of these papers, which are crucial to our assessment of deworming.

ConclusionThe new Cochrane review does not directly challenge the findings that are core to our view on combination deworming. That said, it does highlight general issues with research on deworming (e.g., potential publication bias and a case for benefit that is generally weaker than what many relevant academics and advocates seem to have believed). We therefore continue to recommend the Schistosomiasis Control Initiative as our #2 charity, though we have somewhat less confidence than we previously did.

Update on GiveWell’s web traffic / money moved: Q2 2012

July 5, 2012 (updated on: February 20, 2014) | by Natalie Crispin

In addition to evaluations of other charities, GiveWell publishes substantial evaluation on itself, from the quality of its research to its impact on donations. We publish quarterly updates regarding two key metrics: (a) donations to top charities and (b) web traffic.

The charts below present basic information about our growth in money moved and web traffic thus far in 2012.

Website traffic tends to peak in December of each year (circled in the chart below). Growth in web traffic has generally remained strong in 2012, though it has slowed somewhat in May and June.

Growth in money moved has remained strong as well. The majority of the funds GiveWell moves comes from a relatively small number donors giving larger gifts. These larger donors tend to give in December, and we have found that growth in donations from smaller donors throughout the year tends to provide a reasonable estimate of the growth from the larger donors by the end of the year.

Below, we show two charts illustrating growth among smaller donors.

Thus far in 2012, GiveWell has directed $404,775 to our top charities from donors giving less than $10,000. This is approximately 2.5x the amount we had directed at this point last year.

Most donors give less than $1,000; the chart below shows the growth in the number of smaller donors giving to our top charities.

Overall, 1247 donors have given to GiveWell’s top charities this year (compared to 479 donors at this point last year).

In total, GiveWell donors have directed $964,250 to our top charities this year, compared with $568,250 at this point in 2011. For the reason described above, we don’t find this number to be particularly meaningful at this time of year. One major difference between 2011 and 2012 is that in 2011, Ken Jennings allocated the $150,000 he won participating in a Jeopardy! contest against IBM’s Watson to VillageReach.

Enter search terms here.

Search form

The GiveWell Blog

We were not sufficiently sensitive to the possibility that non-VillageReach factors might have led to the rise in immunization rates in Cabo Delgado; this caused us to overestimate the strength of the evidence for VillageReach’s impact

We did not ask VillageReach for the raw data associated with the stockouts chart.

We misinterpreted data on immunization rates in Cabo Delgado following the end of VillageReach’s pilot project.

In a December 2009 blog post, we wrote that immunization rates had fallen after VillageReach’s project ended; instead, we should have written that stockout rates rose after VillageReach’s project ended.