# Response to concerns about GiveWell’s spillovers analysis

Last week, we published an updated analysis on “spillover” effects of GiveDirectly‘s cash transfer program: i.e., effects that cash transfers may have on people who don’t receive cash transfers but who live nearby those who do receive cash transfers.1For more context on this topic, see our May 2018 blog post. We concluded: “[O]ur best guess is that negative or positive spillover effects of cash are minimal on net.” (More)

Economist Berk Özler posted a series of tweets expressing concern over GiveWell’s research process for this report. We understood his major questions to be:

1. Why did GiveWell publish its analysis on spillover effects before a key study it relied on was public? Is this consistent with GiveWell’s commitment to transparency? Has GiveWell done this in other cases?
2. Why did GiveWell place little weight on some papers in its analysis of spillover effects?
3. Why did GiveWell’s analysis of spillovers focus on effects on consumption? Does this imply that GiveWell does not value effects on other outcomes?

These questions apply to GiveWell’s research process generally, not just our spillovers analysis, so the discussion below addresses topics such as:

• When do our recommendations rely on private information, and why?
• How do we decide on which evidence to review in our analyses of charities’ impact?
• How do we decide which outcomes to include in our cost-effectiveness analyses?

Finally, this feedback led us to realize a communication mistake we made: our initial report did not communicate as clearly as it should have that we were specifically estimating spillovers of GiveDirectly’s current program, not commenting on spillovers of cash transfers in general. We will now revise the report to clarify this.

Note: It may be difficult to follow some of the details of this post without having read our report on the spillover effects of GiveDirectly’s cash transfers.

## Summary

In brief, our responses to Özler’s questions are:

• Why did GiveWell publish its analysis on spillover effects before a key paper it relied on was public? One of our major goals is to allocate money to charities as effectively as possible. Sometimes, research we learn about cannot yet be made public but we believe it should affect our recommendations. In these cases, we incorporate the private information into our recommendations and we are explicit about how it is affecting our views. We expect that private results may be more likely to change but nonetheless believe that they contain useful information; we believe ignoring such results because they are private would lead us to reach less accurate conclusions. For another recent example of an important conclusion that relied on private results, see our update on the preliminary (private) results from a study on No Lean Season, which was key to the decision to remove No Lean Season as a top charity in 2018. We discuss other examples below.
• Why did GiveWell place little weight on some papers in its analysis of spillover effects? In general, our analyses aim to estimate the impact of programs as implemented by particular charities. The goal of our spillovers analysis is to make our best guess about the size of spillover effects caused by GiveDirectly’s programs in Kenya, Uganda, and Rwanda. We are not trying to communicate an opinion on the size of spillover effects of cash transfers in other countries or in development economics more broadly. Therefore, our analysis places substantially more weight on studies that are most similar to GiveDirectly’s program on basic characteristics such as geographic location and program type. Correspondingly, we place little weight on papers that do not meet these criteria. However, we’d welcome additional information that would help us improve our future decisionmaking about which papers to put the most weight on in our analyses.
• Why did GiveWell’s analysis of spillovers focus on effects on consumption? Our cost-effectiveness models focus on key outcomes that we expect to drive the bulk of the welfare effects of a program. In the case of our spillovers analysis, we believe the two most relevant outcomes for estimating spillover effects on welfare are consumption and subjective well-being. We chose to focus on consumption effects in large part because (a) this is consistent with how we model the impacts of other programs, such as deworming, and (b) distinguishing effects on subjective well-being from effects on consumption in a way that avoids double-counting benefits was too complex to do in the time we had available. It is possible that additional work on subjective well-being measures would meaningfully change how we assess benefits of programs (for this program and potentially others). This is a question we plan to return to in the future.

As noted above, our current best guess is that negative or positive spillover effects of GiveDirectly’s cash transfers are minimal on net. However, we emphasize that our conclusion at this point is very tentative, and we hope to update our views next year if there is more public discussion or research on the areas of uncertainty highlighted in our analysis and/or if public debate about the studies covered in our report raises major issues we had not previously considered.

Details follow.

### Why did GiveWell publish its analysis on spillover effects before a key paper it relied on was public?

In our analysis of the spillover effects of GiveDirectly’s cash transfer program, we place substantial weight on GiveDirectly’s “general equilibrium” (GE) study (as we noted we would do in May 2018,2“We plan to reassess the cash transfer evidence base and provide our updated conclusions in the next several months (by November 2018 at the latest). One reason that we do not plan to provide a comprehensive update sooner is that we expect upcoming midline results from GiveDirectly’s “general equilibrium” study, a large and high-quality study explicitly designed to estimate spillover effects, will play a major role in our conclusions. Results from this study are expected to be released in the next few months.” (More.) prior to seeing the study’s results) because:

• it is the study with the largest sample size,
• its methodology was designed to estimate both across-village and within-village spillover effects, and
• it is a direct study of a version of GiveDirectly’s program.

The details of this study are currently private, though we were able to share the headline results and methodology when we published our report.

This represents one example of a general policy we follow, which is to be willing to compromise to some degree on transparency in order to use the best information available to us to improve the quality of our recommendations. More on the reasoning behind this policy:

• Since our recommendations affect the allocation of over $100 million each year, the value of improving our recommendations by factoring in the best information (even if private) can be high. Every November we publish updates to our recommended charities so that donors giving in December and January (when the bulk of charitable giving occurs) can act on the most up-to-date information. • We have ongoing communications with charities and researchers to learn about new information that could affect our recommendations. Private information (both positive and negative) has been important to our views on a number of occasions. Beyond the example of our spillovers analysis, early private results were key to our views on topics including: • No Lean Season in 2018 (negative result)3“In a preliminary analysis shared with GiveWell in September 2018, the researchers did not find evidence for a negative or positive impact on migration, and found no statistically significant impact on income and consumption.” (More.) • Deworming in 2017 (positive result)4“We have seen preliminary, confidential results from a 15-year follow-up to Miguel and Kremer 2004. We are not yet able to discuss the results in detail, but they are broadly consistent with the findings from the 10-year follow-up analyzed in Baird et al. 2016.” (More.) • Insecticide resistance in 2016 (modeling study)5“We have seen two modeling studies which model clinical malaria outcomes in areas with ITN coverage for different levels of resistance based on experimental hut trial data. Of these two studies, the most recent study we have seen is unpublished (it was shared with us privately), but we prefer it because the insecticide resistance data it draws from is more recent and more comprehensive.” (More.) • Development Media International in 2015 (negative result)6“The preliminary endline results did not find any effect of DMI’s program on child mortality (it was powered to detect a reduction of 15% or more), and it found substantially less effect on behavior change than was found at midline. We cannot publicly discuss the details of the endline results we have seen, because they are not yet finalised and because the finalised results will be embargoed prior to publication, but we have informally incorporated the results into our view of DMI’s program effectiveness.” (More.) • Living Goods in 2014 (positive result)7“The researchers have published an abstract on the study, and shared a more in-depth report with us. The more in-depth report is not yet cleared for publication because the authors are seeking publication in an academic journal.” (More.) • Note that in all of the above cases we worked with the relevant researchers to get permission to publicly share basic information about the results we were relying on, as we did in the case of the GE study. • In all cases, we expected that full results would be made public in the future. Our understanding is that oftentimes early headline results from studies can be shared publicly while it may take substantially longer to publicly release full working papers because working papers are time-intensive to produce. We would be more hesitant to rely on a study that has been private for an unusually long period of time unless there were a good reason for it. • However, relying on private studies conflicts to some extent with our goal to be transparent. In particular, we believe two major downsides of our policy with respect to private information are (a) early private results are more likely to contain errors, and (b) we are not able to benefit from public scrutiny and discussion of the research. We would have ideally seen a robust public discussion of the GE study before we released our recommendations in November, but the timeline for the public release of GE study results did not allow that. We look forward to closely following the public debate in the future and plan to update our views based on what we learn. • Despite these limitations, we have generally found early, private results to be predictive of final, public results. This, combined with the fact that we believe private results have improved our recommendations on a number of occasions, leads us to believe that the benefits of our current policy on using private information outweigh the costs. A few other notes: • Although we provide a number of cases above in which we relied on private information, the vast majority of the key information we rely on for our charity recommendations is public. • When private information is shared with us that implies a positive update about a charity’s program, we try to be especially attentive about potential conflicts of interest. In this case, there is potential for concern because the GE study was co-authored by Paul Niehaus, Chairman of GiveDirectly. We chose not to substantially limit the weight we place on the GE study because (a) a detailed pre-analysis plan was submitted for this study, and (b) three of the four co-authors (Ted Miguel, Johannes Haushofer, and Michael Walker) do not have an affiliation with GiveDirectly. We have no reason to believe that GiveDirectly’s involvement altered the analysis undertaken. In addition, the GE study team informed us that Paul Niehaus recused himself from final decisions about what the team communicated to GiveWell. • When we published our report (about one week ago), we expected that some additional analysis from the GE study would be shared publicly soon (which we still expect). We do not yet have an exact date and do not know precisely what content will be shared (though we expect it to be similar to what was shared with us privately). ### Why did GiveWell place little weight on some papers in its analysis of spillover effects? Some general context on GiveWell’s research that we think is useful for understanding our approach in this case is: • We are typically estimating the impact of programs as implemented by particular charities, not aiming to publish formal meta-analyses about program areas as a whole. As noted above, we believe we should have communicated more clearly about this in our original report on spillovers and we will revise the report to clarify. • We focus our limited time on the research that we think is most likely to affect our decisions, so our style of analysis is often different from what is typically seen in academia. (We think the differences in the kind of work we do is captured well by a relevant Rachel Glennerster blog post.) Consistent with the above, the goal of our spillovers analysis was to make a best guess for the size of the spillover effect of GiveDirectly’s (GD’s) program in Kenya, Uganda, and Rwanda specifically.8This program provides$1,000 unconditional transfers and treats almost all households within target villages in Kenya and Uganda (though still treats only eligible households in Rwanda). We are not trying to communicate an opinion on the size of spillover effects of cash transfers in other countries or development economics more broadly. If we were trying to do the latter, we would have considered a much wider range of literature.

We expect that studies that are most similar to GD’s program on basic characteristics such as geographic location and program type will be most useful for predicting spillovers in the GD context. So, we prioritize looking at studies that 1) took place in sub-Saharan Africa, and 2) evaluate unconditional cash transfer programs (further explanation in footnote).9On (1): Our understanding is that the nature and size of spillover effects is likely to be highly dependent on the context studied, for example because the extent to which village economies are integrated might differ substantially across contexts (e.g. how close households are to larger markets outside of the village in which they live, how easily goods can be transported, etc.).
On (2): We expect that providing cash transfers conditional on behavioral choices is a fairly different intervention from providing unconditional cash transfers, and so may have different spillover effects.
We would welcome additional engagement on this topic: that is, (a) to what extent should we believe that effects estimated in studies not meeting these criteria would apply to GD’s cash transfer programs, and (b) are there other criteria that we should have used?

A further factor that causes us to put more weight on the five studies we chose to review deeply is that they all study transfers distributed by GD, which we see as increasing their relevance to GD’s current work (though the specifics of the programs that were studied vary from GD’s current program). We believe that studies that do not meet the above criteria could affect our views on spillovers of GD’s program to some extent, but they would receive lower weight in our conclusions since they are less directly relevant to GD’s program.

We saw further review of studies that did not meet the above criteria as lower priority than a number of other analyses that we think would be more likely to shift our bottom line estimate of the spillovers of GD’s program. Even though we focused on the subset of studies most relevant to GD’s program, we were not able to combine their results to create a reasonable explicit model of spillover effects because we found that key questions were not answered by the available data (our attempt at an explicit model is in the following footnote).10We tried to create such an explicit model here (explanation here). One fundamental challenge is that we are trying to apply estimates of “within-village” spillover effects to predict across-village spillover effects.11GiveDirectly treats almost all households within target villages in Kenya and Uganda (though still treats only eligible households in Rwanda). Additional complications are described here.

More on why we placed little weight on particular studies that Özler highlighted in his comments:12Note on terminology: In our spillovers analysis report, we talk about studies in terms of “inclusion” and “exclusion.” We may use the term “exclude” differently than it is sometimes used in, e.g., academic meta-analyses. When we say that we have excluded studies, we have typically lightly reviewed their results and placed little weight on them in our conclusions. We did not ignore them entirely, as may happen for papers excluded from an academic meta-analysis. To try to clarify this, in this blog post we have used the term “place little weight.” We will try to be attentive to this in future research that we publish.

• We placed little weight on the following papers in our initial analysis for the reasons given in parentheses: Angelucci & DiGiorgi 2009 (conditional transfers, study took place in Mexico), Cunha et al. 2017 (study took place in Mexico), Filmer et al. 2018 (conditional transfers, study took place in the Philippines), and Baird, de Hoop, and Özler 2013 (mix of conditional and unconditional transfers).
• In addition, the estimates of mental health effects on teenage schoolgirls in Baird, de Hoop, and Özler 2013 seem like they would be relatively less useful for predicting the impacts of spillovers from cash transfers given to households, particularly in villages where almost all households receive transfers as is often the case in GD’s program.13We expect that local spillover effects via psychological mechanisms are less likely to occur with the current spatial distribution of GD’s program. In GD’s program in Kenya and Uganda, almost all households are treated within its target villages. In addition, the majority of villages within a region are treated in a block. Baird, de Hoop, and Özler 2013 estimate spillover effects within enumeration areas (groups of several villages), and the authors believe that the “detrimental effects on the mental well-being of those randomly excluded from the program in intervention areas is consistent with the idea that an individual’s utility depends on her relative consumption (or income or status) within her peer group”, p.372. The spatial distribution of GD’s program in Kenya and Uganda makes it more likely that the majority of one’s local peer group receives the same treatment assignment.

### Why did GiveWell’s analysis of spillovers focus on effects on consumption? Does this imply that GiveWell does not value effects on other outcomes?

Some general context on GiveWell’s research that we think is useful for understanding our approach in this case is:

• When modeling the cost-effectiveness of any program, there are typically a large number of outcomes that could be included in the model. In our analyses, we focus on the key outcomes that we expect to drive the bulk of the welfare effects of a program.
• For example, our core cost-effectiveness model primarily considers various programs’ effects on averting deaths and increasing consumption (either immediately or later in life). This means that, e.g., we do not include benefits of averting vision impairment in our cost-effectiveness model for vitamin A supplementation (in part because we expect those effects to be relatively small as a portion of the overall impact of the program).
• This does not mean that we think excluded outcomes are unimportant. We focus on the largest impacts of programs because (a) we think they are a good proxy for the overall impact of the relevant programs, and (b) having fewer outcomes simplifies our analysis, which leads to less potential for error, better comparability between programs, and a more manageable time investment in modeling.
• For a deeper assessment of which program impacts we include and exclude from our core cost-effectiveness model and why, see our model’s “Inclusion/exclusion” sheet.14We have not yet added it, but we plan to add “Subjective well-being” under the list of outcomes excluded in the “Cross-cutting / Structural” section of the sheet, since it may be relevant to all programs. We aim to include outcomes that can be justified by evidence, feasibly modeled, and are consistent with how we handle other program outcomes. We revisit our list of excluded outcomes periodically to assess whether such outcomes could lead to a major shift in our cost-effectiveness estimate for a particular program.

In our spillovers analysis, we applied the above principles to try to identify the key welfare effects. Among the main five studies we reviewed on spillovers, it seems like the two most relevant outcomes are consumption and subjective well-being. We chose to focus on consumption for the following reasons:

• Assessing the effects of cash transfers on consumption (rather than subjective well-being) is consistent with how we model the welfare effects of other programs that we think increase consumption on expectation, such as deworming.
• Distinguishing effects on subjective well-being from effects on consumption in order to avoid double-counting benefits was too complex to do in the time we had available. It seems intuitively likely that standards of living (proxied by consumption) affect subjective well-being. In the Haushofer and Shapiro studies and in the GE study, the spillover effects act in the same direction for both consumption and subjective well-being. We do not think it would be appropriate to simply add subjective well-being effects into our model over and above effects on consumption since that risks double-counting benefits.
• We do not have a strong argument that consumption is a more robust proxy for “true well-being” than subjective well-being, but given that consumption effects can be more easily compared across our programs we have chosen it as the default option at this point.

We hope to broadly revisit in the future whether we should be placing more weight on measures of subjective well-being across programs. It is possible that additional work on subjective well-being measures would meaningfully change how we assess benefits of programs (for this program and potentially others).

Examples of our questions about how to interpret subjective well-being effects in the cash spillovers literature include:

• In the Haushofer and Shapiro studies, how should we interpret each of the underlying components of the subjective well-being indices? For example, how does self-reported life satisfaction map onto utility versus self-reported happiness?
• In Haushofer, Reisinger, & Shapiro 2015, there is a statistically significant negative spillover effect on life-satisfaction, but there are no statistically significant effects on happiness, depression, stress, cortisol levels or the overall subjective well-being index (column (4) of Table 1). How should we interpret these findings?

### Next steps

• We hope that there is more public discussion on some of the policy-relevant questions we highlighted in our report and on the other points of uncertainty highlighted throughout this post. Our conclusions on spillovers are very tentative and could be affected substantially by more analysis, so we would greatly appreciate any feedback or pointers to relevant work.15If you are aware of relevant analyses or studies that we have not covered here, please let us know at info@givewell.org.
• We are planning to follow up with Dr. Özler to better understand his views on spillover effects of cash transfers. We have appreciated his previous blog posts on this topic and want to ensure we are getting multiple perspectives on the relevant issues.

Notes   [ + ]

 1 ↑ For more context on this topic, see our May 2018 blog post. 2 ↑ “We plan to reassess the cash transfer evidence base and provide our updated conclusions in the next several months (by November 2018 at the latest). One reason that we do not plan to provide a comprehensive update sooner is that we expect upcoming midline results from GiveDirectly’s “general equilibrium” study, a large and high-quality study explicitly designed to estimate spillover effects, will play a major role in our conclusions. Results from this study are expected to be released in the next few months.” (More.) 3 ↑ “In a preliminary analysis shared with GiveWell in September 2018, the researchers did not find evidence for a negative or positive impact on migration, and found no statistically significant impact on income and consumption.” (More.) 4 ↑ “We have seen preliminary, confidential results from a 15-year follow-up to Miguel and Kremer 2004. We are not yet able to discuss the results in detail, but they are broadly consistent with the findings from the 10-year follow-up analyzed in Baird et al. 2016.” (More.) 5 ↑ “We have seen two modeling studies which model clinical malaria outcomes in areas with ITN coverage for different levels of resistance based on experimental hut trial data. Of these two studies, the most recent study we have seen is unpublished (it was shared with us privately), but we prefer it because the insecticide resistance data it draws from is more recent and more comprehensive.” (More.) 6 ↑ “The preliminary endline results did not find any effect of DMI’s program on child mortality (it was powered to detect a reduction of 15% or more), and it found substantially less effect on behavior change than was found at midline. We cannot publicly discuss the details of the endline results we have seen, because they are not yet finalised and because the finalised results will be embargoed prior to publication, but we have informally incorporated the results into our view of DMI’s program effectiveness.” (More.) 7 ↑ “The researchers have published an abstract on the study, and shared a more in-depth report with us. The more in-depth report is not yet cleared for publication because the authors are seeking publication in an academic journal.” (More.) 8 ↑ This program provides \$1,000 unconditional transfers and treats almost all households within target villages in Kenya and Uganda (though still treats only eligible households in Rwanda). 9 ↑ On (1): Our understanding is that the nature and size of spillover effects is likely to be highly dependent on the context studied, for example because the extent to which village economies are integrated might differ substantially across contexts (e.g. how close households are to larger markets outside of the village in which they live, how easily goods can be transported, etc.). On (2): We expect that providing cash transfers conditional on behavioral choices is a fairly different intervention from providing unconditional cash transfers, and so may have different spillover effects. 10 ↑ We tried to create such an explicit model here (explanation here). 11 ↑ GiveDirectly treats almost all households within target villages in Kenya and Uganda (though still treats only eligible households in Rwanda). 12 ↑ Note on terminology: In our spillovers analysis report, we talk about studies in terms of “inclusion” and “exclusion.” We may use the term “exclude” differently than it is sometimes used in, e.g., academic meta-analyses. When we say that we have excluded studies, we have typically lightly reviewed their results and placed little weight on them in our conclusions. We did not ignore them entirely, as may happen for papers excluded from an academic meta-analysis. To try to clarify this, in this blog post we have used the term “place little weight.” We will try to be attentive to this in future research that we publish. 13 ↑ We expect that local spillover effects via psychological mechanisms are less likely to occur with the current spatial distribution of GD’s program. In GD’s program in Kenya and Uganda, almost all households are treated within its target villages. In addition, the majority of villages within a region are treated in a block. Baird, de Hoop, and Özler 2013 estimate spillover effects within enumeration areas (groups of several villages), and the authors believe that the “detrimental effects on the mental well-being of those randomly excluded from the program in intervention areas is consistent with the idea that an individual’s utility depends on her relative consumption (or income or status) within her peer group”, p.372. The spatial distribution of GD’s program in Kenya and Uganda makes it more likely that the majority of one’s local peer group receives the same treatment assignment. 14 ↑ We have not yet added it, but we plan to add “Subjective well-being” under the list of outcomes excluded in the “Cross-cutting / Structural” section of the sheet, since it may be relevant to all programs. 15 ↑ If you are aware of relevant analyses or studies that we have not covered here, please let us know at info@givewell.org.

• Samuel Hilton on January 11, 2019 at 7:22 am said:

Thank you for this research. I found the discussion on consumption measures versus subjective well-being measures very interesting and I look forward to seeing GiveWell revisit subjective well-being measures in the future.

I think it is worth flagging that the people I have met who expressed concern about spillover effects of cash transfers have primarily pointed towards evidence on subjective well-being. I expect that analysis looking into subjective well-being (rather than consumption) might have been better for communicating the case for GiveDirectly to informed donors.

Keep up the great work. Sam

• I am coming to this discussion more from an ‘outside’ perspective. (I have interacted with some EA ( ‘effective altruism’) people locally at a ‘diversity and social inclusion’ discussion (I had a hard time getting in the door since i forgot the address), at a hike in the local park, and at a ‘veganism’ discussion—at a trendy vegan coffee shop, right across the street from a trendy mexican barbecue restaurant, and by phone with some people in California who said they might cover some costs for me to fly there for a 2 day conference. I told them i dont want to fly 3000 miles across the country for just 2 days when i dont even know the agenda. (I was expected to make a proposal though i had already made it online—they wanted more detail. At present I am not going into more detail and i dont have much more.)

‘General equilibrium ‘ models have a mixed or bad history in African development and elsewhere—IMF and World Bank have many, on ‘structural adjustment’, NAFTA, CAFTA and so on. There are ‘spillover effects’ (‘externalities’) in those cases–that may be why there are ‘immigration crises’ (eg Europe, California and Texas…) , a ‘rust belt’ in mid-central USA along with opiate and suicide epidemic, drug and gun cartels , and so on.

In USA many use ‘consumption inequality ‘ as a metric—often ‘conservative’ or ‘libertarian ‘ groups like AEI, Cato, Heritage, etc. They point out everyone or many have cars, smart phones, large screen TVs, etc. (In my area, some people have these because they steal them.)

Subjective well-being is definately very difficult to define or measure. In my area sometimes that means it goes up if you have access to cash to buy drugs, cigarettes, alcohol and/or sex, besides having a place to stay and food to eat, and maybe a pet dog and cat, and some sort of family.
Give well is mostly focused on people from hedge fund industry (i brefly worked in one or something like that in SF–i lasted for 2 paychecks—they told me i was competent but i didnt like the work much–i’m into music and sciences).

Subjective well being , consumption in/equality , non/equilibrium models, and inter/national in my view are all related.