The GiveWell Blog

How we work, #3: Our analyses involve judgment calls

This post is the third in a multi-part series, covering how GiveWell works and what we fund. Through these posts, we hope to give a better understanding of our research and decision-making.

Our goal is to recommend funding to the programs we believe have the greatest impact per dollar donated. There’s no simple algorithm for this question. Answering it necessarily involves making judgment calls. Our first post in this series discussed the importance of cost-effectiveness analyses and the many factors we consider; in this post, we’ll share:

Making decisions with imperfect evidence

Our work relies heavily on evidence, but the available evidence never answers a question with certainty.

Academic literature and its limitations
We don’t take the results of any given study at face value.[1] Instead, we often make adjustments along the way to come to a final estimate. As part of that process, we might consider:

  • The methodological limitations of the available studies
  • The likelihood of publication bias or spurious results
  • Whether the study results are likely to represent the impact of the specific program we’re considering funding, which requires looking at potential differences in contexts and in the programs being implemented
  • How plausible the results seem when considering other relevant information, including whether there’s a known mechanism by which a program might have a certain effect
  • The opinions of expert advisors
  • Other factors not listed here

Some questions can’t easily be addressed by studies but are still important for assessing the impact of a program. Those include topics like:

  • Will another funder support this program if we don’t?
  • Will this program be successfully transitioned to the government?
  • How likely is it that new research will provide information that changes our minds two or three years from now?
  • How bad is the experience of having disease A compared to the experience of having disease B?

Considering multiple perspectives
Some donors and other experts might reasonably disagree with our overall estimates; while we try to come to the best answer we can, our analyses necessarily involve subjective judgments.

In setting values for important parameters, we don’t just rely on a single GiveWell researcher’s opinion. In addition to reviewing scientific literature, we inform our views via:

  • Conversations with experts. In the course of a grant investigation, we often speak with experts to learn from their views. See this page for many examples.
  • Internal peer review. We subject our work to peer review. In a typical grant investigation, two or three GiveWell researchers who didn’t investigate the grant might carefully review the work and share their feedback on the strengths and weaknesses of the case for the grant. We might also hold workshops to discuss key questions. See our review of New Incentives’s program for some examples of peer feedback.
  • External expert review. Sometimes, we commission external consultants to review our work or particular scientific studies. For example, David Roodman has looked into tricky, complex questions like the effect of childhood deworming on later-in-life consumption; see his posts here and here. More recently, we commissioned a report from an iron and anemia expert to inform our view on concerns raised by an entry to our Change Our Mind Contest.

Case studies

Combining data and intuition: Estimating the effect of water chlorination on mortality
Mortality reduction is the single largest driver of the cost-effectiveness of water quality interventions in our models, but it’s not straightforward to estimate how many lives clean water can save. For several years, we didn’t recommend substantial funding to water quality programs; our best estimates placed their cost-effectiveness below that of our top charities. New results shared with us in 2020 led us to update our view: a meta-analysis by Michael Kremer and his team pooled mortality results from studies representing several types of water quality interventions in settings with unsafe water, providing stronger evidence for the effect of water quality on mortality as well as a higher point estimate for that effect than we’d previously used.

While this meta-analysis was a substantial update, we estimate the effect of water quality on mortality is smaller than the study results imply. At face value, the meta-analysis estimated a reduction in all-cause mortality among young children of about 30%—not only higher than could be explained by a reduction in diarrhea alone, but also higher than we’d expect when taking into account indirect effects (i.e., deaths that occur because waterborne diseases may make people more vulnerable to other causes of death).[2]

To come to our estimate of the effect size, we completed our own meta-analysis, and we commissioned two external experts (see here and here) to review the original meta-analysis so that we could consider their views. We pooled data from the five RCTs in the Kremer meta-analysis that focus on chlorination (as opposed to other water quality interventions) and have follow-up lengths of at least one year (to exclude studies in which we think publication bias is more likely).[3] We excluded one RCT that meets our other criteria because we think the results are implausibly high such that we don’t believe they represent the true effect of chlorination interventions (more in footnote).[4] It’s unorthodox to exclude studies for this reason when conducting a meta-analysis, but we chose to do so because we think it gives us an overall estimate that is more likely to represent the true effect size.

Our estimate suggests that chlorination reduces all-cause mortality among children under five by roughly 12%, as compared to 30% in the Kremer meta-analysis on water quality interventions.[5] (We then adjust this 12% figure for the particular context in which a program takes place, as described in our previous blog post.)

Our approach to calculating this effect has sparked thoughtful comments from others, such as Witold Więcek and Matthew Romer and Paul Romer Present. We think reasonable researchers might make different choices about how to interpret the literature and how to come to an overall estimate, and we’re still considering other approaches. At the same time, we’re currently comfortable with the approach we’ve chosen as a way of reaching a sensible bottom line.

Valuing disparate outcomes: Comparing clubfoot treatment to life-saving programs
In early 2023, we made a grant to support MiracleFeet’s treatment of clubfoot, a painful congenital condition that limits mobility. To compare the cost-effectiveness of this program to other programs we fund, we need an estimate of how valuable it is to successfully treat a child’s case of clubfoot compared to how valuable it is to avert the death of a child.

This question—how bad is a case of clubfoot?—doesn’t have a “correct” numerical answer. The Institute for Health Metrics and Evaluation’s Global Burden of Disease (GBD) study is our usual source for estimates of how bad different diseases and disabilities are, but it doesn’t have an estimate specifically for clubfoot. We spoke with a disability expert and reviewed related literature to get a better sense of clubfoot’s impact on an affected person. We wanted to understand how painful clubfoot is, how much it impacts a person’s ability to do things, and how stigma related to clubfoot affects well-being and opportunities.

Ultimately, we chose to use the GBD weight for “disfigurement level 2 with pain and moderate motor impairment due to congenital limb deficiency” as a proxy for bilateral clubfoot (i.e., clubfoot affecting both legs). Our understanding is that this GBD weight accounts for a physical disability that is sore and itchy, where the person has some difficulty moving around, and where other people might stare and comment. We selected it based on our understanding of the experience of untreated clubfoot, informed by the literature we read and our conversations with a disability expert and MiracleFeet. We use 75% of that value as a proxy for unilateral clubfoot (i.e., clubfoot affecting just one leg).[6] With these values, preventing the suffering that would be caused over the course of someone’s lifetime by clubfoot is one-quarter as valuable as averting the death of a young child.[7] But, we’re still very uncertain about the true value—our best estimate is that there’s a 50% chance that the “true” value is between one-eighth as good and one-third as good as averting a death.[8]

These estimates imply that MiracleFeet, which we believe averts a case of clubfoot for roughly $1,200,[9] is about as cost-effective as our top charities. That’s why we’ve funded it and are interested in learning more.

Anticipating the likely decisions of other actors: Predicting the impact of technical assistance for syphilis screening and treatment
We recommended a grant to Evidence Action for syphilis screening and treatment during pregnancy, which reduces adverse outcomes including neonatal mortality and stillbirths. Rather than directly delivering tests and antibiotics, Evidence Action is working to support the governments in Cameroon and Zambia in providing the program. The goal is to support the Ministries of Health in switching from using HIV rapid tests to using HIV/syphilis dual rapid tests for pregnant people and providing penicillin to people who test positive for syphilis.

For policy-oriented work, we have to make predictions and think about counterfactuals: What will happen with Evidence Action’s support, and what would have happened without it? In assessing the expected cost-effectiveness of this grant, two key questions are:

  • Would Zambia or Cameroon scale up dual HIV/syphilis testing and treatment even in the absence of Evidence Action’s support?
  • Will the program be successfully transitioned to the government such that it ultimately continues without ongoing support from Evidence Action?

Focusing just on Zambia, we estimated a 50% chance that the dual test would have been adopted by the government in the absence of Evidence Action’s support, and we also estimated that had it been adopted by the government, it would have happened about a year later than with Evidence Action’s support. With Evidence Action’s support, we estimate an 80% chance of successful adoption, and a 70% chance that the program successfully transitions to the government. Much of the projected impact comes from increases in treatment; for Zambia, we’re projecting 70% of people who test positive will receive treatment with Evidence Action’s support, as compared to 45% (which we think is very roughly the current treatment rate) without it.

These figures are subjective, and come from our perspective on factors like the apparent level of government interest in implementing dual testing, the simplicity of the intervention, and the fact that it leverages existing HIV testing infrastructure.

We made predictions about the outcome of this grant here and will be able to look back and compare our predictions to what actually happens, though of course we won’t be able to find out what would have happened if we hadn’t funded the program.

Conclusion

We aim to create estimates that represent our true beliefs. Our cost-effectiveness analyses are firmly rooted in evidence but also incorporate adjustments and intuitions that aren’t fully captured by scientific findings alone. (When there are gaps in the evidence base, we also fund additional research where feasible—more on that in a later post.)

Ultimately, we make decisions based on our best estimates even though some of the values are subjective and even though we aren’t sure we’re right. Our work requires taking action despite incomplete information. We make recommendations that reflect our true beliefs about where we believe funding can do the most good, and we share our decision-making process transparently so that people can evaluate our conclusions.