The GiveWell Blog

June 2019 open thread

June 17, 2019 | by Catherine Hollander

This post is more than 6 years old

Our goal with hosting quarterly open threads is to give blog readers an opportunity to publicly raise comments or questions about GiveWell or related topics (in the comments section below). As always, you’re also welcome to email us at info@givewell.org or to request a call with GiveWell staff if you have feedback or questions you’d prefer to discuss privately. We’ll try to respond promptly to questions or comments.

You can view our March 2019 open thread here.

Comments

Jonas Vollmer on June 18, 2019 at 11:25 am said:

What do you think about charter cities? https://80000hours.org/podcast/episodes/lutter-and-winter-chater-cities-innovative-governance/
Colin Rust on June 19, 2019 at 9:34 am said:

(also posted on March Open Thread, but I figured this will be more active now)

In your cost effectiveness analysis (CEA), why does GiveWell consider estimated increases to consumption, but not also improvements in health?

In brief, it seems to me that holding consumption constant, better health is a good thing and at least in principle ought to contribute to the CEA.

In a little more detail, I think GW supported charities have, broadly, three kinds of benefits to human well-being:

1. reduced mortality,
2. improved health and
3. increased consumption.

All three I would argue are intrinsic goods. But only #1 and #3 are counted in the CEA, not #2.

Now of course, there is some interaction between #2 and #3. Indeed, that’s the case you make for deworming: that via improved health (deworming) it indirectly ultimately promotes consumption. On the other hand, GiveDirectly more directly targets #3, but it wouldn’t be surprising that it indirectly benefits #2 (because less poor people can afford better nutrition and medical care, are less stressed, etc.). But just because #2 and #3 interact causally, it doesn’t mean that they are equivalent. My expectation is that by not counting #2, you tend to overrate on a relative basis charities that directly target #3 (GD) vs charities that directly target #2 (e.g. deworming).

My 2 cents. I’m sure folks at GW have thought about this much more carefully than I have. Maybe what’s going on here is that under reasonable moral views, in practice the increased consumption you can get per charitable dollar is just worth a lot more than the improved health. Or maybe I’m missing something more basic. Anyway, I’d be interested to better understand your thought process here.
Catherine (GiveWell) on June 19, 2019 at 3:57 pm said:

Hi Jonas,

We haven’t looked into charter cities in depth, although they are on our long list of potential areas we might explore in the future as we expand the scope of our research.

We spoke with Mark Lutter of the Center for Innovative Governance Research about charter cities in June 2018. Notes from that conversation are here.
The Chetna on June 20, 2019 at 12:08 am said:

The Chetna NGO works for the education for underprivileged children and on the various mission to eradicate this problem from our society. https://bit.ly/2YYTQTz
James Snowden (GiveWell) on June 20, 2019 at 2:35 pm said:

Hi Colin,

Thanks for the question! It’s a good one – and a fair critique.

I think there are two parts to this question:

(1) Why don’t we include improved health (i.e. morbidity) in our cost-effectiveness analyses?
(2) How do we think about double-counting when two terminal values (i.e. “intrinsic goods”) interact?

I’ll answer each separately.

(1) Why don’t we include improved health (i.e. morbidity) in our cost-effectiveness analyses?

We take different approaches to estimate the benefits of preventing morbidity depending on the particular program to balance accuracy and practicality.

When we believe morbidity makes up only a small proportion of the benefits of the program (or is particularly difficult to estimate) we typically exclude it from our main cost-effectiveness analysis, but factor it into our overall assessment separately. A list of excluded factors for each of our top charities is here. Rows 11, 24 and 28 are our estimates for how malaria and deworming morbidity would affect the analysis. These estimates are factored into what we call a “posterior cost-effectiveness analysis”, although we downweight them according to the criteria in columns C-F. You can see how we factor these estimates into our recommendations in this sheet.

When we believe morbidity is the main benefit of a particularly promising program, we include the estimate explicitly in our cost-effectiveness analysis. See, for example, our preliminary cost-effectiveness analysis of fistula treatment. We generally use the DALY weightings from the Global Burden of Disease as a benchmark of how to weigh these benefits, but may adjust these up or down based on our own research.

(2) How do we think about double-counting when two terminal values interact?

I think the “first best” way to do this under perfect information would be to model the interactions as far as they go (within some “reasonable” scope):

For example: decreased morbidity -> increased productivity -> more income -> increased food security -> decreased morbidity -> increased productivity etc.

I would guess these virtuous cycles do exist, but the causal chain degrades at every step, meaning a new equilibrium would be reached, with improvements across all three terminal values (morbidity/mortality/income).

Given the difficulty of estimating this kind of model, I think it’s necessary to take a less principled approach, which is mostly captured within our moral weights. As different GiveWell staff take different approaches to their moral weights, it’s difficult to give a definitive GiveWell answer. I’ll just say how I deal with this, which is to go one step down the causal chain but not further:

– I include an estimate of the effect of mortality on income (of family members) in my moral weights for death.
– I include an estimate of the effect of income on mortality in my moral weights for income (see rows 3-6).

As morbidity is generally a less important input into our models (for our top charities), I’ve put less thought into how to model interactions there.

I hope this is helpful. In sum, this is something we’ve thought about quite a bit, but we don’t yet have a fully satisfying principled answer.

All the best,

James
Colin Rust on June 22, 2019 at 7:38 pm said:

James,

Thanks for the detailed and thoughtful reply. That was very helpful.

Thinking about GW’s deworming CEA: One thing that stands out to me is the estimated value of the effect on morbidity is much smaller than the effect on future income, even though the mechanism for increased income is via decreased morbidity. That seems implausible to me (that the effect would be so much bigger than the cause in this sense), though again you’ll have thought about it much more carefully.. Do you have a way of thinking about this apparent tension?
James Snowden (GiveWell) on June 25, 2019 at 11:28 am said:

Thanks Colin,

My best guess is that, if deworming does have an effect on income, it’s most likely to be through subtle effects on the long run development of children’s cognitive ability (possibly mediated through education). That’s the kind of health effect which I’d guess has a greater effect on wellbeing through the ability to earn than it does directly through morbidity.

The evidence for the effect of deworming on cognitive function generally shows no statistically significant results, but not precisely estimated, so I don’t believe that evidence is sufficient to rule out improvements in cognitive development as a mechanism (given the direct evidence that exists for the effect on earnings).

Some more information on the potential mechanisms for deworming is here and is one of the documents linked to from our cost-effectiveness analysis (L6). That document is very rough, and wasn’t the only data point we used to estimate our replicability adjustment for deworming.
Craig on June 29, 2019 at 1:46 pm said:

re: recent post on sodium:
https://groups.google.com/forum/#!msg/newly-published-givewell-materials/LG32JLFu0Ws/S1AnaJM4CwAJ

I thought that a major reason for adding salt to food was to reduce spoilage/illness.. In developed countries this leads to longer shelf life and potential safety from using older products. In countries like India, salt and spices may counteract shorter term spoilage and potential illness. It may also mask some of the taste of food that is starting to spoil.

Is this perception true? If so, then reducing sodium intake may be more complicated than reducing consumption through behavior change (dealing with shorter shelf life, packaging, handling, storage,, refrigeration, spoilage, etc..)
Milan Griffes on July 5, 2019 at 2:33 pm said:

Was the discussion of board votes removed from the public-facing recording of the April 2019 board meeting?

( https://www.givewell.org/about/official-records/board-meeting-41 )

From the agenda, it looked like board votes were the first item to be discussed, but it looks like the recording starts off with discussion of the “2018 in review” topic.

Agenda:
https://files.givewell.org/files/ClearFund/Meeting_2019_04_30/Agenda_GiveWell_Board_Meeting_April_2019.pdf
Colin Rust on July 7, 2019 at 3:14 pm said:

Thanks again James.

Having a look at your replicability adjustment doc for deworming, two questions:

1. Have any studies measured (daily) school attendance as opposed to (annual) enrollment and/or grade completion? An issue throughout appears to be statistical power in that confidence intervals are really wide. I don’t know, but I would guess that at least in some cases schools keep good attendance records so that you could compare attendance rate pre- and post-distribution, and that would have relatively tight error bars, If the education story is right, i.e. if there is an impact on education so large that it ultimately has a big effect on long term consumption, I would expect to see some clear impact on attendance. Could be wrong in any number of ways, though. (One issue might be that there are causes of absences that are correlated across students, e.g. weather, which might lead you to underestimate error on a naive analysis. Though I guess you could try to correct for that by using some notion of the rate of idiosyncratic absences.)

2. GiveWell evaluates a large number of possible interventions. Let’s model the estimated CE of each as drawn from some normal distribution centered at the true CE for that intervention. Then, as you well know, taking the “best” intervention — in the sense of maximizing expected CE — will be biased towards interventions with high variance (which we don’t want) as well as towards interventions with high true CE (which we do want). So to correct for this, ideally instead of maximizing the EV of CE, you want to maximize EV less some variance penalty which depends on the number of interventions considered. (Put differently, estimating the CE of a randomly drawn intervention is a different problem than estimating the CE of the intervention that was selection as maximizing estimated CE.) Previously, I had understood GW’s replicability adjustment (currently 0.11) as accounting for this. But looking at your doc, it looks like that’s not primarily what it’s about. You estimate a factor of 0.16 looking at issues that are purely endogenous to the deworming analysis (Bayesian analysis using priors to discount implausible results). Taking all this at face value (and I realize ultimately these are intuitive decisions, so this should be taken with a grain of salt), that would suggest a variance penalty of 0.11/0.16 or a little under 70% (i.e. knocking a little over 30% of the CE estimate), which intuitively seems very low to me. How do you think about this?
Colin Rust on July 7, 2019 at 3:31 pm said:

To add to Craig’s comment re salt, two additional thoughts:

1. Is it well established that reduced salt intake reduces the risk of heart disease? A recent (Dec 2018) meta-analysis published in JAMA found:

Limited evidence of clinical improvement was available among outpatients who reduced dietary salt intake, and evidence was inconclusive for inpatients. Overall, a paucity of robust high-quality evidence to support or refute current guidance [to reduce salt intake] was available.

2. An additional consideration might be iodine from (iodized) salt which prevents thyroid disease.
Erin Wolff (GiveWell) on July 8, 2019 at 5:47 pm said:

Hi Craig and Colin,

Thanks for your questions on salt. We have not yet deeply investigated the health effects of salt or effects on preserving food, so we don’t have useful analyses to share on those topics. We have only done some brief desk research and a couple of phone calls with relevant researchers and practitioners to get a rough sense of the potential importance, neglectedness, and tractability of working on sodium reduction. Based on that shallow work, we don’t expect to prioritize further research into sodium reduction in the near term. We hope to write more about how we thought about prioritizing among “public health regulation”-oriented policy causes soon.
Erin Wolff (GiveWell) on July 8, 2019 at 7:57 pm said:

Hi Milan,

The board meeting did not end up following the order laid out in the agenda. The minutes for this meeting show the correct order of topics as they occurred at the meeting. The discussion of administrative items, which included voting on the board, can be found around the 42nd minute of the recording. I hope this helps!
Milan Griffes on July 9, 2019 at 6:49 pm said:

Got it, thank you!
Richard Crowder on July 11, 2019 at 10:15 am said:

I see a page headed: Our updated top charities for giving season 2018. But giving season 2018 is over, right? When will we see an update for giving season 2019? Thanks.
Catherine (GiveWell) on July 11, 2019 at 11:17 am said:

Hi Richard,

We work on an annual cycle, and our goal is to have an updated list of top charity recommendations by Giving Tuesday, which is the Tuesday following U.S. Thanksgiving. Giving season lasts through December 31, and the majority of giving to GiveWell’s top charities occurs during this period of time.

We continue to recommend our top charities after giving season. We believe our top charities have significant room for more funding and can use additional donations effectively throughout the year. We check in with our top charities throughout the year and share major updates on our website.
Aderonke on July 17, 2019 at 2:53 am said:

While applying to positions at GiveWell last July, i delved into its works and had some contentions with GiveWell”s position on the root cause of poverty.

So i wrote a mail on my thoughts with subject title: Beyond Intervention, on July 12, 2018. Although nobody addressed me on this directly, I am happy to learn folks at GiveWell are taking the ideas i pointed out in the mail seriously.

https://innovativegovernance.org/2019/07/01/effective-altruism-blog/

I am told organisations mine communities for ideas quite often without giving due credits. I hope GiveWell is different and wish you all the best.

NB: i applied via GiveWell portal on Friday July 6, 2018. I got no acknowledgement, I then wrote to jobs@givewell.org on Wednesday, July 11th informing them of my application. I sent in my Beyond Intervention mail to the same address on the 12th. Tracy Williams replied to my July 11th application mail on the 13th.
James Snowden (GiveWell) on July 18, 2019 at 12:20 pm said:

Hi Colin,

Sorry for the slow response — I’ve just returned from vacation.

On (1), Miguel and Kremer 2004 collect data on school attendance (measured through unannounced school visits, which I’d expect are more reliable than attendance records [Pg 189]). They appear to find effects on attendance, although the effect size and significance varies depending on the particular subgroup and year [Pg 191]. Replications of Miguel and Kremer (including Aiken et al. 2014) called the size and significance of this result into question; we discuss that topic further in this blog post. We wrote that we believe those replications are “best read as finding significant evidence for a smaller attendance effect.” I don’t believe any other large cluster-randomized trials have collected data on school attendance.

In sum, I believe the results reported in Miguel and Kremer are consistent with the hypothesis that school attendance may explain some of the large reported income effect.

On (2), we don’t explicitly account for the optimizer’s curse (i.e. the phenomenon that taking the “best” interventions will be biased towards high-variance estimates). As you mention, our replicability adjustment is supposed to account for issues that are endogenous to the individual analysis (this is true of both the 11% and the 16% adjustments).

We haven’t incorporated a formal (i.e. mathematically correct) adjustment for the optimizer’s curse, although there’s been a lot of internal discussion around it. We expect it would be a large project to create a satisfying model for this (e.g. defining an appropriate prior distribution; creating a Monte Carlo simulation of each of our cost-effectiveness estimates).

Some informal things we do which I think help to mitigate this problem (but don’t address it fully) are:

– Informally, when making allocation decisions, we place less weight on differences in cost-effectiveness when those models are particularly uncertain, or when the bias would be uncorrelated (e.g. we place more weight in our comparisons of programs which target the same outcomes, as they don’t depend on high-variance “moral weights” that play a significant role in our comparison of programs with different outcomes).
– We winsorize our deworming “intensity adjustments” for different regional programs to limit the extent to which noisy data can drive very large differences in cost-effectiveness.

There is some further discussion on the optimizer’s curse here.
Erin Wolff (GiveWell) on July 19, 2019 at 6:31 pm said:

Hi Aderonke,

In our traditional work, we’ve focused on finding effective organizations carrying out interventions that are relatively cost-effective and have strong evidence behind them. Generally, the evidence for interventions addressing “symptoms” is much stronger than for those addressing underlying causes. In addition, by improving people’s general health and wellbeing, we believe they will be better positioned to make long-term gains for themselves. In this way, treating health is itself arguably a way to address other problems. You can read more here.

As GiveWell begins to evaluate more complex evidence bases, we foresee a future in which GiveWell better understands the philanthropic opportunities for encouraging growth and structural change in low- and middle-income countries, and has information to share about how they compare with our top charities. Although this area has been outside the scope of GiveWell’s past research, we hope it won’t always be too speculative for us to consider.

Charter cities is one example that we may consider investigating under the umbrella of “increasing economic growth and redistribution.” We spoke with Mark Lutter of the Center for Innovative Governance Research about charter cities in June 2018. Notes from that conversation are here.

The following blog posts may also provide some helpful context for how we’ve thought about the question of what to recommend in the past; while we are exploring a broader scope of research evaluations going forward, we think the core points these posts raise are worth keeping in mind:
https://blog.givewell.org/2011/05/27/in-defense-of-the-streetlight-effect/
https://blog.givewell.org/2012/04/12/how-not-to-be-a-white-in-shining-armor/
Colin Rust on July 20, 2019 at 3:04 pm said:

Thanks as always James for the helpful explanations.

That totally makes sense that constructing a formal quantitative model for the optimizer’s curse that is good enough to actually be helpful would be a significant and tricky (perhaps even intractable) project. As you say, in particular, modeling the prior distribution seems like a really tough problem. It’s hard enough to model the univariate CE distributions, let alone how they interact in a multivariate context. And it may not be good enough to approximate distributions as normal since what you really care about is the right tails. I’m not that familiar with it, but one thought that likely already came up in your discussions: Extreme Value Theory — which characterizes the possible shapes of the tail of the maximum in the limit as you take the maximum over infinitely many distributions — might (or might not) be useful.

And I agree that both measures you list should have the effect of reducing the optimizer’s curse. Winsorizing reduces variability. Focusing on comparing more similar interventions could help by two mechanisms: reducing the search space, and as you say via correlated errors so that the relative errors are smaller.

But I do wonder if there’s more you can do with informal reasoning about the optimizer’s curse. We’re most interested in comparing interventions, so the absolute correction on an intervention is less important than the relative correction between interventions. Maybe something to consider would be having a relative optimizer’s curse correction (ROCC) that’s chosen like many other parameters based on summarizing people’s intuitions; you could arbitrarily set the ROCC to 1 for the intervention you’re most confident about. Two considerations: 1. Interventions where there is more uncertainty in the impact should tend to have a lower ROCC, so e.g. this is an argument for a lower ROCC for deworming than other top interventions and 2. Interventions with indirect mechanisms for impact are implicitly drawn from a larger search space and so should tend to have a lower ROCC; this is a second reason to discount deworming vs. other top interventions.
James Snowden (GiveWell) on July 22, 2019 at 9:32 pm said:

Colin,

Thanks for the link. I found this post about why we’d expect cost-effectiveness to be log-normally distributed (rather than normally distributed) quite compelling.

And thanks also for your thoughtful suggestions on ROCC. We’ll consider being more explicit about how to account for the optimizer’s curse in future!
Colin Rust on August 3, 2019 at 1:21 pm said:

Thanks James.

Jeff Kaufman (your link) gives a nice argument why we should expect CE to be approximately log-normally distributed: if CE is given by a product of many distributions (which as he points out it tends to be in GiveWell analyses!) then under mild assumptions this follows from the Central Limit Theorem (CLT).

But we have to be careful here. For the Optimizer’s Curse (OC) analysis, you’re interested in the (right hand) tail. Convergence for the CLT is much worse in the tails than in the bulk of the distribution. Let E be the typical number of effects per CE (i.e. such that each CE distribution is a product of ~E distributions) and N the number of interventions in the search space. It would be best to roll up our sleeves and do some math to get a more quantitative estimate of the tradeoff between N and E, but my guess is you need N >> E or at best N = O(E) to make the log-normal approximation reasonable for OC purposes. But in reality, I’d think we have E>>N.

I realize there isn’t an official GiveWell view on OC at this time, but I’d be curious if people tend to have the same intuition that I do that OC is significantly worse for deworming than other top interventions.
Prama Jyoti on September 10, 2019 at 2:07 am said:

Pramajyoti is Child Labour NGO in Delhi ,working towards eradication of child labour . We have built a mission to overcome child labor and abuse . Click here to know more : http://www.pramajyoti.org

Comments are closed.

Enter search terms here.

Search form

The GiveWell Blog

Comments