The GiveWell Blog

Self-evaluation: GiveWell as a project

This is the third post (of five) we’re planning to make focused on our self-evaluation and future plans.

This post answers a set of critical questions for GiveWell stakeholders. The questions are the same as last year’s.

Is GiveWell’s research process “robust,” i.e., can it be continued and maintained without relying on the co-Founders?

Where we stood as of Feb 2012

We wrote:

We currently have 3 full-time analysts, and have made an offer to an analyst who will start in July, which would bring GiveWell to 4 full-time analysts. We continue to focus on recruiting and hope to reach 6 full-time analysts (8 total employees) summer 2012.

Analysts take the lead on most charity investigations; co-founders may provide basic guidance and sign off on work before it is published. GiveWell Labs, because of its experimental nature, will be led for the time being by co-founders.

Progress since Feb 2012

Following February 2012, we made two full-time hires and one part-time hire; one of the full-time hires departed GiveWell the same year. We also saw the departure of another analyst who had started in January of 2012 (and was included in the above quote). On net, therefore, the size of our staff rose by one part-timer. We also employed a summer intern and a trial hire, both of whom may become full-time employees this year.

Due to time sensitivity, the review of GiveDirectly – our new recommended charity in 2012 – was led by co-founders, rather than analysts. (See our shortcoming on this matter.) In addition, much of the work we put into deepening our research was led by co-founders. Analysts played valuable roles, and made far greater contributions than in previous years, but the share of work done by co-founders was higher than it would have been if we had not been dealing with this time sensitivity.

Two positive developments on this front in 2012:

  • Our capacity has improved significantly because of the maturation of existing employees. We now have several analysts who are able to add substantial value on a regular basis, improving our capacity. Alexander Berger has been promoted to Senior Research Analyst and represents an expansion in our capacity for top-level investigations. Natalie Crispin has taken over primary management of GiveWell’s financials and donation processing (which was previously handled by co-founders) and is now Research Analyst and Financial Manager.
  • Our research process has become better systemized. 2012 was the first year in which our process for investigating a top charity remained substantially the same as in the previous year, and we feel that this bodes well for our ability to train analysts to take on more of this process in the future.

Our work on GiveWell Labs is still new and exploratory, and thus is led by senior staff.
Where we stand
We currently have three full-time and one part-time analyst, along with the two co-founders. We are currently re-thinking our hiring process and the roles and qualifications of people we wish to hire.

Although analysts have taken on more responsibility, we remain reliant on GiveWell’s co-founders for significant core research work. Elie Hassenfeld is heavily involved in managing and conducting individual charity/giving opportunity investigations and Holden Karnofsky is heavily involved in completing literature reviews for the evidence of effectiveness and cost-effectiveness analyses for interventions.

What we can do to improve

We intend to make hiring a priority over the coming year, but are not yet sure of exactly what path this will take. We have some ideas for finding new hires more effectively than previously, including (a) evaluating people via trial work rather than relying on interviews when possible; (b) considering more senior hires with experience that is directly relevant to the work our research analysts do. We don’t believe we have yet found a reliable formula for hiring people, though we believe we are improving on this dimension, both through trial and error in hiring and through getting a better sense over time (via repetition) of what work our employees need to do.

Does GiveWell present its research in a way that is likely to be persuasive and impactful (i.e., is GiveWell succeeding at “packaging” its research)?

Where we stood as of Feb 2012

We wrote:

As traffic to our website has increased over the past 12 months, we would guess that the importance of better packaging our research has risen. In particular, we feel our site is poorly suited to donors who want to spend more than a few minutes but less than an hour on our site. (We have designed the site to make quick action easy and to provide significant depth, but we have no “middle level” of depth for gaining some information relatively quickly.)

Progress since Feb 2012

None. This has continued to be a low priority over the past year.

Where we stand

We continue to believe that the lack of mid-level content is a shortcoming that likely prevents us from reaching some potential donors.

What we can do to improve

We have several ideas that we could execute in order to produce more “mid-level” content regarding our recommendations, but we do not plan to prioritize this work in the coming year.

Self-evaluation: GiveWell as a donor resource

This is the second post (of five) that we’re planning to make focused on our self-evaluation and future plans.

This post answers a set of critical questions about the state of GiveWell as a donor resource. The questions are the same as last year’s.

Does GiveWell provide quality research that highlights truly outstanding charities in the areas it has covered?

Where we stood as of Feb 2012

We felt that current research was high-quality and up-to-date. However:

  • We felt that there were multiple areas that could offer outstanding opportunities that we had not yet researched as thoroughly as we could have (particularly in the areas of nutrition, vaccinations, neglected tropical disease control, tuberculosis control, and research and development).
  • We were not satisfied with the degree to which our research was “vetted.” It still seemed to us that we could make a substantial mistake or error in judgment, with too high a probability that it would remain unnoticed.
  • We worried about our total “room for money moved,” which we estimated at $15-20 million in our top charities; it seemed possible to us that continued rapid growth could potentially lead us to “run out” of great giving opportunities.

Progress since Feb 2012

In 2012, we wrote that we wanted to:

  1. Revisit the goal of having our work subjected to formal, consistent, credible external review.
  2. Continue to look for more outstanding giving opportunities for individual donors, particularly in the areas we have identified as most promising (i.e. global health and nutrition).
  3. Begin to look for more outstanding giving opportunities for individual donors through GiveWell Labs.

In 2012, we made limited progress on #1, strong progress on #2, and less than anticipated progress on #3:

  1. We did not solicit any new external reviews of our work in 2012, and we did not formally revisit the goal of doing so. Rather than focusing on increasing formal expert review over the past year, we subjected our key pages to a higher level of pre-publication internal review, ensuring that pages and spreadsheets that play an important role in our final recommendations are thoroughly checked by at least one person who did not play a role in their production. We do not view this change as eliminating the eventual need for formal outside review, but we see it as adequate for our current needs. We also feel that the increased level of informal critical attention our research has received from the outside has lowered the need for formal external review (more on this in a future post).
  2. We added GiveDirectly to our list of top-rated charities in November 2012, after a thorough review that included a site visit and review of the evidence for unconditional cash transfers. We also conducted further investigations in the area of global health and nutrition:
  3. In the realm of GiveWell Labs,

    However, we have not been able to devote as much time to GiveWell Labs as we would have liked, and progress has accordingly been slower than anticipated. We have not yet identified any giving opportunities that we are ready to recommend (aside from the two grants mentioned above, both funded by Good Ventures).

Where we stand

We continue to feel our research has identified outstanding giving opportunities for individual donors, with adequate capacity (room for more funding in top charities) to absorb the level of funding that we expect in 2013, but we believe that room for improvement remains across the three broad areas we identified in 2012: continuing to find ways to subject our research to scrutiny and quality control, finding more outstanding giving opportunities according to our traditional criteria, and broadening our criteria via GiveWell Labs.

Of these three, we think the most urgent need is to make more progress on GiveWell Labs. Progress on that front in 2012 was much slower than hoped, due to a smaller allocation of staff time than intended. In order to make more progress on GiveWell Labs in the future, we may need to put less time (in the short term) into the other two goals, while hoping eventually to expand our staff capacity so that we can pursue all three effectively.

What we can do to improve

We plan to prioritize work on GiveWell Labs more highly in 2013, devoting more staff time to research on new causes than we did in 2012. We aren’t yet sure how we will be addressing the other areas of improvement discussed above; it depends heavily on how much capacity we are able to devote to GiveWell’s traditional work while making sure that we are moving forward significantly faster on GiveWell Labs. How to allocate capacity between these two arms of GiveWell is a major question for the coming year, to be discussed further in a future post.

GiveWell’s progress in 2012

This is the first post (of five) we’re planning to make focused on our self-evaluation and future plans.

As in past years, we’re going to be posting our annual self-evaluation and plan as a series of blog posts. This post summarizes what changed for GiveWell in 2012 and what it means for the future. Future posts will elaborate.

For us, the major developments of 2012 were:

  • We continued to strengthen our partnership with Good Ventures. In 2011, we made contact with Good Ventures, a new foundation, and they made grants totaling $750,000 to our top-rated charities. During 2012, we have been working more closely with Good Ventures, including playing a significant role in $1.1 million worth of grants made to organizations that are not top-rated GiveWell charities. Good Ventures also made $2 million in grants to GiveWell top charities in 2012.
  • We relocated to San Francisco. During 2012, we decided to move our office (and staff) from New York to San Francisco. As of February 2013, we have completed the move. We are sharing office space in San Francisco with Good Ventures.
  • Growth in money moved was strong, but appears to be slowing relative to the previous 100% per year pace. Our total money moved for 2012 was over $8.5 million (we haven’t yet finalized the tally), compared to a bit over $5 million for 2011; excluding two particularly large and unrepresentative donors, it was ~$5.5 million for 2012 as compared to ~$3.4 million for 2011.
  • Our staff capacity has grown slower than we had hoped. We currently have a staff of five full-time employees and one part-time employee, compared to five full-time employees at the end of 2011; in last year’s plan we expressed a hope that we would have eight full-time employees by this time.

    We continue to believe that we have more work than staff capacity our current size, and recruiting will be a major priority for 2013.

  • We conducted a thorough review of GiveDirectly and added it to our list of top charities, in addition to conducting deeper research in a number of areas of global health and nutrition that we had not previously investigated in depth. We anticipate continuing to look for more top charities in the global health and nutrition field in 2013, particularly organizations working on interventions we have not previously prioritized, but our work in 2012 has led us to suspect that room for more funding will be an ongoing issue as we consider new interventions.
  • Our process for conducting research on new charities has become more systematic and replicable. We now have relatively stable pattern of initial phone calls and document requests to prioritize organizations, followed by a “deep dive” review of highly promising organizations (including a site visit), along with an intervention report and cost-effectiveness analysis for the intervention a potential top charity conducts. In addition, more thorough (pre-publication) internal and (post-publication) external reviews of our core research products have also made us more confident in our recommendations than we have been previously.
  • We did not make as much progress as hoped on GiveWell Labs, our effort to conduct research on other causes. Upping our staff allocation to GiveWell Labs, and thus the progress we make on it, is a major priority for 2013.

Overall, in 2012 our research and our influence both improved significantly, but we see substantial room for more improvement, particularly with our work on GiveWell Labs and with the development of our organizational capacity. We continue to believe GiveWell has enough impact to justify its operating expenses, and hope to have much more impact in the future.

Of course, we also made plenty of mistakes in 2012, and we’ve recently updated our shortcomings log to reflect them. Perhaps most importantly, our process for considering GiveDirectly as a potential top charity started later in the year than it should have, leading us to miss our goal of having our “giving season” updates ready by November 1st; this also meant that our co-founders played a larger role in investigating GiveDirectly than they ideally would have (due to time sensitivity), which had negative consequences for the time we were able to put into GiveWell Labs and for the long-term development of our organizational capacity.

Guest post from David Barry about deworming cost-effectiveness

This is a guest post by David Barry, a GiveWell supporter. He emailed us at the end of December to point out some mistakes and issues in our cost-effectiveness calculations for deworming, and we asked him to write up his thoughts to share here. We made minor wording and organizational suggestions but have otherwise published as is; we have not vetted his sources or his modifications to our spreadsheet for comparing deworming and cash. Note that since receiving his initial email, we have discussed the possibility of paying him to do more work like this in the future.

Along with many others, I intuitively disagreed with GiveWell recommending GiveDirectly ahead of the Schistosomiasis Control Initiative. But I wasn’t as knowledgeable as I thought I was – not having paid close attention to every post and report, I thought that deworming helped fix minor short-term illnesses, but it was cost-effective because the deworming treatments are really cheap. The latter is correct, but I eventually realized that the reason GiveWell was recommending deworming at all was because of its developmental effects. Since these benefits should show up later in life in the form of increased incomes in adulthood, a comparison between deworming and cash transfers makes more sense: what generates more extra money, an investment of a cash transfer or a higher income thanks to deworming? Put like that, deworming’s not the obvious winner.

Since I still didn’t have a decent grip on what exactly went into GiveWell’s cost-effectiveness estimates, I worked through the various spreadsheets behind them. I didn’t check everything closely, but I worked through it closely enough to turn up a couple of spreadsheet errors (including another one from the DCP2 spreadsheet, though not anywhere near as dramatic as the errors described in 2011).

Since there are probably others out there who are as interested in the topic as me, and as ignorant of the details of the calculations as I was, I decided to write up a post on them. It’s part explanation, part criticism, part general commentary. GiveWell have in the past talked about the number of judgement calls that they have to make; they really are unavoidable, and no doubt my own biases have crept into my analysis. I hope I’ve been clear about them though.

Overall, I find a few points of serious disagreement with the GiveWell staff’s assumptions, but they effectively cancel out, so the net effect for deworming is fairly small. My final cost-effectiveness estimate is within the broad range of those from the GiveWell spreadsheet, though more favorable to cash transfers (relative to deworming) than any of the GiveWell staff scenarios. By the end, this exercise made me much more favorable to donating to GiveDirectly – the benefits of cash transfers are pretty big and may be comparable with many health interventions. Whereas before I was thinking of an 80/20 split between AMF and SCI, now I think I’ll go 80/10/10 (AMF/SCI/GD). I can now also see better where GiveWell are coming from when they’ve written in the past about not putting as much emphasis on these formal cost-effectiveness estimates – I could well be persuaded in the near future that my own estimates are off by a factor of 5, in either direction.

There are two main approaches to estimating the cost-effectiveness of deworming: one is based on the DCP2 method, and one is based heavily on Baird et al.’s working paper. I’ll take each of these in turn.

DCP2-style calculation
The DCP2 method for calculating the cost per DALY averted, which was (with some corrections and additions) how GiveWell calculated their cost-effectiveness in 2011, goes like this:

  1. Use models to estimate the prevalence in each region of the world for each of the various symptoms caused by the various worm infections.
    • Already there’s going to be some uncertainty introduced because of the modeling – I haven’t studied this step closely, but in the absence of good data collection, models of disease burdens can be quite substantially wrong. An prominent example here is the annual deaths caused by malaria – the WHO has a 95% confidence interval of [537000, 907000], whereas a study in the Lancet (Murray et al.) has [929000, 1685000]. It’d be very optimistic to conclude that it’s probably close to 900,000. More likely, someone’s methodology is wrong.
  2. Estimate the fraction of the people with symptoms who would be cured with deworming treatment.
    • I expect that these values are pretty well established from medical studies.
  3. Assign a duration and disability weight to each symptom.
    • While people can always argue over disability weights, I’ll assume that there is relatively little debate over the short-term morbidity. More challenging is assigning a disability weight to life-long cognitive impairment. The standard value in the literature is 0.024; the reasoning seems to go roughly, “The disability is pretty small, and 0.024 is a pretty small number.” At least one author (King 2010 and elsewhere) argues that instead of the usual value of 0.005 for the total DALY burden due to a schistosomiasis case, we should use a value that may be ten or more times higher. Unfortunately for our confidence in this estimation method, preventing such cognitive or developmental impairment, even using the lower weights, accounts for over half of the estimated DALYs averted by deworming. This is why it’s understandable that for their 2012 analysis, GiveWell relied mostly on the Baird et al. working paper – it’s very problematic to work off just one study, but at least it’s an empirical data point.
  4. Take the estimated cost per treatment, and multiply through everything to get a cost per DALY averted.

The spreadsheet GiveWell used to perform these calculations is here. (Some of the inputs to that sheet are from the corrected (but not fully corrected) DCP2 spreadsheet here. I’m only linking to this spreadsheet for completeness – it’s very hard to navigate. While there’s still an error in the schistosomiasis sheet, it doesn’t affect GiveWell’s subsequent calculations, and I won’t talk about this spreadsheet again.)

The calculations in the spreadsheet for soil-transmitted helminths are essentially as I’ve described, with symptoms separated into “general”, “severe”, and “developmental”. But the calculations are different for schistosomiasis. Rather than starting with the prevalences of the various schistosomiasis symptoms, GiveWell instead simply use a given total DALY burden per schistosomiasis case, and spread it out over “general”, “severe”, and “developmental” in the same proportions as the STH’s. (If you just want to calculate the overall cost per DALY averted, there’s no need to spread out the schistosomiasis DALYs like that. But doing so means that you can later fiddle with the disability weight for the developmental impacts. In fact, the spreadsheet back-calculates all the prevalences and other intermediate quantities used in the calculation.)

Now some comments on the spreadsheet. All calculations are performed for a 3% discount rate (in the ‘DCP2’ sheet: columns E, G, I, …) and for a 0% discount rate (columns F, H, J, …), which makes the sheet look a lot more intimidating that it perhaps should be. Furthermore, rather than calculating DALYs, and converting to “life-saved equivalents” at the end, the conversion is done in the middle of the calculation; I find this unintuitive (perhaps it’s just a personal preference).

The spreading out of the schistosomiasis DALY burden across the three categories of symptoms is incorrect due to an Excel formula error (cells DCP2!W20:X22). Fixing this error and making no other changes affects some of the intermediate quantities substantially, but only makes a difference of about 10% to the overall cost effectiveness estimates.

Two different DALY burdens are used per schistosomiasis case: one for a 3% discount rate, and one for a 0% discount rate. Starting with the DALYs and back-calculating the earlier values (prevalences and so on) means that there are various inconsistencies: the spreadsheet has the schistosomiasis prevalences dependent on the discount rate. It is possible to fix this anomaly: pick either the given value of 0.0058 DALY(3,0) or 0.0097 DALY(0,0) and derive the other so that the prevalences are consistent. But while we can remove the absurd apparent dependence of the current prevalence on the future discount rate, it’s not clear that this actually improves the final estimates – who knows which of the 0.0058 and the 0.0097 is the right one to use, if either?

The conversion from DALYs averted to life-saved equivalents will always cause disagreement amongst different people, and here is my disagreement. GiveWell measure a life at 70 years, and say that the life-long developmental effects last for 70 years. But the life-expectancy in many countries in sub-Saharan Africa is much lower than 70 years, and the children getting dewormed are, on average, aged about 10. So I would have the life-long effects lasting at, say, 45 years rather than 70. Furthermore, since we’re making life-saved comparisons to bednets – where we’re primarily saving lives of under-5 children in countries with similar life expectancies – I think that the “life” would be better defined at 50 years rather than 70. I don’t claim that everyone should follow my definitions for the DALY-to-lives conversion (people may want to weight deaths of young children below those of adults, or age-weight in some other way, and so on), but I think it is worth showing this as one way, in amongst many others, where differences of opinion can alter the cost-effectiveness estimates by 10% or so.

I’ve put my own version of the spreadsheet, with the various changes described above, on Google Docs here.

Calculation based on Baird et al.
In light of the large uncertainty of the appropriate disability weight to use for life-long cognitive impairment and other possible developmental benefits of deworming treatment in childhood, GiveWell’s latest estimates rely heavily on the Baird et al. working paper, the latest version of which was released in August 2012. This study looks at (amongst other things) the working hours and earnings of a group of Kenyan adults aged 19 to 26, who as children had been in schools in the Busia district where a deworming program for both STH’s and schistosomiasis had been implemented. The program was rolled out in stages, with schools in “group 1” receiving treatment from 1998 onwards, schools from group 2 from 1999, and schools from group 3 from 2001. Groups 1 and 2 constitute the treatment group, and group 3 is the control group. The program was not quite as simple as that (there were experiments in cost-sharing tried for a little while), but all up, the students in the treatment group received, on average, about 2.4 more years of deworming than the students in the control.

The results of the study are very positive: wage earners in the treatment group averaged about 29% higher incomes than those in the control. As well as comparing this result to possible returns to investment from cash transfers, we’ll want to ask (and I will give my answer later) how much increased incomes are worth in DALY-equivalent terms – since deworming gives both short-term health benefits and long-term financial benefits, we’ll need some sort of conversion factor if we want an overall cost-effectiveness estimate. The study’s regressions also found that self-employed non-agricultural workers reported 19% higher profits, though the latter result was not statistically significant at the usual levels. I’ll come back to this point later.

GiveWell present two different calculations based on the Baird et al. results. One is a “lives saved framework”, and one is a “financial framework”. I’ll focus my discussion on the financial framework, which includes the assumptions behind the cash transfer benefits calculation. The basic idea for cash transfers is that some fraction of the transfer is invested and generates an income stream (assumed to last 40 years; I suspect that this is generous), while the rest is spent on short-term consumption. Add the latter to the (appropriately discounted) former, and you have the total monetary benefits to cash transfers. For deworming, the basic idea is that some recipients get short-term health benefits, represented as a fraction of average income, and some recipients get an increased income in adulthood. Appropriately discount the latter and add to the former, and you get the total monetary benefits of deworming.

The ‘Assumptions’ sheet lists the key inputs to the calculations. I’ll look at these out of order. There is a lot of room for disagreement on some quantities, and no real way to satisfactorily resolve those disagreements without much more research. So while at times I might argue that my assumptions are more appropriate than those given by GiveWell staff, at other times I am just offering a different opinion.

ROI of cash transfers

    I don’t really have much confidence here, though I am inclined to agree with the GiveWell staff that returns can be high. The example often mentioned is that many of GiveDirectly’s cash transfer recipients spend about half their transfer on an iron roof. The thatch roofs often leak during heavy rainfall, which can happen several times a year, and which requires repair costing (say) $15 and also requires moving possessions to under a more solid shelter. An iron roof would therefore start giving a return relative to baseline of over 10% as soon as it next starts raining heavily; GiveDirectly estimate this return to be 17%. In addition, there are studies from other cash transfer programs showing high rates of return. The GiveWell staff’s inputs on this range from 5% to 54%; I am happy to go with something in the middle of that.

Rate of investment of cash transfers

  • This assumption tells us how much of the cash transfer gets invested; all the GiveWell staff assume 75% “based on GiveDirectly self-reported spending and economic theory”, and I have no reason to alter a figure based on self-reported spending.

Discount rate

  • All the GiveWell staff go with 5% here, which I think is much too high a rate. I think that this value should be the social discount rate, which essentially tells us how much we value the welfare of people today relative to the welfare of people in the future, though it can also be used to describe our uncertainty about the effects of policies today on the future – such an uncertainty will usually get larger the further into the future you estimate. There are good arguments that the social discount rate should be zero (i.e., that we should value future welfare equal to present welfare) or some small number to account for future uncertainty; more commonly used is 3% per annum; in my head I usually assume the rate to be some fuzzy number between about 1.5% and 2%.
  • Instead, the GiveWell assumption seems to be describing the discount rate that you’d use if you were making an investment: if you’re able to get a risk-free 5% annual return, then you discount your expected future revenue by 5% per annum compounding. Comparisons of returns might be relevant to the recipients of the transfers, who have to decide where to invest their new cash. But even if they’re able to get a risk-free 5% return, then that is a benefit of the cash transfer that should be discounted only at the social discount rate.
  • In his blog post, Holden also mentions that the discount rate of 5% incorporates the recipients’ preference for current consumption over future consumption. This is a more solid argument, but one that need not bind donors (and arguably should not bind donors) – I am happy to value the recipients’ future welfare more highly than they do themselves at present. Holden also uses the discount rate to account for the fact that donors are able to invest money today and donate later. There can be (and have been) long debates on the topic of donating now versus donating later; I figure that since I am donating now, the return I can get on my money is not relevant to an estimate of the benefits of that donation.
  • The very high discount rate of 9.85%, offered as a third option, is irrelevant: it is the real interest rate paid by the Kenyan government on its sovereign debt, and it should in no way inform the discount rate used in these cost-effectiveness estimates.
  • I’ve spent a bit of time on this assumption because it changes quite substantially the final results. Cash transfers start generating a return very quickly, but the future increased incomes from deworming only start several years after the fact (ten years, if the deworming starts at age 5; five years, if we take the middle age group of students aged 5-14). And in a comparison to bednets, a larger discount rate will unfairly make both deworming and cash transfers look less effective.

Proportion of child-years that are as helpful (in terms of developmental effects) as the specific years in the study for deworming

  • The study has the treatment group getting 2.41 extra years of deworming relative to the control. The calculation assumes that students are dewormed every year for 10 years. If all of those years of deworming are as helpful as the 2.41 years in the Kenyan study, then this assumption should be 100%. I don’t have any clue what this value should be; perhaps some experts on worm infections would have more of an idea about whether infections at age 5 are more or less damaging than those at age 14, but I don’t know.

Proportion of deworming going to children

  • All the GiveWell staff go with 50%, based on discussions with SCI.

Proportion working for wages; treatment effect on ln(total labor earnings)

  • These are two separate assumptions in the spreadsheet, but they are related. Recall that Baird et al. find wage earners in the treatment group seeing increased incomes by 29% relative to the control; about one sixth (16.6%) of their sample was working for wages. Another 10% was in non-agricultural self-employment, and this group did not have a statistically significant increase in profits relative to control, but the point estimate was large, at 19%. GiveWell’s spreadsheet assumes that the self-employed non-agricultural workers receive no long-term benefit from the deworming, and it is certainly acceptable (and often desirable) to ignore results that are not statistically significant. But I personally don’t put quite as large an emphasis on proven outcomes, and put more of an emphasis on expectation values. And here I think the expectation should be that profits increased. As well as the wage earners showing a statistically significant increase in incomes, they show a statistically significant increase in weekly hours worked. The non-agricultural self-employed also show a statistically significant increase in hours worked, and a fairly large point estimate of an increase in profits.
  • So while I wouldn’t want to see policymakers picking and choosing non-significant but favorable results, in this case I think the lack of significance is a minor issue, at least for me personally. Including a non-significant results means that I should probably make the “replicability adjustment” (see below) stronger, but in my set of assumptions, I have the proportion of the population seeing increased earnings at 26%, and a treatment effect on ln(earnings) at ln(1 + (0.166*0.287 + 0.1*0.19)/(0.166 + 0.1)) = 0.22.

Duration of benefits in years

  • In GiveWell’s spreadsheet, this assumption applies to both the deworming calculation and the cash transfer calculation. I think that the two should be split: while the average person who reaches the age of 15 may work for 40 years, I’m not convinced that investments will last for so long – the iron roof might need replacing after a couple of decades. (Of course, it’s possible that some investments may last even longer and be passed on to the next generation.) I don’t have much confidence in what the duration of the cash transfer benefits should be, but I would be more comfortable with, say, 25 years rather than 40.

Replicability adjustment

  • John Ioannidis estimates that about half of the results in the medical literature are unable to be replicated, because of publication bias, and some researchers hunting around for techniques that give a statistically significant result, and so on. It’s therefore prudent to assume that experimental social science papers will be subject to similar biases. And since the bulk of the deworming calculation is based on the results of one working paper, it seems appropriate to discount the overall results by some factor: 50% is roughly in the middle of Ioannidis’s estimates on replicability for medical research, and the GiveWell staff all give roughly comparable values (between 30% and 50%). There is, of course, no way of really knowing what this value should be.
  • Since I’m including the non-significant result on self-employed profits, and I might be wrong to do so, I think that I should reduce this value from 50% to 42%. That’s a ballpark guess, of course: say the replicability of the wage increases (statistically significant) is 50%, and the replicability of the profit increases (not statistically significant) is only 25%. The wage increases constitute about 70% of the total earnings increases (16.6% of the population with a 29% increase as opposed to 10% of the population with a 19% increase) in the Baird et al. paper, so a natural adjustment to use is 0.7*0.5 + 0.3*0.25 = 0.42. But in my discussion of external validity (below), I argue that when generalized across sub-Saharan Africa, the benefits to the self-employed non-agricultural workers will become more important (comparable to the benefits gained by wage earners), so instead I’ll use an adjustment here of 37%.
  • Of course, Baird et al. is not the only paper to have studied deworming benefits, although it provides more useful results for our purposes than the others. Last year, the Cochrane Collaboration released a review of randomized trials of STH deworming (for a round-up of sometimes strong reactions, see this Storify). The “Main results” section of the review on page 2 is probably the most persuasive case for cash transfers over deworming that I’ve seen – it is a long list of small or null results with weak evidence bases. It is certainly plausible that some of the defenses of mass deworming are correct: that the subtle health effects caused by a particular species won’t necessarily be picked up by a review of all STH deworming; that we have far stronger evidence of the health benefits of deworming on animals (I have not looked up this literature, but my prior is that it’s likely to be true and relevant to humans); that schistosomiasis, not covered in the Cochrane review, is more important. But I think that we have to give a good deal of weight to the Cochrane review. Of course people will disagree heavily over how much weight to give it; I will say that I’ll reduce my replicability adjustment from 37% down to 18%.
  • There’s also a replicability adjustment needed for cash transfers; perhaps we can think of it as an external validity adjustment (see below) as well – this latter correction wouldn’t necessarily be relevant for GiveDirectly at the moment, but perhaps it will be if they scale up their operations to more countries. Since we have evidence from various programs of high rates of return on investments, I’m satisfied that not much of a correction is needed for the 25% return that I think is reasonable to use for the estimate. On the other hand, I don’t know if a different set of cash transfer recipients would invest, on average, 75% of their transfers. It seems pretty high to me. I’ll put this adjustment at 66.67%.

External validity

  • There are two main reasons to think that the Baird et al. study of the Busia district in Kenya won’t generalize to the rest of sub-Saharan Africa. The first is that Kenya has an unusually large percentage of its adult population in wage employment. The graph on page 9 of this report from the International Labour Organization shows that about 30% of working-age people in Kenya are in wage employment, with the other low-income countries in the region between about 6 and 19% (the study sample in Baird et al. had about 17%; I’m guessing this is because the wage jobs are disproportionately in Nairobi). Botswana, Namibia, and South Africa – all countries with much higher average incomes – are the only three countries shown with higher percentages of wage employment. Gindling and Newhouse (Table 3, p16) give the overall percentage of wage earners in a sample of 21 sub-Saharan African countries as 13.4%.
  • If you treat increased wage earnings as the only source of life-long benefits to deworming, as the GiveWell spreadsheet does, and if SCI’s work is predominantly in rural areas like the Busia district (I don’t know if this is true), then the external validity adjustment from this factor alone should probably be about 50% (GiveWell’s spreadsheet does not consider this factor).
  • If you also include increased profits from the self-employed, then it’s a little trickier. Heintz (Table 10.1, p203) gives the self-employed non-agricultural share in Kenya as about 16%, as compared with about 20% for the same sample of sub-Saharan African countries used by Gindling and Newhouse. Again assuming that SCI works predominantly in areas like the Busia district, then we can scale up the 10% of self-employed non-agricultural workers to about 12.5%, and scale down the 16.6% of wage earners to around 8%. My overall adjustment is therefore (0.08*0.29 + 0.123*0.19)/(0.166*0.29 + 0.1*0.19) = 70%.
  • The second reason for an external validity adjustment is the high prevalences of helminth infections in the region covered by the study. At baseline, the prevalence of hookworm infection was 77%, A. lumbricoides 42%, and T. Trichiura 55% (Miguel and Kremer, Table II). By comparison, the sub-Saharan Africa prevalences are (roughly!) 23%, 23%, and 19% respectively (Bundy et al., where I’ve taken the infections for the 5-14 age groups in Tables 9.5b to 9.7b and normalized them by the SSA/Total infections ratio in Tables 9.5a to 9.7a).
  • Furthermore, heavy flooding in 1998 caused very high prevalences of moderate-heavy infection amongst students who hadn’t received deworming treatment when these values were measured in early 1999 (Miguel and Kremer, Table V). These students are from group 2 and group 3 schools – half of them ended up in Baird et al.’s treatment group, and half in the control. If they had all ended up in the control, then we would expect the results to substantially over-estimate the benefits of deworming (remembering that this is relative to an already high baseline prevalence); instead they likely still over-estimate them, but not by as much.
  • It is generally assumed that very low worm burdens don’t cause any problems – only people with worm burdens above some threshold are at risk. I haven’t tried to model the proportion of 5-14’s across sub-Saharan Africa above the various thresholds used. The more thought I’ve put into this point, the more uncertain I’ve become on the appropriate external validity adjustment to use here.
  • I haven’t looked at schistosomiasis prevalences.
  • GiveWell’s staff all use a value of 30.25%, derived from using an odds ratio of moderate-heavy infections between the 1999 treated and 1999 not-treated groups, with the latter figure adjusted upwards so that it is the moderate-heavy prevalence that would have existed in the absence of any treatment (there are spillover benefits to deworming). I find this problematic on three levels. Firstly, using an odds ratio suggests to the casual reader that there is some respectable mathematical model that implies that an odds ratio is the appropriate adjustment; perhaps some model exists, but really this adjustment should be considered a guess. Secondly, I don’t think it’s appropriate to use estimated prevalences in the absence of any treatment: all the results from the studies are based on comparing a treatment to a control, so if an adjustment is to be made based on prevalence levels, then it should be made on the prevalence levels that were actually experienced. Thirdly, GiveWell’s external validity really only attempts to make the results external in time to the same district in Kenya. It would be better to incorporate the knowledge that the helminth infections were particularly high relative to the rest of sub-Saharan Africa, even at baseline. On the other hand, SCI targets to some extent regions where worm infections are high, somewhat mitigating this last factor.
  • At the time of writing, I’m unable to come up with anything better than a wild guess for this factor. Moderate-heavy infection prevalences may be substantially higher in the Busia district than in other regions where there will be deworming programs, but they may not be (and I’ve also come across some much higher prevalence estimates than those I linked to above). I don’t know how long the flood-induced very high prevalences in the study persisted for. I don’t even know what effect to expect when the worm burdens change; I tried modeling that recently, but a) my conclusion then was that I could only guess, and b) further reflections since writing that post have led me to think that some of its assumptions are not valid anyway. So, in light of all that, I wouldn’t argue strongly against any vaguely reasonable adjustment factor; I’ve gone with 30%, but honestly I could be persuaded to much smaller or much larger values.
  • Combining the 30% just guessed at with the 70% from earlier (50% if you only consider wage earners) gives an overall external validity factor of 21% (15% if you only consider wage earners).

For the cash transfer calculation, there are also some inputs needed on the size of the transfers and the average income per person.

With all of those values decided on, it’s relatively straightforward to churn through the math to get the estimate of the total discounted benefits per dollar. The GiveWell spreadsheet assumes that the benefits are proportional to the log of the increases; that seems debatable but reasonable enough, and my quick check suggests that using proportions directly doesn’t make much of a difference.

The last component needed to complete the deworming calculation is the short-term health benefits. We can get the DALYs averted per person from the 2011 spreadsheet discussed in the “DCP2-style calculation” section above. With my preferred inputs to that calculation, the result is 0.0019 DALYs averted per person treated. Now the question is, how much should I value that quantity in dollar terms? This is again a question which will result in much reasonable disagreement between different people. The way I figure it, the value of a statistical life in high-income countries is measured in the millions of dollars; call it 90 times the average income. I will therefore value statistical lives in low-income countries at about 90 times the average income. An actual life saved by malaria bednets gives the child, on average, about another 50 years of life; 50 years discounted gives something like 30 years; a DALY is therefore worth something like 90/30 = 3 times an average income. Maybe since the marginal utility of extra consumption is higher for those on low incomes, I should reduce that 3 down to something lower. Keeping it at 3, though, the short-term health benefits equate to 3*0.0019 = 0.57%, similar to the GiveWell staff’s 0.51%.

Putting it all together, inputting my best guesses (well, often they’re just guesses) into the spreadsheet, it tells me that a dollar spent on deworming leads to a 1.67% proportional increase in consumption, and that a dollar spent on cash transfers leads to a 0.96% proportional increase in consumption. Compared to the GiveWell staff, I’m roughly in the middle of their estimates for the benefits of deworming, and on the high end for cash transfers (only Alexander, who assumes a 54% ROI, has the benefits of cash higher than me), the latter largely due to me using a lower discount rate. My inputs make deworming only 1.7 times as cost-effective as cash – substantially closer to parity than the figures of 2.3 to 4.2 from the GiveWell staff.

My spreadsheet with my inputs is on Google Docs here.

The remaining question is a comparison to bednets. I’ll ignore what’s in the spreadsheet here and do a quick conversion: I said earlier that I value a statistical life at about 90 times income (those who would prefer a different multiplier here can adjust accordingly). Deworming with my inputs gives an increase of a factor of 0.0167 times income; that implies about $5400 for a benefit equivalent to a life saved, about twice what it costs to save a statistical life with bednets, and that ignores any (speculative) subtle developmental effects that bednets may have. For cash transfers with my inputs, it costs a bit under $10,000 to achieve benefits equivalent to a life saved.

Conclusions

  • There really are lots of guesses involved with these sorts of “estimates”. GiveWell have written in the past about not putting as much emphasis on these estimates as some of us might want; having now worked through the deworming example, I can see where GiveWell are coming from.
  • While I have plenty of disagreements with the GiveWell staff’s assumptions, and some of those disagreements I would consider more than a mere difference of opinion, altogether I don’t think our biases are systematically different. My preferred inputs to the estimates give comparable results to theirs – the disagreements going in both directions happen to roughly cancel out in this case.
  • It’s not always going to be the case that I (or someone else) will find a series of disagreements that lead to only minor change overall. It’s quite plausible that on another calculation, I would end up estimating the cost-effectiveness to be off by a factor of 2 or more.
  • I could well be persuaded in the near future that my own estimates are off by a factor of 5, in either direction.
  • Because of all the guesses, I don’t think it’s particularly useful to come up with confidence intervals on the overall cost-effectiveness estimates – it would be a false precision about the uncertainty involved. Having said that, extrapolating deworming benefits from one study is going to give particularly uncertain results, and in other cases it may be of interest to generate formal-ish confidence intervals.
  • As a donor, this exercise has made me much more favorable to donating to GiveDirectly. The benefits of cash transfers are pretty big – in fact I think GiveWell have understated these benefits because they use too high a discount rate – and comparable with many health interventions. Whereas before I was thinking of an 80/20 split between AMF and SCI, now I think I’ll go 80/10/10.

In memory of Aaron Swartz

Note: this post is in memory of Aaron Swartz. Aaron was a friend of and volunteer for GiveWell, and his family has recommended GiveWell for donations in his memory. We are deeply grateful for the help and support that Aaron provided during his lifetime, as well as for the outpouring of generosity that has come in the wake of his tragic death. I wrote this post to honor Aaron’s memory and provide context on his connection to GiveWell.

We take pride in our work, and we draw much of that pride not only from how many fans and supporters we have, but from who those fans and supporters are – their thoughtfulness, their intelligence, their values and passion to pursue those values. Aaron was one of the supporters who made us proudest.

We cold-emailed Aaron in May of 2011 because Elie and I had been greatly enjoying his blog, had noticed his interest in Peter Singer’s work, and thought he might be interested in GiveWell. We talked on the phone and had a great conversation. Within a week of our first phone call (before any of us had met him in person), Aaron had notified us of his intention to leave money to GiveWell in his will. He explained that he wanted to use his money to accomplish as much good as possible, and that as long as he was alive this meant funding projects of his own, but that if something unexpected happened he wanted the money to go to the next-best option.

The first time I actually met Aaron in person was at a talk I gave with Peter Singer at Princeton in October 2011. Aaron heard about the talk on Twitter at 3pm, and immediately got on a train from Boston, where he lived, to New Jersey. He arrived just a couple of minutes after the talk began, with no plan for getting home or where he was going to stay; we ended up talking for several hours after everyone else left.

When Aaron moved to New York in 2012, he and I became friends. Our relationship was largely intellectual. He was interested in GiveWell, in monetary policy, in self-skepticism, in psychology (one of his last emails to me was a theory of why people procrastinate and how one might systematically help overcome it), and in a vast array of other topics. I found him brilliant and fascinating; he was astoundingly well-read and knowledgeable; he challenged many of my beliefs in compelling ways; he was simultaneously passionate and open-minded about his views. Whenever we met up – which usually consisted of no agenda other than walking and talking for several hours – I found myself racking my brain on how to make the best use of our time, because I felt I had so much to learn from him on so many topics. He was one of my favorite people to talk to.

I believe the root of Aaron’s breadth of interests – and the reason the two of us connected – was what I call “rational altruism.” When I talk to people about why they give where they give, the answer is usually that they’ve been touched by a particular cause, organization, story or person. By contrast, I believe in putting all the options on the table and deliberately, strategically, analytically narrowing them down on the basis of which will accomplish the most good – then continuing to constantly step back and reevaluate the choice. This was the approach Aaron favored for charitable giving; it was also the approach he favored for deciding how to invest his time and considerable talent. His interests naturally went wherever he saw opportunities to help the oppressed and disadvantaged. And he was constantly rethinking and revisiting his views, always ready to leave behind an area he had invested in and become known for in order to go after the new best opportunity.

That’s why he was interested in, and worked on, such a broad range of topics. Over the last year, Aaron was facing a court case that is now inspiring people to rally against restrictions on information access and overzealous prosecution. But Aaron himself, at that time, was less interested in these issues – which were dominating his life – than in, for example, monetary policy or climate change, which he intellectually believed to be among the most important topics.

I think Aaron would be honored to see people channeling their fondness for him into a movement to combat excessive prison sentences or to promote freedom of information. I think he would be honored by the outpouring of donations to GiveWell in his memory, which we greatly appreciate. But these aren’t the only actions that I think honor Aaron. I think anyone who is struggling to make the world a better place – in any area, at any place and time – is in some sense honoring his memory. Especially if they are doing it deliberately, strategically and reflectively for maximal impact.

Aaron’s passion was his relentless quest to fight oppression and suffering as effectively as possible, whatever it took. I believe that the world desperately needs more people with that goal. I’m devastated that we now have one fewer.


We’d also like to recognize the work Aaron did for GiveWell and the value he added directly:

  • When Good Ventures asked us for help evaluating the Drug Policy Alliance – an organization well outside any field we’ve researched – I referred Aaron. Aaron and his friend, Matt Stoller, surveyed the drug policy landscape and reported what they were seeing back to Good Ventures. Their work was intelligent, strategic and thought-provoking, and it helped both Good Ventures and GiveWell get an initial handle on how to start thinking about policy advocacy.
  • Aaron also helped us, on a volunteer basis, with technical issues – things like helping us operationalize recurring donations to top charities, and helping us deal with our accidental publication of non-approved information earlier this year. This wasn’t the sort of thing he found intrinsically exciting, but he had offered to help us in any way he could, and whenever we needed his help he stepped up no matter what the task looked like.
  • Aaron paid a great deal of attention to our research and often shared his opinions on it, which we greatly appreciated. For example, he emailed our public email group about a concern he had based on a footnote in an update on one of our top charities, sparking a discussion. He also attended multiple GiveWell discussion events and conference calls and was an important contributor to the discussions.

Our hearts go out to Aaron’s friends and family. We’ll miss him deeply.

Cash transfers vs. microloans

We’ve written that people in the developing world can get very high returns – in excess of 20% annually (and sometimes much more) – on cash transfers. We’ve previously argued that this is both plausible and empirically supported. However, it raises the question: “If people in the developing world can get such good returns on capital, why not support microlending rather than cash transfers?”

Our answer is twofold. First, the history of, and studies of, microlending are entirely consistent with – and in my view, provide some support for – the notion that people in the developing world can earn high rates of return on capital. The second is that despite this, making loans has a couple of disadvantages that cash transfers don’t have: it may sometimes cause net harm by causing indebtedness (even as it sometimes does great good when people make returns above the interest they owe), and it requires far more overhead to run a microlending operation than to run a cash transfer operation.

Microfinance has advantages too; the conceptual points above do not, by themselves, make the case for cash transfers. However, given the landscape and evidence that we see today, we think there is a substantially stronger case for cash transfers as meeting our criteria.

What does the evidence say about microfinance?
We have de-emphasized microfinance as a charitable cause, because we don’t think that funding microfinance charities is among the most promising ways to achieve humanitarian good in the current funding landscape. But that doesn’t mean we think microfinance is a failure. In fact, we think the following two points are both fairly well-established and quite impressive when taken together:

1. Microlending institutions have repeatedly become self-sustaining and even profitable institutions. Compartamos in Mexico and SKS in India are the most vivid examples of this, as both are now public for-profit companies. Due Diligence, by David Roodman, further discusses the microfinance industry’s success in building sustainable institutions.

The basic model these institutions have followed is that of making small loans to very low-income people, and recouping the loans with substantial interest rates.

2. High-quality studies of microlending have not found systematic reductions in poverty, but neither have they found systematic increases, i.e., evidence of exploitation. Over the last few years, there have been a series of high-quality studies of microfinance, and most have found mixed results, with no robust impacts on consumption in a positive or negative direction. (We have not done an up-to-date review of all such studies, and are currently getting our rough picture of the findings from summaries by David Roodman, available here and here. Note, in addition, that it’s possible that microfinance does have some negative or positive impacts on income and consumption that are simply too small and/or long-term for these studies to find.)

These studies are disappointing for one who expects that microfinance is systematically improving lives, but should be reassuring and encouraging for one who expects that microfinance is systematically damaging them, by exploiting recipients. One possible interpretation of the findings is that microloans help some and hurt others, much like other sources of credit we’re more familiar with.

Taken together, #1 and #2 seem to me to provide suggestive (though not conclusive) evidence that very low-income people in the developing world – despite limits to their education, rationality, etc. – are regularly (though not universally) able to invest at high rates of return.This is consistent with relevant studies of cash transfers.

Microlending vs. cash transfers
As we see it, microlending and cash transfers have the following advantages relative to each other:

Advantages of microlending

  • Microlending implicitly targets people who expect to be able to pay back loans. It may therefore do a better job of selecting for recipients who can best invest capital.
  • Microlending holds out the potential of creating self-sustaining institutions that don’t need donations to survive (the goal we believe most microfinance charities are focused on) and of reaching more people with the same capital.

Advantages of cash transfers

  • Cash transfers don’t run the same risk microloans do of hurting recipients by encouraging indebtedness at high interest rates.
  • Cash transfers faciltate higher-risk investments than microloans. They thus give recipients more options and potentially open opportunities to earn greater returns, without risking having to default on loans.
  • Cash transfers don’t require nearly the same level of overhead that microloans do. A microlending operation must track and seek repayment from each recipient, something that can be costly both to the institution (hence the need for substantial interest rates despite high repayment rates) and to the borrowers (microloan recipients often have to attend frequent, time-consuming meetings).

Listing these advantages, by itself, does not answer the question of which is better to support, and indeed we think it is something of an open question.

When it comes to our recommendations, we strongly prefer cash transfers because

  • The best available evidence seems to suggest that cash transfers are beneficial – increasing consumption over the short and long run – while the available evidence on microlending does not show clear positive (or negative) impact.
  • We are wary of ventures that combine for-profit and nonprofit motives and have not developed a good process for assessing them.
  • We also put some weight on the argument of Due Diligence (a book on microfinance by David Roodman) that there is already a great deal of money – including charitable money – in microlending, and more is unlikely to have high returns in terms of enabling the building of institutions that would be difficult to build otherwise.