Partnership with The Pew Charitable Trusts

Throughout the post, “we” refers to GiveWell and Good Ventures, who work as partners on GiveWell Labs.

We have agreed to a major partnership with The Pew Charitable Trusts as part of our work on criminal justice reform. Good Ventures will provide $3 million to support and expand the work of Pew’s public safety performance project (PSPP), which aims “to advance data-driven, fiscally sound policies and practices in the criminal and juvenile justice systems that protect public safety, hold offenders accountable, and control corrections costs” through technical assistance to states, research and public education, and promotion of nontraditional alliances and collaboration around smart criminal justice policies.

We came into contact with Pew through our investigation on criminal justice reform. Our impression is that PSPP has been intensively involved in the criminal justice reform packages that have passed in over two dozen states since 2007. PSPP now seeks more funding to work in additional states, help states to cement existing reforms, explore the potential for reform at the federal level, and continue pursuing research and public education and engaging with nontraditional allies of reform.

In discussions with Pew, we have been impressed with the knowledge and thoughtfulness both of the PSPP team and of The Pew Charitable Trusts as a whole. It appears to us that Pew has worked in a substantial number of policy areas, often with concrete goals and concrete stated results over several-year time frames, and that Pew has a good deal of general capacity for assessing the opportunities in a policy space and developing a relatively systematic strategy for working within it. (This does not mean that we see eye to eye with Pew on all matters. We believe it sets policy priorities using a different value system from ours; for example, we have stronger interest in foreign aid and other issues related to developing-world poverty reduction.) More information on Pew as a whole will be forthcoming, including notes from a day-long visit in November and a potential historical case study on its work in an another area. Our current writeup includes an assessment of the track record of PSPP specifically.

We see this partnership as an important step on multiple fronts:

  • Criminal justice reform is a current focus area for us, and PSPP appears to be one of the most prominent and effective organizations working toward change on this front. Funding and following its work represents an opportunity for both impact and learning.
  • We are also interested in developing a relationship with Pew as a whole; we believe this relationship will be a valuable resource as we continue to explore policy-oriented philanthropy. Based on conversations with Pew representatives, we see supporting PSPP as one of the best ways to support Pew as a whole.
  • Finally, the process of establishing this partnership has itself been a valuable learning opportunity. With PSPP’s help, we have conducted a brief review of PSPP’s track record, which was our first attempt to assess the track record of a U.S.-policy-focused organization and taught us a fair amount about the criminal justice reform space. We have also dealt with new challenges around how to balance our goal of transparency with the goal of having maximal impact; when working on policy, there can be particular tension between these, and we have established an agreement regarding public discussion of PSPP that may serve as a guide to future grant agreements. Note that we have agreed to a review process for public updates that is likely to be time-consuming for both us and Pew, and accordingly we have agreed to limit the frequency with which we publish updates on the project.

Our full writeup has further discussion of PSPP, its track record, our cost-effectiveness estimate, and the case for (and details of) this collaboration.

Writeup on our partnership with PSPP
Note that we believe PSPP has room to productively use more than the $3 million Good Ventures will be providing. Donors interested in contributing to PSPP should contact us.

The moral value of the far future

A popular idea in the effective altruism community is the idea that most of the people we can help (with our giving, our work, etc.) are people who haven’t been born yet. By working to lower global catastrophic risks, speed economic development and technological innovation, and generally improve people’s resources, capabilities, and values, we may have an impact that (even if small today) reverberates for generations to come, helping more people in the future than we can hope to help in the present.

This belief is sometimes coupled with a belief that the most important goal of an altruist should be to reduce “existential risk”: the risk of an extreme catastrophe that causes complete human extinction (as, for example, a sufficiently bad pandemic – or extreme unexpected developments related to climate change – could theoretically do), and thus curtails large numbers of future generations.

We are often asked about our views on these topics, and this post attempts to lay them out. There is not complete internal consensus on these matters, so I speak for myself, though most staff members would accept most of what I write here. In brief:

  • I broadly accept the idea that the bulk of our impact may come from effects on future generations, and this view causes me to be more interested in scientific research funding, global catastrophic risk mitigation, and other causes outside of aid to the developing-world poor. (If not for this view, I would likely favor the latter and would likely be far more interested in animal welfare as well.) However, I place only limited weight on the specific argument given by Nick Bostrom in Astronomical Waste – that the potential future population is so massive as to clearly (in a probabilistic framework) dwarf all present-day considerations. More
  • I reject the idea that placing high value on the far future – no matter how high the value – makes it clear that one should focus on reducing the risks of catastrophes such as extreme climate change, pandemics, misuse of advanced artificial intelligence, etc. Even one who fully accepts the conclusions of “Astronomical Waste” has good reason to consider focusing on shorter-term, more tangible, higher-certainty opportunities to do good – including donating to GiveWell’s current top charities and reaping the associated flow-through effectsMore
  • I consider “global catastrophic risk reduction” to be a promising area for a philanthropist. As discussed previously, we are investigating this area actively. More

Those interested in related materials may wish to look at two transcripts of recorded conversations I had on these topics: a conversation on flow-through effects with Carl Shulman, Robert Wiblin, Paul Christiano, and Nick Beckstead and a conversation on existential risk with Eliezer Yudkowsky and Luke Muehlhauser.

The importance of the far future

As discussed previously, I believe that the general state of the world has improved dramatically over the past several hundred years. It seems reasonable to state that the people who made contributions (large or small) to this improvement have made a major difference to the lives of people living today, and that when all future generations are taken into account, their impact on generations following them could easily dwarf their impact in their own time.

I believe it is reasonable to expect this basic dynamic to continue, and I believe that there remains huge room for further improvement (possibly dwarfing the improvements we’ve seen to date). I place some probability on global upside possibilities including breakthrough technology, space colonization, and widespread improvements in interconnectedness, empathy and altruism. Even if these don’t pan out, there remains a great deal of room for further reduction in poverty and in other causes of suffering.

In Astronomical Waste, Nick Bostrom makes a more extreme and more specific claim: that the number of human lives possible under space colonization is so great that the mere possibility of a hugely populated future, when considered in an “expected value” framework, dwarfs all other moral considerations. I see no obvious analytical flaw in this claim, and give it some weight. However, because the argument relies heavily on specific predictions about a distant future, seemingly (as far as I can tell) backed by little other than speculation, I do not consider it “robust,” and so I do not consider it rational to let it play an overwhelming role in my belief system and actions. (More on my epistemology and method for handling non-robust arguments containing massive quantities here.) In addition, if I did fully accept the reasoning of “Astronomical Waste” and evaluate all actions by their far future consequences, it isn’t clear what implications this would have. As discussed below, given our uncertainty about the specifics of the far future and our reasons to believe that doing good in the present day can have substantial impacts on the future as well, it seems possible that “seeing a large amount of value in future generations” and “seeing an overwhelming amount of value in future generations” lead to similar consequences for our actions.

Catastrophic risk reduction vs. doing tangible good
Many people have cited “Astronomical Waste” to me as evidence that the greatest opportunities for doing good are in the form of reducing the risks of catastrophes such as extreme climate change, pandemics, problematic developments related to artificial intelligence, etc. Indeed, “Astronomical Waste” seems to argue something like this:

For standard utilitarians, priority number one, two, three and four should consequently be to reduce existential risk. The utilitarian imperative “Maximize expected aggregate utility!” can be simplified to the maxim “Minimize existential risk!”.

I have always found this inference flawed, and in my recent discussion with Eliezer Yudkowsky and Luke Muehlhauser, it was argued to me that the “Astronomical Waste” essay never meant to make this inference in the first place. The author’s definition of existential risk includes anything that stops humanity far short of realizing its full potential – including, presumably, stagnation in economic and technological progress leading to a long-lived but limited civilization. Under that definition, “Minimize existential risk!” would seem to potentially include any contribution to general human empowerment.

I have often been challenged to explain how one could possibly reconcile (a) caring a great deal about the far future with (b) donating to one of GiveWell’s top charities. My general response is that in the face of sufficient uncertainty about one’s options, and lack of conviction that there are good (in the sense of high expected value) opportunities to make an enormous difference, it is rational to try to make a smaller but robustly positive difference, whether or not one can trace a specific causal pathway from doing this small amount of good to making a large impact on the far future. A few brief arguments in support of this position:

  • I believe that the track record of “taking robustly strong opportunities to do ‘something good’” is far better than the track record of “taking actions whose value is contingent on high-uncertainty arguments about where the highest utility lies, and/or arguments about what is likely to happen in the far future.” This is true even when one evaluates track record only in terms of seeming impact on the far future. The developments that seem most positive in retrospect – from large ones like the development of the steam engine to small ones like the many economic contributions that facilitated strong overall growth – seem to have been driven by the former approach, and I’m not aware of many examples in which the latter approach has yielded great benefits.
  • I see some sense in which the world’s overall civilizational ecosystem seems to have done a better job optimizing for the far future than any of the world’s individual minds. It’s often the case that people acting on relatively short-term, tangible considerations (especially when they did so with creativity, integrity, transparency, consensuality, and pursuit of gain via value creation rather than value transfer) have done good in ways they themselves wouldn’t have been able to foresee. If this is correct, it seems to imply that one should be focused on “playing one’s role as well as possible” – on finding opportunities to “beat the broad market” (to do more good than people with similar goals would be able to) rather than pouring one’s resources into the areas that non-robust estimates have indicated as most important to the far future.
  • The process of trying to accomplish tangible good can lead to a great deal of learning and unexpected positive developments, more so (in my view) than the process of putting resources into a low-feedback endeavor based on one’s current best-guess theory. In my conversation with Luke and Eliezer, the two of them hypothesized that the greatest positive benefit of supporting GiveWell’s top charities may have been to raise the profile, influence, and learning abilities of GiveWell. If this were true, I don’t believe it would be an inexplicable stroke of luck for donors to top charities; rather, it would be the sort of development (facilitating feedback loops that lead to learning, organizational development, growing influence, etc.) that is often associated with “doing something well” as opposed to “doing the most worthwhile thing poorly.”
  • I see multiple reasons to believe that contributing to general human empowerment mitigates global catastrophic risks. I laid some of these out in a blog post and discussed them further in my conversation with Luke and Eliezer.

For one who accepts these considerations, it seems to me that:

  • It is not clear whether placing enormous value on the far future ought to change one’s actions from what they would be if one simply placed large value on the far future. In both cases, attempts to reduce global catastrophic risks and otherwise plan for far-off events must be weighed against attempts to do tangible good, and the question of which has more potential to shape the far future will often be a difficult one to answer.
  • If one sees few robustly good opportunities to “make a huge difference to the far future,” the best approach to making a positive far-future difference may be “make a small but robustly positive difference to the present.”
  • One ought to be interested in “unusual, outstanding opportunities to do good” even if they don’t have a clear connection to improving the far future.

With that said:

  • This line of reasoning is not the only or overwhelming consideration in our current top charity recommendations. As discussed in the previous section, we place some weight on the importance of the far future but believe it would be irrational to let our beliefs about it take on excessive weight in our decision-making. The possibility that arguments about the importance of the far future are simply mistaken, and that the best way to do good is to focus on the present, carries weight.
  • I also do not claim that the above reasoning should push all those interested in the far future into nearer-term, higher-certainty actions. People who are well-positioned to take on low-probability, high-upside projects aiming to make a huge difference – especially when their projects are robustly worthwhile and especially when their projects represent promising novel ideas – should do so. People who have formed the deep understanding necessary to evaluate such projects well should not take us to be claiming that their convictions are irrational given what they know (though we do believe some people form irrationally confident convictions based on speculative arguments). As GiveWell has matured, we’ve become (in my view) much better-positioned to take on such low-probability, high-upside projects; hence our launch of GiveWell Labs and our current investigations on global catastrophic risks. The better-informed we become, the more willing we will be to go out on a limb.

Global catastrophic risk reduction as a promising area for philanthropy
I see global catastrophic risk reduction as a promising area for philanthropy, for many of the reasons laid out in a previous post:

  • It is a good conceptual fit for philanthropy, which is seemingly better suited than other approaches to working toward diffused benefits over long time horizons.
  • Many global catastrophic risks appear to get little attention from philanthropy.
  • I place some (though not overwhelming) weight on the argument that the implications of a catastrophe for the far future could be sufficiently catastrophic and long-lasting that even a small mitigation could have huge value.

I believe that declaring global catastrophic risk reduction to be the clearly most important cause to work on, on the basis of what we know today, would not be warranted. A broad variety of other causes could be superior under reasonable assumptions. Scientific research funding may be far more important to the far future (especially if global catastrophic risks turn out to be relatively minor, or science turns out to be a key lever in mitigating them). Helping low-income people (including via our top charities) could be the better area to work in if our views regarding the far future are fundamentally flawed, or if opportunities to substantially mitigate global catastrophic risks turn out to be highly limited. Working toward better public policy could also have major implications for both the present and the future, and having knowledge of this area could be an important tool no matter what causes we end up working on. More generally, by exploring multiple promising areas, we create better opportunities for “unknown unknown” positive developments, and the discovery of outstanding giving opportunities that are difficult to imagine given our current knowledge. (We also will become more broadly informed, something we believe will be very helpful in pitching funders on the best giving opportunities we can find – whatever those turn out to be.)

Potential global catastrophic risk focus areas

Throughout the post, “we” refers to GiveWell and Good Ventures, who work as partners on GiveWell Labs. This post draws substantially on our recent updates on our investigation of policy-oriented philanthropy, including using much of the same language.

As part of our work on GiveWell Labs, we’ve been exploring the possibility of getting involved in efforts to ameliorate potential global catastrophic risks (GCRs), by which we mean risks that could be bad enough to change the very long-term trajectory of humanity in a less favorable direction (e.g. ranging from a dramatic slowdown in the improvement of global standards of living to the end of industrial civilization or human extinction). Examples of such risks could include a large asteroid striking earth, worse-than-expected consequences of climate change, or a threat from a novel technology, such as an engineered pathogen.

In our annual plan for 2014, we set a stretch goal of making substantial commitments to causes within global catastrophic risks by the end of this calendar year. We are still hoping to decide whether to make commitments in this area, and if so which causes to commit to, on that schedule. At this point, we’ve done at least some investigation of most of what we perceive as the best candidates for more philanthropic involvement in this category, and we think it is a good time to start laying out how we’re likely to choose between them (though we have a fair amount of investigative work still to do). This post lays out our current thinking on the GCRs we find most worth working on for GiveWell Labs.

Why global catastrophic risks?

We believe that there are a couple features of global catastrophic risks that make them a conceptually good fit for a global humanitarian philanthropist to focus on. These map reasonably well to two of our criteria for choosing causes, though GCRs generally seem to perform relatively poorly on the third:

  • Importance. By definition, if a global catastrophe were to occur, the impact would be devastating. However, most natural GCRs appear to be quite unlikely, making the annual expected mortality from natural GCRs low (e.g., perhaps in the hundreds or thousands; more on the distinction between natural and anthropogenic GCRs below). The potential importance of GCRs comes both from novel technological threats, which could be much more likely to cause devastating impacts, and from considering the very long-term impacts of a low-probability catastrophe: depending on the moral weight one assigns to potential future generations, the expected harm of (even very unlikely) GCRs may be quite high relative to other problems.
  • Crowdedness. Because GCRs are generally perceived to have a very low probability, many other social agents that are normally devoted to protecting against risks (e.g. insurance companies, governments in wealthy countries) appear not to pay them much attention. This should not necessarily be surprising, since much of the benefits of averting GCRs seem to accrue to future generations, which cannot hold contemporary institutions accountable, and to the extent they accrue to present generations, they are distributed very widely, with no clear concentrated constituency that has an incentive to prioritize them. The possibility that a long time horizon may be required to justify investment in averting GCRs also seems to make them a good conceptual fit for philanthropy, which, as GiveWell board member Rob Reich has argued, is unusually institutionally suited to long time horizons. This makes it all the more notable that, with the key exception of climate change, most potential global catastrophic risks seem to receive little or no philanthropic attention (though some receive very significant government support). The overall lack of social attention to GCRs is not dispositive, but it suggests that if GCRs are genuinely worthy of concern, a new philanthropist aiming to address them may encounter some low-hanging fruit.
  • Tractability. The very low frequencies of GCRs suggest that tractability is likely to be a challenge. Humanity has little experience dealing with such threats, and it may be important to get them right the first time, which seems likely to be difficult. A philanthropist would likely struggle to know whether they were making a difference in reducing risks.

Our tentative conclusion on GCRs as a whole is that the balance of strong performance on the importance and crowdedness criteria outweighs low expected tractability, but we are open to revising that view on the basis of deeper explorations of particularly promising-seeming GCRs.

What we’ve done to investigate GCRs
We have published shallow investigations on both GCRs in general and a variety of specific (potential) GCRs:

We also have an investigation forthcoming on potential risks from artificial intelligence, and we commissioned former GiveWell employee Nick Beckstead to do a shallow investigation of efforts to improve disaster shelters to increase the likelihood of recovery from a global catastrophe. We are still hoping to conduct shallow investigations of nanotechnology, synthetic biology governance (aimed more at ecological threats than biosecurity), and the field of emerging technology governance, though we may not do so before prioritizing causes within GCRs.

Beyond the shallow level, we have done a deeper investigation on geoengineering research and continued our investigation of biosecurity through a number of additional conversations.

Our investigations have been far from comprehensive; we’ve prioritized causes we’ve had some reason to think were particularly promising, often because we suspected a relative lack of interest from other philanthropists relative to the causes’ humanitarian importance or because we encountered a specific idea from someone in our network.

We have also made attempts to have conversations with people who think broadly and comparatively about global catastrophic risks. As far as we can tell, most such people tend to be connected to the effective altruist community (to which we have strong ties and which tends to take a strong interest in GCRs). Many of our conversations with such people have been informal, but public notes are available from our conversations with Carl Shulman, a research associate at the Future of Humanity Institute, and Seth Baum, executive director of the Global Catastrophic Risk Institute.

General patterns in what we find promising
The following two general observations are major inputs into our thinking:

“Natural” GCRs appear to be less harmful in expectation.

After a number of shallow investigations, we’ve tentatively concluded that “natural” (i.e. not human-caused) GCRs seem to present smaller threats than “anthropogenic” (i.e. human-caused) GCRs. The specific examples we’ve examined and a general argument point the same direction.

The general argument for being more worried about anthropogenic GCRs is as follows. The human species is fairly old (Homo sapiens sapiens is believed to have evolved several hundred thousand years ago), giving us a priori reason to believe that we do not face high background extinction risk: if we had a random 10% chance of going extinct every 10,000 years, we would have been unlikely to have survived this long (0.9^30 = ~4%). Note that anthropic bias can make this kind of reasoning suspect, but this reasoning also seems to map well to available data about different potential GCRs, as discussed below (i.e., we do not observe natural risks that appear likely to cause human extinction). By contrast with “natural” risks, anthropogenic risks present us with potentially unprecedented situations, for which history cannot serve as much of a guide. Atomic weapons and biotechnology are only decades old, and some of the most dangerous technologies may be those that don’t yet exist. With that said, some “natural” risks could present us with somewhat unprecedented situations, due to the modern world’s historically high level of interconnectedness and reliance on particular infrastructure.

On the specifics of various “natural” GCRs:

  • Near earth asteroids. A 2010 U.S. National Research Council report estimates that the background annual probabilities of an impact as large as the one that is believed to have caused the extinction of the dinosaurs and a “possible global catastrophe” are 1/100 million and 1/700,000 respectively (PDF, page 19). NASA reports that it has tracked 93% of the near earth asteroids large enough to cause a “possible global catastrophe” and all of the ones as large as the one believed to have caused the extinction of the dinosaurs (and none of them are on track to hit Earth in the next few centuries), suggesting a residual possibility of a “possible global catastrophe” of ~1/100,000 during the next century (and likely lower). There may be a comparable remaining risk from comets—Vaclav Smil claims that “probabilities of the Earth’s catastrophic encounter with a comet are likely less than 0.001% during the next 50 years,” which would be about the same as the remaining asteroid risk—but our understanding is that comets are much harder to detect. As a result of the attention from NASA and the B612 Foundation, this cause also appears more “crowded” than others, though seemingly more tractable as well.
  • Large volcanic eruptions. Estimates of the frequency of volcanic eruptions large enough to count as global catastrophic risks differ by several orders of magnitude, but our current understanding is that volcanic eruptions large enough to cause major crop failures are likely to occur no more frequently than 1/10,000 years, and perhaps significantly less frequently (suggesting a <1% chance of such an eruption in the next century). Large volcanic eruptions may be much more of a cause for concern than asteroid strikes, but this cause performs relatively poorly on tractability, since our ability to predict eruptions is limited, and we are not currently capable of preventing an eruption.
  • Antibiotic resistance. Microbes are currently evolving to be resistant to antibiotics faster than new antibiotics are being developed, posing a growing public health threat. However, antibiotic resistance is unlikely to represent a threat to civilization, since humanity survived without antibiotics until ~1940, including during the period when most gains against infectious diseases were made. We also expect other actors to work to address antibiotic resistance as it continues to become a more pressing public health issue. (More at our writeup.)
  • Geomagnetic storms. The major threat from geomagnetic storms is to potentially imperil some large-scale power infrastructure, but the risks are not well-understood. A consultant who has contributed to many of the published reports on the topic contends that a worst-case, 1/200 year storm could result in a “years-long global blackout,” but other sources show less concern (e.g. modeling the impact of a ~200 year storm as a risk of a blackout for ~10% of the U.S. population for somewhere between 2 weeks and 2 years).

The only GCRs that receive large amounts of philanthropic attention are nuclear security and climate change.

We do not have precise figures aggregated across causes, but our impression is that climate change is an area in which hundreds of millions of dollars a year are spent by U.S. philanthropic funders, while philanthropic funding addressing nuclear security appears to be in the tens of millions.

We don’t know of philanthropic funding for any of the other GCRs exceeding the single digit millions of dollars per year.

Leading focus area contenders

The leading contenders described below are among the most apparently dangerous and potentially unprecedented GCRs (seemingly – to us – more worrisome than the “natural” GCRs listed above, though such a comparison is necessarily a judgment call). At the same time, all appear to have limited “crowdedness,” at least in terms of philanthropic attention, unlike nuclear security (and unlike most of the climate change space, though one of the contenders described below relates to climate change). They are discussed in the order I would pick between them if I had to pick today, though we have not decided how many we expect to commit to by the end of the year, and other GiveWell staff may disagree. Though these are the GCRs I would choose to work on if I were picking today, we don’t have high confidence that they represent the correct set. There are a number of questions (discussed below) that we hope to address before reaching a conclusion at the end of the year.

Biosecurity

By biosecurity, we mean the constellation of issues around pandemics, bioterrorism, biological weapons, and biotechnology research that could be used to inflict great harm (“dual use research”). Our understanding is that natural pandemics (especially flu pandemics) likely present the greatest current threat, but that the development of novel biotechnology could lead to greater risks over the medium or long term. We see this GCR as having a strong case for “importance” because it seems to combine relatively credible, likely, current threats with more speculative potential longer-term threats in a fairly coherent program area. The space receives significant attention from the U.S. government (with ~$5 billion in funding in 2012) but little from foundations: the Skoll Global Threats Fund is the only U.S. foundation we know to be engaging in this area currently, at a relatively low level, though the Sloan Foundation also used to have a program in this area. (We believe the distinction between government and philanthropic funding is at least potentially meaningful, as the two types of actors have different incentives and constraints; in particular, philanthropic funding could potentially influence a much larger amount of government funding.) Although we are not sure of the activities that would be best for a philanthropist to support, many people we spoke with argued that current preparedness is subpar and that there is significant room for a new philanthropic funder.

Although we have had a number of additional conversations since the completion of our shallow investigation, we continue to regard the question of what a philanthropist should fund within this broad issue as an open one. We expect to address it with a deeper investigation and a declared interest in funding.

Geoengineering research and governance

We see a twofold case for the importance of work on geoengineering research and governance:

Although solar geoengineering is in the news periodically, research on the science or governance appears to receive relatively little dedicated funding: our rough survey found about $10 million/year in identifiable support from around the world (mostly from government sources), and we are not aware of any institutional philanthropic commitment in the area (though Bill Gates personally supports some research in the area).

Our conversations have led us to believe that there is significant scientific interest in conducting geoengineering research and that funding is an obstacle, but, as with biosecurity, we do not have a very detailed sense of what we might fund. We’re wary of the concern that further geoengineering research could conceptually undermine support for emissions reductions, but we regard it as relatively unlikely, and also find it plausible that further research could contribute significantly to governance efforts.

We expect to address the question of what a philanthropist could support in this area with a deeper investigation and a declared interest in funding. Note that we don’t envision ourselves as trying to encourage geoengineering, but rather as trying to gain better information and governance structures for it, which could make the actual use more or less likely (and given the high potential risks of both climate change and geoengineering, we could imagine that shifting the probabilities in either direction – depending on what comes of more exploratory work – could do great good).

Potential risks from artificial intelligence

We are earlier in this investigation than in investigations of the above two causes, and have not yet produced a writeup. There is internal disagreement about how likely this cause is to end up as a priority; I don’t feel highly confident that it should be above some of the other contenders not discussed in depth here.

In brief, it appears possible that the coming decades will see substantial progress in artificial intelligence, potentially even to the point where machines come to outperform humans in many or nearly all intellectual domains, though it is difficult or impossible to make confident forecasts in this area. Such a scenario could carry great potential benefits, but could carry significant dangers (e.g. technological disemployment, accidents, crime, extremely powerful autonomous agents) as well. The majority of academic artificial intelligence researchers seem not to see the rapid development of powerful autonomous agents as a substantial risk, but to believe that there are some potential risks worth preparing for now (such as accidents in crucial systems or AI-enabled crime; see slides 20-22). However, some people, including the Machine Intelligence Research Institute and computer scientist Stuart Russell, feel that there are important things that should be done today to substantially improve the social outcomes associated with the rapid development of powerful artificial intelligence.

In general, my inclination would be to defer to the preponderance of expert opinion, but I think this area could potentially be promising for philanthropy partly because I have not seen a rigorous public assessment by credible AI researchers to support the (seemingly predominant) lack of concern over risks from the rapid development of powerful autonomous agents. Since this topic seems to be drawing increasing attention from some highly credentialed people, supporting such a public assessment seems like it could be valuable, even if the conclusion is that most researchers are right to not be concerned. The fact that a substantial portion of mainstream AI researchers also seem to think that more traditional risks from AI progress (e.g. accidents, crime) are worth addressing in the near term does increase my interest in the area, though not by much, since I don’t see those issues as GCRs, whereas the rapid development of powerful autonomous agents could conceivably be one. Should we decide to pursue this area further, I would guess that it would be at a lower level of funding than the other potential priority areas described above.

Note from Holden: I currently see this cause as more promising than Alexander does, to a fairly substantial degree. I agree that there are reasons, including the preponderance of expert opinion, to think that there is little preparatory work worth doing today; however, I see the stakes as large enough to justify work in this area even at a relatively low probability of having impact. I would like to see reasonably well-resourced, full-time efforts – with substantial input from mainstream computer scientists – to think about what preparations could be done for major developments in artificial intelligence, and my perception is that efforts fitting this description do not exist currently. We are currently working on trying to understand whether the seeming lack of activity comes from a place of “justified confidence that action is not needed now” or of “lack of action despite a reasonable possibility that action would be helpful now.” My current guess is that the latter is the case, and if so I hope to make this cause a priority.

We will be writing more on this topic in the future.

Why these three risks stand out

Generally speaking, the causes highlighted above (geoengineering, biosecurity and potentially (pending more investigation) artificial intelligence) seem to us to have:

  • Greater potential for the most extreme direct harms (extreme enough to make a substantial change to the long-term trajectory of civilization likely) relative to other risks we’ve looked at, with the exception of nuclear weapons (an area that we perceive as more “crowded” than these three).
  • Very difficult to quantify, but potentially reasonably high (1%+), risk of such extreme harm in the next 50-100 years.
  • Very little philanthropic attention.

Our guess is that most other candidate risks would, upon sufficient investigation, appear less worth working on than at least one of our top candidates – due to presenting less potential for harm, less tractability, or more crowdedness, while being roughly comparable on other dimensions. That said, (a) the specific assessment of artificial intelligence is still in progress and we don’t have internal agreement on it, as discussed above; (b) we have low confidence in our working assessment, and plan both to do more investigation and to seek out more critical viewpoints on our current priorities.

Topics for further investigation

While I currently see the three potential GCRs discussed above as the leading contenders for GCR focus areas, there are a number of questions we would like to answer before committing.

Our shallow investigations have generated a number of follow-up questions that we would like to resolve before committing to causes:

  • Our current understanding is that major volcanic eruptions are currently neither predictable nor preventable, making this cause apparently rather intractable. To what extent could further research help remedy these shortcomings, and are there other ways a philanthropist could help address the risk from a large volcanic eruption?
  • How do risks from comets compare to the remaining risks from untracked near earth asteroids? Our understanding is that these risks are likely to be an order of magnitude or two lower than volcanic eruption risks that would cause similar harm, but we aren’t sure how they compare in tractability. What could be done about potential risks from comets?
  • How credible are existing estimates of the potential harm of geomagnetic storms? In particular, how do experts assess the risks to the power grid from a rare geomagnetic event? How prepared are power companies for geomagnetic storms?
  • Are there any important gaps in current funding for efforts to improve nuclear security?

In addition, we are still hoping to conduct shallow investigations of nanotechnology, synthetic biology governance (aimed more at ecological threats than biosecurity), and the field of emerging technology governance as a whole, which we think could potentially be competitive with some of the risks described as potential focus areas.

Update on GiveDirectly

Three members of GiveDirectly‘s board of directors (Paul Niehaus, Michael Faye, and Chris Hughes) are planning to start a for-profit technology company, Segovia, aimed at improving the efficiency of cash transfer distributions in the developing world. Segovia plans to sell software to developing-country governments for use in implementing their cash transfer programs.

This development was announced today (though we have been aware of and discussing it with GiveDirectly for some time). Some discussion is available at today’s post on the Development Channel blog.

GiveDirectly and Segovia will work out of the same office space in New York City.

Dr. Niehaus, who has been our primary contact at GiveDirectly and has unofficially played the role of GiveDirectly’s full-time Executive Director, will continue to devote significant time to GiveDirectly and serve as its President with primary responsibility for GiveDirectly. He will be co-employed by Segovia and has told us that he may spend up to 20% of his time on Segovia. Dr. Faye will become Segovia’s president. (Previously, both Dr. Niehaus and Dr. Faye have had full-time jobs outside of GiveDirectly, though they have had substantial responsibilities at GiveDirectly.)

We think this development is simultaneously a potentially very positive one broadly – bringing the possibility of greatly leveraged positive impact on the world – and one that raises new issues and risks for GiveDirectly and its donors.

We think these issues and risks (discussed further below) are noteworthy but ultimately similar in magnitude to, or smaller than, similar risks that exist for our other present and past recommended charities. We plan to continue recommending GiveDirectly as a top charity and continue to see it as an outstanding giving opportunity.

Note that we have discussed all of these issues with Dr. Niehaus and Dr. Faye – they have reviewed a draft of this post – and we believe they are aware of all of the issues we discuss below.

This post focuses on the following:

  • What costs and benefits does this decision pose for GiveDirectly right now?
  • What additional issues could arise in the future, particularly potential conflicts of interest between Segovia and GiveDirectly?
  • Why have Dr. Niehaus, Dr. Faye, and Mr. Hughes decided to serve developing country governments and why are they using a for-profit-company structure?
  • What effect will this have on our recommendation of GiveDirectly?

We have not tried to formulate a view on Segovia’s possible impact because this does not seem directly relevant to GiveWell or our donors. Based on what Dr. Niehaus and Dr. Faye have told us, we believe it’s plausible that given (a) the amount of money governments transfer to recipients and (b) the amount of money that may be lost by those programs due to negligence and/or corruption, Segovia could be very impactful and may represent some of the “upside” we hoped to see from GiveDirectly.

What costs and benefits does this decision pose for GiveDirectly right now?

We discuss several potential negative impacts Segovia could have on GiveDirectly; we also discuss potential positive impacts.

  1. What impact will Segovia have on key staff’s time allocation to GiveDirectly?
  2. Will Segovia’s existence affect the intensity with which GiveDirectly leadership work to maximize GiveDirectly’s impact?
  3. Will Segovia directly affect GiveDirectly’s ability to absorb and distribute funds to recipients?
  4. Will the general public react negatively to this announcement in a way that affects GiveDirectly’s ability to raise funds or otherwise distracts it from its core work?
  5. What benefits might Segovia have for GiveDirectly?

What impact will Segovia have on key staff’s time allocation to GiveDirectly?

Dr. Niehaus and Dr. Faye told us that they expect the following changes to staff time allocations due to Segovia:

  • Paul Niehaus, GiveDirectly’s President, had previously been splitting his time between GiveDirectly and his academic position at University of California at San Diego. Pending the university’s approval, he hopes to take a one-year leave of absence from his academic position to enable co-employment at GiveDirectly and Segovia. During this one-year leave of absence, he expects that the total amount of time he devotes to GiveDirectly will increase slightly and that he will spend a maximum of 20% of his time on Segovia.
  • Michael Faye, Segovia’s president and a member of GiveDirectly’s Board of Directors, had previously worked at a management consulting firm but spent significant time on GiveDirectly. He has now taken a leave of absence from his job and intends to spend the vast majority of his time on Segovia while still offering time to GiveDirectly. He expects the time he spends on GiveDirectly to increase. More on this below.
  • Melissa Harpool, Outreach Coordinator, will split her time between GiveDirectly and Segovia. Her current primary role is managing schedules, and the people whose schedules she manages will now be splitting time between Segovia and GiveDirectly. She had previously been full-time at GiveDirectly.

Dr. Niehaus and Dr. Faye told us that relevant staff track their time allocation to projects and will be able to share whether or not they have hit the targets described above.

Will Segovia affect the intensity with which GiveDirectly leadership work to maximize GiveDirectly’s impact?

Dr. Niehaus told us that he retains his ambitions for and commitment to GiveDirectly’s long term impact, but splitting attention between two organizations is difficult, especially when both are growing rapidly and likely to face significant obstacles.

It is plausible that given GiveDirectly’s and Segovia’s overlapping leadership, staff and office space, those involved with both might see Segovia as the more exciting opportunity. We believe that this could lead to reduced ambition or it could reduce the quality of the mental effort GiveDirectly’s leadership dedicates to maximizing GiveDirectly’s impact.

Will Segovia directly affect GiveDirectly’s ability to absorb and distribute funds to recipients?

Assuming that GiveDirectly staff meets the time targets described above, we don’t think Segovia will have a direct impact on GiveDirectly’s ability to absorb and distribute funds to recipients.

Will GiveDirectly receive a negative response from the general public that affects its ability to raise funds or otherwise distracts it from its core work?

We continue to see GiveDirectly as an outstanding giving opportunity and plan to continue recommending it to donors. That said, we are not confident about how others will react and remain concerned about the impact that the general public’s reaction might have on GiveDirectly’s future fundraising prospects.

Dr. Niehaus and Dr. Faye told us that they have attempted to reduce the likelihood that the response is negative by speaking at length with media in advance of the announcement so that stories written about their decision present a reasonable perspective on this new development. They have also communicated with their major donors and report that they have not encountered negative reactions.

What benefits will Segovia provide for GiveDirectly?

Potential benefits include:

  • GiveDirectly will receive an equity stake in Segovia, which could result in GiveDirectly’s receiving additional funding in the future. The size of the stake is not yet determined. Dr. Niehaus, Dr. Faye, and Mr. Hughes are currently discussing the size of this stake with potential investors.
  • The technology Segovia is planning to develop would likely be helpful to GiveDirectly. Segovia would give this technology to GiveDirectly without charge.
  • As discussed above, Paul Niehaus has been based in San Diego and the rest of GiveDirectly staff is in New York. Dr. Faye has been employed full-time at a management consulting firm. Dr. Niehaus will be spending half his time in New York and hopes to take leave from his academic position, and Dr. Faye will now be working full-time out of the same office. Dr. Niehaus’s co-location with the rest of GiveDirectly staff will likely improve his ability to manage other staff. Dr. Faye’s co-location with Dr. Niehaus and other GiveDirectly staff may also increase his contribution to GiveDirectly. (Dr. Faye has told us that the time he has spent on GiveDirectly has increased since he took leave of absence from his job.)
  • Mr. Hughes intends to significantly increase his work on advocating for cash transfers. This should benefit both Segovia and GiveDirectly.

What additional issues could arise in the future, particularly potential conflicts of interests between Segovia and GiveDirectly?

There may be cases where GiveDirectly has to consider actions that would maximize its impact but might harm Segovia’s interests. GiveDirectly board members (Paul Niehaus, Michael Faye, and Chris Hughes) will hold equity stakes in Segovia, so their financial interests could come into conflict with their roles as Directors of GiveDirectly. We see the following possible conflicts of interest:

  • GiveDirectly’s board members’ financial interest in Segovia could lead them to use GiveDirectly as a means to promote Segovia. This could be via using Segovia’s software even if it’s not well suited to GiveDirecty’s needs, or otherwise using contacts/meetings that might take place due to GiveDirectly (e.g., government, academic or media contacts) to promote Segovia’s offering.
  • Segovia will also have (a) investors and (b) staff who hold significant financial stakes in Segovia, which could lead to conflicts between maximizing profit and maximizing impact.
  • If Segovia were bidding on a contract with a particular government, would GiveDirectly avoid offering its service in the same area/to the same government so that Segovia would have an easier path to a sale?

We have spent significant time with Paul Niehaus and some time with Michael Faye and Chris Hughes over the past few years, and we believe they have good intentions.

In addition, Dr. Niehaus, Dr. Faye, and Mr. Hughes hope to identify investors whose primary motivation is social impact, and believe that choosing investors wisely is a priority. They have also told us that they plan to expand GiveDirectly’s board to 6-7 directors, 3-4 of whom have no overlap with Segovia. Dr. Niehaus told us that overlapping directors would recuse themselves from votes that involve conflicts.

Why have Dr. Niehaus, Dr. Faye, and Mr. Hughes decided to serve developing country governments and why are they using a for-profit-company structure?

Dr. Niehaus and Dr. Faye believe that Segovia’s product is one that governments will want to purchase, and the product will have significant social impact. They have had a long-standing interest in working directly with governments.

Dr. Niehaus and Dr. Faye told us of their hope that GiveDirectly would work with government-run cash transfer programs in November 2013. We discuss this possibility in our review of GiveDirectly, relying on a summary of a conversation we had with them at the time.

Dr. Niehaus and Dr. Faye told us recently that they had initially hoped governments would transfer funds directly to/through GiveDirectly. The developing-country governments that GiveDirectly spoke with preferred technology to fully outsourcing implementation, saying that they already had a significant number of individuals employed to implement their cash transfer programs. Instead, governments asked for software that could improve their operations, which Segovia now aims to provide.

GiveDirectly still believes it will have opportunities to implement government programs, but Dr. Niehaus and Dr. Faye have come to the conclusion that there will be many more cases where governments want technology alone.

Dr. Niehaus and Dr. Faye pointed us to a World Economic Forum report estimating that developing-country governments distribute $400 billion in transfers each year. Dr. Niehaus and Dr. Faye have also told us that data showing rates of leakage of 50% or more are not uncommon in large public-sector transfer programs (i.e., the amount that never reaches the intended recipients). (More information about these sources in this footnote.) They believe that governments will see that purchasing Segovia’s product will save them money by allowing them to transfer more money to recipients at lower overall cost.

We find the above explanation of Segovia’s potential impact plausible but have not tried to vet it as we don’t think our take on it has direct relevance to GiveDirectly or the donors who use our research.

We have the impression that the belief that Segovia could have great social impact is the primary driver of Dr. Niehaus’s, Dr. Faye’s, and Mr. Hughes’ desire to start Segovia.

Why has GiveDirectly settled on this corporate structure as opposed to another structure?

Dr. Niehaus, Dr. Faye, and Mr. Hughes had initially expected to undertake this project as part of GiveDirectly’s existing non-profit structure but told us that they decided on the structure of a for-profit, independent company for three reasons:

  1. Recruiting. We spoke with the recruiting firm that GiveDirectly retained for this search, and the person who led the search told us that recruiting top technology talent was slow. In some cases, the engineers GiveDirectly contacted were not interested in working for a non-profit. Even when GiveDirectly offered compensation packages competitive with for-profit companies, some engineers balked when they saw the negative attention that the media and donors give to high salaries in the non-profit sector. Dr. Niehaus, Dr. Faye, and Mr. Hughes place high priority on recruiting the very best possible talent, so while they feel they could have reasonable success recruiting as a non-profit, they see the improved recruiting prospects associated with a for-profit to be a major consideration.
  2. Investment. GiveDirectly told us that there are investors who would support Segovia as a for-profit entity but would not be interested in supporting GiveDirectly, the non-profit.
  3. Legal advice. GiveDirectly received legal advice that an independent for-profit company is the most straightforward way to avoid jeopardizing GiveDirectly’s tax exempt status.

What effect will this have on our recommendation of GiveDirectly?

We do not expect the existence of Segovia to change our recommendation of GiveDirectly. We expect GiveDirectly to continue to successfully distribute cash to very poor individuals in the developing world, and believe that the issues and risks described above are smaller than, or at worst similar in importance to, those that exist with all of our other recommended charities.

We will continue to follow GiveDirectly closely and report on its progress.

We have written previously about the “upside” we saw in GiveDirectly. We think that Segovia may be one example of that “upside” — Dr. Niehaus and Dr. Faye, partly through their work on GiveDirectly, saw an opportunity for significant social impact and are now pursuing it. However, we think the attention they will now pay to Segovia likely diminishes the upside of future donations to GiveDirectly.


Footnote: On the World Economic Forum report described above, Dr. Niehaus wrote, “I have some questions about the methodology but believe the basic message that it is big and has problems.” On the leakage rates, he wrote, “India’s two largest social programs are the employment scheme (NREGS) and ration scheme (TPDS). For NREGS, the best nationally representative leakage estimate is by Imbert and Papp (published in R. Khera, editor, The battle for employment guarantee. Oxford University Press, 2011) who estimate that between 44% and 58% of participation reported in official figures is fictitious. This likely understates leakage in dollar figures since people who do work are often underpaid, but nationally representative data on earnings are not to the best of my knowledge available. For TPDS, the most recent nationally representative figures I know of are from the 2004-2005 NSS and are discussed in work by Svedberg in EPW who reports a national average estimate of 54% leakage of grains intended for the poor.”

Sequence Thinking vs. Cluster Thinking

Note: this is an unusually long and abstract post whose primary purpose is to help a particular subset of our audience understand our style of reasoning. It does not contain substantive updates on our research and recommendations.

GiveWell – both our traditional work and GiveWell Labs – is fundamentally about maximization: doing as much good as possible with each dollar you donate. This introduces some major conceptual challenges when making certain kinds of comparisons – for example, how does one compare the impact of distributing bednets in sub-Saharan Africa with the impact of funding research on potential high-risk responses to climate change, attempts to promote better collaboration in the scientific community or working against abuse of animals on factory farms?

Our approach to making such comparisons strikes some as highly counterintuitive, and noticeably different from that of other “prioritization” projects such as Copenhagen Consensus. Rather than focusing on a single metric that all “good accomplished” can be converted into (an approach that has obvious advantages when one’s goal is to maximize), we tend to rate options based on a variety of criteria using something somewhat closer to (while distinct from) a “1=poor, 5=excellent” scale, and prioritize options that score well on multiple criteria. (For example, see our most recent top charities comparison.)

We often take approaches that effectively limit the weight carried by any one criterion, even though, in theory, strong enough performance on an important enough dimension ought to be able to offset any amount of weakness on other dimensions. Relatedly, we look into a broad variety of causes, broader than can seemingly be justified by a consistent and stable set of values. Many others in the effective altruist community seem to have a strong and definite opinion on questions such as “how much animals suffer compared to humans,” such that they either prioritize animal welfare above all else or dismiss it entirely. (Similar patterns apply to views on the moral significance of the far future.) By contrast, we give simultaneous serious consideration to reducing animal suffering, reducing risks of global catastrophic events, reforming U.S. intellectual property regulation, global health and nutrition and more, and think it’s quite likely that we’ll recommend giving opportunities in several of these areas, while never resolving the fundamental questions that could (theoretically) establish one such cause as clearly superior to the others.

I believe our approach is justified, and in order to explain why – consistent with the project of laying out the basic worldview and epistemology behind our research – I find myself continually returning to the distinction between what I call “sequence thinking” and “cluster thinking.” Very briefly (more elaboration below),

  • Sequence thinking involves making a decision based on a single model of the world: breaking down the decision into a set of key questions, taking one’s best guess on each question, and accepting the conclusion that is implied by the set of best guesses (an excellent example of this sort of thinking is Robin Hanson’s discussion of cryonics). It has the form: “A, and B, and C … and N; therefore X.” Sequence thinking has the advantage of making one’s assumptions and beliefs highly transparent, and as such it is often associated with finding ways to make counterintuitive comparisons.
  • Cluster thinking – generally the more common kind of thinking – involves approaching a decision from multiple perspectives (which might also be called “mental models”), observing which decision would be implied by each perspective, and weighing the perspectives in order to arrive at a final decision. Cluster thinking has the form: “Perspective 1 implies X; perspective 2 implies not-X; perspective 3 implies X; … therefore, weighing these different perspectives and taking into account how much uncertainty I have about each, X.” Each perspective might represent a relatively crude or limited pattern-match (e.g., “This plan seems similar to other plans that have had bad results”), or a highly complex model; the different perspectives are combined by weighing their conclusions against each other, rather than by constructing a single unified model that tries to account for all available information.

A key difference with “sequence thinking” is the handling of certainty/robustness (by which I mean the opposite of Knightian uncertainty) associated with each perspective. Perspectives associated with high uncertainty are in some sense “sandboxed” in cluster thinking: they are stopped from carrying strong weight in the final decision, even when such perspectives involve extreme claims (e.g., a low-certainty argument that “animal welfare is 100,000x as promising a cause as global poverty” receives no more weight than if it were an argument that “animal welfare is 10x as promising a cause as global poverty”).

Finally, cluster thinking is often (though not necessarily) associated with what I call “regression to normality”: the stranger and more unusual the action-relevant implications of a perspective, the higher the bar for taking it seriously (“extraordinary claims require extraordinary evidence”).

I’ve tried to summarize the difference with the following diagram. Variation in shape size represents variation in the “certainty/robustness” associated with different perspectives, which matters a great deal when weighing different perspectives against each other for cluster thinking, but isn’t an inherent part of sequence thinking (it needs to be explicitly modeled by inserting beliefs such as “The expected value of this action needs to be discounted by 90%”).

seq-cluster

I don’t believe that either style of thinking fully matches my best model of the “theoretically ideal” way to combine beliefs (more below); each can be seen as a more intellectually tractable approximation to this ideal.

I believe that each style of thinking has advantages relative to the other. I see sequence thinking as being highly useful for idea generation, brainstorming, reflection, and discussion, due to the way in which it makes assumptions explicit, allows extreme factors to carry extreme weight and generate surprising conclusions, and resists “regression to normality.” However, I see cluster thinking as superior in its tendency to reach good conclusions about which action (from a given set of options) should be taken. I have argued the latter point before, using a semi-formal framework that some have found convincing, some believe has flaws, and many have simply not engaged due to its high level of abstraction. In this post, I attempt a less formalized, more multidimensional, and hopefully more convincing (more “cluster-style”) defense. Following that, I lay out why I think sequence thinking is important and is probably more undersupplied on a global scale than cluster thinking, and discuss how I try to combine the two in my own decision-making. Separately from this post, I have also published a further attempt to formalize the underlying picture of an idealized reasoning process. 

By its nature, cluster thinking is hard to describe and model explicitly. With this post, I hope to reduce that problem by a small amount – to help people understand what is happening when I say things like “I see no problem with your reasoning, but I’m not placing much weight on it anyway” or “I think that factor could be a million times as important as the others, but I don’t want to give it 100x as much attention,” and what they can do to change my mind in such circumstances. (The general answer is to reduce the uncertainty associated with an argument, rather than simply demonstrating that no explicit flaws with the argument are apparent.)

In the remainder of this post, I:

  • Elaborate on my definitions of sequence and cluster thinking. More
  • Give a variety of arguments for why one should expect cluster thinking to result in superior decisions. More
  • Briefly note and link to a new page (published alongside this post) that attempts to formalize, to some degree, the “idealized thought process” I’m envisioning and how it reproduces key properties of cluster thinking. More
  • Lay out some reasons that I find sequence thinking valuable, even if one accepts that cluster thinking results in superior decisions, and defend the idea of switching between “sequence” and “cluster” styles for different purposes. I believe sequence thinking is superior not only for purposes of discussion and reflection (due to its transparency), but also for reaching the sort of deep understanding necessary for intellectual progress, and for generating novel insights that can become overwhelmingly important. More
  • Briefly discuss why cluster thinking can be confusing and challenging to deal with in a discussion, and outline how one can model and respond to cluster-thinking-based arguments that are often perceived as “conversation stoppers.”More
  • Close with a brief discussion of how I try to combine the two in my own thinking and actions. More

Before I continue, I wish to note that I make no claim to originality in the ideas advanced here. There is substantial overlap with the concepts of foxes and hedgehogs (discussed by Philip Tetlock); with the model and combination and adjustment idea described by Luke Muehlhauser; with former GiveWell employee Jonah Sinick’s concept of many weak arguments vs. one relatively strong argument (and his post on Knightian uncertainty from a Bayesian perspective); with former GiveWell employee Nick Beckstead’s concept of common sense as a prior; with Brian Tomasik’s thoughts on cost-effectiveness in an uncertain world; with Paul Christiano’s Beware Brittle Arguments post; and probably much more.

Defining Sequence Thinking and Cluster Thinking
Say that we are choosing between two charities: Charity A vaccinates children against rotavirus, and Charity B does basic research aiming to improve the odds of eventual space colonization. Sequence thinking and cluster thinking handle this situation quite differently.

Sequence thinking might look something like:

Charity A spends $A per child vaccinated. Each vaccination reduces the odds of death by B%. (Both A and B can be grounded somewhat in further analysis.) That leaves an estimate of (B/A) lives saved per dollar. I will adjust this estimate down 50% to account for the fact that costs may be understated and evidence may be overstated. I will adjust it down another 50% to account for uncertainties about organizational competence.

Charity B spends $C per year. My best guess is that it improves the odds that space colonization eventually occurs by D%. I value this outcome as the equivalent of E lives saved, based on my views about when space colonization is likely to occur, how many human lives would be possible in these case, and how I value these lives. (C, D, and E can be grounded somewhat in further analysis.) That leaves an estimate of (D*E)/C) lives saved per dollar. I will adjust this estimate down 95% to account for my high uncertainty in these speculative calculations. I will adjust it down another 75% to account for uncertainties about organizational competence, which I think are greater for Charity B than Charity A; down another 80% to account for the fact that expert opinion seems to look more favorably on Charity A; and down another 95% to account for the fact that charities such as Charity A generally have a better track record as a class.

After all of these adjustments, Charity B comes out better, so I select that one.

Cluster thinking might look something like:

Explicit expected-value calculations [such as the above] imply quite a strikingly good cost-per-life-saved for Charity A, and I think the estimate isn’t terribly likely to be terribly mistaken. That’s a major point in favor of Charity A. Similar calculations imply good cost-per-life-saved for Charity B, but this is a much more uncertain estimate and I don’t put much weight on it. The fact that Charity B comes out ahead even after trying to adjust for other factors is a point in favor of Charity B. In addition, Charity A seems like a better organization than Charity B, and expert opinion seems to favor Charity A, and organizations such as Charity A generally have a better track record as a class, and all of these are signals I have a fair amount of confidence in. Therefore, Charity A has more certainty-weighted factors in its favor than Charity B.

Note that this distinction is not the same as the distinction between explicit expected value and holistic-intuition-based decision-making. Both of the thought processes above involve expected-value calculations; the two thought processes consider all the same factors; but they take different approaches to weighing them against each other. Specifically:

  • Sequence thinking considers each parameter independently and doesn’t do any form of “sandboxing.” So it is much easier for one very large number to dominate the entire calculation even after one makes adjustments for e.g. expert opinion and other “outside views” (such as the track record of the general class of organization). More generally, it seems easier to reach a conclusion that contradicts expert opinion and other outside views using this style. This style also seems more prone to zeroing in on a particular category of charity as most promising: for example, often one’s estimate of the value of space colonization will either be high enough to dominate other considerations or low enough to make all space-colonization-related considerations minor, even after many other adjustments are made.
  • The two have very different approaches to what some call Knightian uncertainty (also sometimes called “model uncertainty” or “unknown unknowns”): the possibility that one’s model of the world is making fundamental mistakes and missing key parameters entirely. Cluster thinking uses several models of the world in parallel (e.g., “Expert opinion is correct”, “The track record of the general class of an organization predicts its success”, etc.) and limits the weight each can carry based on robustness (by which I mean the opposite of Knightian uncertainty: the feeling that a model is robust and unlikely to be missing key parameters); any chain of reasoning involving high uncertainty is essentially disallowed from making too much difference to the final decision, regardless of the magnitude of effect it points to. Sequence thinking involves the use of a single unified framework for decision analysis and by default it treats “50% probability that a coin comes up heads” and “50% probability that Charity B will fail for a reason I’m not anticipating” in fundamentally the same way. When it does account for uncertainty, it’s generally by adjusting particular parameters (for example, increasing “0.00001% chance of a problematic error” to “1% chance of a problematic error” based on the chance that one’s calculations are wrong); after such an adjustment, it uses the “highly uncertain probabilities adjusted for uncertainty” just as it would use “well-defined probabilities,” and does not disallow the final calculation from carrying a lot of weight.

Robustness and uncertainty

For the remainder of this piece, I will use the term robustness to refer to the “confidence/robustness” concept discussed immediately above (and “uncertainty” to refer to its opposite). I’m aware that I haven’t defined the term with much precision, and I think there is substantial room for sharpening its definition. One clarification I would like to make is that robustness is not the same as precision/quantifiability; instead, it is intended to capture something like “odds that my view would remain stable on this point if I were to gain more information, more perspectives, more intelligence, etc.” or “odds that the conclusion of this particular mental model would remain qualitatively similar if the model were improved.”

Regression to normality

A final important concept, which I believe is loosely though not necessarily related, is that of regression to normality: the stranger and more unusual the implications of an argument, the more “robustness” the supporting arguments need to have in order for it to be taken seriously. One way to model this concept is to consider “Conventional wisdom is correct and what seems normal is good” to be one of the “perspectives” or “mental models” weighed in parallel with others. This concept can potentially be modeled in sequence thinking as well, but in practice does not seem to be a common part of sequence thinking.

A couple more clarifications

Note that sequence thinking and cluster thinking converge in the case where one can do an expected-value calculation with sufficiently high robustness. “Outside view” arguments inherently involve a substantial degree of uncertainty (there are plenty of examples of expert opinion being wrong, of longstanding historical trends suddenly ending, etc.) so a robust enough expected-value calculation will carry the decision in both frameworks.

Note also that cluster thinking does not convert “uncertain, speculative probabilities” automatically into “very low probabilities.” Rather, it de-weights the conclusions of perspectives that overall contain a great deal of cumulative uncertainty, so that no matter what conclusion such perspectives reach, the conclusion is not allowed to have much influence on one’s actions.

Summary of properties of sequence thinking and cluster thinking

Sequence thinking Cluster thinking
Basic structure Tries to combine all relevant beliefs into a prediction using one model (“If A, B, C, … N, then X”) Weighs different mental models, each implying its own prediction (“A implies X; B implies ~X; C implies X; … therefore X”)
How much can a high-uncertainty parameter affect the conclusion? One big enough consideration can outweigh all others, even if it’s an uncertain “best guess” Any conclusion reached using uncertain methods has limited impact on the final decision
“Inside views” (laying out a causal chain) vs. “outside views” (expert opinion, “regression to normality,” historical track record of superficially similar decisions, etc. No obvious way of integrating inside and outside views; integration is often done via ad hoc adjustments and inside views often end up dominating the decision High-uncertainty inside views are usually dominated by outside views no matter what conclusions they reach

Why Cluster Thinking?
When trying to compare two very different options (such as vaccinations and space colonization), it seems at first glance as though sequence thinking is superior, precisely because it allows huge numbers to carry huge weight. The practice of limiting the weight of uncertain perspectives can have strange-seeming results such as (depending on robustness considerations) giving equal weight to “Charity A seems like the better organization” and “Charity B’s goal is 200 billion times as important.” In addition, I find cluster thinking far more difficult to formalize and describe, which can further lower its appeal in public debates about where to give.

Below, I give several arguments for expecting cluster thinking to produce better decisions. It is important to note that I emphasize “better decisions” and not “correct beliefs”: it is often the case that one reaches a decision using cluster thinking without determining one’s beliefs about anything (other than what decision ought to be made). In the example given in the previous section, cluster thinking has not reached a defined conclusion on how likely space colonization is, how valuable space colonization would be, etc. and there are many possible combinations of these beliefs that could be consistent with its conclusion that supporting Charity A is superior. Cluster thinking often ends up placing high weight on “outside view” pattern-matching, and often leads to conclusions of the form “I think we should do X, but I can’t say exactly why, and some of the most likely positive outcomes of this action may be outcomes I haven’t explicitly thought of.”

The arguments I give below are, to some degree, made using different vocabularies and different styles. There is some conceptual overlap between the different arguments, and some of the arguments may be partly equivalent to each other. I have previously tried to use sequence-thinking-style arguments to defend something similar to cluster thinking (though there were shortcomings in the way I did so); here I use cluster-thinking-style arguments.

Sequence thinking is prone to reaching badly wrong conclusions based on a single missing, or poorly estimated, parameter

Sequence-style reasoning often involves a long chain of propositions that all need to be reasonable for the conclusion to hold. As an example, Robin Hanson lays out 10 propositions that cumulatively imply a decision to sign up for cryonics, and believes each to have probability 50-80%. However, if even a single one ought to have been assigned a much lower probability (e.g., 10^-5) – or if he’s simply failed to think of a missing condition that has low probability – the calculation is completely off.

In general, missing parameters and overestimated probabilities will lead to overestimating the likelihood that actions play out as hoped, and thus overestimating the desirability of deviating from “tried and true” behavior and behavior backed by outside views. Correcting for missed parameters and overestimated probabilities will be more likely to cause “regression to normality” (and to the predictions of other “outside views”) than the reverse.

Cluster thinking is more similar to empirically effective prediction methods

Sequence thinking presumes a particular framework for thinking about the consequences of one’s actions. It may incorporate many considerations, but all are translated into a single language, a single mental model, and in some sense a single “formula.” I believe this is at odds with how successful prediction systems operate, whether in finance, software, or domains such as political forecasting; such systems generally combine the predictions of multiple models in ways that purposefully avoid letting any one model (especially a low-certainty one) carry too much weight when it contradicts the others. On this point, I find Nate Silver’s discussion of his own system and the relationship to the work of Philip Tetlock (and the related concept of foxes vs. hedgehogs) germane:

Even though foxes, myself included, aren’t really a conformist lot, we get worried anytime our forecasts differ radically from those being produced by our competitors.

Quite a lot of evidence suggests that aggregate or group forecasts are more accurate than individual ones … “Foxes often manage to do inside their heads what you’d do with a whole group of hedgehogs,” Tetlock told me. What he means is that foxes have developed an ability to emulate this consensus process. Instead of asking question of a whole group of experts, they are constantly asking questions of themselves. Often this implies that they will aggregate different types of information together – as a group of people with different ideas about the world naturally would – instead of treating any one piece of evidence as though it is the Holy Grail. The Signal and the Noise, pg 66

In sequence thinking, a single large enough number can dominate the entire calculation. In consensus decision making, a person claiming radically larger significance for a particular piece of the picture would likely be dismissed rather than given special weight; in a quantitative prediction system, a component whose conclusion differed from others’ by a factor of 10^10 would be likely to be the result of a coding error, rather than a consideration that was actually 10^10 times as important as the others. This comes back to the points made by the above two sections: cluster thinking can be superior for its tendency to sandbox or down-weight, rather than linearly up-weight, the models with the most extreme and deviant conclusions.

A cluster-thinking-style “regression to normality” seems to prevent some obviously problematic behavior relating to knowably impaired judgment

One thought experiment that I think illustrates some of the advantages of cluster thinking, and especially cluster thinking that incorporates regression to normality, is imagining that one is clearly and knowably impaired at the moment (for example, drunk), and contemplating a chain of reasoning that suggests high expected value for some unusual and extreme action (such as jumping from a height). A similar case is that of a young child contemplating such a chain of reasoning. In both cases, it seems that the person in question should recognize their own elevated fallibility and take special precautions to avoid deviating from “normal” behavior, in a way that cluster thinking seems much more easily able to accommodate (by setting an absolute limit to the weight carried by an uncertain argument, such that regression to normality can override it no matter what its content) than sequence thinking (in which any “adjustments” are guessed at using the same fallible thought process).

The higher one’s opinion of one’s own rationality relative to other people, the less appropriate the above analogy becomes. But it can be easy to overestimate one’s own rationality relative to other people (particularly when one’s evidence comes from analyzing people’s statements rather than e.g. their success at achieving their goals), and some component of “If I’m contemplating a strange and potentially highly consequential action, I should be wary and seek robustness (not just magnitude) in my justification” seems appropriate for nearly everyone.

Sequence thinking seems to tend toward excessive comfort with “ends justify the means” type thinking

Various historical cases of violent fanaticism seem somewhat fairly modeled as sequence thinking gone awry: letting one’s decisions become dominated by a single overriding concern, which then justifies actions that strongly violate many other principles. (For example, justifying extremely damaging activities based on Marxist reasoning.) Cluster thinking is far from a complete defense against such things: the robustness of a perspective (e.g., a Marxist perspective) can itself be overestimated, and furthermore a “regression to normality” can encourage conformism with highly problematic beliefs. However, the basic structure of cluster thinking does set up more hurdles for arguments about “the ends” (large-magnitude but speculative down-the-line outcomes) to justify “the means” (actions whose consequences are nearer and clearer).

I believe that invoking “the ends justify the means” (justifying near and clear harms by pointing to their further-out effects) is sometimes the right thing to do, and is sometimes not. Specifically, I think that the worse the “means,” the more robust (and not just large in claimed magnitude) one’s case for “the ends” ought to be. Cluster thinking seems to accommodate this view more naturally than sequence thinking.

(Related piece by Phil Goetz: Reason as memetic immune disorder)

When uncertainty is high, “unknown unknowns” can dominate the impacts of our actions, and cluster thinking may be better suited to optimizing “unknown unknown” impacts.

Sequence thinking seems, by its nature, to rely on listing the possible outcomes of an action and evaluating the action according to its probability of achieving these outcomes. I find sequence thinking especially problematic when I specifically expect the unexpected, i.e., when I expect the outcome of an action to depend primarily on factors that haven’t occurred to me. And I believe that the sort of outside views that tend to get more weight in cluster thinking are often good predictors of “unknown unknowns.” For example, obeying common-sense morality (“ends don’t justify the means”) heuristics seems often to lead to unexpected good outcomes, and contradicting such morality seems often to lead to unexpected bad outcomes. As another example, expert opinion often seems a strong predictor of “which way the arguments I haven’t thought of yet will point.”

It’s hard to formalize “expecting unknown unknowns to be the main impact of one’s action” in a helpful way within sequence thinking, but it’s a fairly common situation. In particular, when it comes to donations and other altruistic actions, I expect the bulk of the impact to come from unknown unknown factors including flow-through effects.

Broad market efficiency

Another way of thinking about the case for cluster thinking is to consider the dynamics of broad market efficiency. As I stated in that post:

the more efficient a particular market is, the higher the level of intensity and intelligence around finding good opportunities, and therefore the more intelligent and dedicated one will need to be in order to consistently “beat the market.” The most efficient markets can be consistently beaten only by the most talented/dedicated players, while the least efficient ones can be beaten with fairly little in the way of talent and dedication.

When one is considering a topic or action that one knows little about, one should consider the broad market to be highly efficient; therefore, any deviations from the status quo that one’s reasoning calls for are unlikely to be good ideas, regardless of the magnitude of benefit that one’s reasoning ascribes to them. (An amateur stock trader should generally assume his or her opinions about stocks to be ill-founded and to have zero expected value, regardless of how strong the “inside view” argument seems.) By contrast, when one is considering a topic or action that one is relatively well-informed and intelligent about, contradicting “market pricing” is not as much of a concern.

This is a special case of “as robustness falls, the potential weight carried by an argument diminishes – no matter what magnitudes it claims – and regression to normality becomes the stronger consideration.”

Sequence thinking seems to over-encourage “exploiting” as opposed to “exploring” one’s best guesses

I expect this argument to be least compelling to most people, largely because it is difficult for me to draw convincing causality lines and give convincing examples, but to me it is a real argument in favor of cluster thinking. It seems to me that people who rely heavily on sequence thinking have a tendency to arrive at a “best guess” as to what cause/charity/etc. ought to be prioritized, and to focus on taking the actions that are implied by their best guess (“exploiting”) rather than on actions likely to lead to rethinking their best guess (“exploring”). I would guess that this is because:

  • To the extent that sequence thinking highlights opportunities for learning, it tends to focus on a small number of parameters that dominate the model, and these parameters are often the least tractable in terms of learning more (for example, the value of space colonization). It thus seems often to encourage continued debate on largely intractable topics. Cluster thinking highlights many consequential areas of uncertainty and promises returns to clearing up any of them, leading to more traction on learning and more reduction in “unknown unknowns” over time.
  • Sequence thinking has a tendency to make different options seem to differ more in value, while cluster thinking tends to make it appear as though any high-uncertainty decision is a “close one” that can be modified with more learning. I believe the latter tends to be a more helpful picture.
  • Cluster thinking tends to have heavier penalties for uncertainty, due to its feature of not allowing the magnitude of a model parameter to overwhelm adjustments for uncertainty. When people are promoting speculative arguments, having to contend with and persuade “cluster thinkers” seems to cause them to do more investigation, do more improving of their arguments, and generally do more to increase the robustness of their claims.

In the domains GiveWell focuses on, it seems that learning more over time is paramount. We feel that much of the effective altruist community tends to be quicker than we are to dismiss large areas as unworthy of exploration and to focus in on a few areas.

Formal framework reproducing key qualities of cluster thinking

Cluster thinking, despite its seeming inelegance, is in some ways a closer match to what I see as the “idealized” thought process than sequence thinking is. On a separate page, I have attempted to provide a formal framework describing this “idealized” thought process as I see it, and how this framework deals with extreme uncertainty of the kind we often encounter in making decisions about where to give.

According to this framework, formally combining different mental models of the world has a tendency to cap the decision-relevance of highly uncertain lines of reasoning – the same tendency that distinguishes cluster thinking from sequence thinking. For more, see my full writeup on this framework, which I have confined to another page because it is long and highly abstract.

Writeup on modeling extreme model uncertainty

 

Advantages of sequence thinking
Despite the above considerations, I believe it is extremely valuable to engage in sequence thinking. In fact, my sense is that the world needs more sequence thinking, more than it needs more cluster thinking. While I believe that cluster thinking is more prone to making the correct decision between different possible (pre-specified) actions, I believe that sequence thinking has other benefits to offer when used appropriately.

To be clear, in this section when I say “engaging in sequence thinking” I mean “working on generating and improving chains of reasoning along the lines of explicit expected-value calculations,” or more generally, “Trying to capture as many relevant considerations as possible in a single unified model of the world.” Cluster thinking includes giving some consideration and weight to the outcomes of such exercises, but does not include generating them. Many of the advantages I name have to do with the tendency of sequence thinking to underweight, or ignore, “outside views” and crude pattern-matches such as historical patterns and expert opinion, as well as “regression to normality”; sequence thinking can make adjustments for such things, but I generally find its method for doing so unsatisfactory, and feel that its greatest strengths come when it does not involve such adjustments.

Sequence thinking can generate robust conclusions that then inform cluster thinking

There are times when a long chain of reasoning can be constructed that has relatively little uncertainty involved (it may involve many probabilistic calculations, but these probabilities are well-understood and the overall model is robust).

The extreme case of this is in some science and engineering applications, when sequence thinking is all that is needed to reach the right conclusion (I might say cluster thinking “reduces to” sequence thinking in these cases, since the sequence-thinking perspective is so much more robust than all other available perspectives).

A less extreme case is when someone simply puts a great deal of work into doing as much reflection and investigation as they can of the parameters in their model, to the point where they can reasonably be assumed to have relatively little left to learn in the short to medium term. People who have reached such status have, in my opinion, good reason to assign much less uncertainty to their sequence-thinking-generated views and to place much more weight on their conclusions. (Still, even these people should often assign a substantial amount of uncertainty to their views.)

There are many times when I have underestimated the weight I ought to place on a sequence-thinking argument because I underestimated how much work had gone into investigating and reflecting on its parameters. I have been initially resistant to many ideas that I now regard as extremely important, such as the greater cost-effectiveness of developing-world as opposed to developed-world aid, the potential gains to labor mobility, and views of “long-term future” effective altruists on the most worrying global catastrophic risks, all of which appeared to me at first to be based on naïve chains of logic but which I now believe to have been more thoroughly researched – and to have less uncertainty around key parameters – than I had thought.

Sequence thinking is more favorable to generating creative, unconventional, and nonconformist ideas

I often feel that people in the effective altruist community do too little regression to normality, but I believe that most people in the world do far too much. Any thinking style that provides a “regression to normality”-independent way of reaching hypotheses has major advantages.

Sequence thinking provides a way of seeing where a chain of reasoning goes when historical observations, conventional wisdom, expert opinion and other “outside views” are suspended. As such, it can generate the kind of ideas that challenge long-held assumptions and move knowledge forward (the cases I list in the immediately previous section are some smaller-scale examples; many scientific breakthroughs seem to fit in this category as well). Sequence thinking is also generally an important component in the formation of expert opinion (more below), which is usually a major input into cluster thinking.

Sequence thinking is better-suited to transparency, discussion and reflection

I generally find it very hard to formalize and explain what “outside views” I am bringing to a decision, how I am weighing them against each other, and why I have the level of certainty I do in each view. Many of my outside views consist of heuristics (i.e., “actions fitting pattern X don’t turn out well”) that come partly from personal experiences and observations that are difficult to introspect on, and even more difficult to share in ways that others would be able to comprehend and informedly critique them.

Sequence thinking tends to consist of breaking a decision down along lines that are well-suited to communication, often in terms of a chain of causality (e.g., “This action will lead to A, which will lead to B, which will lead to outcome-of-interest C if D and E are also true”). This approach can be clumsy at accommodating certain outside views that don’t necessarily apply to a particular sub-prediction (for example, many heuristics are of the form “actions fitting pattern X don’t turn out well for reasons that are hard to visualize in advance”). However, sequence thinking usually results in a chain of reasoning that can be explicitly laid out, reflected on, and discussed.

Consistent with this, I think the cost-effectiveness analysis we’ve done of top charities has probably added more value in terms of “causing us to reflect on our views, clarify our views and debate our views, thereby highlighting new key questions” than in terms of “marking some top charities as more cost-effective than others.” I have often been pushed, by people who heavily favor sequence thinking, to put more work into clarifying my own views, and I’ve rarely regretted doing so.

Sequence thinking can lead to deeper understanding

Partly because it is better-suited to explicit discussion and reflection, and partly because it tends to focus on chains of causality without deep integration of poorly-understood but empirically observed “outside view” patterns, sequence thinking often seems necessary in order to understand a particular issue very deeply. Understanding an issue deeply, to me, includes (a) being able to make good predictions in radically unfamiliar contexts (thus, not relying on “outside views” that are based on patterns from familiar contexts); (b) matching and surpassing the knowledge of other people, to the point where “broad market efficiency” can be more readily dismissed.

In my view, people who rely heavily on sequence thinking often seem to have inferior understanding of subjects they aren’t familiar with, and to ask naive questions, but as their familiarity increases they eventually reach greater depth of understanding; by contrast, cluster-thinking-reliant people often have reasonable beliefs even when knowing little about a topic, but don’t improve nearly as much with more study. At GiveWell, we often use a great deal of sequence thinking when exploring a topic (less so when coming to a final recommendation), and often feel the need to apologize in advance to the people we interview for asking naïve-seeming questions.

In order to reap this benefit of sequence thinking, one must do a good job stress-testing and challenging one’s understanding, rather than being content with it as it is. This is where the “incentives to investigate” provided by cluster thinking can be crucial, and this is why (as discussed below) my ideal is to switch between the two modes.

Other considerations

Sequence thinking can be a good antidote to scope insensitivity, since it translates different factors into a single framework in which they can be weighed against each other. I do not believe scope insensitivity is the only, or most important, danger in making giving decisions, but I do find sequence thinking extremely valuable in correcting for it.

Many seem to believe that sequence thinking is less prone to various other cognitive biases, and in general that it represents an antidote to the risks of using “intuition” or “system 1.” I am unsure of how legitimate this view is. When making decisions with high levels of uncertainty involved, sequence thinking is (like cluster thinking) dominated by intuition. Many of the most important parameters in one’s model or expected-value calculation must be guessed at, and it often seems possible to reach whatever conclusion one wishes. Sequence thinking often encourages one to implicitly trust one’s intuitions about difficult-to-intuit parameters (e.g., “value of space colonization”) rather than trusting one’s more holistic intuitions about the choice being made – not necessarily an improvement, in my view.

Cluster thinking and argumentation
I’ve argued that cluster thinking is generally superior for reaching good conclusions, but harder to describe and model explicitly. While I believe transparency of thought is useful and important, it should not be confused with rationality of thought.

I’ve sometimes observed an intelligent cluster thinker, when asked why s/he believes something, give a single rather unconvincing “outside view” related reason. I’ve suspected, in some such cases, that the person is actually processing a large number of different “outside views” in a way that is difficult to introspect on, and being unable to cite the full set of perspectives with weights, returns a single perspective with relatively (but not absolutely) high weight. I believe this dynamic sometimes leads sequence thinkers to underestimate cluster thinkers.

One of my hopes for this piece is to help people better understand cluster thinking, and in particular, how one can continue to make progress in a discussion even after a seemingly argument-stopping comment like “I see no problem with your reasoning, but I’m not placing much weight on it anyway” is made.

In such a situation, it is important to ask not just whether there are explicit problems with one’s argument, but how much uncertainty there is in one’s argument (even if such uncertainty doesn’t clearly skew the calculation in one direction or another) and whether other arguments, using substantially different mental models, give the same conclusion. When engaging with cluster thinking, improving one’s justification of a probability or other parameter – even if it has already been agreed to by both parties as a “best guess” – has value; citing unrelated heuristics and patterns has value as well.

To give an example, many people are aware of the basic argument that donations can do more good when targeting the developing-world poor rather than the developed-world poor: the developing-world poor have substantially worse incomes and living conditions, and the interventions charities carry out are commonly claimed to be substantially cheaper on per-person or per-life-saved basis. However, many (including myself) take these arguments more seriously on learning things like “people I respect mostly agree with this conclusion”; “developing-world charities’ activities are generally more robustly evidence-supported, in addition to cheaper”; “thorough, skeptical versions of ‘cost per life saved’ estimates are worse than the figures touted by charities, but still impressive”; “differences in wealth are so pronounced that “hunger” is defined completely differently for the U.S. vs. developing countries“; “aid agencies were behind undisputed major achievements such as the eradication of smallpox”; etc. The function of such findings isn’t necessarily to address specific objections to the basic argument, but rather to put its claims on more solid footing – to improve the robustness of the argument.

The balance I try to strike
As implied above, I believe sequence thinking is valuable for idea generation, reflection and discussion, while cluster thinking is best for making the final choice between options. I try to use the two types of thinking accordingly. GiveWell often puts a great deal of work into understanding the causal chain of a charity’s activities, estimating the “cost per life saved,” etc., while ultimately being willing to accept some missing links and place limited weight on these things when it comes to final recommendations.

However, there are also times in which I let sequence thinking dominate my decisions (not just my investigations), for the following reasons.

One of the great strengths of sequence thinking is its ability to generate ideas that contradict conventional wisdom and easily observable patterns, yet have some compelling logic of their own. For brevity, I will call these “novel ideas” (though a key aspect of such ideas is that they are not just “different” but also “promising”). I believe that novel ideas are usually flawed, but often contain some important insight. Because the value of new ideas is high, promoting novel ideas – in a way that is likely to lead to stress-testing them, refining them, and ultimately bringing about more widespread recognition of their positive aspects – has significant positive expected value. At the same time, a given novel idea is unlikely to be valid in its current form, and quietly acting on it (when not connected to “promoting” it in the marketplace of ideas, leading to its refinement and/or widespread adoption) may have negative expected value.

One example of this “novel ideas” dynamic is the charities recommended by GiveWell in 2006 or 2007: GiveWell at that time had a philosophy and methodology with important advantages over other resources, but it was also in a relatively primitive form and needed a great deal of work. Supporting GiveWell’s recommendations of that time – in a way that could be attributed to GiveWell – led to increasing attention and influence for GiveWell, which was evolving quickly and becoming a more sophisticated and influential resource. However, if not for GiveWell’s ongoing evolution, supporting its recommended charities would not have had the sort of expected value that it naively appeared to (according to our over-optimistic “cost per life saved” figures of the time). (Note that this paragraph is intended to give an example of the “novel ideas” dynamic I described, but does not fit the themes of the post otherwise. Our recommendations weren’t purely a product of sequence thinking but rather of a combination of sequence and cluster thinking.)

For me, a basic rule of thumb is that it’s worth making some degree of bet on novel ideas, even when the ideas are likely flawed, when it’s the kind of bet that (a) facilitates the stress-testing, refinement, and growing influence of these ideas; (b) does not interfere with other, more promising bets on other novel ideas. So it makes sense to start, run, or support an organization based on a promising but (because dependent on sequence thinking, and in tension with various outside views) likely flawed idea … if (a) the organization is well-suited to learning, refining, and stress-testing its ideas and growing its influence over time; (b) starting or supporting the organization does not interfere with one’s support of other, more promising novel ideas. It makes sense to do so even when cluster thinking suggests that the novel idea’s conclusions are incorrect, to the extent that quite literal endorsement of the novel idea would be “wrong.”

When we started GiveWell, I believed that we were likely wrong about many of the things that seemed to us from an inside, sequence-thinking view to be true, but that it was worth acting on these things anyway, because of the above dynamic. (I am referring more to our theories about how we could influence donors and have impact than to our theories about which charities were best, which we tried to make as robust as we could, while realizing that they were still quite uncertain.) We believed we were onto some underappreciated truth, but that we didn’t yet know what it was, and were “provisionally accepting” our own novel ideas because we could afford to do so without jeopardizing our overall careers and because they seemed to be the novel ideas most worth making this sort of bet on. We expected our ideas to evolve, and rather than taking them as true we tried to stress-test them by examining as many different angles as we could (for example, visiting a recommended charity’s work in the field even though we couldn’t say in advance which aspect of our views this would affect). There were other novel ideas that we found interesting as well, but incorporating them too deeply into our work (or personal lives) would have interfered with our ability to participate in this dynamic.

The above line of argument justifies behavior that can seem otherwise strange and self-contradictory. For example, it can justify advocating and acting to some degree on a novel idea, while not living one’s life fully consistently with this idea (e.g., working to promote Peter Singer’s ideas about the case for giving more generously, while not actually giving as much as his ideas would literally imply one should). When considering possible actions including “avoiding factory-farmed meat,” “giving to the most apparently cost-effective charity,” etc., I am always asking not only “Does this idea seem valid to me?” but “Am I acting on this idea in a way that promotes it and facilitates its evolution, and does not interfere with my promotion of other more promising ideas?” As such, I tend to change my own behavior enough to reap a good portion of the benefits of supporting/promoting an idea but not as much as literal acceptance of the idea would imply. I have a baseline level of stability and conservatism in the way I live my life, which my bets on novel ideas are layered on top of in a way that fits well within my risk tolerance.

Promoting a sequence-thinking-based idea in a cluster-thinking-based world leads to examining the idea from many angles, looking for many unrelated (or minimally related) arguments in its favor, and generally working toward positive evolution of the idea. The ideal, from my perspective, is to use cluster thinking to evaluate the ultimate likely validity of ideas, while retaining one’s ability to (without undue risk) promote and get excited about sequence-thinking-generated ideas that may eventually change the world.

For one with few resources for idea promotion and exploration, this may mean picking a very small number of bets. For one who expects to influence substantial resources – as GiveWell currently does – it is rational to simultaneously support/promote work in multiple different causes, each of which could be promising under certain assumptions and parameters (regarding how much value we should estimate in the far future, how much suffering we should ascribe to animals, etc.), even if the assumptions and parameters that would support one cause contradict those that would support another. When choosing between causes to support, cluster thinking – rather than choosing one’s best-guess for each parameter and going from there – is called for.

Reflections on a Site Visit in Myanmar (Burma)

This is a cross-post from Good Ventures’ blog Give & Learn. It was co-authored by Cari Tuna, Good Ventures Co-Founder, and Natalie Crispin, GiveWell Research Analyst.

We recently traveled to Myanmar to visit a project Good Ventures is supporting to help prevent the spread of drug-resistant malaria. This post shares some observations from the trip and general thoughts about the value of site visits versus other ways of learning about the impact of one’s giving.

About the project

To recap, this project aims to rapidly replace one type of malaria treatment in Myanmar (AMT) with another (ACT) to reduce the risk of drug resistance, which could have a devastating effect on global malaria control efforts if left unchecked.

The project is being carried out by Population Services International (PSI) with support from the Gates Foundation, the UK Department for International Development (DFID) and Good Ventures. We contributed to the project as part of our effort to learn from other major funders through co-funding.

The project involves selling subsidized ACTs to the largest pharmaceutical distributor in Myanmar, sending “product promoters” to private drug providers to promote the appropriate use of ACTs, and piloting a project to promote the use of rapid diagnostic tests (RDTs) among such providers. It also involves encouraging the Myanmar government to follow through on its commitment to end the importation of AMTs.

The project got off to a slow start in 2012, but progress has accelerated over the course of 2013. The latest data show that the ratio of AMT to ACT in the market has decreased from about 20:1 in 2012 to about 1:2 in 2013. The subsidy has been largely passed on to patients: the price of a full course of ACT is less than or equal to the price of a typical (partial) dose of AMT in 94% of drug outlets. Due to quickly declining malaria rates in the country, PSI Myanmar estimates that the project has enough funding to continue into 2016, 18 months longer than originally planned. (A more detailed update is forthcoming.)

Observations on the site visit

We spent five days with PSI Myanmar for the annual donor review of the project, which was also attended by representatives of the Gates Foundation and DFID. We spent the first day in PSI Myanmar’s office in Yangon reviewing the project’s progress and potential risks to its continued success and sustainability. We spent the next three days traveling through Mon State and interviewing people working at various levels of the supply chain, including itinerant drug vendors, village pharmacists, drug wholesalers, and representatives of the large pharmaceutical distributor, AA Medical Products. (We’ll refer to these three days as the “field visit.”) We spent the last day in Yangon reflecting on the visit with PSI and DFID. We’re deeply grateful to PSI Myanmar for hosting us and organizing the trip.

We’ll post detailed notes from our travels soon. In the meantime, we wanted to share some miscellaneous observations and reflections:

  • A major goal of ours in co-funding this project — and attending the annual review — was to learn about how funders such as the Gates Foundation and DFID operate.
    • These funders took a similar approach to the site visit as we would have taken on our own. They asked everyone we interviewed copious questions, both directly and indirectly related to the project. They tended to avoid leading questions and tried to ask the same questions in multiple ways, in order to increase their odds of getting an unbiased view of the situation.
    • We were impressed by Louise Mellor from DFID and her efforts to establish a positive rapport with the people we interviewed. She encouraged everyone in our entourage to introduce themselves by name at each stop. After we had asked our questions, she often offered the interviewees words of encouragement or thanks for the role they were playing to help prevent drug resistance. (Previously, Louise told me that establishing a positive rapport is important for helping people feel comfortable speaking freely.) We plan to be more intentional about establishing a positive rapport with interviewees on future site visits.
    • We were also impressed by the Gates Foundation’s Tom Kanyok, who volunteered to take a rapid diagnostic test (RDT) at a general store we visited in Mon State. The test involved the general store owner pricking his finger for a blood sample. It was helpful to see an RDT performed live.
    • DFID’s process for evaluating the project’s progress involved both a formal review of pre-determined, quantified indicators and unstructured reflection on what could be improved. At the end of our visit, DFID staff provided PSI with both overall feedback on how they believed the project was going (“a very positive review”) and some recommendations for PSI to consider over the next few months, such as studying the feasibility of reducing the ACT subsidy, exploring the use of local languages on drug packages, and providing more information to donors on issues raised during the review. DFID will also produce a formal report on the review (update: recently published here). In addition to DFID staff who are based in Myanmar, a DFID economic advisor, who is based in London and not normally involved in the project, attended the annual review, in order to provide an outside perspective.
    • DFID staff noted that while field observations cannot be treated as representative evidence, site visits help to provide a reality check on one’s assumptions, surface potential problems, and allow for relationship building and evaluation of project management. They said DFID is trying to incorporate more site visits into its project monitoring.
    • We were struck by how often Tom Kanyok from the Gates Foundation raised the topic of malaria elimination and eradication. (Representatives of the Gates Foundation previously told us that malaria eradication is a focus of the foundation. PSI told us that governments in the region, with support from the World Health Organization, have adopted the strategy of eliminating of P. falciparum malaria, and believe it's the only long-term solution for antimalarial drug resistance.)
  • ACTs were available in all but one of the 10 drug outlets we visited. The shopkeeper at the outlet that was out of stock out told us that she had sold the last pack the previous day. At each outlet, we asked the provider whether he or she sold other antimalarial drugs, and we looked for such drugs on display. No provider said they carried oral AMTs. At one wholesale outlet, the shop owner told us that he did not sell oral AMTs, but we subsequently found them on display. The shop owner then became worried that he was in trouble. (It’s not against the law to sell oral AMTs, only to import them, but there seemed to be rumors that selling them was banned as well. What’s more, we were traveling with a representative of the Myanmar Ministry of Health, which could have increased the shop owner’s concern.)
  • From what we saw, the pilot program to encourage use of rapid diagnostic tests (RDTs) appeared to be going well. All of the pilot participants with whom we spoke reacted positively to questions about the program, and the shopkeeper who performed a demonstration test for us did so competently. She had completed only a few tests before our visit.
  • One issue that emerged as a concern to donors during the field visit — but wasn’t emphasized as a major risk to the project’s success during PSI’s earlier presentation — was the upcoming expiration of ACTs in the market. At the start of the project, PSI had to place its first order of ACTs with limited information about the size of the market. As a result, and because of declining malaria rates in Myanmar, PSI ordered more ACTs than have been needed. It had stock that would expire, unused, in March and April 2014. At the time of the site visit, PSI was waiting to begin selling a new batch of drugs, and replacing expired drugs in the market with new drugs, to minimize wastage. The issue surfaced during the field visit when a DFID participant noticed that all the ACTs we encountered in the market were set to expire in March or April. Donors raised the concern that PSI may be waiting too long to replace the drugs and that expired drugs may trickle down to harder-to-reach areas as a result. This prompted a conversation that led PSI to begin the replacement process slightly earlier than planned, after obtaining approval from the donors. PSI notes that it's not alone in having expired drug stock — all agencies in Myanmar, including the Ministry of Health, are currently seeing rapidly decreasing disease transmission resulting from aggressive control efforts, which has led them to need fewer drugs than originally anticipated. PSI also notes that overstock is greatly preferable to understock, which could lead to market demand for sub-standard products.
  • Because we were traveling with a large entourage (around 10 people) and making mostly pre-scheduled stops, it was difficult to know whether we were seeing an unbiased picture of circumstances on the ground. We do note that the purpose of the field visit was not to assess the project’s overall success, which is more appropriately done by looking at representative monitoring data rather than a small sample of anecdotal observations.
    • For instance, we were struck by the abundance of promotional materials for the PSI-subsidized ACTs on display at the outlets we visited, including at outlets where providers reported relatively low malaria caseloads. Many of the materials looked relatively new. A PSI representative assured us that extra materials were not hung up in anticipation of our visit. This may well have been the case; PSI staff said they had recently undertaken a big marketing push.
  • We were struck by the complexity of the operating environment in which the project is taking place. This served as a general reminder of the large number of ways in which a project can fall short of its objectives, including ways that would be difficult for a donor who is not highly informed to predict. In this case, complexities of the operating environment include violent conflict in some parts of the country where drug resistance is of particular concern, conflicting policies in different government departments, dissatisfaction with the goals of the project among some government physicians, a second common malaria parasite that is treated with a different drug regimen, and working conditions and vector behavior that make certain interventions, such as bednets, somewhat less effective for at-risk populations. 

General thoughts on the value of site visits

As we research potential focus areas, we’re often advised to “get out into the field” in order to understand the work we’re funding, or considering funding, better. We agree with the notion that site visits can be valuable for learning. That said, we’ve found that such visits are helpful for some — but not all — types of learning. They are not a substitute for desk research, though they can complement such research in important ways. Site visits also take a great deal of time to conduct, and hosting them requires a significant investment on the part of the nonprofit. These are trade-offs we take into account when deciding how to prioritize our time to maximize our learning.

What are site visits good for?

  • Field observations can provide a valuable reality check on our assumptions.
  • They can be helpful for raising questions and surfacing potential problems that may not have appeared important based on reviewing formal reports and monitoring data. In this case, for example, the issue of expiring drugs became a greater concern for donors because we encountered them in the market.
  • Field visits often involve spending an extended amount of time with project managers and staff — and in this case, other donors — including unstructured time on the road and eating meals, in addition to scheduled stops. This time is not only helpful for relationship building; it also allows us to ask many more questions than we could over the phone or by e-mail. We’ve consistently found this to be one of the biggest benefits to participating in such visits.
  • Field visits help us to develop a fuller picture of the context in which a nonprofit is working, including complexities that may be hard to appreciate from afar.
  • We’ve found that communicating about a project is easier after spending a lot of time immersed in the details of the work.
  • Lastly, visiting with beneficiaries of a project, as we did in our 2012 visit to GiveDirectly in Western Kenya, can lead to a greater emotional connection to the work.

What aren’t site visits good for?

In most cases, site visits do not seem to be an appropriate tool for learning how a project is going in general, because they only allow for a small number of anecdotal observations, which are not necessarily representative of the situation overall. Furthermore, despite our best efforts not to ask leading questions or prime interviewees to respond in certain ways, we’ve found it hard to know whether we’re getting a fully accurate view of circumstances on the ground in the places we’ve visited. This is to be expected and doesn’t diminish the other benefits of conducting site visits. But it does point to the importance of representative monitoring data and rigorous, independent evaluation in learning about whether the work we’re funding is succeeding in meeting its goals.