Geomagnetic storms: History’s surprising, if tentative, reassurance

This is the second post in a series about geomagnetic storms as a global catastrophic risk. A paper covering the material in this series was just released.

My last post raised the specter of a geomagnetic storm so strong it would black out electric power across continent-scale regions for months or years, triggering an economic and humanitarian disaster.

How likely is that? One relevant source of knowledge is the historical record of geomagnetic disturbances, which is what this post considers. In approaching the geomagnetic storm issue, I had read some alarming statements to the effect that global society is overdue for the geomagnetic “Big One.” So I was surprised to find reassurance in the past. In my view, the most worrying extrapolations from the historical record do not properly represent it.

I hasten to emphasize that this historical analysis is only part of the overall geomagnetic storm risk assessment. Many uncertainties should leave us uneasy, from our incomplete understanding of the sun to the historically novel reliance of today’s grid operators on satellites that are themselves vulnerable to space weather. And since the scientific record stretches back only 30–150 years (depending on the indicator) and big storms happen about once a decade, the sample is too small to support sure extrapolations of extremes.

Nevertheless the historical record and claims based on it are the focus in this and the next post. I’ll examine four (kinds of) extrapolations that have been made from the record: from the last “Big One,” the Carrington event of 1859; from the July 2012 coronal mass ejection (CME) that might have caused a storm as large if it had hit Earth; a more complex extrapolation in Kappenman (2010); and the formal statistical extrapolation of Riley (2012). I’ll save the last for the next post.

The Carrington event

A series of CMEs starting in late August 1859 caused an extraordinary, global geomagnetic storm. It came to be named for astronomer Richard Carrington, who linked the events on earth to an intense solar flare he had observed on the sun hours before. The Carrington event produced spectacular auroras as far south, in the northern hemisphere, as San Salvador and as far north, in the southern hemisphere, as Santiago. Vivid reports were made from around the world. Here is one from the Washington Territory of the US:

At 8 P.M. Aug. 28, 1859, a diffused light, without definite form, was observed a little east of north, covering about one-fourth of the heavens, which gradually increased to the west, sending across from east to west an arch of a whitish color, the arch itself being much brighter than the circumjacent light…At 9h20m P.M. strongly marked rays became visible, which rising from the horizon converged to a point on the arch a little south of the zenith, and in this position remained visible about one hour. The rays in the northwest were of a pink color, those in the southeast were purple, alternately brightening and fading to a whitish color. At midnight, all disappeared except the arch, and at intervals undulating flashes of light appeared not visible longer than three seconds. Occasionally streamers shot up from the horizon, the lower part disappearing before the upper part had reached the zenith. Sometimes these streamers were broad at the horizon, and came to a point near the zenith, and sometimes the reverse.

The storm hardly harmed human societies: it temporarily disrupted telegraph communications. Yet because of its apparent strength, the Carrington event of 1859 occupies a special place in the minds of those concerned about geomagnetic storms. It is the 1906 San Francisco earthquake, whose repetition is statistically inevitable. And next time global society may be much more vulnerable.

A critical question in assessing that vulnerability is: how much bigger was Carrington than the geomagnetic storms that have hit since the development of modern grids, which civilization has shrugged off? The paucity of high-quality scientific measurements from 1859 impedes comparisons (some magnetometers were operating, but went off scale). But scientists have made the most of the available data. The table below compares the Carrington event to more recent storms on several measures.

Storm strength indicator Carrington event, 1859 Modern comparators Sources
Lowest magnetic latitude where aurora visible 23° 29°, Mar. 1989 Cliver and Svalgaard (2004), p. 417; Silverman (2006), p. 141
Associated solar flare intensity (soft X-ray emissions) 0.0045 W/m2 0.0035 W/m2, Nov. 2003 Cliver and Dietrich (2013), pp. 2–3
Transit time of CME to earth 17.6 h 14.6 h, Aug. 1972;
20.3 h, Oct. 2003
Cliver and Svalgaard (2004), Table III
Dst (low-latitude magnetic field depression) –850 nT –589 nT, Mar. 1989 Siscoe, Crooker, and Clauer (2006); Kyoto University
W/m2 = watts/square meter; h = hours; nT = nanotesla

Perhaps the most important comparison is in the last row. The storm-time disturbance (Dst) index measures the change in Earth’s magnetic field, as measured at four low-latitude observatories; it roughly proxies the overall geomagnetic impact of a storm. The Dst has only been compiled since 1957; scientists estimate that the Carrington event would have registered at roughly –850 nanotesla. As shown in the table, the biggest value on record is –589 nT; it occurred on March 14, 1989, which is when the Québec grid collapsed for some hours and permanently lost two transformers. (Last Tuesday a pretty-big CME drove the Dst to –195, the largest reading in 10 years.) This and the other comparisons in the table suggest that the Carrington storm was, conservatively, no more than twice as strong as modern events.

The July 2012 near-miss

Another important comparator is the major CME of July 23, 2012. That CME missed Earth because it left from what was then the far side of the sun. However, the NASA probe STEREO-A was travelling along earth’s orbit about 4 months ahead of the planet, and lay in the CME’s path, while STEREO-B, trailing four months behind earth, was also positioned to observe. The twin probes produced the best measurements ever of a Carrington-class solar event (Baker et al. 2013). Since the sun rotates about its axis in less than a month, had the CME come a couple of weeks sooner or later, it would have smashed into our planet.

Two numbers convey the power of the near-miss CME. First is its transit time to earth orbit: at just under 18 hours, almost exactly the same as in the Carrington event.[1] A slower CME on July 19 appears to have cleared the interplanetary medium of solar plasma, resulting in minimal slowdown of the big CME on July 23. The second number is the strength of the component of the CME’s magnetic field running parallel to earth’s. When a CME hits Earth, it strews the most magnetic chaos if its field parallels Earth’s (meaning that both point south); then it is like slamming together two magnetized toy trains the way they don’t want to go. The southward component of the magnetic field of the great July 2012 CME peaked at 50 nT. Here, however, “south” means perpendicular to Earth’s orbital plane. Since Earth’s spin axis is tilted 23.5° and its magnetic poles deviate from the spin poles by about another 10°, the southerly magnetic force of the near-miss CME, had it hit Earth, could have been more or less than 50 nT, depending on the exact time of day and year. Baker et al. (2013, p. 590) estimate the worst case as 70 nT south, relative to earth’s magnetic orientation.

For comparison, the graph below shows the north-south component of the interplanetary magnetic field near earth since 1963, where north and south are also defined by the orientation of earth’s magnetic poles. Unfortunately, data are missing for the largest storm in the time range, the one of March 1989.[2] The graph does reveal a large northerly spike in 1972, which explains why that year’s great CME caused minimal disruption despite the record speed noted in the table above. Also shown are large southerly magnetic forces in storms of 1982 and 2003, the latter reaching 50 nT.

North-south component of interplanetary magnetic field near earth (nT)

Given the near-miss CME’s speed, its magnetic field, and its density, how big a storm could it have caused had it hit earth?  Baker et al. (2013) estimate that it would have rated between –480 and –1182 on the Dst index, depending on the CME’s magnetic orientation relative to Earth’s at collision. Comparing the high value to the modern record of –589 again points to a benchmark worst-case storm as being twice as strong as anything experienced since the construction of modern grids.

In a companion paper, the same team of scientists ran computer simulations to develop a more sophisticated understanding of what would have happened if earth had been in STEREO-A’s place on July 23. Their results do not point clearly to a counterfactual catastrophe. “Had the 23 July CME hit Earth, there is a possibility that it could have produced comparable or slightly larger geomagnetically induced electric fields to those produced by previously observed Earth directed events such as the March 1989 storm or the Halloween 2003 storms.”

Kappenman’s factor of 10

In contrast, the prominent analyst John Kappenman has favored a factor of 10 to characterize the likely 100-year storm relative to the strongest recent storms.[3] Recognizing that this contrast with my interpretation of the evidence begs explanation, I investigated the basis for the factor of 10. It appears to arise as the ratio of two numbers.

One represents the worst disruption that geomagnetic storms have wrought in the modern age, meaning, again, the Québec blackout in March 1989. As Kappenman points out, just as the Richter scale doesn’t tell you everything about the destructive force of an earthquake in any given spot—local geology, distance from the epicenter, and building construction quality matter too—the Dst index doesn’t tell you everything about the capacity of a storm to induce electrical surges in any given place. What matters is not the total perturbation in the magnetic field, globally averaged, but the rate of magnetic field change from minute to minute along power lines of concern.[4] By the laws of electromagnetism, a changing magnetic field induces a voltage; and the faster the change, the bigger the voltage. In Québec in 1989, the rate of magnetic field change peaked at 479 nT/min (nanotesla per minute) according to Kappenman.

While I did not find a clear citation of source for this statistic, it looks highly plausible. The graph below, based on my own extracts of magnetic observatory data, shows the maximum horizontal field changes at 58 stations on that day in 1989, based on measurements taken every minute, on the minute. Each 3-letter code represents an observatory; e.g., FRD is Fredericksburg, VA, and BFE is Brorfelde, Denmark.[5]

Maximum-one-minute-change-in-horizontal-magnetic-field-March-13-1989-nanotesla

Ottawa (OTT, in red) recorded a peak change of 556 nT/min, between 9:50 and 9:51pm universal time, which is compatible with Kappenman’s 479 for nearby Québec. Brorfelde recorded the highest value, 1994 nT/min.

The other number in Kappenman’s factor-of-10 ratio represents the highest estimate we have of any per-minute magnetic field change before World War II, at least at a latitude low enough to concern Europe or North America. It comes from Karlstad, in southern Sweden, during the storm of May 13–15, 1921. The rate of change of the magnetic field was not measured there, but the electric field induced in a telegraph line was reportedly estimated at 20 volts/kilometer (V/km). Calibrating to modern observations, Kappenman calculates, “the 20 V/km observation…suggests the possibility that the disturbance intensity approached a level of 5000 nT/min.” Elsewhere Kappenman suggests 4800 nT/min. And 4800/479 is just about 10. That is, the worst case on record looks to be 10 times as bad as what caused the Québec blackout.

I have two doubts about this ratio. First, the top number appears to have been unintentionally increased by a scholarly game of telephone. As a source for the 20 V/km observation, Kappenman cites—and correctly represents—a 1992 conference paper by Elovaara et al., who write, “The earth surface potentials produced are typically characterized by the value 1 V/km, but in extreme cases much higher values has been recorded like 20 V/km in a wire communication system in Sweden in May 1922 [sic].” No source is given there; but Jarmo Elovaara pointed me to Sanders (1961) as likely. Indeed, there we read, “In May, 1921, during an outstanding magnetic storm, the largest earth-current voltages measured on wirelines in Sweden ranged from 6.3 to 20 v/km.” The cited source for that range is the “Earth Currents” article of the 1943 Encyclopedia Britannica, which states: “In May 1921, during an outstanding magnetic storm, Stenquist calculated from the fusing of some copper wires and the non-fusing of others that the largest earth current voltage in Sweden lay between 6.3 and 20 volts per kilometre.” “Stenquist” is David Stenquist, a Swedish telegraph engineer who in 1925 published Étude des Courants Telluriques (Study of Earth Currents). The pertinent passage thereof comes on page 54:

Stenquist 1925, Étude des Courants Telluriques, page 54

Translation:

Nevertheless I tried to calculate the largest value of telluric [earth] currents. Until now, standard opinion was that the largest potential differences in the earth because of telluric currents are two volts per kilometer. During the nights of May 13–14 and 14–15, this value was greatly exceeded. In many cases the currents were so strong in the lines of copper (3 mm [millimeters]), the conduits melted, i.e. the current exceeded 2.5 amps. Because the copper wire just mentioned had a resistance of 2.5 ohms per kilometer, we get a potential difference of 6.3 volts per kilometer. In contrast, the [fusion tubes?] placed on the iron lines (4 mm) did not melt. These iron lines have a resistance of 8 ohms per kilometer. So it is known that 20 volts did not occur. With a large enough security to speak, a difference of 10 volts per kilometer was found.

Stenquist believed the electric force field reached 10 V/km but explicitly rejected 20. Yet through the chain of citations, “20 volts n’ont pas été dé-passés” became “higher values has been recorded like 20 V/km.” Using Kappenman’s rule of thumb, Stenquist’s 10 V/km electrical force field suggests peaks of 2500 rather than 5000 nT/min of magnetic change on that dark and geomagnetically stormy night in Karlstad.

The second concern I have about the estimated ratio of 10 between magnetic fluctuations in the distant and recent past is that it appears to compare apples to oranges—an isolated, global peak value in one storm to a wide-area value in another. As we have already seen, the highest value observed in 1989 was not the 479 Kappenman uses to represent that storm but 1994 nT/min, in Brorfelde. And back in July 13–14, 1982, the Lovo observatory, at the same latitude as Karlstad, experienced 2688 nT/min according to my calculations from the public data. Both instances were isolated: most observatories of comparable latitude reported much lower peaks. It is therefore not clear that the 1921 storm, with its isolated observation of 2500 nT/min, exceeded those of the 1980s at all, let alone by a factor of 10. Maximum magnetic changes and voltages may have been the same.

While there is apparently no evidence that fluctuations as great as 4800 nT/min have happened over large areas, Kappenman’s simulated 100-year storm scenario assumes that extremes of this order would occur across the US in a 5-degree band centered on 50° N geomagnetic latitude (an area 350 miles wide north-south, 3,000 long east-west)—4800 nT/min east of the Mississippi and 2400 nT/min to the west. The associated estimate that a 100-year storm would put 365 high-voltage transformers at risk of permanent damage, out of some 2,146 in service, affect regions home to 130 million people, and reduce economic output by trillions of dollars, entered an oft-cited National Research Council conference report. Yet to me, this seems like a highly unrepresentative extrapolation from history.

I found one other independent review of the Kappenman analysis. It was commissioned in 2011 by the US Department of Homeland Security from JASON, a group of scientists that advises the government on issues at the intersection of science and security. The JASON report concludes: “Because mitigation has not been widely applied to the U.S. electric grid, severe damage is a possibility, but a rigorous risk assessment has not been done. We are not convinced that the worst-case scenario of [Kappenman] is plausible. Nor is the analysis it is based on, using proprietary algorithms, suitable for deciding national policy….[W]e are unlikely to experience geomagnetic storms an order-of-magnitude more intense than those observed to date.”

Summary

Despite some spectacular reports from 1859 Carrington event and some equally spectacular scenario forecasts, to this point in our inquiry, the historical record appears surprisingly reassuring. There is little suggestion that the Carrington event—or any other in the last two centuries—was more than twice as strong as the biggest storms of recent decades, in 1982, 1989, and 2003. And civilization shrugged those off, with only a few high-voltage transformers taken out of commission. This does not prove geomagnetic storms pose no global catastrophic risk; like the JASON group, I don’t feel we can rule that out. But it does lead us to more focussed questions: What are the risks posed by a doubling of storm strength relative to recent experience? Under what assumptions would the effects be extremely disproportionate to the increase in magnetic disruption? What steps could be taken to change that?

In fact, I don’t feel that I have satisfactory answers to those questions. The area appears under-researched, and that may point to an opportunity for philanthropy.

But I get ahead of myself. In the next post, I will make a final approach on the historical record, this one more systematic and statistical. I think it’s important but it will not change the conclusion much.

Footnotes

[1] The eruption occurred at about 2:05 universal time on July 23, 2012. STEREO-A began to sense it around 21:00.

[2] Downloaded from cdaweb.sci.gsfc.nasa.gov/cdaweb/sp_phys, data set OMNI2_H0_MRG1HR, variable “1AU IP Bz (nT), GSM” (meaning 1 astronomical unit from sun, interplanetary magnetic field Z component, nanotesla, geocentric solar magnetospheric coordinates). Readings are hourly, with gaps.

“Because the [1-in-100 year scenario] 4800nT/min threat environment is ~10 times larger than the peak March 1989 storm environment, this comparison also indicates that resulting GIC peaks will also in general be nearly 10 times larger as well” (Kappenman 2010, p. 4-12). “This disturbance level is nearly 10 times larger than the levels that precipitated the North American power system impacts of 13 March 1989” (Kappenman 2004). “Measured data has shown that storms with impulsive disturbance levels that are 4 to 10 times larger than those that impacted the North American grid in March 1989 have occurred before” (Kappenman 2012, p. 2).

[4] Pulkkinen et al. (2008) find that standard modeling approaches for translating geomagnetic disturbances into induced voltages are reasonably accurate using 1-minute time resolution data, with the modeled peak more than 80% of the true peak. Going to a cadence of 10 seconds eliminates the remaining gap.

[5] Plotted are all stations with data for the period in NOAA’s SPIDR system or the Nordic IMAGE network. Geomagnetic latitudes are from NASA’s geomagnetic coordinate calculator. A list and maps of observatories is here.

Geomagnetic storms: An introduction to the risk

Image from NASA via Wikipedia
The Open Philanthropy Project has included geomagnetic storms in its list of global catastrophic risks of potential focus.

To be honest, I hadn’t heard of them either. But when I was consulting for GiveWell last fall, program officer Howie Lempel asked me to investigate the risks they pose. (Now I’m an employee of GiveWell.)

It turns out that geomagnetic storms are caused by cataclysms on the sun, which fling magnetically charged matter toward earth. The collisions can rattle earth’s magnetic field, sending power surges through electrical grids. The high-speed particles can also take out satellites critical for communication and navigation. The main fear is that an extreme storm would so damage electrical grids as to black out power on a continental scale for months, even years. The toll of such a disaster would be tallied in economic terms, presumably in the trillions of dollars. It would also be measured in lives lost, since all the essential infrastructure of civilization, from food transport to law enforcement, now depends on being able to plug things in and turn them on (NRC 2008, pp. 11–12).

Having examined the issue, especially its statistical aspects, I am not convinced that this scenario is as likely as some prominent voices have suggested. For example, as I will explain in a later post, Riley’s (2012) oft-cited estimate that an extreme storm—stronger than any since the advent of the modern grid—has a 12%-per-decade probability looks like an unrepresentative extrapolation from the historical record. I put the odds lower. My full report has just been posted, along with data, code, and spreadsheets.

Nevertheless, my reassurance is layered in uncertainty. The historical scientific record is short–we get a big storm about once a decade, and good data have only been collected for 30–150 years depending on the indicator. Scientific understanding of solar dynamics is limited. Likewise for the response of grids to storms. My understanding of the state of knowledge is itself limited. On balance, significant “tail risk”—of events extreme enough to cause great suffering—should not be ruled out.

This is why I think the geomagnetic storm risk, even if overestimated by some, deserves more attention from governments than it is receiving. To date, the attention has been minimal relative to the stakes.

The rest of this post delineates how geomagnetic storms come about and why they may particularly threaten one critical component of modern electrical grids, the high-voltage transformer. Later posts will delve into what the available evidence says about the chance of a geomagnetic “perfect storm.”

How turbulence on the sun produces power surges on earth

A distinctive feature of the geomagnetic storm issue is the sequential, probabilistic nature of the phenomenon of concern. The storms originate in cataclysmic explosions on the face of the sun with the power of a billion hydrogen bombs.[1] Each event may throw off some amount of magnetically charged plasma—from tens of megatons to tens of gigatons (Gopalswamy 2006, p. 244). In the abstract, this coronal mass ejection (CME) has some probability of hitting the earth, which depends on the CME’s angular breadth. If the CME hits, it will do so at some speed, perhaps as high as 1% of the speed of light, 3,000 kilometers per second.

The CME’s magnetic field may by chance point substantially in the same direction as the earth’s, producing a magnetic collision (Gopalswamy 2006, p. 248) rather like slamming together two magnets the way they don’t want to go. Sometimes several CMEs will fly out over a few days, the first clearing a path through interstellar matter and speeding the transit of its successors. Each magnetic blast will, over hours or days, bend the earth’s magnetic field. This will accelerate electrical currents that flow at great heights above the planet, such as the “electrojets” that cause the Aurora Borealis and Aurora Australis. The gusts of solar weather will also strew turbulence in the earth’s magnetic field, like a strong wind over water (Kappenman 2005, p. 6), producing even sharper, if more transient and localized, magnetic oscillations across the surface of the earth. Scientists will declare the arrival of a geomagnetic storm.

According to the laws of electromagnetism, when the magnetic field fluctuates in a spot, it induces a voltage there. The faster the magnetic change, the greater the voltage. Before the Industrial Revolution, electrical pressures induced by magnetic storms could only be relieved by the flow of electric charge through air, sea, or land. But now people have laced the planet with less resistive conduits: high-voltage power lines stretching hundreds of miles. Especially when crossing terrain whose (igneous) mineralogy resists electrical current—or when terminating near conductive seawater—and especially when the wires happen to align with the induced electrical force, these cables become geomagnetic lightning rods.

How power surges threaten grids

Like lightning rods, long-distance, high-voltage power lines are grounded: for safety, they are connected to the earth at either end. But at each end of most of these power lines, interposed between them and the earth, are transformers, garage-sized or bigger.

Transformers put the “high-voltage” in “high-voltage power line.” In preparation for long-distance transmission, the transformers take the electricity produced by a windfarm or coal plant and step up its voltage, to as high as 765,000 volts in the US. They feed this transformed electrical energy into the long-distance lines. (Boosting the voltage for long-distance transmission cuts energy losses from the electrical resistance of the power lines.) At the receiving end, similar transformers symmetrically step the voltage back down for distribution to factories, offices, and homes.

Transformers exploit the symmetry of electromagnetism: just as a changing magnetic field induces a voltage, so does the movement of electrical charge (electricity) produce a magnetic field. Inside each transformer, two wires, one connected to the input line and one to the output, coil hundreds of times within or around a shared core of magnetically permeable material such as silicon steel. The normal input is alternating current (AC), like that in an ordinary home, its voltage flipping from positive to negative and back 50 or 60 times a second. The oscillating electricity in the wire produces an oscillating magnetic field in the transformer’s core. That in turn induces an oscillating current in the output wire, typically at a different voltage. The capacity of AC to be transformed in this way for low-loss, long-distance transmission is precisely why at the dawn of the electrical age AC beat out DC—constant, “direct” current—as the standard for power systems.

Under design conditions, a transformer’s core is magnetically capacious enough to carry the whole field produced by the input wire. But if too strong a current enters, the core will saturate. Magnetic force fields will stray out of the core and into the surrounding wires, where they can exact invisible mayhem: random currents in both the input and output wires and “hotspots” of burnt insulation. Possibly, the transformer will fail immediately. Or it may continue operating while the hot spots cool into something analogous to dots of rust: they escape attention at first, but initiate degradation that spreads over weeks or months. Eventually a failure may be triggered, which engineers may not even recognize as storm damage (Albertson et al. 1973, p. 475; Gaunt and Coetzee 2007, p. 444). High-voltage transformers are nodes in the grid. When they fail, links in the power system are sundered.

Geomagnetic storms can send damaging currents into transformers in two ways. The storms can directly induce them, as just described. Or the storms can disrupt currents, voltages, and frequencies in an operating grid enough to overwhelm the equipment meant to counteract such distortions, and thus trigger shutdowns of power plants or disconnections between sections of the grid. These automatic responses are designed to protect the grid, and may largely do so—but perhaps not completely in extreme cases. In Québec during the great storm of March 1989, the sudden disconnection of the La Grande hydroelectric dam complex from the rest of the grid overloaded and damaged two big transformers, part of a larger cascade of events that led to a widespread blackout (NERC 1990, p. 42).

A wildcard that has emerged since 1989 is that a storm might damage GPS and communications satellites, which utilities have increasingly used to coordinate components of the grid. (Giant generators spinning at 50 or 60 times per second, hundreds of miles apart, must be precisely synchronized if serving the same grid.)

Debating the worst-case scenario

In the worst case, argues geomagnetic storm expert John Kappenman, a storm would take out hundreds of high-voltage transformers across a continent-wide area. High-voltage transformers are large, expensive, custom industrial products. There are not a lot of spares around. New ones take months each to manufacture, and limited global production capacity could produce a backlog of years. The effects of a long-term blackout would cascade to all corners of industrial societies—pipelines, sewage treatment, police, air traffic control, hospitals. The scariest potential consequence is the loss of cooling at storage facilities for spent nuclear fuel, as at Fukushima in 2011 (Foundation for Resilient Societies 2011).

Offsetting such risks is the paradoxical resilience built into grids, as seen in Québec. If a geomagnetic storm sufficiently distorts the current entering or exiting a major transformer, safety equipment trips, shutting it down. Large areas may be blacked out within seconds. But, contend Ramsis Girgis and Kirin Vedante of transformer manufacturer ABB, the quiescent grid may be protected from more permanent damage. Short-term fragility bestows long-term resilience. In Québec, power was largely restored (after nine hours), and life went on.

In addition, the power system is arguably more prepared for electrical storm surges today. Satellite-based warning systems are more sophisticated (“GoreSat” was launched on February 11 to strengthen capacity to monitor solar activity). Since 1989 utility officials have wisened to the danger and are perhaps more ready to preemptively shut down grids to protect them. And some systems have been modified to make them more robust.

In the end, I did not achieve an understanding of power engineering well enough to make a call on these contending considerations. I am convinced, however, that how power systems will respond to extreme geomagnetic storms has been too little researched. Few experiments have been conducted under realistic conditions. Much of what is known is locked in the minds and computers of transformer manufacturers and power system operators, who may have incentives not to share everything they know.

My next few posts will focus on a question I am more competent to explore, which is what the historical record tells us about the probabilities of extreme storms in the future.

Footnote

[1] A 2-megaton nuclear detonation would by definition release 2 × 4.184 PJ = 8.368 × 1022 erg. Gopalswamy (2006), p. 244, reports that CMEs can attain kinetic energies as high as 1032 erg, a billion times larger.

Key questions about philanthropy, part 1: What is the role of a funder?

This post was updated on July 6 with language edits but substantially unchanged content.

As a new funder, we’ve found it surprisingly difficult to “learn the ropes” of philanthropy. We’ve found relatively little reading material – public or private – on some of the key questions we’re grappling with in starting a grantmaking organization, such as “What sorts of people should staff a foundation?” and “What makes a good grant?” To be sure, there is some written advice on philanthropy, but it leaves many of these foundational questions unaddressed.

As we’ve worked on the Open Philanthropy Project, we’ve accumulated a list of questions and opinions piecemeal. This blog post is the first in a series that aims to share what we’ve gathered so far. We’ll outline some of the most important questions we’ve grappled with, and we’ll give our working answer for each one, partly to help clarify what the question means, and partly to record our thoughts, which we hope will make it easier to get feedback and track our evolution over time.

We’d love to see others – particularly experienced philanthropists – write more about how they’ve thought through these questions, and other key questions we’ve neglected to raise. We hope that some day new philanthropists will be able to easily get a sense for the range of opinions among experienced funders, so that they can make informed decisions about what kind of philanthropist they want to be, rather than starting largely from scratch.

This post focuses on the question: “what is the role of a funder, relative to other organizations?” In brief:

  • At first glance, it seems like a funder’s main comparative advantage is providing funding, and one might guess that a funder would do well to stick to this role as closely as possible. In other words, a funder might seek to play a ”passive” role, by considering others’ ideas and choosing which ones to fund, without trying to actively influence what partner organizations work on or how they work on it.
  • In practice, this doesn’t seem to be how the vast majority of major funders operate. It’s common for funders to develop their own strategies, provide funding restricted for specific purposes, develop ideas for new organizations and pitch them to potential founders, and more. Below, we lay out a spectrum from “highly passive” funders (focused on supporting others’ ideas) to “highly active” funders (focused on executing their own strategies, with strong oversight of grantees). More
  • In the final section of this post, we lay out our rough take on when we think it’s appropriate for us, as a funder, to do more than write a check. In addition to some roles that may be familiar from for-profit investing – such as providing connections, helping with fundraising and providing basic oversight – we believe it is also worth noting the role of funders play via cause selection, and the role a funder can play in filling gaps in a field by creating organizations. More

The spectrum from passive to active funding
There is a good deal of variation in how “active” different funders seek to be. If I were to articulate two ends of the spectrum, I’d say that:

  • One end is roughly represented by groups like Ashoka and the Skoll Foundation, both of which consider proposals from “social entrepreneurs” in a wide variety of areas and fund the ones they find strongest. Neither is a purely passive funder, but both appear focused on identifying and supporting others’ ideas.
  • The other end might be represented by groups that started as foundations but eventually made transitions, becoming public charities or operating nonprofits and choosing to focus on running their own programs rather than making grants. The Kaiser Family Foundation appears to be an example, as does the Pew Charitable Trusts.

Most major funders seem to be somewhere in between. They provide a mix of unrestricted and restricted funding. They develop their own in-house expertise, create their own strategies, pitch ideas to potential grantees, assemble convenings, and often get involved in grantees’ work at a level beyond simply cutting a check. At the same time, their relationship to most grantees is that of a supporter who checks in periodically rather than a partner who is involved day-to-day.

Our provisional take on the funder’s role
As we’ve written before, we initially envisioned taking a highly passive approach, but we have learned that there is a strong case for being active in certain ways. For us, the key question is what we, as the funder, are positioned to do better than others. We believe it makes sense to be active where we can offer something (besides money) that our grantees don’t have. But we want to avoid micro-managing grantees, who have more knowledge of their issues and their capabilities than we do.

What a given funder has to offer will depend on what sorts of expertise and staff that funder has built up. But to generalize, it currently seems to us that:

  • Funders have much of their impact via cause selection: choosing what problems and issues – for example, criminal justice reform vs. global health vs. biomedical innovation – to prioritize. This is a personal-values-laden decision, and not one that grantees are well positioned to help with, as they tend to have a focus on a particular problem or issue baked into their mission. Accordingly, it is often appropriate for a funder to support a grantee only on condition that they prioritize a given cause/problem/issue/goal. When a funder is interested in an issue that gets relatively little attention, it may be necessary for the funder to proactively build the field by holding convenings, speaking publicly about the issue’s importance, etc. I believe there is a significant difference with for-profit investing here: in the for-profit world, there generally is high alignment on the ultimate goal (making money) between prospective investors and investees, whereas in the nonprofit world, alignment on ultimate goals is the exception rather than the rule.
  • A funder may be well-positioned to identify a gap in a field – a type of organization, collaboration or project that ought to exist and doesn’t. This is by virtue of having a broad view of, and relationships with, all of the organizations working on a particular cause, and having the funds to support the creation of a new one. Funders are not the only actors who can create new organizations; they can also be created by entrepreneurial individuals pitching their own plans. However, we’ve observed that funders often are key players in the creation of new nonprofit organizations (see our previous post on the subject). In order to be well positioned to identify gaps, we’ve tried to learn about what sorts of organizations can exist, particularly by looking at successful, well-funded causes. With this “inventory” in hand, we’re better able to look at a given field and see what’s “missing.” An example of such an “inventory” is our list of the different avenues by which nonprofits can influence policy.
  • There may be times when a grantee has a basic, structural deficiency that a funder is able to spot and help to address. One example is the Sandler Foundation’s support for improving the communications capacity of the Center on Budget and Policy Priorities. Here again, it seems helpful to have a good sense for the basic “inventory” of different ways to accomplish philanthropic goals, as well as a strong network so that one can connect organizations with partners who are strong where they are weak. However, as a funder, we believe we should be careful to distinguish between (a) areas where the grantee has a weakness that we can help to address and (b) cases where the grantee and we simply disagree, and the grantee is well positioned to have an informed view, to which we ought to defer. We suspect that many organizations are eager enough for funding that they may act against their own best interests when pushed to do so by an opinionated funder. We also believe that if we try to help to address a weakness, we should focus on connecting the organization to qualified advisers, rather than on getting directly involved in the organization’s tactics and practices.
  • Relatedly, funders often have networks that can be useful for a grantee to tap into, especially for the purpose of fundraising. A common role of a funder, as in for-profit investing, is to provide fundraising leads and other connections.
  • Nonprofits often plan for their level of funding to stay roughly the same or grow modestly. When offered the chance to grow their budget substantially, some nonprofits have little idea at first of what they would do. But when continually encouraged to think about this question, they may come up with new ideas. We’ve seen examples of this as a funder, though they aren’t currently public. GiveWell also has experienced this dynamic as a grantee. We did not have the idea for the Open Philanthropy Project (previously GiveWell Labs) until Good Ventures expressed a high degree of interest in our work. Having a major funder encourage us to think about how we would allocate large amounts of money caused us to think more deeply about the matter than we would have otherwise. Now, as a funder, when we find strong people or organizations who aren’t actively seeking more funding, we often try to develop relationships with them and continually raise the possibility of providing funding, rather than simply expressing a basic level of interest and dropping out of touch.

Outside of the above situations, we ideally seek to have only high-level involvement in our grantees’ work – more analogous to that of a board member than that of a manager or consultant. We aim to check in periodically, assess high-level progress, ask critical questions, and drill down when we don’t understand a grantee’s answers. We aim ultimately to defer to the grantee on most details. We don’t think this is necessarily how every funder should behave – funders with certain kinds of expertise or staff may seek more involvement, or even to operate their own programs – but it is what we are aiming for.

Incoming Program Officer for Criminal Justice Reform: Chloe Cockburn

We’re excited to announce that Chloe Cockburn has accepted our offer to join the Open Philanthropy Project team as a Program Officer, leading our work on criminal justice reform. She expects to start in August and to work from New York, where she is currently based. She will lead our work on developing our grantmaking strategy for criminal justice reform, selecting grantees, and sharing our reasoning and lessons learned.

Chloe comes to us from the American Civil Liberties Union (ACLU), where she currently serves as the Advocacy and Policy Counsel for the ACLU’s Campaign to End Mass Incarceration, heading up the ACLU’s national office support to state-level ACLU affiliates.

The search to fill this role has been our top priority within U.S. policy over the last few months. We conducted an extensive search for applicants and interviewed many strong candidates.

We feel that hiring Chloe is one of the most important decisions we’ve yet made for the Open Philanthropy Project. In the future, we plan to write more about how we conducted the search and why we ultimately decided to make Chloe an offer.

We’re very excited to have Chloe on board to lead our investment in substantially reducing incarceration while maintaining or improving public safety.

Corrections in our review of Development Media International

Recently, we discovered a few errors in our cost-effectiveness analysis of Development Media International (DMI). After correcting these errors, our best guess of DMI’s cost per life saved has increased from $5,236 per life saved to $7,264 per life saved. Additionally, we discovered some errors in our analysis of DMI’s finances. The corrected cost-effectiveness analysis is here.

These changes do not affect our bottom line about DMI, and we continue to consider it a standout charity.

What were the errors?

Crediting DMI with changes in antimalarial compliance. DMI broadcasts voice-acted stories embedded with health advice over radio into areas with high childhood mortalities. Among other advice, the messages encourage families to seek treatment for malaria when their child has a fever. However, the messages do not specifically address what is called “compliance”: completing the full course of malaria treatment, rather than treating the child only until symptoms stop.

DMI’s midline results found that antimalarial compliance had increased more in intervention areas than in control areas (the difference was not statistically significant). In our original analysis, we gave the option of crediting or not crediting DMI’s intervention with the increased compliance (with the default set to “yes, give credit”). We originally assumed that DMI’s campaign included messages specifically about complying with antimalarial treatment. Recently, we learned that it did not. While it’s possible that the DMI campaign had an effect on compliance without messaging on it, knowing that antimalarial compliance messages were not broadcast leads us to change our best guess. In our updated estimate, we have set the default compliance option to “no, don’t credit DMI for the increased compliance.” The option to credit DMI for the increase is still available in our model. (Note 1)

Not crediting DMI with increases in antimalarial compliance increased the cost per life saved by 38.7% (from $5,236 per life saved to $7,264 per life saved). This change accounts for the entire increase in headline cost per life saved, as the errors below are contained within the antimalarial compliance calculation, and thus only affect the headline cost per life saved if DMI is credited with improving antimalarial compliance.

Other errors in our cost-effectiveness analysis. In addition to mistakenly crediting DMI with the changes in antimalarial compliance, we discovered several other errors in our analysis. These errors did not cause any change in our cost per life saved estimate.

  • Antimalarial compliance calculation: Two formulas in our compliance calculation used incorrect inputs. If we credited DMI for increasing antimalarial compliance, and did not fix other errors, these errors caused a 20.7% deflation in our cost per life saved (from $6,607 per life saved to $5,236 per life saved). (Note 2)
  • Size of malaria mortality burden: We incorrectly used the upper bound of a mortality estimate instead of the point estimate. If we credited DMI for increasing antimalarial compliance, and did not fix other errors, this error caused an 11.2% deflation in our cost per life saved (from $5,899 per life saved to $5,236 per life saved). (Note 3)
  • Cameroon data used in Burkina Faso calculation: We used data from Cameroon in our analysis of Burkina Faso, which we calculated as a comparison to the Cameroon cost per life saved. Holding other errors constant, this error caused a 125.5% inflation in our estimate of cost per life saved in Burkina Faso (from $446 per life saved to $1,006 per life saved). (Note 4)

Categorization of past expenditures. In our review of DMI, we included a categorization of DMI’s spending for 2011 to 2014. This categorization contained some errors, which caused our calculation of DMI’s total 2011-2014 spending to be $212,650 higher than its actual total spending (an inflation of 2.5%). Since we based our estimate of DMI’s costs in Cameroon on its projection of those costs rather than on past spending in Burkina Faso, these errors did not affect our final cost-effectiveness estimate for DMI. (Note 5)

How did we discover these errors?

We discovered these errors in two ways:

First, when revisiting our cost-effectiveness analyses (as part of our broader effort to improve our cost-effectiveness analyses this year), one of our research analysts discovered two of the errors (the antimalarial compliance calculation mistake and the size of malaria mortality burden mistake). As we were correcting the analysis, we discovered the Cameroon data in the Burkina Faso analysis, and realized that we weren’t certain if the DMI campaign messaged on antimalarial compliance. DMI clarified that its campaign did not message on antimalarial compliance.

Second, as part of our standard process, an analyst (who did not conduct the original work) carefully reviews a page before we publish it. We call this process a vet. While vetting our review of DMI, one of our research analysts discovered the expenditure categorization errors. This vet occurred after the page had been published. Our standard process is to vet pages before they are published, but in this case we published the page without a vet in order to meet our December 1st deadline for publishing our new recommendations last year.

We have added these errors to our mistakes page.

How do these corrections affect GiveWell’s view of DMI?

As noted above, these changes do not affect our bottom line about DMI, and we continue to consider it a standout charity.

In particular, the change as a result of our error is small relative to our uncertainty about other inputs into our model. Specifically:

  • Our estimate of $7,264 per life saved relies solely on data from Cameroon because we guessed that Cameroon was the country where DMI was most likely to spend additional funds. We remain uncertain about where DMI will spend additional funds, and a more robust estimate of its cost-effectiveness would also incorporate estimates from other countries.
  • Our estimate credits DMI with affecting behavior for pneumonia and diarrhea but not malaria because DMI’s midline results only measured a 0.1% increase in treatment seeking for malaria in the intervention group compared to the control group. It is arguably unlikely that DMI would cause behavior change for pneumonia and diarrhea treatment-seeking, but not malaria treatment-seeking, given that the promoted behaviors are relatively similar.
  • As we wrote last December, we are uncertain about whether we should put more credence in our estimate of DMI’s cost-effectiveness based on available data about behavior change, or its own projection. Our cost-effectiveness analysis predicts a 3.2% decline in child mortality, but DMI’s, estimated by the people carrying out a study and paying the considerable expenses associated with it, predicts 10-20%. More in our December 2014 post.

We have not incorporated the above considerations into our cost-effectiveness analysis, but we would guess that incorporating the above could cause changes in our estimate of DMI’s cost-effectiveness significantly larger than the 38% change due to the error discussed in this post.

Footnotes

Note 1: See Cell D76.

Note 2: We are not sure how often ceasing antimalarial treatment prematurely is as bad (for the survival of the child) as not giving antimalarials at all; without an authoritative source we guessed that this is true 25% of the time.

One formula in our spreadsheet left this 25% figure out of the calculation, effectively assuming that 100% of non-compliance cases were as bad as not giving any antimalarials at all. Because the estimate now defaults to not crediting for compliance (see previous error), this error does not affect our updated headline figure for cost per life saved.

In our original cost-effectiveness estimate, Cell D88 (effective coverage before the campaign) erroneously incorporated Cell D75 (raw compliance before the campaign) as an input. In the updated cost-effectiveness estimate, Cell D88 incorporates Cell D79 (effective compliance accounting for the benefit from non-compliance).

In the original cost-effectiveness estimate, Cell D92 (effective coverage after the campaign) erroneously incorporated Cell D77 (raw compliance after the campaign) as an input. In the updated cost-effectiveness estimate, Cell D92 incorporates Cell D80 (effective compliance accounting for the benefit from non-compliance).

Our estimate of lives saved by pneumonia treatment did not contain an equivalent error, and we did not include an equivalent compliance factor for diarrhea since treatment is only needed for as long as symptoms persist. Our model still defaults to crediting DMI with an increase in pneumonia compliance, because DMI’s campaign messaged specifically on completing courses of pneumonia treatment.

Note 3: We use the Institute for Health Metrics and Evaluation’s data visualization tool to estimate the number of deaths from specific causes in target countries. For malaria deaths, ages 1-4, in Cameroon, we incorrectly used the upper bound of the estimate (18,724.2 deaths), rather than the point estimate (9,213.71 deaths). The RCT midline results did not report an increase in malaria treatment coverage, though antimalarial compliance did increase. Because the estimate now defaults to not crediting for compliance (see above), this error does not affect our updated headline figure for cost per life saved.

In the original cost-effectiveness estimate, Cell D106 erroneously included the upper bound of age 1-4 deaths from malaria (see Cell E106 for search parameters and calculation). In the updated cost-effectiveness estimate, Cell D106 includes the point estimate for age 1-4 deaths from malaria (see Cell E106 for search parameters and calculation).

Note 4: This comparison did not affect our headline cost per life saved, because we think a campaign in a country similar to Cameroon is a more likely use of marginal unrestricted funding directed to DMI. The Burkina Faso analysis was structurally the same as the Cameroon analysis, and included the compliance calculation error described above. In addition, the Burkina Faso analysis incorrectly used information about Cameroon, rather than Burkina Faso (specifically the number of under-5 deaths from malaria, pneumonia, and diarrhea; and the campaign cost estimate).

See columns G to I in the cost-effectiveness spreadsheet for the model of the Burkina Faso campaign. See cells G105, G106, and G107 for the data on deaths from pneumonia, malaria, and diarrhea. See cell G117 for the Burkina Faso campaign cost. In the original cost-effectiveness estimate, all of these cells duplicated the data for Cameroon (see D105, D106, D107, and D117). In the updated cost-effectiveness analysis, these cells have been updated with data pertaining to Burkina Faso.

Note 5: Our categorization process involved assigning a category code to each line item of DMI’s budget, then aggregating the subtotals for each category. Two types of errors occurred during this process:

  • A line item was coded to an incorrect category that wasn’t aggregated, causing the item to not be counted in the subtotals.
  • Some formulas for aggregating category subtotals drew inputs from incorrect ranges, causing some items to be double-counted.

DMI has requested that its budget be kept private. Because our categorization process involved coding the line items of DMI’s budget, we are unable to share our categorization files and the specific details about these errors.

Update on GiveWell’s web traffic / money moved: Q1 2015

In addition to evaluations of other charities, GiveWell publishes substantial evaluation of itself, from the quality of its research to its impact on donations. We publish quarterly updates regarding two key metrics: (a) donations to top charities and (b) web traffic.

The tables and chart below present basic information about our growth in money moved and web traffic in the first quarter of 2015 compared to the last two years (note 1).

Money moved and donors: first quarter

Table_2015Q1MoneyMoved.png

Money moved by donors who have never given more than $5,000 in a year increased 78% to about $760,000. The total number of donors in the first quarter increased to about 3,400. This was up 70% compared to last year, roughly consistent with the last year’s growth.

Most of our money moved is donated near the end of the year (we tracked about 70% of the total in the fourth quarter each of the last two years) and is driven by a relatively small number of large donors. Because of this, our year-to-date total money moved provides relatively limited information, and we don’t think we can reliably predict our year-end money moved (note 2). Mid-year we primarily use data on donations from smaller donors, rather than total money moved, to give a rough indication of how our influence on donations is growing.

Web traffic through April 2015

Table_2015Q1WebTraffic.png

Growth in web traffic excluding Google AdWords increased moderately in the first quarter. Last year, we saw a drop in total web traffic because we removed ads on searches that we determined were not driving high quality traffic to our site (i.e. searches with very high bounce rates and very low pages per visit).

GiveWell’s website receives elevated web traffic during “giving season” around December of each year. To adjust for this and emphasize the trend, the chart below shows the rolling sum of unique visitors over the previous twelve months, starting in December 2009 (the first period for which we have 12 months of reliable data due to an issue tracking visits in 2008).

Chart_2015Q1WebTraffic.png

We use web analytics data from two sources: Clicky and Google Analytics (except for those months for which we only have reliable data from one source). The data on visitors to our website differs between the two sources. We do not know the cause of discrepancy (though a volunteer with a relevant technical background looked at the data for us to try to find the cause; he didn’t find any obvious problems with the data). (Note on how we count unique visitors.)

The raw data we used to generate the chart and table above (as well as notes on the issues we’ve had and adjustments we’ve made) is in this spreadsheet.



Note 1: Since our 2012 annual metrics report we have shifted to a reporting year that starts on February 1, rather than January 1, in order to better capture year-on-year growth in the peak giving months of December and January. Therefore, metrics for the “first quarter” reported here are for February through April.

Note 2: In total, GiveWell donors directed $1.76 million to our top charities in the first quarter of this year, compared with $1.45 million that we had tracked in the first quarter of 2014. For the reason described above, we don’t find this number to be particularly meaningful at this time of year.

Note 3: We count unique visitors over a period as the sum of monthly unique visitors. In other words, if the same person visits the site multiple times in a calendar month, they are counted once. If they visit in multiple months, they are counted once per month. Google Analytics provides ‘unique visitors by traffic source’ while Clicky provides only ‘visitors by traffic source.’ For that reason, we primarily use Google Analytics data in the calculations to exclude AdWords visitors.