From sports announcers to political pundits to friends gossipping about romantic interests, lots of people make probabilistic predictions about the future. But only some actually follow up to see how well their predictions performed. For instance, you may have heard that weather forecasters predicted that the 2024 hurricane season had an 85% chance of being more active than normal and there would be 17 to 25 named storms. But in September, they were surprised that so far, it had been unexpectedly quiet, with climate change likely affecting weather patterns in ways scientists don’t fully understand.
GiveWell researchers often make forecasts about activities, milestones, and outcomes of the programs and research studies we recommend funding for, as well as decisions that GiveWell will make in the future. For example, we might forecast whether we’ll fund more hospital programs to implement Kangaroo Mother Care programs by 2027, or whether data collected about how many people are using chlorinated water will align with our expectations.1We publish grant-related forecasts on each of our grant pages, which are linked in this grant database.
As a way to solicit external feedback on some of our predictions, we just launched a page on Metaculus, an online forecasting platform. Periodically, we will post forecasts there about GiveWell’s research and grants for the public to make their own predictions. Metaculus and other contributors will award $2,250 in prizes to people who leave insightful comments on a select group of forecasting questions. The deadline for comments is December 1, 2024.
There’s a wide literature on optimal forecasting,2Forecasting has been brought to popular attention in books such as Superforecasting: The Art and Science of Prediction by By Philip E. Tetlock and Dan Gardner, and The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t by Nate Silver. how to score predictions, and ways to learn from your predictions—both the good ones and the ones that completely missed the mark. Usually predictions are completed, or “resolved,” as “yes” (the thing happened) or “no” (the thing didn’t happen). If you guessed that there was a 5% chance of the U.S. earning the most Olympic medals this year,3At the 2024 Paris Olympics, the U.S. earned the most medals (126), followed by China (91) and Great Britain (65). but it happened, you might think you were wrong, or you’re bad at guessing sports outcomes. In reality, you weren’t exactly wrong—even something with a 5% probability happens five out of every 100 times. It’s often useful to think about the aggregate: looking at a large group of predictions you made, how close were your guesses to the truth of what actually happened? And what can you learn from that?
At GiveWell, while we’ve historically followed up on individual predictions to compare our initial guesses to the actual outcomes, we hadn’t taken a broader look at how all our forecasts, as a whole, have performed, or what lessons the larger set of forecasts could teach us about our decisions.
So, last year, we gathered all our forecasts into a database to track them. We grouped together the 198 grant-related forecasts that have returned results already (out of 660 that we’d made between 2021 and 2023) to see if we could learn anything from the broad set. What we learned was that, while our “score” was reasonable (more on that later), we still had a lot of work to do to really make the exercise of forecasting useful and worthwhile for our research.
How are we doing at making predictions?
The chart above shows all of our 198 resolved forecasts, grouped in 10% “buckets,” then averaged by the probability we predicted they would happen and the percent that actually happened.4Here is a subset of forecasts included in this chart. More of them can be seen on our public grants airtable. For example, in the 10-20% range, the average prediction was 17% likelihood, and in reality, 25% of those things actually happened. So, for things we thought were 10-20% likely, they actually happened more often than we guessed they would.
The diagonal line that cuts across the graph shows what “perfect” prediction would look like (for example, for all the things we predicted were 80% likely, they actually happened 80% of the time). You can see that generally, for things we thought were unlikely to happen (less than 50% predicted probability), we were slightly too pessimistic, while for things we thought were likely to happen (more than 50% predicted probability), we were slightly too optimistic.
What else have we learned?
For us, the biggest lesson came not from “how good” we were at predicting things about our grants, but from what we could do to actually make forecasting a useful exercise for our researchers.
We noticed that sometimes, we made predictions about future analyses we would do (such as updating a cost-effectiveness estimate by a specific date), but by the time these forecasts were due, we hadn’t completed the analysis yet. This was typically because timelines for grant activities extended longer than we’d predicted, or we didn’t think carefully about the optimal timeline for updating our models.
For example, when we made a grant to Evidence Action for its Dispensers for Safe Water program in January 2022, we predicted: By April 2023, we think there is a 60% chance that our best guess cost-effectiveness across all countries funded under this grant (including Kenya) will be equal to or greater than 6x to 8x cash. However, we didn’t update our cost-effectiveness analysis by April 2023, because we didn’t have enough new data yet. Once we do complete the update, we can still check to see whether our prediction was true—the timeline for this work will just be longer than we originally anticipated.
One other big takeaway from looking at our forecasts in this way was that we realized that we—and the organizations we work with—are often optimistic about timelines. We’ve identified this as an area to make improvements and keep monitoring.
What’s next?
We’re planning to do another analysis of all our forecasts at the end of this year and continue this annually from now on. We’re also setting up training for researchers, so that we can get better at forecasting—not just our “score” but also our strategies around how to conceptualize, write, and make predictions.
We also launched a fun forecasting tournament among staff, where we’re predicting real-world events such as the results of 2024 elections, end-of-year status of celebrity couples, Olympic wins, and GiveWell metrics.
Along with Metaculus, we’re also considering working with other prediction platforms to get more opinions about our work.
Finally, along with resolving our forecasts each year, we’re taking on a more comprehensive project of looking back on our past grants to see how well they performed compared to our expectations. We plan to publish more on this soon.
Notes
↑1 | We publish grant-related forecasts on each of our grant pages, which are linked in this grant database. |
---|---|
↑2 | Forecasting has been brought to popular attention in books such as Superforecasting: The Art and Science of Prediction by By Philip E. Tetlock and Dan Gardner, and The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t by Nate Silver. |
↑3 | At the 2024 Paris Olympics, the U.S. earned the most medals (126), followed by China (91) and Great Britain (65). |
↑4 | Here is a subset of forecasts included in this chart. More of them can be seen on our public grants airtable. |