For a little over a year, GiveWell has assigned zero- to three-star ratings to all charities we’ve examined. We’ve done so in response to constant requests from our fans and followers. We’ve been told that people want easily digested, unambiguous “bottom line” information that can help them make a decision in a hurry and with a clean conscience. We understand this argument. But right now, we feel that the costs of ratings outweigh the benefits, and we’re likely on the brink of getting rid of our ratings.
To be clear, we are not going to stop comparing, evaluating, and recommending charities. As we did for our first couple of years of existence, we will rank and promote a number of recommended charities, while sharing the reasons why we do not recommend other charities. What we are going to stop doing is boiling down our view on a each charity examined into a single quantifiable data point. We’re going to go back to “bottom lines” that are qualified and sometimes difficult to interpret without reading further (for example, instead of “zero stars,” our bottom line will say something more like “Did not pass heuristics to qualify for further investigation”). We know we’ll be sacrificing the kind of simplicity that appeals to many, and we still think it’s worth it.
In trying to provide star ratings, we’ve run into fundamental questions that we don’t have good answers to:
- Should we rate charities in an “absolute” sense (based on our confidence that they have positive impact) or in a “relative” sense (based on how they compare to other charities working on similar issues)?
- How should we deal with charities that we feel do excellent work, but have limited or no room for more funding? Should we rate them above or below charities that do less excellent work but have more definite needs? Should our ratings reflect our opinion of organizations’ work or our opinion of whether undecided donors should give to them?
- The vast majority of charities share no substantive information on their effectiveness, making it impossible to evaluate their effectiveness. Should such charities receive “no rating” (in which case we would rate very few charities, and may provide incentives for charities with low effectiveness to remain opaque) or our lowest rating (which creates considerable offense and confusion among those who feel we have judged their work ineffective)?
Each of these issues involve an ambiguity in what precisely star ratings mean, and we need ways to resolve the ambiguity in a very clear, easily digested, instantly understood way or we lose the benefit we were hoping to gain from handing out ratings in the first place. At this point we cannot construct a system that accomplishes this.
We believe that these issues are unavoidable when assessing charities based on their impact. We believe that nobody else has yet run into these problems because nobody else has yet tried to rate charities based on the case for their impact, i.e., their effects on the people and communities they serve.
Problem 1: are ratings “absolute” or “relative to a cause?”
How does Doctors Without Borders rate? The answer depends partly on whether you’re looking at it as a global health organization or as a disaster relief organization. Compared to other global health organizations, its transparency and documented effectiveness do not seem top-notch (though they are better than average). Compared to other disaster relief organizations (based on our preliminary and subject-to-change impressions), it stands out.
An organization may be top-notch compared to other water organizations, while mediocre in terms of proven health impact. Our view of a charter school organization depends on whether we’re comparing it to other education groups or to U.S. equality of opportunity organizations of all kinds. The more one tries to accommodate wishes like fighting a specific disease or attacking a problem in a specific way – i.e., the more one explores and subdivides different causes – the more these difficult questions come up.
We have been rating each organization “relative to” the cause in which it seems to fit most intuitively. However, this is confusing for donors who don’t have strong cause-based preferences and take a broad view of charity as “helping people in general.” (Usually these are the donors who are a particularly good fit for what we provide.) Alternately, we could rate each organization using an “absolute” scale (taking the cause into account), but if we did this we’d rank even relatively mediocre international aid charities above the outstanding Nurse-Family Partnership, and that would create considerable confusion among people who didn’t agree with our (highly debatable) view on international vs. domestic aid.
In the end we don’t feel comfortable rating Nurse-Family Partnership higher than Small Enterprise Foundation … or lower … or the same. They’re too different; your decision on which to give to is going to come down to judgment calls and personal values.
It is possible for ratings systems to deal effectively with “apples and oranges” comparisons. Consumer websites (e.g., Amazon) provide ratings for products in many different categories; consumers generally seem to understand that the ratings capture something like “how the product performs relative to expectations,” and expect to supplement the ratings with their own thoughts about what sort of product and what features they want. However, in this domain I feel that consumers generally have a good feel for what different product categories and features consist of (for example, I know what to expect from a laser vs. inkjet printer, and don’t assume that this issue is captured in the rating). In the charity world, there is often just as little to go on regarding “what can be expected from an education charity?” as there is regarding “which education charity is best of the bunch?” So there is ambiguity regarding the extent to which a rating includes our view of the charity’s general cause.
While this problem isn’t a fatal one for charity ratings, it brings some complexity and confusion that is compounded by the issues below.
Problem 2: do ratings incorporate whether a group has room for more funding?
The question is how to rate an organization such as Aravind Eye Care System, AMK or (arguably) Nurse-Family Partnership – an organization that we largely think is doing excellent work, but has limited room for more funding. On one hand, we need donors to know that their money may be more needed/productive elsewhere; giving a top-notch organization a top-notch rating does not communicate this. On the other hand, if we were to lower Nurse-Family Partnership’s rating, that would imply to many that we do not have as high an opinion of their work, and may even result in reduced support from existing donors, something we definitely don’t want to see happening.
Then there are organizations which we do not investigate, even though they are promising and pass our initial heuristics, because it comes out early in the process that they have no room for more funding. We therefore have no view of these organizations’ work, one way or the other; we simply know that they are not a good fit for the donors using our recommendations.
The ambiguity here is regarding whether ratings represent our view of an organization’s work or our view of it as an opportunity for non-preexisting donors.
Problem 3: how should we rate the vast majority of charities that share no substantive information?
If a charity doesn’t collect, and share, substantive information on its effectiveness, there is no way of gauging its effectiveness. From what we’ve seen, the vast majority of charities do not both collect and share substantive information on their effectiveness. This gives us two unattractive options:
1. Give ratings only to charities that share enough information to make it possible to gauge their impact. If we did this, we would have a tiny set of rated charities, with all the rest (including some of the largest and least transparent charities such as UNICEF) marked as “Not rated.” Our lowest-rated charities would in fact be among the most transparent and accountable charities; we would effectively be punishing charities for sharing more information; people who wanted to know our view of UNICEF would wrongly conclude that we had none.
2. Give our lowest rating to any charity that shares no substantive information. This is the approach we have taken. This results in the vast majority of our ratings being “zero stars,” something that makes many donors and charities uncomfortable and leads to widespread confusion. Many people think that a “zero star” rating indicates that we have determined that a group is doing bad work, when in fact we simply don’t have the information to determine its effectiveness one way or the other. We have tried to reduce confusion by modeling our ratings on the zero-to-three-star Michelin ratings (where the default is zero stars and even a single star is a positive mark of distinction) rather than on the more common one-to-five-star system (where one star is a mark of shame), but confusion persists.
All of the above issues involve ambiguity in how our ratings should be read. Any of them might be resolvable, on its own, with some creative presentation. However, we have not been able to come up with any system for addressing all of these issues that remains simple and easily digestible, as ratings should be. So we prefer, at least for the time being, to sacrifice simplicity and digestibility in favor of clear communication. Rather than giving out star ratings, we will provide more complex and ambiguous bottom lines that link to our full reviews.
We understand that this is a debatable decision. We wish to identify outstanding charities and drive money to them; we wish to have a reputation for reliability and integrity among thoughtful donors; the goal of giving large numbers of people a bottom-line rating on the charity of their choice is less important to us. We know that other organizations may make the tradeoff differently and don’t feel it is wrong to do so.
Note that we haven’t finalized this decision. We welcome feedback on how to resolve the tensions above with a simple, easily understood ratings system.