Why Charity Ratings Don’t Work (as of now)

For a little over a year, GiveWell has assigned zero- to three-star ratings to all charities we’ve examined. We’ve done so in response to constant requests from our fans and followers. We’ve been told that people want easily digested, unambiguous “bottom line” information that can help them make a decision in a hurry and with a clean conscience. We understand this argument. But right now, we feel that the costs of ratings outweigh the benefits, and we’re likely on the brink of getting rid of our ratings.

To be clear, we are not going to stop comparing, evaluating, and recommending charities. As we did for our first couple of years of existence, we will rank and promote a number of recommended charities, while sharing the reasons why we do not recommend other charities. What we are going to stop doing is boiling down our view on a each charity examined into a single quantifiable data point. We’re going to go back to “bottom lines” that are qualified and sometimes difficult to interpret without reading further (for example, instead of “zero stars,” our bottom line will say something more like “Did not pass heuristics to qualify for further investigation”). We know we’ll be sacrificing the kind of simplicity that appeals to many, and we still think it’s worth it.

In trying to provide star ratings, we’ve run into fundamental questions that we don’t have good answers to:

  • Should we rate charities in an “absolute” sense (based on our confidence that they have positive impact) or in a “relative” sense (based on how they compare to other charities working on similar issues)?
  • How should we deal with charities that we feel do excellent work, but have limited or no room for more funding? Should we rate them above or below charities that do less excellent work but have more definite needs? Should our ratings reflect our opinion of organizations’ work or our opinion of whether undecided donors should give to them?
  • The vast majority of charities share no substantive information on their effectiveness, making it impossible to evaluate their effectiveness. Should such charities receive “no rating” (in which case we would rate very few charities, and may provide incentives for charities with low effectiveness to remain opaque) or our lowest rating (which creates considerable offense and confusion among those who feel we have judged their work ineffective)?

Each of these issues involve an ambiguity in what precisely star ratings mean, and we need ways to resolve the ambiguity in a very clear, easily digested, instantly understood way or we lose the benefit we were hoping to gain from handing out ratings in the first place. At this point we cannot construct a system that accomplishes this.

We believe that these issues are unavoidable when assessing charities based on their impact. We believe that nobody else has yet run into these problems because nobody else has yet tried to rate charities based on the case for their impact, i.e., their effects on the people and communities they serve.

Problem 1: are ratings “absolute” or “relative to a cause?”

How does Doctors Without Borders rate? The answer depends partly on whether you’re looking at it as a global health organization or as a disaster relief organization. Compared to other global health organizations, its transparency and documented effectiveness do not seem top-notch (though they are better than average). Compared to other disaster relief organizations (based on our preliminary and subject-to-change impressions), it stands out.

An organization may be top-notch compared to other water organizations, while mediocre in terms of proven health impact. Our view of a charter school organization depends on whether we’re comparing it to other education groups or to U.S. equality of opportunity organizations of all kinds. The more one tries to accommodate wishes like fighting a specific disease or attacking a problem in a specific way – i.e., the more one explores and subdivides different causes – the more these difficult questions come up.

We have been rating each organization “relative to” the cause in which it seems to fit most intuitively. However, this is confusing for donors who don’t have strong cause-based preferences and take a broad view of charity as “helping people in general.” (Usually these are the donors who are a particularly good fit for what we provide.) Alternately, we could rate each organization using an “absolute” scale (taking the cause into account), but if we did this we’d rank even relatively mediocre international aid charities above the outstanding Nurse-Family Partnership, and that would create considerable confusion among people who didn’t agree with our (highly debatable) view on international vs. domestic aid.

In the end we don’t feel comfortable rating Nurse-Family Partnership higher than Small Enterprise Foundation … or lower … or the same. They’re too different; your decision on which to give to is going to come down to judgment calls and personal values.

It is possible for ratings systems to deal effectively with “apples and oranges” comparisons. Consumer websites (e.g., Amazon) provide ratings for products in many different categories; consumers generally seem to understand that the ratings capture something like “how the product performs relative to expectations,” and expect to supplement the ratings with their own thoughts about what sort of product and what features they want. However, in this domain I feel that consumers generally have a good feel for what different product categories and features consist of (for example, I know what to expect from a laser vs. inkjet printer, and don’t assume that this issue is captured in the rating). In the charity world, there is often just as little to go on regarding “what can be expected from an education charity?” as there is regarding “which education charity is best of the bunch?” So there is ambiguity regarding the extent to which a rating includes our view of the charity’s general cause.

While this problem isn’t a fatal one for charity ratings, it brings some complexity and confusion that is compounded by the issues below.

Problem 2: do ratings incorporate whether a group has room for more funding?

We’ve argued before that the issue of room for more funding is drastically underappreciated and under-discussed, and it creates major challenges for a ratings system.

The question is how to rate an organization such as Aravind Eye Care System, AMK or (arguably) Nurse-Family Partnership – an organization that we largely think is doing excellent work, but has limited room for more funding. On one hand, we need donors to know that their money may be more needed/productive elsewhere; giving a top-notch organization a top-notch rating does not communicate this. On the other hand, if we were to lower Nurse-Family Partnership’s rating, that would imply to many that we do not have as high an opinion of their work, and may even result in reduced support from existing donors, something we definitely don’t want to see happening.

Then there are organizations which we do not investigate, even though they are promising and pass our initial heuristics, because it comes out early in the process that they have no room for more funding. We therefore have no view of these organizations’ work, one way or the other; we simply know that they are not a good fit for the donors using our recommendations.

The ambiguity here is regarding whether ratings represent our view of an organization’s work or our view of it as an opportunity for non-preexisting donors.

Problem 3: how should we rate the vast majority of charities that share no substantive information?

If a charity doesn’t collect, and share, substantive information on its effectiveness, there is no way of gauging its effectiveness. From what we’ve seen, the vast majority of charities do not both collect and share substantive information on their effectiveness. This gives us two unattractive options:

1. Give ratings only to charities that share enough information to make it possible to gauge their impact. If we did this, we would have a tiny set of rated charities, with all the rest (including some of the largest and least transparent charities such as UNICEF) marked as “Not rated.” Our lowest-rated charities would in fact be among the most transparent and accountable charities; we would effectively be punishing charities for sharing more information; people who wanted to know our view of UNICEF would wrongly conclude that we had none.

2. Give our lowest rating to any charity that shares no substantive information. This is the approach we have taken. This results in the vast majority of our ratings being “zero stars,” something that makes many donors and charities uncomfortable and leads to widespread confusion. Many people think that a “zero star” rating indicates that we have determined that a group is doing bad work, when in fact we simply don’t have the information to determine its effectiveness one way or the other. We have tried to reduce confusion by modeling our ratings on the zero-to-three-star Michelin ratings (where the default is zero stars and even a single star is a positive mark of distinction) rather than on the more common one-to-five-star system (where one star is a mark of shame), but confusion persists.

Bottom line

All of the above issues involve ambiguity in how our ratings should be read. Any of them might be resolvable, on its own, with some creative presentation. However, we have not been able to come up with any system for addressing all of these issues that remains simple and easily digestible, as ratings should be. So we prefer, at least for the time being, to sacrifice simplicity and digestibility in favor of clear communication. Rather than giving out star ratings, we will provide more complex and ambiguous bottom lines that link to our full reviews.

We understand that this is a debatable decision. We wish to identify outstanding charities and drive money to them; we wish to have a reputation for reliability and integrity among thoughtful donors; the goal of giving large numbers of people a bottom-line rating on the charity of their choice is less important to us. We know that other organizations may make the tradeoff differently and don’t feel it is wrong to do so.

Note that we haven’t finalized this decision. We welcome feedback on how to resolve the tensions above with a simple, easily understood ratings system.

Comments

Why Charity Ratings Don’t Work (as of now) — 16 Comments

  1. What about something on the lines of the investment question: buy, sell, hold. For philanthropy, it could be: give, don’t give, wait.

    The difference between don’t give and wait would be where you can draw distinctions that arise from Problems 2 and 3. “Wait” may mean, “This charity is good but has no room for funding.” “This charity doesn’t give us enough info” can be classified as either “don’t give” or “wait” according to the distinctions you parse out above.

    This doesn’t solve the problems that a simplistic model inherently has. But at the same time, choosing a charity boils down to a yes/no question. No matter how complex the reasoning, donors still have to decide to give or not.

  2. I think you’re right that the current system is too ambiguous, which means that there’s going to have to be a tradeoff of simplicity for specificity.

    I’d suggest using color (with 0 to 3 as a numeric backup). The idea is that a charity gets one or more lines, one per mission or way of looking at it, and three dots after it, representing transparency, need for funds and effectiveness. The reason for the order is that a lack of transparency would preclude a meaningful rating on the other two, while no need for funds would indicate that further evaluation wasn’t done.

  3. Holden – Thanks for raising this important issue.

    Like Brigid and John I think that it would be useful to have some sort of brief indication of what GiveWell’s attitude toward a given charity is. I like Brigid’s idea of labeling charities with words, but am not sure what the most suitable words should be.

    One relevant point in my mind is that donors who donate to charities other than GiveWell’s top recommendation in a given cause seem likely to be influenced by GiveWell’s short-hand star rating. In light of this, I think that it’s fine to use a relatively coarse short hand.

    How rating the charities in a given cause by using the words “Recommended”, “Noteworthy” and “No Recommendation”? A “Recommended” or “Noteworthy” rating could be accompanied by a “but no room for more funding” qualification if appropriate.

    “Recommended” could be used to mean “for donors interested in this cause, this charity is as good as any that we know of.” “Noteworthy” could be used to mean “this charity is not among the most promising that we see in this cause but stands out based on our criteria.”

    I guess I would suggest collapsing the 1-star and 2-star rated charities into the hypothetical “Noteworthy” category – I don’t see any reason to distinguish between the charities with 1-star ratings and the charities with 2-star ratings on the level of the shorthand ratings.

  4. On second thought, regarding my comment

    “Recommended” could be used to mean “for donors interested in this cause, this charity is as good as any that we know of.” “Noteworthy” could be used to mean “this charity is not among the most promising that we see in this cause but stands out based on our criteria.”

    this only makes sense for causes in which there are charities which are strong on all of GiveWell’s criteria. For example, GiveWell hasn’t found any charities that are strong on all of GiveWell’s criteria in the cause of developing world education so based on what GiveWell says about Pratham I would label it as “Noteworthy” rather than “Recommended.”

  5. It’s funny, I ran into those exact same issues as a teacher. I hated giving out grades because they often didn’t accurately portray what was actually happening with the student.

    It’s good that you’re contemplating these issues. I struggle with the same issue of the nuances of the issues vs. having to give a quick and simple answer that is far less accurate.

    Good luck in trying to decide how to best handle the issue.

  6. I just stumbled on the following items of interest and wanted to make sure you are aware of them.

    How Donors Choose Charities:
    http://www.kent.ac.uk/sspssr/cphsj/documents/How%20Donors%20Choose%20Charities%2018%20June%202010.pdf

    Benefits and Beneficiaries: four inter-related investigations:
    http://www.kent.ac.uk/sspssr/cphsj/research/charity/benefits-beneficiaries.html

    Beth Breeze’s blog at Kent University:
    https://blogs.kent.ac.uk/philanthropy/

  7. Thanks! for your conclusions and clarification. I will feel less conflicted now when I donate to organizations focusing on my preferred causes – education, of which you include just one US-based one in your top charities list. I would still, like the other respondents, like to see some type of “opinion labels” from you based on your research as I will continue to follow your findings and rely on your collective skills to analyze and disseminate this important information.

  8. You’ve obviously been trying to resolve the rating problem: You’ve come up against many of the problems. The work of charities is complex and difficult to measure. Boiling down a rating down to stars isn’t possible and it is not something to which we should aspire.

    Charities are not toasters: You can’t give them stars based on a simple outcome. But Amazon goes beyond the star-system. Raters have a comment section, in which they can detail why they rated that toaster as 5-star or a zero. It’s in those details that the savvy shopper finds the information needed. “This toaster won’t take bagels” is of no importance to those who don’t eat bagels so that low rating can be kicked out. But if the comment says “Didn’t work,” that matters to everyone.

    This is true of all aspects of ranking, from service impact to overhead.

    Any rating system for charities needs to be nuanced, to allow explanations that require donors to do at least thinking about what matters to them. After all, they do that much when they buy a toaster.

  9. Thanks for the comments.

    Brigid, we have talked about something similar to your idea. I think it would solve the Problem 2. I don’t think it would solve Problem 3. If charity A shares no information and charity B shares information that demonstrates a lack of effectiveness to date, we feel the only reasonable move is to rate charity B higher than charity A. If we give out anything but our worst rating to opaque charities, then every bad charity can improve (or safeguard) its rating by making sure to disclose nothing.

    John Roth, I think that giving a charity multiple ratings would address Problem 1 but at too high a cost in terms of confusion. I think that your proposal for handling Problem 2 would have the problem that Nurse-Family Partnership would appear to be rated below weaker but bigger-funding-gap charities, and we want to express that its work is superior. Finally, I think Problem 3 would remain with your system since the vast majority of charities would receive none of the dots.

    Jonah, we are probably going to end up with something along the lines of what you’re saying – giving “badges” to remarkable charities but not having any kind of quantitative overall scale.

    Overall, I think we could address Problem 1 or Problem 2 if that were the only problem we had. But addressing all 3 at once while keeping the system simple is something we haven’t come up with a solution to.

  10. You definitely need some kind of simple approval system for people to review. When I’m using your site, I find I will use that as my first scan of which charities to review further and which to ignore.

    My first scan will leave in the net any two or three star organisations, but leave out one stars. So if a ‘recommended’ is going to equate to a three star, how am I to differentiate between a ‘maybe’ (which is how I read a two star) and a ‘no (one star)?

    I’d suggest that if you have concerns about the rating system, put up a page stating what they are whilst still leaving the system in place for those who see value in it.

  11. Jacinta,

    Reading over my first comment I noticed that I had made a nontrivial typo, the comment should have read

    One relevant point in my mind is that donors who donate to charities other than GiveWell’s top recommendation in a given cause seem likely to be influenced by factors other than GiveWell’s short-hand star rating. In light of this, I think that it’s fine to use a relatively coarse short hand.

    My thinking was that most of GiveWell’s users are either looking for a quick recommendation of the best charities in a given cause or that they’re inclined to deemphasize the shorthand ratings in favor of some combination of a closer look at GiveWell’s research process and/or outside information, and that for neither group does shorthand distinction between 2-star charities and 1-star charities play a crucial role. It sounds like you may be a counterexample – can you say more about how you use GiveWell’s research? Would your reservations about my proposal be addressed by the use of labels of the type that I describe in combination with a ranked list?

    There’s an inherent tradeoff between simplicity of a rating system and information conveyed. My own intuition is that at the level of shorthand ratings it’s best to err on the side of simplicity, but as the article (Generalizing From One Example) discusses, we have a tendency to unwarrentedly assume that our own preferences and tastes are representative ere’s great variability in individual preferences and tastes and I’m interested in hearing more about other people’s.

  12. This might be more confusing then you would want to use, but I think you could use several different ratings (1-10 or stars could work). For example, you could make a quality of good done by charity compared to similar-cause charities rating, actual rating of good done (not compared to charities with the same cause), an ability to use more funding rating, and then an overall rating (based on the expected amount of good done per dollar by future donations). Maybe this is too many ratings, but I would be for a change in the whole rating system…it just doesn’t seem reasonable with the current system; a charity that is educating people in the U.S. can get a higher rating than a good international charity that can helps more people with less money.

  13. This is my first visit to givewell.org and although I didn’t get what I wanted you definitely piqued my interest. I will be back! Skip the first 2 paragraphs to get to my rating suggestions!

    What did I want? A top rated UK based children’s charity to donate to. I could not find it quickly enough but got side tracked with the issues you have with delivering information quickly as ratings. I was impressed with how much thought, self scrutiny, and honesty I perceived in my very short visit to your site and although I still don’t have the answer I want I would like to have my two pence worth on the rating debate.

    I will continue to give your rating issues some real thought over the next few weeks but here are my initial thoughts. I want to put my money where it makes the most difference but I don’t want to spend all day finding where that is. I might want it to be targeted to a need I can relate to like children or cancer. I find visual information is quicker to skim and I tend to dig in deeper where I feel the need. Maybe others are like me and maybe the following system would help.

    You have assessment criteria (which I have not read yet but will). It seems to me that the criteria could be narrowed down to a few bullet points (I think 5 is good). These could be represented visually as an icon and then coloured based on a percentage rating for that charity in that bullet point. For example I might represent “Transparency” as a pie chart. A small wedge of colour and a large wedge of grey would indicate that a charity does not do so well in this area. (lets say a “grey area” is a negative or questionable aspect on any of the 5 icons or bullet points for consistency reasons). This grey area on the pie icon could be because they lack substantive information on their effectiveness and/or because they have closed books. When I hover over the Icon I get a brief generic tool tip describing the bullet point. If I click on the icon it links to either a more verbose generic description of how that area is rated or better still the parts of the actual full review that make up the bullet point.

    You guys are the experts, my comments are just to get the juices flowing, how the categories break down will make more sense to you. Hopefully your criteria categories already have “scores” of some variety and these can “roll up” into a wider category which can be visualised as an icon. The more full the icon is coloured in the better the charity met your criteria. Looking at “Sharepoint dashboards” for a large enterprise might also give you ideas on how to roll large volumes of information up into quick visual ratings which can be drilled down into more depth if the user desires
    The other 4 icons I might pick are…

    A Bell
    Symbolising alarm bells – this could visualise areas of perceived fundamental failings in the model the charity adopts – giving a goat for example. Again I can click on the visual icon rating to find out why it was good or poor if I want and it might link to your goat article and other related articles for that type of charity? The tool tip might say “A fully coloured bell could mean no alarm bells or a lack of information to raise any alarm bells, considering the pie chart icon or clicking through to the full review provides much better detail if this category is important to you.” In this way, looking at all the icons as a whole or combinations of icons can also quickly tell me where I might want to dig deeper. (hope that makes sense).

    A $ symbol
    This may visualise how bad they need funding and other related money issues such as “bang for buck” or the % of money that reaches the coal face. The tooltip might tell me “If you are giving to this charity and you are happy with their performance on the other icons keep giving. If you are new to giving and looking for a charity, your money might be more productive elsewhere. You might even offer “tags” which link to a higher rated, similar charity when I delve in deeper by clicking the icon.

    A Hammer
    Impact based criteria could be visualised with a hammer.

    And a simple bar or fuel gauge.
    This one is a little different. This might summarise the categories the charity perceives itself to fall within (or tags). For example cancer, cancer research, children’s charity, global health, disaster relief, the list goes on. Givewell.org can assign as many categories to a charity as they want or even offer for the charity to nominate their own. The main Icon is an average of their ratings in all the categories but the user can click on this to link to an expanded view which might show how Doctors without borders compares against other charities that are “tagged” as global health and then also against charities “tagged” as disaster relief.

    I would also like to see searching developed. I would like to be able to search charities based on location (global, UK, US, Africa) or by type or tags (cancer, cancer research, children’s charity, global health, disaster relief)

    I am no IT guru but I believe all the above is possible and if you decide how you want to visually summarise your criteria and link it to the detailed review you could publish it as a project on vworker, guru, elance, odesk and ask IT guru’s to donate their time instead of a goat!

    I apologise for the long dump and I hope I got the basic idea across. It is hard to describe a visual rating system in mere words. At the very least I hope it works as a catalyst for a visual rating system because getting rid of ratings seems a bit like throwing the baby out with the bath water. Yes the existing system has very fundamental issues but the principle of quick visual information is crucial to the usability of your site in my opinion and I firmly believe that “us users” can be trained how to quickly drill down through 1000’s of hours of your brilliant research to extract what means the most to us and hopefully become more charity savvy on the way. As an analogy – when you look at a profit and loss statement you might look at the bottom line and some key operating cost figures, when you see something unusual you might dig a little deeper and investigate the detail to see if it matters to you. Truth is if your short on time (which we all are) you are not going to go through all the numbers line by line, in which case the person preparing the numbers will have wasted their efforts??? I really like what you are doing and will be looking at the “getting involved” link and “unrestricted donation” links next. Good Luck & feel free to reply if I have piqued your interest but not explained myself clearly enough.

    Phil