# Should charity evaluation be “fair”?

One of the goals we do not have is the goal of being “fair” to charities, in the sense expressed in this comment:

You are right on in your focus on provable results, but some areas are much easier to evaluate than others … A bad charity to me is one that is misleading, not transparent, or ineffective in comparison to its peers.

It seems common that people wish to be “fair” to charities, focusing on merit (how well they do what they do) rather than results. (Objections that our process favors large charities often come from a similar place.) How do people trade off merit vs. results in other domains?

• Investing: if company A looks more likely to be profitable than company B, most people will invest in company A. They won’t give a second’s thought to considerations like “But company B is in a more difficult sector.”
• Consuming. If you’re buying an iPad, I doubt you’ve worried much about how unfortunate Notion Ink is for not having the brand name, marketing budget and sheer size to compete for your dollars.
• Sports. Here we take significant steps to “level the playing field.” We are very careful about which advantages (strength, speed) we allow and which (equipment, performance-enhancing drugs) we do not.

We feel strongly that charity evaluation should not be seen as a contest, bringing glory to meritorious charities and dishonor to scams. Instead, it should be seen as a pragmatic way to get money to where it can do as much good as possible. “Merit,” “A good attempt given the difficulty of the problem,” etc. should be left out of the picture.

That’s why we’ve always taken more care to eliminate “false positives” than “false negatives” in our recommendations. If there’s an ineffective charity we recommend, that’s a real problem. If there’s a good one that our process has failed to identify, that may be bad from a “fairness” perspective, but it’s not nearly so problematic for the goal of helping donors accomplish good. Evaluators that are more inclined to give charities the benefit of the doubt probably “wrong,” and anger, fewer charities – but they’re less effective in directing funding to where it will accomplish a great deal of good.

And that’s why we’re unapologetic about our bias toward charities working on tangible, measurable goals. Health is an “easier” sector than economic empowerment for clearly defining goals, tracking progress toward them, learning from mistakes, and ultimately making a positive difference. That’s a reason to prefer health charities, not a reason to “handicap” them. (Note that we do think there can be good reasons to give to “less measurable” causes, including philosophical preferences; more on this in a future post.)

When we spend money on ourselves (investing or consuming), we think exclusively about meeting our own needs and wants. It’s only fair that when we spend money on others (charity), we should think exclusively about meeting theirs.

• Laura Deaton on April 14, 2010 at 10:09 am said:

Excerpt from above…”If there’s an ineffective charity we recommend, that’s a real problem.” I completely agree, and your current methodology leaves you wide open for that happening on a regular basis.

Please review your assessment of Partners in Health, which you give 2/3 stars and recommend funding. Here’s what you say directly above the “Donate Now” button:

“We have little formal evidence regarding the quality and outcomes of PIH’s medical care, but feel it faces a lower burden of proof than most charities because of the nature of its activities, and are largely convinced that it is improving health outcomes. Its cost-effectiveness is somewhat unclear; we do not have reason to believe that PIH’s activities are as cost-effective as those of the strongest charities, but still likely within a reasonable range for the cause of international aid.”

In the details below, you say:

“We have little formal evidence regarding the quality of PIH’s care or the outcomes of its treatments.”

So, since you have no formal evidence of results, you then use the absence of negative data as a justification for recommending PIH:

“Ultimately, despite the absence of formal evaluations, we feel that PIH would be unable to maintain its high profile if it were not providing quality medical care, and that providing medical care – in this case – can reasonably be equated to changing lives.”

Finally, you actually mention some real concerns:

“PIH does include some programming that we are significantly less confident in, including HIV prevention education, housing support, coverage of clients’ school fees, and even microfinance projects.”

So, no evidence of effectiveness, no proof of high quality care except “high profile”, and some concerns about its other programs, That then converts to “recommended”…all from looking at data on a tax return.

No wonder “fairness” continues to be a concern. I’d add “reasonable,” “rational”, and “logical” as concerns too. And, I’d be concerned that it’s entirely possible that you do have a real problem and that you may be recommending ineffective charities.

• Suzy Meneguzzo on April 14, 2010 at 10:52 am said:

I find this entire line of reasoning troubling. In the example of investing- isn’t it the mind set of straight profit returns that got us into our current economic situation?

Nonprofit evaluation- like investments, selection of a favorite sports team and even buying the lastest tech gadget- is far more complicated than your current methods recognize.

“…it should be seen as a pragmatic way to get money to where it can do as much good as possible.” If I allow you to make these assessments on my behalf it assumes that you and I have a common understanding of “good”- we do not. You’ve shown the very reason here:

“…we’re unapologetic about our bias toward charities working on tangible, measurable goals. Health is an “easier” sector than economic empowerment for clearly defining goals, tracking progress toward them, learning from mistakes, and ultimately making a positive difference.

Your approach is, by your words “easier” and ignores the many ways that I define good. And indeed many things that are good are very difficult to track progress towards AND make a positive difference.

Nice try but many good things are neither easy or direct but they are important! I’ll get my information elsewhere.

• Holden on April 14, 2010 at 6:20 pm said:

Suzy: agreed that many things worth doing aren’t easy. But if two goals have comparable value and one is substantially easier to achieve, would you not prioritize the easier goal?

Laura, our evaluation of Partners in Health uses a substantial amount of information beyond tax documents, including:

• Information on budget allocation by program and activity. Among other things, this information implies that the activities we have less confidence in are a small part of the budget.
• A detailed report on the Rwanda program (that we haven’t been cleared to publish but quote extensively) with limited information on treatments and outcomes and detailed information on operations.

As we state in the review, a key aspect of our recommendation is that we feel Partners in Health’s activities – directly delivering proven medical treatments – face a lower burden of proof than most international aid activities (and we feel they meet this burden).

Standing on ceremony about “formal evidence,” despite seeing a strong case that a charity is effective, would be putting “fairness” over the pragmatic goal of getting money to great charities. We are comfortable with our two-star rating of Partners in Health (indicating a stronger case for confidence than the vast majority of charities, while still falling short of our top rating).

• Perla Ni on April 14, 2010 at 7:25 pm said:

well said – “We feel strongly that charity evaluation should not be seen as a contest, bringing glory to meritorious charities and dishonor to scams. Instead, it should be seen as a pragmatic way to get money to where it can do as much good as possible.”

In this time of all these online contests for nonprofits – this is refreshing.

Perla

• Laura Deaton on April 15, 2010 at 4:17 pm said:

Thanks for your response, Holden, and for your willingness to continue the dialogue. Partners in Health was intended to be an example of what I see as a problem with your approach, but I’ll continue the example as a useful illustration of my concerns.

Under your criteria for ranking a charity, the very first criterion is whether there is evidence that the charity’s programs are effective. For Partners in Health, you say “We have little formal evidence regarding the quality of PIH’s care or the outcomes of its treatments.” I’m aware that you go on to say that you “feel” that it requires a lower burden of proof than other organizations, but still yet you share that you have little-to-no concrete (“limited, small-sample-size”) information that they are providing quality care or creating strong treatment outcomes. Then, you list several programs from PIH that you are “significantly less confident in, including HIV prevention education, housing support, coverage of clients’ school fees, and even microfinance projects.” So for your first criteria, you have absence of data, a presumption of outcomes, a reliance on your “feelings” and concerns about other several of the programs that comprise a smaller part of the budget for PIH. That leads you to give it a “moderate” score on that criterion. I’m fairly certain that I wouldn’t have come to the same conclusion about “evidence of effectiveness” based on the data at hand. You may (rightfully) assert that I can’t get my hands on the Rwanda report so I acknowledge that I have less information at my fingertips , but you did disclose that it is primarily operational and doesn’t really contain much helpful or guiding information on effectiveness. From my perspective, you simply do not have enough quality information to rank this organization effectively on this criterion.

The next criterion on your list is cost-effectiveness, and I appreciate your frankness about the difficulty of including cost-effectiveness in your criteria because of misleading or overly optimistic information from charities, and because it is often and apples-to-oranges type of comparison. For Partners in Health, you acknowledge that a portion of its programming includes cost-effective treatments like tuberculosis treatment, but it also provides services that are known to not be cost-effective, and that you do not have enough information to really do more than “guess that they are outside- though not necessarily far outside” a reasonable range of cost-effectiveness. So, for you second criteria, you once again have absence of data, an “extremely rough estimate” of spending per death averted, and a “guess” on reasonableness. As with program effectiveness, PIH gets a “moderate” on this criteria. Again, I know that we’re all entitled to our own opinions and gut instincts, but I’d end up in a different place again, even if I likely agreed with you on your guesses and estimates. Once again, this would be a big question mark for me with a “need to know more” if I was attempting to provide high quality information to donors to help them make well-informed giving decisions.

Until I know the answer to the question I just posed, I don’t really know enough about transparency to understand if I would end giving Partners in Health and “above-average” rating for transparency as you have. Can you tell me what led to that rating? I didn’t see any information in the review itself that had to do with “transparency” and there wasn’t an extended discussion about it as a basic criterion (unlike the other three).

Likewise, I see that Partners in Health was given an “above-average” rating for “monitoring and evaluation.” Can you tell me what why that would be the case given the lack of information about medical outcomes, and what led you to that ranking? I didn’t see a reference to it on the report.

One final and more general question: Since Partners in Health is at best “moderate”, and only a “two-star” organization, are you actively looking for international health organizations who rank excellent on these criteria so that you can “bump” PIH out of #7 on the top-rated list and put other, higher ranking charities in that category ahead of them?

I’m legitimately interested in the answers to my questions, above. I’m also deeply concerned about the quality of your reports, the depth of your analysis, and the reasonableness of your conclusions, and PIH is just one example. While you acknowledge that donors have every right to draw their own conclusions that are different from your “gut” responses, I do believe that anyone who puts themselves forward as charity rankers or raters has the minimum obligation to do it with integrity and to base their conclusions and recommendations on much more than lack of information, “gut responses”, “guesses”, and “estimates”. Even if you don’t want to be “fair,” one of your key differentiators from other raters is to “focus on how well programs actually work – i.e., their effects on the people they serve.” You and your funders (like the William and Flora Hewlett Foundation) own the responsibility to provide donors with more than conjecture.

• Holden on April 16, 2010 at 2:30 pm said:

Laura, thanks for continuing to engage. I’ll respond point by point.

Re: evidence of effectiveness.

• PIH focuses on delivering medical treatments that have themselves been subject to highly rigorous studies and for which very strong evidence of impact exists. The general evidence base for a program is a significant part of the “evidence of impact” picture; simply by choosing to provide antiretroviral therapy and malaria medicine, PIH has gained a significant edge in “evidence of effectiveness” over an organization funding (for example) agriculture development programs.
• I think a decent definition of “evidence of effectiveness” is “observed phenomena that would be unlikely if programs were failing.” PIH’s combination of a high profile, significant attention from parties who are knowledgeable and not overly invested in the organization, and few public complaints about quality of care is, to my mind, “evidence of effectiveness” for quality of care. It is not formal and it is not as strong as that of our top-rated charities, but it is evidence.

These are reasons we consider the evidence of effectiveness to be moderate rather than poor (for most charities it is the latter).

Re: cost-effectiveness. All of PIH’s interventions are estimated within an order of magnitude of what we consider “highly cost-effective.” So I think a “moderate” rating is reasonable. Most charities’ work is both more difficult to assess and could easily fare far worse on this metric.

Re: room for more funding. We have asked PIH for information along these lines. Initially PIH declined to submit it (citing the extra work it would take to produce); they later told us they might be able to get us more information; but now our talks with them are on hold (we have held off on nagging them since the Haiti disaster). That brings me to the next point:

Re: transparency. We are highly unsatisfied with PIH’s information disclosure. Of our recommended charities, it has been one of the least responsive to us and probably has the least helpful website. However, its clarity of funding by program (publicly disclosed on the website) and willingness to share the Rwanda document (even internally) make it easily above average.

Re: monitoring and evaluation. The Rwanda document does not address outcomes, but is still a well-done piece of monitoring and evaluation covering key questions about the population served, costs, and operations.

Since Partners in Health is at best “moderate”, and only a “two-star” organization, are you actively looking for international health organizations who rank excellent on these criteria so that you can “bump” PIH out of #7 on the top-rated list and put other, higher ranking charities in that category ahead of them?

We are always looking for more excellent charities. We are always open to “raising the bar” and knocking existing recommended charities off the list.

Bottom line and what should be expected of GiveWell.

We have plenty of reservations about PIH, but we also feel that it performs better, by our criteria, than the vast majority of the many charities we’ve examined.

You say:

anyone who puts themselves forward as charity rankers or raters has the minimum obligation to do it with integrity and to base their conclusions and recommendations on much more than lack of information, “gut responses”, “guesses”, and “estimates”. Even if you don’t want to be “fair,” one of your key differentiators from other raters is to “focus on how well programs actually work – i.e., their effects on the people they serve.” You and your funders (like the William and Flora Hewlett Foundation) own the responsibility to provide donors with more than conjecture.

I agree with some but not all of this.

I agree that integrity is important. I.e., we should apply similar degrees of skepticism to all the charities we rate and should avoid basing our ratings on irrelevant factors such as personal relationships, friendliness to GiveWell, etc.

I agree that we should focus on “how well programs actually work – i.e., their effects on the people they serve.”

Where I disagree is with the idea that it is inappropriate to use guesses, estimates, and conjecture. I do not believe it is possible to make a statement about a charity’s impact that does not incorporate all 3 of these things.

As discussed recently, even the Disease Control Priorities Report uses heavy amounts of guesses, estimates and conjecture in assigning cost-effectiveness to some of the world’s best-studied humanitarian interventions. When evaluating actual charities, the lack of information becomes even more problematic. We may wish it were otherwise, but that’s a fact of charity that any donor has to deal with.

In fact, we consider our lack of pretense to objectivity to be one of our major differentiating factors. If you insist only on systematic, objective measures, you will not get meaningful ones. Only by being willing to make judgment calls can one incorporate the issues that are actually relevant.

Rather than pursue systematicity, certainty and objectivity, we pursue a level of transparency that allows donors to see which judgment calls we’re making and decide for themselves how they feel about them. (We have been on record with this view since 2007.)

Many who have read our research very thoroughly and carefully have concluded that they are comfortable following our recommendations. You have concluded that you aren’t. This divergence is exactly what I would expect for a high-quality, but also transparent and clear, evaluation in an area as uncertainty-filled as charitable giving.

I don’t believe in holding charities or charity evaluators to abstract standards; instead, they should be compared to alternatives. With that in mind, I’d ask:

• Do you believe there is a way to arrive at good recommendations for individual donors without incorporating guesses, estimates and judgment calls?
• Do you believe there are charities not on our top-rated list that outperform PIH on our criteria?
• Laura Deaton on April 17, 2010 at 11:45 am said:

On PIH and Evidence of Impact
• I agree that it is possible to impute some evidence of impact for organizations that choose to provide anti-retroviral therapy and malaria medicine. This assumption would be true for all organizations that provide these services, not just PIH. It is why I would hope that GiveWell is affirmatively seeking “3-Star” organizations to recommend who do this AND that can demonstrate high levels of evidence of effectiveness.
• I disagree that a reasonable definition of “evidence of effectiveness” is “observed phenomena that would be unlikely if programs were failing.” From my perspective, a much more reasonable, and helpful definition is “observed indicators of successful program outcomes.” Evidence then becomes defined based on “success factors” instead of indicators of “lack of failure.” As a donor, I simply cannot make an informed decision based on evidence of a lack of failure. That’s all that I see presented for PIH. Evidence of successful program outcomes would substantially increase my confidence in your research and your recommendations.
• I disagree that a charity’s high profile and lack of direct customer complaints are success factors. High visibility simply does not equal effectiveness. More often than not, it is instead an indicator of a finely-tuned marketing and communications machine, and may also include some celebrity or expert endorsements. Lack of public complaints does not equal quality of care, particularly when those receiving services are living in poverty, without access to appropriate means for indicating poor service, and without substantive alternatives.

On PIH and Cost-Effectiveness
• Donors (and most of the rest of the universe) don’t think in logarithms. We generally process information linearly. I understand when I delve into your criteria that those who are rated as “excellent” in this case have a cost-per-death-averted (CPDA) of $1,000 or less. And, I can personally see that$3,500 for CPDA for PIH is likely to be at least 250% higher than programs that rate “excellent,” but you don’t provide that information in a way that donors can see and easily understand it when reviewing individual data. Why not actually put that information directly into your reports? It would increase transparency, provide a stronger framework for donor evaluation, and provide a point of comparison that would be much more valuable than “moderate” and “inside an order of magnitude.”

On PIH and Funding Gap Analysis and Transparency
• The documentation of your request for funding gap information and PIH’s responses should be included in your report, as should a section on transparency, since it is one of your criteria. By just reading the report, I would have no way to know that GiveWell had asked for funding information, that they had initially declined, and that now talks are on hold. I also wouldn’t know that GiveWell is “highly unsatisfied with PIH’s information disclosure” or that of GiveWell’s recommended charities, “PIH has been one of the least responsive…” I’m sure you’ll agree that this information would provide value to potential donors, particularly when the charity is on a top-charity “recommended” list. I was already concerned about the lack of information that was being used to recommend PIH, and I’m now even more concerned based on this information.
• I’d like to see you further define and refine “transparency” as a criterion since it has been done for all of the other criteria.

PIH and Monitoring/Evaluation
• I can’t speak directly to the Rwanda report since I don’t have access to it, so I’ll just reinforce that you have very limited content in your report that addresses this explicitly, and you don’t reference in your report that you have ranked them as “above-average.” Your matrix of ratings for top charities should not include information that isn’t discussed in the “full” report (same is true of transparency for PIH).

Holden, I’m glad to hear that GiveWell is always looking for more excellent charities so that an organization such as PIH that only rates “above-average” and “moderate” won’t eventually show up as the #5 “top-rated international aid charity” and #7 on GiveWell’s “top-rated charities overall.” In the meantime, let me suggest that, since fairness isn’t a concern for you, that you consider the impact that the word “top” has on donors, and that your recommendations might be taken far more seriously if “top” actually equaled “excellent.” Your list would be shorter, but it would only highlight the “best” charities out there, and it would be a much more accurate representation of your logarithmic lens. It would also keep you from having an organization that hasn’t been responsive and that has left you highly unsatisfied with information disclosure off of that “top” list, especially when you haven’t shared that information with donors. And, it would keep you from having to bump people off the list over time because the above-averages wouldn’t have been on that list to start with. Instead, as you discover more excellent charities, your list will grow and be filled with a real “top” list from which donors can choose.

(1) I do not believe that there is a way to arrive at good recommendations for donors without incorporating some guesses, estimates, and judgment calls. It is why I find rating and ranking systems to be generally harmful to the sector as a whole, and unhelpful to me as a donor. I’m totally fine with people making subjective calls about their favorite charities, but when they do so under the rubric of a nonprofit organization that has some expertise to bring to the table, then I grow concerned, especially when I see very little expertise, very little high quality data, and limited qualitative analysis to accompany the quantitative. I believe that you and I share a belief that systems that are predicated exclusively on data (990 analysis, ratios, etc) are not effective. Where we likely disagree is that I find your approach and the current quality of your reports to be an unsatisfying alternative that may mislead donors. What we don’t need is another pitifully inaccurate 3rd party rating system that doesn’t give donors high quality information and empower them to make stronger decisions. Framing the “top” or the “best” charitable organizations in the context of a handful of organizations that someone has decided to research and give a “gold star” is misleading to donors at best (by its omissions) and does a disservice to the sector as a whole, at worst.

• David Barry on April 17, 2010 at 8:38 pm said:

that you consider the impact that the word “top” has on donors, and that your recommendations might be taken far more seriously if “top” actually equaled “excellent.”

Holden can of course answer for himself, but I would like to respond to this point from the perspective of another donor. I agree with a lot of your criticisms about the ranking of PIH, but I don’t think that the word “top” as applied to PIH is diverting many donations towards it.

The relevant table was shown in this blog post. Adding up the first four columns (pledges, large gifts, donations through website) gives a total of $480k for the top four charities (VillageReach, Stop TB, AMF, PSI) and only$10k for PIH.

My speculation is that there are two main reasons for this – one is that a lot of donors would just give to the highest-rated. The other is that a lot of donors will carefully read the reports, conclude that the other “top” charities are better than PIH, and therefore not donate to PIH.

I agree that the PIH report could do with some of the extra information raised in this comments thread. It is probable then that some of the other reports could also be improved. But I think that there’s enough information in them at present to be considered something “of reasonable value to donors”.

• Laura Deaton on April 18, 2010 at 10:45 am said:

David – Thanks so much for adding your thoughts here. It’s good to know that there is someone else who has concerns about PIH’s ranking, and who can see opportunities for improvement in the reports.

You’re right about the grid. The one saving grace is that PIH’s “top” listing isn’t really funneling much money their way, although when you look practically at ROI, that’s the equivalent of at least 10 deaths that could have been averted by another organization compared to the less than 3 for PIH if those estimates can be relied on in any way (still questionable at best on the extreme-roughness of the estimate).

And, you’re right. We don’t know why money isn’t going to PIH.

You’ve mentioned that it could be:

(1) Many donors would just give to the highest rated. If you’re correct, then there’s really no need for a “Top” charities list at all. GiveWell could just list “Our Pick” in each category. Actually, I’d be much more comfortable with that approach instead of the “Top” because “Our Pick” doesn’t imply “best” or “better” than others, it just says , “we recommend it” which in this case is a much closer match to what GiveWell is doing. My concern with the word “Top” is that it implies that someone (or more than one person/entity) with some knowledge and expertise has looked at all of the options, created a system that appropriately measures organizations on a comparative scale, and then chosen the best of the best to recommend. That’s not what’s happened here. GiveWell has evaluated a small set of organizations and has simply put the highest ranking of those that have been evaluated on a recommended list and labeled them “Top.” What the “lazy” donor likely imputes is that GiveWell has carefully evaluated the full playing field and these are the best. Also, not true here. This is much more akin to a “King of the Mountain” game, where the best that have been evaluated make it to the top and those that haven’t been evaluated….well, too bad. Putting an “average” or “moderate” charity on this list above charities that haven’t even been evaluated and calling them “top” is misleading. Based on what we know (or in this case don’t know) about PIH, it is troublesome to have them listed as one of the Top 5 International Aid charities and one of the Top 7 charities period, and since fairness isn’t a concern, I can see no value to be gained by doing so. Again, PIH is just an example to discuss the problem with the model as a whole…it’s simply illustrative. Holden’s already said that he’d be happy to knock them off the mountain if a better charity came around. Why not knock them off the mountain now, and just reserve the mountain for those who GiveWell can confidently recommend as excellent and who have earned the precious “3-star” rating? That provides a much better list for donors.

(2) Many donors could carefully read the reports, conclude that the other charities are better poised to deliver effective services, and decided not to give to PIH. I hope that this is what’s happening more frequently than the first option (although I must admit that I still hope that based on the quality of the reports presented here that donors are going elsewhere for information and not relying on any of this). But, if this is really what IS happening, then it tells me that donors have already disagreed with GiveWell’s analysis and decided that PIH simply isn’t really “Top.” That’s another reason to get them off that list. GiveWell can still provide its report, still rank them as 2 stars, and still give donors the information in the report without listing them as a Top 5 or Top 7 organization. Again, this isn’t about PIH itself. It is merely an example of the methodological problem with all similarly-ranked organizations.

Finally, here’s a scary and troublesome real-life example of what happens when raters and rankers mislead donors with “Top” lists. At the end of March, Guidestar announced that it would be including information from several raters, including Givewell at its site. If you go to Guidestar’s TakeAction:Global Health page and click on the tab for Recommended Charities (sorry, it’s an asp website so there was no direct link), you’ll see that there are only 6 charities listed with GiveWell as a source. Because they are listed alphabetically, PIH shows up as NUMBER 3, ABOVE Village Reach (GiveWell’s top rated charity), the Stop Tuberculosis Partnership (Givewell’s #2 charity) and Population Services International (GiveWell’s #4 charity). So, now Guidestar, the premier independent resource for high-quality charity information is listing and recommending PIH in 3rd position on a list of 6 recommended global health organizations. Sure, they say that these were recommended by GiveWell, but there aren’t any recommendations in this category from Great Nonprofits or Philanthropedia, so it’s just GiveWell’s “Top” list in the wrong order. It’s flat-out misleading to donors and has the potential to re-direct significant funding to an organization (if you’re right about your first option of donors just choosing the Top organizations) that we don’t even have any positive program outcome information about. Given the lack of information about outcomes contained in the report, and Holden’s disclosures and expressed concerns in this blog thread about PIH’s transparency, I’m simply not comfortable with any of this as an outcome. Are you?

If GiveWell has failed to verify the information and the way that it is being presented on Guidestar as part of this partnership, I’d say that this is troubling enough that it’s probably time to add something new to that “Mistakes” list, particularly in light of this detailed discussion above. I’d also add the omissions in funding gap and transparency information in PIH’s report to that Mistakes list, and I’d add the inclusion of any “non-excellent” charities on a “Top” list, too. So that’s just one quick look at one report, and a bunch of problems as a result. Think about what we’d find if we actually scrutinized every report and asked more questions. It’s why my very first sentence of my very first post indicated that I believe that GiveWell’s methodology leaves them wide open for recommending an ineffective charity.

• Holden on April 19, 2010 at 8:01 pm said:

Further responses to Laura:

Re: “evidence of effectiveness.” I think we are miscommunicating on what I mean by my definition. I do not mean “lack of evidence for failure,” but “phenomena that are in fact observed, yet would be unlikely to be observed in the case of failure.”

For example, if World Vision were failing, I don’t feel that this would make it much less likely for it to be successfully raising money or having a strong brand (as you point out). Thus, the presence of a strong brand and fundraising are not “evidence of effectiveness” by my definition. By contrast, if Partners in Health were failing, I would expect to have seen complaints from the relatively uninvested, medically knowledgeable people who visit it.

My definition maps pretty well to standard statistical analysis, in which a p-value represents the probability that a certain pattern would be observed if the “null hypothesis” (usually something like “no impact”) were true. This is distinct from the probability that the null hypothesis is true and also distinct from “lack of evidence that impact isn’t happening.”

Re: cost-effectiveness, funding gap, transparency, M&E

It’s true that we haven’t fully synchronized the reviews with the table at http://www.givewell.org/charities/top-charities. Explicitly addressing each of the columns in the table, within the review, is a good thing for us to make sure we add and I appreciate your raising the concern.

Clarification on our process, and what constitutes a “top” charity

The “top” charities are not the complete set of charities we’ve examined. They are the charities that stand out, by our criteria, from the hundreds of charities we’ve examined. We have examined every relevant/eligible charity we’ve found (narrowing the field using heuristics). More at our page specifying the process and criteria we used to identify top international charities.

In our view,

• PIH shares more substantive information on its website than the average charity.
• PIH is focused far more on proven programs than the vast majority of charities.
• PIH’s cost-effectiveness appears solidly worse than that of the best charities we’ve found, but nowhere near as far from this cost-effectiveness as an ineffective program.

While we haven’t examined every charity in existence, we have examined every relevant/eligible charity we’ve found and we’ve made a significant effort to find relevant/eligible charities.

Outside the realm of formal competitions (which I contrasted with charity evaluation in the post above), I think you’d be hard pressed to find any “top” list that doesn’t have a similar dynamic. Whether we’re talking about the “most beautiful people” or “top microwaves”, “top” means not “We have definitively established that we have examined every potential contender in existence and provide you with a list of products about which we have no concerns and that cannot be improved on,” but “We have followed a process, involving heuristics and judgment calls, to arrive at our best guess as to which items stand out from the crowd.”

In almost all of these cases there is a lot of subjectivity and debatable claims and missing information. I think people expect that from any “top” list.

In some cases lower-ranked items on the list are clearly inferior to higher-ranked items, but this doesn’t stop them from being classified as “top” (any more than the existence of LeBron James disqualifies Paul Pierce as a “top basketball player”).

One more note on this point. You say: “Holden’s already said that he’d be happy to knock them off the mountain if a better charity came around. Why not knock them off the mountain now, and just reserve the mountain for those who GiveWell can confidently recommend as excellent and who have earned the precious “3-star” rating?” I’d also be happy to knock our three-star charities off the list if we found a significant number of stronger charities. We have been open that we have significant concerns about these charities as well. None of this seems to invalidate the listing and recommending of charities or use of the word “top” as we do.

Re: GuideStar. We noticed the same problem and emailed them about it on March 10, though we haven’t pushed aggressively on it in phone conversations (focusing on other important fixes). We will continue to push on this. TakeAction is a beta and there are many things we’re still working on.

Re: misleading donors. As argued above, I think the language we use is appropriate for what we are doing and is likely to be interpreted accurately.

I believe the donors who are coming to GiveWell and TakeAction are likely extremely low on info. I believe it is important and valuable to provide them with recommendations that are likely based on much more research than they have the ability to do.

Any time public information is published, there is a risk of misinterpretation. There is a risk that donors will interpret our recommendations as so scientific and certain that they should outweigh the donors’ own knowledge. However, (a) I would guess that this is a pretty unusual occurrence, since donors who already have lots of information, strong commitments to an organization and strong opinions in general are unlikely to put much trust in the contradicting views of strangers; (b) I don’t see anything we’re doing to make this occurrence unreasonably likely.

On the flip side, I think that underselling our recommendations would be as harmful as overselling them. Your comment seems concerned that people might give to one charity and save a few less lives than they could by giving to another charity (while acknowledging the difficulty of knowing when this is the case), so I imagine that you are on the same page with us here: when we have one charity that stands out from others, and there are some donors who (for philosophical reasons) would prefer funding its activities to funding our top charity’s activities, not recommending and promoting said charity could do harm.

And again, I don’t see the presence of subjectivity, missing info, etc. as undermining the case for these recommendations. The quality of a movie is subjective, but people still find movie critics useful and few argue that these critics are hurting the movie industry by creating an impression of infallibility.

Re: mistakes log. My feeling is that none of the things you’re pointing to as “mistakes” rise to the level of being on that page.

1. TakeAction is clearly marked as a beta. We’ve expected from the beginning to have issues that need fixing. I don’t think we have done anything mistaken here.
2. We stand by PIH’s inclusion on our “top charities” list, for reasons discussed above.
3. Your suggestion to explicitly address each item on our table is a good one, but you haven’t pointed to anything inaccurate in the reviews, just ideas for making them clearer. We expect our reports to undergo continuous improvement.

Bottom line

• I think there is plenty of room for improvement on clarity of communication. I think you have raised some good points on this front.
• I stand by our use of the term “top charities” and our current rating and ranking of PIH.
• I do not think you have made a compelling argument for the idea that the presence of GiveWell specifically or independent raters generally is doing damage.
• Holden on April 19, 2010 at 8:04 pm said:

Laura, another thought. I think the analogy to movie/restaurant reviews is a very helpful one; in some ways we aspire to be to charities what a movie reviewer is to movies.

• Movie reviewers share both quantified, judgmental “ratings” and (some of) the reasoning behind their ratings.
• Reviews involve substantial judgment calls.
• Some people read the reviews to get a sense of the reviewers’ values, biases, etc. and decide for themselves how much to weigh them. Some people just look at the ratings.
• Movie reviewers will sometimes put out a “X best movies of the year” list, even though they haven’t even seen all (or even a substantial portion of) the movies made in a year.
• Large information providers, including both newspapers and aggregators such as Metacritic/Rotten Tomatoes, will collect the opinions of many movie reviewers in one place and offer them up as a service to people deciding which movies to watch. They will put their name behind the reviews even though they don’t endorse every word of them.
• The whole system seems to be valuable to a lot of people and to be clearly superior to alternatives such as (a) ratings provided without any reasoning; (b) ratings based on purely objective criteria such as “percent of film budget spent on actors’ salaries”; (c) ratings withheld entirely out of concern for misleading overly impressionable viewers.

Similar systems exist for a lot of consumer decisions.

Any metaphor breaks down and I wouldn’t want this one taken too far. But it seems appropriate for the question at hand, which is whether we are “overpromising” by issuing ratings, evaluations and “top charities” lists. Based on what these terms mean in other domains, I think we are closer to overdelivering.

• Laura Deaton on April 20, 2010 at 10:59 am said:

Holden, I think the horse is almost dead, so rather than sending it to it’s final resting place, I’ll just thank you for your responses and continue to disagree about the quality of your existing reports and your methodology as something that I find useful.

I’m too appalled by the callus comparison to movie ratings to even engage on that one, so someone else will have to beat that drum on your new blog post.

In the meantime, I’m keeping my fingers crossed that this time wasn’t completely wasted and that this dialogue will actually generate some good for the philanthropic sector down the road in some way. Unfortunately, your responses to my comments lead me to believe that very little will actually change as a result of the feedback that I shared.

Best of luck to you and your team.

• Ryan Schoop on April 29, 2010 at 3:47 pm said:

Holden, Laura, and others:

I am new to reading this blog and appreciate your passion for this topic. As a former development officer, I often found that donors are largely uninformed about the over 1,000,000 options that they have of charities to give to. Unfortunately, it often becomes a matter of what organization is able to build the best relationship with the prospect, no matter the organization’s accountability. I hope that GiveWell and other rating services can help to change this.

Laura, while I think your questioning is admirable, I think the bigger picture needs to be looked at. Seeing as there are over 1,000,000 charities out there, and very few rating sites (the only other I am aware of is Charity Navigator) the service that Holden and his team are providing is of unbelievable importance. The fact of the matter is that attempting to rate these organizations is a subjective practice. While ratings continue to encourage non-profits to become more and more transparent, and hence the ratings more objective, in the meanwhile, providing a rating even with some subjectivity, is better than complete ignorance, which in my experience is the situation that the average donor is in. Simply put, there is no Super Bowl to determine the best non-profit. Holden and his team are just getting at the tip of the iceberg, and while surely there is much tweaking to do, any indication of a charity’s effectiveness is a invaluable improvement over none at all.

Keep up the good work, Holden.

• J. S. Greenfield on May 1, 2010 at 7:05 pm said:

The above discussion in comments may be interesting, but I’m not sure it really has much to do with the fundamental question in the original post, regarding whether charity evaluation should be “fair.”

Regarding that, I think you are actually mistaken in your characterization of investing, as an analogy. Sure, people don’t directly handicap based on how challenged a sector is, but if you really look at how investing works, that is undoubtedly considered, indirectly. That’s because (fundamentals-based) investing isn’t driven strictly by profits, but by how much it costs to buy those profits. Stocks of companies in challenged industries usually have prices that are, from a profit valuation perspective, lower than companies in booming industries, making them an appealing investment for many investors.

At the end of the day, they still do have non-zero prices, demonstrating that an intrinsic investment arbitrage process has priced the stock of companies in challenged sectors so that they are, to the average investor, as appealing as the stock of companies in booming sectors. The stock price reflects a discount based on how positive or negative the outlook is, and based on the level of uncertainty in such.

So what does this have to tell us about evaluation of charities?

Perhaps nothing, but perhaps the analogy is actually apposite, and there should be an analog in charity evaluation that takes into account whether a sector is “challenged” in terms of having good objective measures of performance.

Elsewhere, you’ve previously acknowledged the difficulty comparing the relative value of very different types of charitable efforts. You’ve cited a DALY measurement methodology to compare at least a subset of charitable efforts. But of course, DALY is not a purely objective measurement process, since the notion of disability adjustment implies some subjective valuation of various disabilities.

The fact is that while objective measurement methodologies are very desirable, and to be aspired to, there are many, many areas where we have not yet identified good, objective measurement methodologies. Likely, there are many areas where there simply aren’t any such good objective measurement methodologies to be identified. And that’s not just for charities.

In business, I may be able to effectively measure the performance of a call center customer service representative using a wide variety of objective criteria, ranging from how quickly they are able to handle calls, to how many of their calls result in repeat calls, to how callers rate their experience and results in post-call surveys. But achieving effective objective measures is not nearly as easy when it comes to attempting to measure the performance of, say, an architect or a programmer.

Of course, we don’t give up on attempting to rate the performance of employees working in fields where effective objective measures are inadequate or non-existent. We simply resort to significantly incorporating subjective measures into the process.

I suspect that an optimal strategy for evaluating charities similarly incorporates subjective measures, sometimes significantly so. The additional degree of uncertainty they introduce justifies applying a discount to perceived benefits/impact. But that uncertainty almost certainly does not justify discounting the value zero.

(And that would also be a direct analog to investing, where uncertainty in aspects of an evaluation lead to discounting.)

Now this may be what you effectively mean when you say that you have a “bias” toward charities with measurable impacts. And to my view, you are at least minimally incorporating such into your methodology – albeit seemingly without a formal framework for such – when you talk about, for example, Partners in Health facing a lower burden of proof, in your opinion.

But it would probably make sense to incorporate it into a more rigorous, formal framework. And it is probably not optimally efficient to end up rating (or rating above zero) only charities that fall into a few categories, for which you have high confidence in the objective measures you have developed. At the end of the day, many donors are likely to have their own biases as to the types of charities that they are interested in, and perhaps believe fundamentally must be beneficial, even in the absence of objective proof, and so they apply their own “discounting” to the class (potentially negative discounting, uplifting their evaluation of some favored types of charities). For such donors, having even imperfect information to help select among charities in their favored classes may well be valuable.

You have very few charities with non-zero ratings at this point, so perhaps the issue is premature. But certainly, if you reach the point where you have many non-zero ratings for charities in several classes, I would expect you should be asking yourself the question of whether rating yet another charity within one of the well-represented classes would provide a greater return on your effort than turning your attention to a class of charities not well-represented (likely because they lack strong, identified objective measures).

And this would seem precisely analogous to an investor (or more properly, an investment advisory firm) who has already identified many strong investment opportunities within specific industries, or within specific classes of assets, and who can probably benefit more by turning his attention to finding the best investment opportunities in other industries or asset classes, rather than churning out yet another opportunity within one of the well-represented groups.

• Holden on May 5, 2010 at 3:11 pm said:

J.S. Greenfield: I agree that “investing isn’t driven strictly by profits, but by how much it costs to buy those profits.” (The concept of ours that maps most closely to “how much it costs to buy those profits” is room for more funding – whether a program is under- or over-funded.) However, I think that assessing this issue is a very different process from using a sector-based “handicap.” In some cases, a stock may be underpriced (/a program may be underfunded) because the difficulty of the sector scares other investors/donors off. But in other cases, investors/donors may underestimate (or, in the case of charity, simply be indifferent to) the difficulty of a sector, causing it to be overinvested in. The difficulty of the sector may be related to its “price,” but I don’t think it’s fully appropriate as a proxy, especially for charity specifically.

One way in which I think the analogy breaks down is that I don’t feel that philanthropists are anywhere near as efficient about fully funding the best opportunities, and “pricing in” the degree of difficulty. The most demonstrably profitable, “slam-dunk” companies are likely to be widely recognized and heavily invested in; by contrast, the most “slam-dunk” charities we’ve been able to find, in the “easiest” sectors, have substantial room for more funding.

Otherwise, I agree with your comments. We do intend to expand our coverage to different causes, including those that are less prone to measurement, and we agree that the benefits of doing so can be substantial. However, I stand by the decision to start with, and for now emphasize, the charities with the most demonstrable cases for impact because I think these charities have substantial room/need for more funding.

• J. S. Greenfield on May 5, 2010 at 10:57 pm said:

Holden, I appreciate that your concept of “room for more funding” is probably the closest analog to “how much it costs to buy those profits.” And I suspect that we are, in reality, not terribly far apart in our thinking.

However, I do think there’s an important point that is not captured by your “room for more funding concept” — and it’s probably the key aspect in which it breaks down as an analogy to the investment case.

You characterize “room for more funding” as a measure of whether a program is under-funded or over-funded. But that notion of under- vs. over-funding is an absolute notion, that exists in a vacuum. There is another notion of such, which is relative under- or over-funding.

For example, let’s suppose that you evaluate the Susan G. Komen foundation and find that it effectively uses the funds it raises to reduce deaths due to breast cancer. Let’s suppose you further conclude that incremental funds could produce a similarly efficient further incremental reduction in breast cancer deaths. You would presumably conclude that the charity is underfunded, and rate it highly.

This is an absolute evaluation — is it underfunded in the sense that additional dollars will be efficiently used.

I, as a donor, however (and many like me) might evaluate the situation differently, in a relative sense. We might look at breast cancer and conclude that it receives an enormous amount of charitable (and governmental funding) compared to a large variety of “orphan” diseases. That is, breast cancer may receive far more funding per death, or even instance of disease, than many orphan diseases.

I might, as a result, conclude that in fairness, I should direct my charitable donations intended to fight disease toward one or more of the orphan diseases, which are underfunded in this relative sense.

And I would assert that this is not an irrational choice, even though it may be the case that funding for those orphan diseases cannot immediately generate an easily identifiable end result, or an identifiable result that is as large as that for breast cancer. For example, incremental funding for breast cancer might generate incremental screening, generating incremental early detection, and fewer deaths. On the other hand, the current situation for an orphan disease like pancreatic cancer may be so dismal that early screening may be impractical, and even with early detection, many fewer lives might be saved. The situation might be such that the only effective investment is in research, which offers very little in the way of immediately measurable results.

Does that mean that it is rational to direct any incremental funding to breast cancer, rather than pancreatic cancer? Or perhaps even to justify withdrawing all existing funding for pancreatic cancer activities and redirecting it to the more (measurably) efficient breast cancer activities?

This would seem to be the inexorable conclusion of an evaluation methodology that focuses solely on measurable results.

Or is it reasonable and rational to evaluate the two _relative_ to one other, considering the _potential_ for ultimate benefit? If yes, then it we conclude it is rational to conclude that it is better to invest incremental funds into pancreatic cancer, because it is underfunded relative to breast cancer in terms of funding per unit suffering (let’s say using a DALY methodology) and therefore offer a greater potential return, ultimately, for an incremental investment.

I think this latter notion is precisely a notion of “fairness” with respect to the purely results-based paradigm GiveWell has been pursuing, and I also think it is a completely rationally defensible approach to evaluation.

If you accept that, then the notion of using the kind of methodology GiveWell pursues to attempt to compare the relative worth of charities within this overarching “fairness” model, is potentially reasonable.

And while I’ve provided a very specific example relating to two easily comparable charitable alternatives (i.e., both relating to disease funding), I suspect that the concept can probably be rationally expanded to address the relative value of less similar charitable alternatives. The top-level comparison will, I think, necessarily include a subjective comparison on the part of the donor…but as you’ve acknowledged before, that level of comparison may best be left to donors.

Now, I don’t suggest that I know how GiveWell could effectively incorporate this into it’s processes. I suppose that’s an exercise left to the reader, for the time being. 😉

But to the basic question of whether “fairness” properly has any role in evaluating charitable alternatives, to my view, the answer is yes, it does.

Unless you are prepared to argue that measurable results are the only rational approach to evaluation — and to accept the consequences of extending that notion to it’s rational extremes: for example, concluding that all current funding for activities still in a basic research phase, or even just in an immature phase where their measurable results are still small, should be redirected to more mature activities that produce greater measurable results, because they are more efficient — I think that you probably have to conclude that “fairness” _does_ have a rational place in evaluation of charitable alternatives, even if GiveWell is not (yet) prepared to incorporate such into it’s own evaluation activities.

• Holden on May 6, 2010 at 10:35 am said:

J.S., I think there are two separate issues here that it is helpful to separate.

1. Measurable vs. unmeasurable. It is almost certainly true that in some cases, an unmeasurable intervention does more good than a measurable one. However, since we have no way of knowing this and are seeking to advise individual donors with little expertise/relationships, we think it is appropriate to have a bias toward (not an absolute rule favoring) more measurable results.

There may be times when intervention A is less measurable than intervention B, but its upside is so much higher as to make it a better investment on balance. We will write about this more in the future.

2. Maximum impact vs. fairness. If, in fact, an individual donor’s expected impact is higher when funding “neglected” diseases – for any combination of reasons, including the possible ones you gave in your hypothetical – then we think that doing so is appropriate. However, the mere fact that an area is “neglected” does not mean that donations to it are relatively cost-effective. For example, the fact that “orphan” diseases get less attention is partly due to the fact that they affect less people. So focusing on them may be more or less cost-effective than focusing on better-known diseases; it comes down to specifics.

What we advocate is to (a) donate solely on the basis of impact; (b) incorporate questions such as “What is neglected/underfunded?” to the extent that, and only to the extent that, they bear on how much impact can be achieved for a given donation; (c) give no weight to “fairness” in and of itself.

• J. S. Greenfield on May 7, 2010 at 9:42 pm said:

Again, I think we largely agree, but I would note that:

1) Measurable vs. unmeasurable is not necessarily a binary scale. So even within domains that are less measurable, there may be some measures that can distinguish among different charities working in that domain. It is potentially useful to help donors to draw such distinctions, assuming they have concluded that there is sufficient potential upside to overcome an appropriate discount for less than optimal measures.

This is one place where I would expect there is potentially some role for GiveWell, as it grows and expands over time.

2) There’s no question that being neglected does not, in and of itself, mean that it makes sense to commit additional resources, which is precisely why I suggested potential measures to quantify and compare the level of funding: funding dollars per death, or per incidence of the disease.

But this is another area where GiveWell might be able to play a useful role, in the future — helping donors to identify rational ways to evaluate and compare charities working in domains where measuring results is not currently a viable methodology (which is probably currently a very large set of domains, perhaps even the majority or the vast majority).

3) As to the issue of this kind of “impact” based methodology being distinct from “fairness” — at the end of the day, that’s a matter of semantics. But I’ll note that, in the original post, you posed the concept of “fairness” as taking into account “merit” vs. “results.” By that definition, considering anything other than directly measurable results is, presumably, a methodology that incorporates a sense of “fairness.”

I certainly don’t argue that donor decisions should not consider “fairness” in the sense of distributing funds to all parties, just because they exist, or based on an evaluation of how hard they try, with no concern for any form of merit. But it didn’t seem that the original post was defining “fairness” quite so narrowly. 😉

• Holden on May 10, 2010 at 11:32 am said:

I agree on #1 and #2.

Re: #3, the post refers to likely/expected results, not proven results. A program with a murky track record + high upside could easily be considered to have superior “expected results” to a program with a very strong track record + lower upside, for example. The intended argument is that we should chase the best expected/likely results even if that biases us toward certain sectors, certain size/age charities, etc.