The GiveWell Blog

What’s so hard about rigorous self-evaluation?

I’m not trying to be a jerk. I honestly want to know.

Most of the self-evaluation we see from charities looks at clients, but doesn’t compare them to (otherwise similar) non-clients. So it’s probably effective at preaching to the choir, but not at winning over skeptics. When we bring up the issues with it, we constantly hear things like “Of course we’d love to do a randomized/longitudinal study, but those are expensive and difficult and the funding isn’t there.”

This is how I imagine an interested charity could evaluate itself rigorously:

  1. Use a lottery to pick clients. Many/most humanitarian charities don’t have enough money to serve everyone in need. (And if they do have enough, there’s an argument that they don’t need more.) Instead of dealing with this issue by looking for the “most motivated” applicants, or using first-come first-serve (as most do), get everyone interested to sign up, fill out whatever information you need (including contact info), etc. – then roll the dice.

    Unethical? I don’t think so. I don’t see how this is any worse than “first-come first-serve.” It could be slightly worse than screening for interest/motivation … but (a) it’s also cheaper; (b) I’m skeptical of how much is to be gained by taking the top 60% of a population vs. randomly selecting from the top 80%; (c) generating knowledge of how to help people has value too.

    Cost: seems marginal, cheaper than using an interest/motivation screen. You do have to enter the names into Excel, but for 1000 people, that’s what, 5 hours of data entry?
    Staff time: seems marginal.

  2. Get contact information from everyone. This is actually part of step 1, as described. These days you can get on the Web in a library, homeless people blog, and I’m guessing that even very low-income people can and do check email. Especially if you give them a reason to (see below).

    Cost: see above
    Staff time: none

  3. Follow up with both clients and randomly selected non-clients, offering some incentive to respond. Incentives can vary (many times, with clients, providing information can be a condition of continuing support services). But if push comes to shove, I don’t think a ton of people would turn down $50.

    Cost: worst case, sample 100 clients and 100 non-clients per class and pay them $50 ea. That’s a decent amt of sample size. Follow up with each cohort at 2, 5, and 10 years. That’s a total of 600 people followed up with = $30k/yr, absolute maximum.
    Staff time: you do have to decide how to follow up, but once you’ve done that, it’s a matter of sending emails.

  4. Check and record the followup responses. If possible, get applicants to provide proof (of employment, of test scores) for their $50. Have them mail it to you, and get temps to audit it and punch it into Excel.

    Cost: assuming each response takes 30min to process and data entry costs $10/hr, that’s $3k/yr.
    Staff time: none.

  5. And remember: Have Fun! Did you think I was going to put something about rigorous statistical analysis here? Forget it. Data can be analyzed by anyone at any time; only you, today, can collect it. When you have the time/funding/inclination, you can produce a report. But in the meantime, just having the data is incredibly valuable. If some wealthy foundation (or a punk like GiveWell) comes asking for results, dump it on them and say “Analyze it yourself.” They’re desperate for things to spend money on; they can handle it.

    (Also, I’m not a statistics expert, but it seems to me that if you have data that’s actually randomized like this, analyzing it is a matter of minutes not hours. Writing up your methodology nicely and footnoting it and getting peer-reviewed and all that is different, but you don’t need to do that if you’re just trying to gauge yourself internally.)

The big key here, to me, is randomization. Trying to make a good study out of a non-randomized sample can get very complicated and problematic indeed. But if you separate people randomly and check up on their school grades or incomes (even if you just use proxies like public assistance), you have a data set that is probably pretty clean and easy to analyze in a meaningful way. And as a charity deciding whom to serve, you’re the only one who can take this step that makes everything else so much easier.

I tried to be generous in estimating costs, and came out to ~$35k/yr, almost all of it in incentives to get people to respond. Nothing to sneeze at, but for a $10m+ charity, this doesn’t seem unworkable. (Maybe that’s what I’m wrong about?) And this isn’t $35k per study – this is $35k/yr to follow every cohort at 2, 5, and 10 years. That dataset wouldn’t be “decent,” it would be drool-worthy.

And the benefits just seem so enormous. First and foremost, for the organization itself – unless its directors are divinely inspired, don’t they want to know if their program is having an impact? Secondly, as a fundraising tool for the growing set of results-oriented foundations. Finally, just for the sake of knowledge and all that it can accomplish. Other charities can learn from you, and help other people better on their own dime.

The government can learn from you – stop worrying about charity replacing government and instead use charity (and its data) as an argument for expanding it. In debates from Head Start to charter schools to welfare, the research consensus is that we need more research – and the charities on the ground are the ones who are positioned to do it, just by adding a few tweaks onto the programs they’re already conducting.

So what’s holding everyone back? I honestly want to know. I haven’t spent much time around disadvantaged people and I’ve never run a large operation, so I could easily be missing something fundamental. I want to know how difficult and expensive good self-evaluation is, and why. Please share.


  • michael vassar on October 27, 2007 at 3:11 am said:

    I think that people at most charities are committed to the belief that they are doing a good job and donors need to donate more. Any information that could threaten that belief is a threat to them.

    Basically, it’s the usual problem that almost everyone thinks it’s legitimate to have preferences over beliefs (and in a sense they may be right, fact is, I do have such preferences even if I’m uncomfortable about it)

    By the way, this is very interesting

  • Stephanie Hope on November 1, 2007 at 1:15 pm said:

    I am currently the project coordinator for a university run study examining the effectiveness of a program for foster kids through a randomized design.

    I agree that the information gained through a randomized design is worth the effort it takes to collect it (and the lottery seems like a good system), but I think you are underestimating the effort it takes to collect data, especially in underprivledged populations.

    If you just send out a questionaire you are probably going to get a 50% response rate at best, and it is VERY LIKELY THAT THE RATE AND COMPOSITION OF RESPONDERS WILL DIFFER BETWEEN INTERVENTION AND CONTROL GROUPS. You need to have people tracking each case and getting on the phones or going out into the field to collect data and to insure that the rate of attrition is minimal and does not differ meaningly between groups.

    Another issue to consider is consent to research. I don’t know how this would work in a non-university setting, but here we are mandated to obtain informed consent for research and to protect the rights of research subjects (especially if the subjects are children).

    And if you are working with an underprivledged population there are barriers in tracking participants. Cell phones get disconnected, people change addresses or enter institutions (prisons, hospitals, rehab centers), and many do NOT have email and computer access.

    I remember reading a previous blog entry about researchers and charities joining forces. This is what we do in our study (we work in collarboration with foster care agencies) and I think this may be the most feasable option.

  • Holden on November 2, 2007 at 8:36 am said:

    Thanks for the comments, Stephanie. It’s great to get a more experienced perspective.

    Everything you say makes sense to me. I do want to put it out there that a lot of charities already do the hard work of following their clients after “graduation” from their programs – in many cases the randomization is actually the only thing that is really missing (although I also recognize that following non-clients is probably much harder even than following former clients).

    One other thing I wonder is how much the problems you describe could be mitigated by the way you use incentives. My basic reasoning is that if you have $X, rather than giving it to an employee to hound clients and follow up with them, it seems potentially better to give it to the clients themselves, conditional on participation. Partly because the money is presumably worth more to them; partly because if they really want to participate, they will have a much easier time doing so than your tracker has finding them; and partly because you can structure the incentive in particularly appealing ways. (For example, as a lottery with a big grand prize – this seems to have big psychological appeal, esp. to low-income people).

    What incentives do people have to participate in your study? What do you think of this idea?

  • Stephanie on November 2, 2007 at 12:57 pm said:

    We do use a $50 incentive to compensate participants for completing 2 hour interview, and it definitely makes a difference (and only seems fair after all). Used on their own, however, incentives may set up a responder bias; people who are more influenced by money will be more likely to respond. You still need someone to go after non-responders. But overall, yes, incentives help a lot.

    And if charities already have they systems in place to track their participants, it wouldn’t be too hard to track non-participants as well.

  • Holden on November 2, 2007 at 1:08 pm said:

    The other thing I think is that you can get pretty far without the 2-hour interview. If you have someone’s Social Security Number and their permission, can’t you check up on pretty basic stuff without their help? I don’t know the answer to this … but how about checking the public assistance rolls, the correction system, etc.? I know there’s some kind of publicly available database where you can look up who’s enrolled in what college (one of our applicants used it – if they’d combined it with a lottery this would have been the cheapest rigorous study ever).

    Somewhere in between is the approach I outlined above – just asking people to send a letter or email reporting something basic like their income or educational status. I do think the 2-hour interview approach is ideal, but what I’m describing might allow a charity to gather meaningful data more continuously, rather than saving up for years for “the big study.”

  • Amanda Fernandez on November 12, 2007 at 9:48 am said:

    You could write a dissertation on the many reasons charities don’t evaluate their activities more. The reasons run the gamut from fear of being “found out” to the lack of resources provided for this, to the complexity involved, to the donor’s lack of interest, to flat out rejection from participants to be analyzed more.

    Having worked with many poor communities in the past, most are pretty sick of being evaluated, while nothing changes for them. And the fact is there are tons of studies out there on the problems of poverty and the poor – more information on “what’s wrong” is not lacking.

    And, re: Holden’s last post, participant data and information is incredibly sensitive (things like SS numbers, legal status, income, investments). Lots of legal and privacy issues arise when dealing with this sort of information. Very complex issue, not to be taken lightly.

    On a positive note, one donor that takes data and evaluation super seriously is the Baltimore-based Annie E. Casey Foundation (a UPS charity). This group is pretty cutting edge in terms of how it uses data to shape programs and inform the public about policy implications (in no small part b/c their board is made up of number-crunching UPS execs).

  • Hi Amanda, welcome and thanks for your comment. A couple thoughts:

    1. I agree with you that legal and privacy issues aren’t to be taken lightly. But I don’t see why they can’t be a condition of aid. This is another way in which charities are uniquely positioned to carry out research; presumably, if what they offer is very valuable, people will be willing to participate in a study to get it.

    2. I don’t agree with you that there’s already enough information on “what’s wrong.” As far as I can tell, nobody knows much of anything about what really works in improving life outcomes for the disadvantaged. If you do, could you point me to a starting point?

    3. It makes perfect sense to me that people are “sick of being evaluated.” Firstly, any kind of learning and improvement, esp. in this area, is going to take a long time no matter what. But also, I feel like most of the evaluation that takes place today results in nothing – it’s done because it was part of the grant proposal, but it is rarely rigorous enough to conclude anything, nor are many interested in using it to inform their opinions. This post was about the former – my main point being that randomizing evaluation would slightly increase the cost of it while changing it from basically useless to extremely informative – and the latter issue remains, but GiveWell hopes to make a dent in it.

  • Amanda Fernandez on November 23, 2007 at 3:35 pm said:

    Hi Holden,

    A few responses to your last:

    1. Few organizations want the responsibility of housing sensitive info like SS#s in house. Some fear the data could be lost or misused somehow (credit card companies lose sensitive info all the time – imagine a barebones nonprofit?) or that the government will come knocking asking for access for more nefarious purposes (like going after people using false SS#s in employment projects and free tax preparation programs, or Homeland Security coming after it to find & purge people for security reasons).

    Also, what you need to understand is that most NGOs and charities strengths’ are not centered around data collection. Most DO collect data from their clients, but they are not very good at designing data systems, collecting data, storing data and developing systems to use/analyze it over time. This all costs money. If their donors really cared, they would fund rigorous evaluation – as you point out, few obviously do.

    2. The info is absolutely out there on what works in improving life outcomes for the disadvantaged. It is just very scattered, and of course people have strong opinions on what aspects should take priority – juvenile justice reform or education, poverty alleviation or improved preventative health care, youth employment, etc.

    Anyway, one starting point for your edification is The Annie E. Casey Foundation website. FYI – this foundation brought together experts over a period of years to discuss how to do community development best and improve family outcomes. They covered what worked and what didn’t extensively and used this information to design their huge, 10-year “Making Connections” initiative underway now which has a huge, nationwide data collection component. Check out their website to see what data they have already collected and analyzed (

    Also, this foundation also funds and publishes a great resource called “Kids Count”. This annual report is data heavy and basically outlines the status of states vis-a-vis the most significant indicators that can determine whether kids succeed in life in America. Again – check out

    I worked for this foundation so I know their work better than others, and how seriously they take data collection/evaluation, but all the big foundations fund data collection to some extent, especially when it comes to different pieces of poverty alleviation. Go to the Brookings Institute (a big recipient of these funds) for more great data and info on “what’s wrong”.

    3. Sure, there are many NGOs that do crummy evaluation work, particularly in the U.S. But to see something impressive, check out the data component that is being designed by AECF for the Making Connections initiative – extensive & super pricey, but will ultimately be extremely informative on whether the project is accomplishing what it sets out to do.

  • One good practice that is emerging from academic work on experimental design for evaluations is to choose pairs prior to randomization that are exactly alike or nearly so. The treatment is subsequently randomized among these pairs and the result is a control group that closely mirrors the treatment group in every way minus access to the intervention. This technique is useful because it reduces the likelihood that the impact of the intervention will be conflated with other systematic differences between groups. If an evaluation has already been carried out and there is concern that groups are dissimilar in ways other than the allocation of the treatment, matches can be generated based on the likelihood of belonging to the treatment versus control group based on other characteristics for which the analyst has data.

Comments are closed.