Note: a positive effect of recent events was a set of substantive concerns about our model raised by non-profit insiders and others. We put all our reasoning and assumptions on our website precisely because we value critiques that will help us improve, and we look forward to responding to and discussing all concerns shortly. This post, however, focuses on the work we’ve been doing recently, evaluations for Cause 4 (K-12 education).

Three of our finalists focus on creating new schools. Replications, Inc. does so through the public school system, while the Knowledge is Power Program (KIPP) and Achievement First create charter schools. In evaluating these programs, we’re hesitant to focus too much on test scores – the link between test scores and useful skills, or later life outcomes, is far from clear – but test scores are a start, and both the media coverage and the organizations themselves have focused on test scores as the primary evidence of their value-added. So what does the evidence say?
So far, it’s very unclear. All three of the organizations discussed sent us data from an incomplete set of their schools; when it came to providing comparison-school data (to put it in context), what we were provided is even more incomplete. The same is true for the independent reports available on the organizations’ websites. For example, the independent reports that KIPP lists focus either on very few schools, or examine many schools but look only at one year. It’s often unclear to us why particular schools, grades, and years were examined and others weren’t, especially in light of the fact that nearly all test-score data is publicly available through state governments. (Collecting such data may be time-consuming, and we wouldn’t necessarily expect the organizations themselves to do so – but what about funders?)
In our minds, if we want to gauge these organizations’ impact, it’s necessary to collect all easily available and relatively recent data (to avoid concerns about cherry-picking particularly successful schools), and study not just the schools relative to district- or state-wide averages, but the trajectory of students’ scores. The question has been raised whether KIPP, for example, effectively selects stronger students for participation, both through its acceptance process and through attrition (students’ dropping out as time goes on). So even if their students outperform “comparison group” students, this isn’t necessarily an effect of the schools themselves; if we can establish, on the other hand, that their students enter at similar levels to their peers but improve gradually over time, the “KIPP effect” will become much clearer.
This sort of observation wouldn’t support a “KIPP effect” as strongly as a true randomized study would, but it would still be much stronger than what we’ve seen to date. We’ve seen comparisons of particular schools to nearby schools, and to statewide averages, but nothing that is either comprehensive or attempts systematically to address the issues discussed above. For example, this report, by the Education Policy Institute which looks across many KIPP schools, but only at 5th grade test scores in 2003-04. Other available reports look only at a few schools, or schools in a particular city for a one academic year.
So, at this point our plan is to bite the bullet: get all the data we can from state governments, and do our own gruntwork putting it in a form that can be systematically analyzed. We certainly don’t want to reinvent the wheel if someone has already done this work – we’d rather read research from experts that have spent much more time on this than we have – but at this point we don’t see an alternative, because we can’t find anyone who is publishing research on the effect of these programs in a way that addresses concerns about selection/attrition/publication bias.
What do you think? Are we going too far? Is there a faster and better way to answer these questions? Do you know of any research that already exists on these issues (for these charities)?
Transparency is a really big deal to us because we believe that no matter how much we learn and no matter how hard we work, we can always be wrong. That’s why we invite as many people as possible into the conversation.
Measurement is about inviting someone else into the conversation: The Facts. The Facts have a lot to say, and they often contradict what we would have thought. That’s why we have to listen to them. Like transparency, measurement takes a lot of extra effort and expense; like transparency, it can’t solve all your problems by itself; and like transparency, it’s easily worth it if you agree that the issues are extremely complex, and that no matter how much sense something makes in your head, The Facts might disagree.
To a lot of people, humility means speaking with a certain tone of voice, or just plain keeping your mouth shut. If that’s what you think humility is, we don’t have it. To us, humility is constantly saying “The things that make sense to me could be wrong – that’s why I’m going to do everything I can to test them, against others’ ideas and against reality.” Instead of being silently dissatisfied with charity, we’re loudly dissatisfied, so that anyone who disagrees can respond. Instead of happily assuming our dollars are doing good, we demand to see The Facts, so they can respond too.
Audio for last Monday’s board meeting is on the way; in the meantime, here’s a summary. The meeting ran about five hours and had heated arguments, tension, drama, a couple car chases, and down-to-the-wire votes. The highlight was the Elie and I ended up reversing our position on 2 of the 3 causes we voted on. Here’s the story.