Note: a positive effect of recent events was a set of substantive concerns about our model raised by non-profit insiders and others. We put all our reasoning and assumptions on our website precisely because we value critiques that will help us improve, and we look forward to responding to and discussing all concerns shortly. This post, however, focuses on the work we’ve been doing recently, evaluations for Cause 4 (K-12 education).

Three of our finalists focus on creating new schools. Replications, Inc. does so through the public school system, while the Knowledge is Power Program (KIPP) and Achievement First create charter schools. In evaluating these programs, we’re hesitant to focus too much on test scores - the link between test scores and useful skills, or later life outcomes, is far from clear - but test scores are a start, and both the media coverage and the organizations themselves have focused on test scores as the primary evidence of their value-added. So what does the evidence say?

So far, it’s very unclear. All three of the organizations discussed sent us data from an incomplete set of their schools; when it came to providing comparison-school data (to put it in context), what we were provided is even more incomplete. The same is true for the independent reports available on the organizations’ websites. For example, the independent reports that KIPP lists focus either on very few schools, or examine many schools but look only at one year. It’s often unclear to us why particular schools, grades, and years were examined and others weren’t, especially in light of the fact that nearly all test-score data is publicly available through state governments. (Collecting such data may be time-consuming, and we wouldn’t necessarily expect the organizations themselves to do so - but what about funders?)

In our minds, if we want to gauge these organizations’ impact, it’s necessary to collect all easily available and relatively recent data (to avoid concerns about cherry-picking particularly successful schools), and study not just the schools relative to district- or state-wide averages, but the trajectory of students’ scores. The question has been raised whether KIPP, for example, effectively selects stronger students for participation, both through its acceptance process and through attrition (students’ dropping out as time goes on). So even if their students outperform “comparison group” students, this isn’t necessarily an effect of the schools themselves; if we can establish, on the other hand, that their students enter at similar levels to their peers but improve gradually over time, the “KIPP effect” will become much clearer.

This sort of observation wouldn’t support a “KIPP effect” as strongly as a true randomized study would, but it would still be much stronger than what we’ve seen to date. We’ve seen comparisons of particular schools to nearby schools, and to statewide averages, but nothing that is either comprehensive or attempts systematically to address the issues discussed above. For example, this report, by the Education Policy Institute which looks across many KIPP schools, but only at 5th grade test scores in 2003-04. Other available reports look only at a few schools, or schools in a particular city for a one academic year.

So, at this point our plan is to bite the bullet: get all the data we can from state governments, and do our own gruntwork putting it in a form that can be systematically analyzed. We certainly don’t want to reinvent the wheel if someone has already done this work - we’d rather read research from experts that have spent much more time on this than we have - but at this point we don’t see an alternative, because we can’t find anyone who is publishing research on the effect of these programs in a way that addresses concerns about selection/attrition/publication bias.

What do you think? Are we going too far? Is there a faster and better way to answer these questions? Do you know of any research that already exists on these issues (for these charities)?