# Evaluating charter schools

Note: a positive effect of recent events was a set of substantive concerns about our model raised by non-profit insiders and others. We put all our reasoning and assumptions on our website precisely because we value critiques that will help us improve, and we look forward to responding to and discussing all concerns shortly. This post, however, focuses on the work we've been doing recently, evaluations for Cause 4 (K-12 education).

Three of our finalists focus on creating new schools. Replications, Inc. does so through the public school system, while the Knowledge is Power Program (KIPP) and Achievement First create charter schools. In evaluating these programs, we’re hesitant to focus too much on test scores – the link between test scores and useful skills, or later life outcomes, is far from clear – but test scores are a start, and both the media coverage and the organizations themselves have focused on test scores as the primary evidence of their value-added. So what does the evidence say?

So far, it’s very unclear. All three of the organizations discussed sent us data from an incomplete set of their schools; when it came to providing comparison-school data (to put it in context), what we were provided is even more incomplete. The same is true for the independent reports available on the organizations’ websites. For example, the independent reports that KIPP lists focus either on very few schools, or examine many schools but look only at one year. It’s often unclear to us why particular schools, grades, and years were examined and others weren’t, especially in light of the fact that nearly all test-score data is publicly available through state governments. (Collecting such data may be time-consuming, and we wouldn’t necessarily expect the organizations themselves to do so – but what about funders?)

In our minds, if we want to gauge these organizations’ impact, it’s necessary to collect all easily available and relatively recent data (to avoid concerns about cherry-picking particularly successful schools), and study not just the schools relative to district- or state-wide averages, but the trajectory of students’ scores. The question has been raised whether KIPP, for example, effectively selects stronger students for participation, both through its acceptance process and through attrition (students’ dropping out as time goes on). So even if their students outperform “comparison group” students, this isn’t necessarily an effect of the schools themselves; if we can establish, on the other hand, that their students enter at similar levels to their peers but improve gradually over time, the “KIPP effect” will become much clearer.

This sort of observation wouldn’t support a “KIPP effect” as strongly as a true randomized study would, but it would still be much stronger than what we’ve seen to date. We’ve seen comparisons of particular schools to nearby schools, and to statewide averages, but nothing that is either comprehensive or attempts systematically to address the issues discussed above. For example, this report, by the Education Policy Institute which looks across many KIPP schools, but only at 5th grade test scores in 2003-04. Other available reports look only at a few schools, or schools in a particular city for a one academic year.

So, at this point our plan is to bite the bullet: get all the data we can from state governments, and do our own gruntwork putting it in a form that can be systematically analyzed. We certainly don’t want to reinvent the wheel if someone has already done this work – we’d rather read research from experts that have spent much more time on this than we have – but at this point we don’t see an alternative, because we can’t find anyone who is publishing research on the effect of these programs in a way that addresses concerns about selection/attrition/publication bias.

What do you think? Are we going too far? Is there a faster and better way to answer these questions? Do you know of any research that already exists on these issues (for these charities)?

Hi Elie.

Bob, given your experience in the field, which organizations or experts do you think have done research which could provide insight on how best to answer the questions which Elie poses in his post?

More generally, the process of determining which organization's programming will allow a donor to make the largest impact for their dollar is challenging. As a result, we encourage the community to challenge our thinking and point us in the direction of excellent research (this is the reason Elie asks the questions at the end of his post) that helps us answer these tough questions. Ultimately this will improve the public resource we create for donors who use the main GiveWell site and ensure that the donations that we make as an organization make the most impact possible.

• One crazy idea might be to try to find small individual communities where these charities have operated and look for changes in broader statistics — maybe adolescent crime or truancy rates if such things exist.

The other idea might be to randomly sample yourself instead of getting all the available data. Pick 100 students who left public schools for charity sponsored charter schools and pick 100 who did not. Then try to find some statistically significant change around the time they entered the new school.

The type of test is important as well. I’ve read that scores on tests of general intelligence tend to be remarkably consistent after the age of 5 or so. But, on the other hand, a history test might be completely dependent on how consistently and effectively a teacher presents the material. Does a change in score on a history test indicate the new school is doing more good for the student?

I guess the question is, if you had every statistic in the world at your fingertips, what would you be looking for?

What you’re seeking to do is extremely ambitious. You’re asking questions that evaluators from within the worlds of education and government have been asking for at least a hundred years. What constitutes educational success? What data do you look for? How do you compare apples to apples? How do you control for the affluence of the neighborhood, nutrition, parental support at home, or other outside-of-school factors? How do you control for the tremendous variance in schools due to local control and differences in state legislation and type of school charter?

It is a nightmare to prove the success or failure of even one testing program or federal school initiative, let alone to define and compare schools as though the school were the only factor in student effectiveness. You might want to begin understanding the world of education evaluation here: http://ies.ed.gov/ncee/
Or begin reading about educational evaluation at ERIC, the Educational Resources Information CEnter
http://eric.ed.gov/

The wheel doesn’t need to be re-invented, it needs to be invented. That’s a lot for you guys to bite off. Can you be a little more clear or specific about a single point of difference you are seeking, which might stand a chance of being evidenced in the data you’re requesting?

• Michelle on January 16, 2008 at 11:28 am said:

Also, do you know about the many existing rankings sites for parents?

http://www.greatschools.net/ is just one.

You’re in New York. Why not pay a few hundred dollars to someone from Bank Street College of Education for a three-hour tutorial on the problems of educational evaluation?

Better yet, why not take a course in social science research? As you’re seeing, measuring human progress is a matter of much more than crunching numbers. It’s choosing what numbers to crunch, and understanding why you believe those numbers might mean anything to long-term success, and whether you have any reason to think you’re correct about that. Any conclusions you come to with heterogenous data and a poor understanding of social science research and reporting will forever be of very limited usefulness.

To quote from the WSJ comments: Doing good work means having great people, great technology, and self-evaluation. That’s all “overhead.” The problem in the nonprofit sector is not too much of it, but too little.

It seems like you aren't convinced that test scores are a meaningful method of comparing schools, but then you go ahead and jump right in and use them because that's what the media does and what the organizations do. Why? Is it because evaluating something like the effectiveness of schools is difficult to do? It seems like you skip over a pretty important question in doing this. You seem to be saying right off the bat that test scores of school A compared to test scores of school B don't tell you enough about the schools. How does taking the derivative of the test scores make this a more useful number? There's a common criticism that statewide testing encourages teachers to teach the test alone and ignore untested subjects (the arts, history, etc.). This seems like it could improve test scores, but unless you believe better test scores are the goal of schools you still don't have a tool that's useful for evaluating them.

• Sean Stannard-Stockton on January 19, 2008 at 2:15 pm said:

Something for people to consider. Whether you think Elie and Holden are qualified or not to evaluate anything, every donors (foundation or individual) in the world is evaluating nonprofit programs before donating. GiveWell is just showing you what they are thinking.

They are not doing anything different than any of the many foundations in the world. Except they are letting you in on their thinking.

Michelle, I’ve been very impressed with your comments across the range of sites discussing GiveWell. But when you say, “What you’re seeking to do is extremely ambitious.” I have to disagree. GiveWell is trying to decide which nonprofits to give money too. Almost every American makes a decision like this every year.

You guys are so over. No one will ever take you seriously again.

I'm astounded, but not surprised, by the juvenility of most of the posts to this thread. Elementary school education in this country is failing way too many kids, and well-meaning people — including, in their own earnest way, Holden and Elie — are doing their best to identify solutions to the problem. And what do self-styled vigilantes from online "communities" (I use the term loosely) have to add to the conversation? Scatalogical references and insults. Holden and Elie, maybe it's time you guys thought about moderating your comments and creating a space for a civil discussion about these issues.

Mitch, your initial take on Givewell and Metafilter wasn’t exactly glowing, so I’m not expecting some sudden reversal here, but you’re being awefully lazy about blaming the juvenility here on Metafilter — and failing to credit the civil discussion from those of use who are interested in the conversation.

I can’t see any reason why Bob et al shouldn’t moderate the true crap comments here — one of the nimrods you’re talking about is trying to start a fight about it, essentially — but the goal should be to keep out noise, not criticism, and I can see a pretty clear distinction in the comments here between those two camps.

• Erich Riesenberg on January 22, 2008 at 11:46 am said:

Sean Stannard-Stockton writes: They are not doing anything different than any of the many foundations in the world. Except they are letting you in on their thinking. … GiveWell is trying to decide which nonprofits to give money too. Almost every American makes a decision like this every year.

Are you truly this clueless Sean? Givewell is not merely letting us in on their thinking, they are in fact telling people to follow their advice, and ridiculing people who reach different conclusions, in much the way you do. You need to pick a new horse, Sean.

Matt says:

Matt says:

January 17th, 2008 at 4:42 pm

It's one thing to encourage feedback. It's another to have a research effort so flawed that it requires a massive public intervention. If you want this to be a public effort, make it a public effort. Provide us with some incentive to do your work for you. Make a wiki, say, so we feel like we own the contributions we make instead of like we're feeding a parasitic host. And lose the 65k/yr babysitters if we're the ones you want doing the heavy lifting – deciding your whole methodology. Same goes for branding: if this is going to be a community instead of a private concern, it's got to be represented differently. What I want you to do is lose the whiz-kid branding, the corporate facade, and make this place a user-driven public resource, spurred by the same curiosity that made the kids start this up in the first place.

That's a pipe dream, though. I expect business as usual.

That’s a pipe dream, though. I expect business as usual.