The GiveWell Blog

Fryer and Dobbie on the Harlem Children’s Zone: Significance

My last post summarized a very recent paper by Fryer and Dobbie, finding large gains for charter school students in the Harlem Children’s Zone. (You can get the study here (PDF)).

I believe that this paper is an unusually important one, for reasons that the paper itself lays out very well in the first couple of pages: a wealth of existing literature has found tiny, or zero, effects from attempts to improve educational outcomes. This is truly the first time I (and, apparently, the authors) have seen a rigorously demonstrated, very large effect on math performance for any education program at all.

This study does not show that improving educational outcomes is easy. It shows that it has been done in one case by one program.

The program in question is a charter school that is extremely selective and demanding in its teacher hiring (see pages 6-7 of the study), and involves highly extended school hours (see page 6). Other high-intensity charter schools with similar characteristics have long been suspected of achieving extraordinary results (see, for example, our analysis of the Knowledge is Power Program). This study is consistent with – but more rigorous than – existing evidence suggesting that such high-intensity charter schools can accomplish what no other educational program can.

Those who doubt the value of rigorous outcomes-based analysis should consider what it has yielded in this case. Instead of scattering support among a sea of plausible-sounding programs (each backed with vivid stories and pictures of selected children), donors – and governments – are now in a position where they can focus in on one approach far more promising than the others. They can work on scaling it up across the nation and investigate whether these encouraging effects persist, and just how far the achievement gap can be narrowed. As a donor, the choice is yours: support well-meaning approaches with dubious track records (tutoring, scholarships, summer school, extracurriculars, and more), or an approach that could be the key to huge gains in academic performance.

Reasons to be cautious

The result is exciting, but:

  • This is only one study, and the sample size isn’t huge (especially for the most rigorous randomization-based analysis). It’s a very new study – not even peer-reviewed yet – and it hasn’t had much opportunity to be critiqued. We look forward to seeing both critiques of it and (if the consensus is that it’s worth replicating) the results of replications.
  • Observed effects are primarily on math scores. Effects on reading are encouraging but smaller. Would this improved performance translate to improvements on all aspects of the achievement gap (either causally or because the higher test scores are symptomatic of a general improvement in students’ personal development)? We don’t know.
  • Just because one program worked in one place at one time doesn’t mean funding can automatically translate to more of it. Success could be a function of hard-to-replace individuals, for example. Indeed, if the consensus is that this program “works,” figuring out what about it works and how it can be extended will be an enormous challenge in itself.
  • With both David Brooks and President Obama lending their enthusiasm fairly recently, the worthy mission of replicating and evaluating this program could have all the funding it needs for the near future. Individual donors should bear this in mind.

Related posts:

Fryer and Dobbie on Harlem Children’s Zone: What they found

The Fryer/Dobbie study on the Harlem Children’s Zone is, in my view, an extremely important work that should seriously affect how donors think about the cause of promoting equality of opportunity in the U.S. (Longtime readers of this blog know that we don’t often say something like this.) This post will simply summarize what it found, and I’ll discuss my views on the significance in another post.

Here’s the link to the paper on Roland Fryer’s website (PDF). Note that it is prominently marked “PRELIMINARY AND INCOMPLETE” and is not yet peer reviewed. All page numbers refer to this paper.

The big picture

The study tracks children in the Harlem Children’s Zone, “a 97-block area in central Harlem, New York, that combines reform-minded charter schools with a web of community services created for children from birth to college graduation that are designed to ensure the social environment outside of school is positive and supportive” (page 2).

It makes a rigorous case, using two different methodologies, that the charter school investigated in this area (Promise Academy I, 6-8 grade) had a huge effect on children’s academic performance as measured by standardized tests. The effect size was particularly large in math – enough to close the black-white achievement gap, something that no previous (rigorously demonstrated) effect has come close to. It was much smaller in English Language Arts (ELA), but still as large an effect as has been rigorously demonstrated by just about any past study. (The paper itself gives many references to such past studies on pages 1-2.)

From limited evidence, the authors also argue that other social programs in Harlem Children’s Zone did not have large effects, and were likely not the key to the school’s effects.

The charts below are taken from pages 32-33. They show the progress of scores over time for attendees (compliers) vs. non-attendees (CCM), measured in standard deviations. (The actual meaning of these lines involves some extrapolation to correct for issues of bias – discussed more below).

Some details

For the full details, read the full study (linked above). However, here are answers to some of the first questions that I usually raise with studies like these. I split them into three parts: charter school analysis I (the methodology that generated the headline results and the charts above); charter school analysis II (an alternative methodology that found similar results); other analysis (less rigorous and less conclusive analysis about other aspects of the Harlem Children’s Zone).

Charter school analysis I

The study compared lottery winners to lottery losers at the oversubscribed Promise Academy. Due to oversubscription, the two groups had been separated using a randomized lottery – implying that the only salient difference between them was randomly determined. (For more on the advantages of randomization in assessing program effects, see the Poverty Action Lab’s writeup.) 182 6-8 grade “lottery winners” were compared to 304 “lottery losers”; the groups were initially similar according to available information (see page 35).

Attrition (students’ leaving the school) was significant, as it is in many schools (see page 12). However, the analysis did not directly compare attendees to non-attendees. It first compared “lottery winners” to “lottery losers,” ignoring whether winners had actually enrolled, and found significant effects on both math and reading scores (see the top charts on pages 32-33; these are not the same as the charts above in this post). It then used statistical extrapolation (2SLS approach) to estimate the differences between those who attended and those who did not attend only because they had lost the lottery. (The second approach is what generated the charts above.)

Statistically significant and unprecedented gains were seen in math scores for all subgroups. Effects on ELA test scores were relatively large compared to past effects of educational interventions, but were not as large, and were not statistically significant except in 8th grade. (See page 38.)

A somewhat similar analysis was used for grades K-3, but the school was not significantly oversubscribed (see page 3), making the estimation messier. The study found positive but not statistically significant effects on test scores, and statistically significant effects on attendance rates (see page 41).

Added 2:29pm: the study also partially addresses the possibility of cheating:

Jacob and Levitt (2003) use an algorithm for detecting teacher cheating to show there are serious cases of teacher
or administrator cheating on high-stakes tests in four to five percent of Chicago elementary schools. While we do
not have the question-by-question data necessary to run the Jacob-Levitt algorithm, we have the results of low-
stakes interim test scores given by the charter schools for internal instruction purposes. Student performance on the
low-stakes tests is comparable to the high-stakes tests.

Charter school analysis II

The authors did an alternate analysis “tak[ing] advantage of two important features of the HCZ charter schools: (1) anyone is eligible to enroll in HCZ’s schools, but only students living inside the Zone are actively recruited by the HCZ staff; and (2) there are cohorts of children that are ineligible due to the timing of the schools’ inception and their age” (page 3). Examining tens of thousands of children, their estimation found that being eligible due to age and physically in the Harlem Children’s Zone had significant effects, over and above having either one of these characteristics (page 37).

This analysis is not as rigorous as the first method described, but still seems reasonably strong/persuasive to me. It found, again, large and statistically significant effects on math and positive but smaller/non-statistically significant effects on ELA scores for 6-8 graders (page 37). It also found large and statistically significant effects on both for grades K-3 (page 40).

Other analysis

Pages 19-21 describe less rigorous analysis of the Harlem Children’s Zone’s early childhood programs, failing to find significant effects.

Page 22 argues that the effects described above should be attributed primarily to the schools and not to the network of community services, for reasons including:

  • School attendees living outside of the Harlem Children’s Zone (and thus ineligible for many of the community services) benefited just as much as those within the Zone.
  • Lottery winners’ siblings (who became eligible for many family benefits) saw positive but small gains in test scores and attendance.

Related posts:

Harlem Children’s Zone closes achievement gap?

Fascinating claim reported by David Brooks. The study (by Roland Fryer and Will Dobbie) doesn’t seem to be available anywhere as of this writing.

Fryer and his colleague Will Dobbie have just finished a rigorous assessment of the charter schools operated by the Harlem Children’s Zone. They compared students in these schools to students in New York City as a whole and to comparable students who entered the lottery to get into the Harlem Children’s Zone schools, but weren’t selected …. the most common education reform ideas — reducing class size, raising teacher pay, enrolling kids in Head Start — produce gains of about 0.1 or 0.2 or 0.3 standard deviations … Promise Academy produced gains of 1.3 and 1.4 standard deviations. That’s off the charts. In math, Promise Academy eliminated the achievement gap between its black students and the city average for white students.

It’s a strong claim about one of the best-reputed and -publicized charities working on the cause of equality of opportunity. We’ve been checking the Harlem Children’s Zone website (and contacting their representatives) for years, without seeing any documentation of impact – but perhaps it’s now on the way. If anyone gets a copy of this paper or knows how we can, please let us know.

Update: thanks to everyone who sent me a link to the paper. I have now read it, and my initial impression is that this is an extremely important work that should seriously affect the way a donor views this cause. I will be making at least one more post on the topic today, and possibly more than one.

Small, unproven charities

Imagine that someone came to you with an idea for a startup business and offered you a chance to invest in it. Which of the following would you require before taking the plunge?

  • Familiarity with (or at least a lot of information about) the people behind the project
  • Very strong knowledge of the project’s “space” (understanding of any relevant technologies, who the potential customers might be, etc.)
  • As much information as possible about similar projects, both past and present

Unless you’re an unusually adventurous investor, you probably answered with “All of the above.” After all, there’s a risk of losing your investment – and unlike with established businesses (which have demonstrated at least some track record of outdoing the competition), here your default assumption should be that that’s exactly what will happen.

Now what is the difference between this situation and giving to a startup charity?

One difference is that with a charity, you know from the beginning that you won’t be getting your donation back. But this doesn’t mean there isn’t risk – the risk just takes a different form. Presumably, your goal in donating to a charity is to improve the world as much as possible. If the startup charity you help get off the ground ends up being much less impactful (on a per-dollar basis) than established charities, then your support was a mistake. If it ends up having no meaningful impact, you’ve lost your shirt.

And in my opinion, the worst case possible is that it succeeds financially but not programmatically – that with your help, it builds a community of donors that connect with it emotionally but don’t hold it accountable for impact. It then goes on to exist for years, even decades, without either making a difference or truly investigating whether it’s making a difference. It eats up money and human capital that could have saved lives in another organization’s hands.

As a donor, you have to consider this a disaster that has no true analogue in the for-profit world. I believe that such a disaster is a very common outcome, judging simply by the large number of charities that go for years without ever even appearing to investigate their impact. I believe you should consider such a disaster to be the default outcome for an new, untested charity, unless you have very strong reasons to believe that this one will be exceptional.

So when would I consider appropriate for a donor to invest in a small, unproven charity? I would argue that all of the following should be the case:

  1. The donor has significant experience with, and/or knowledge regarding, the nonprofit’s client base and the area within which it’s working. For example, a funder of a new education charity should be familiar with the publicly available literature on education, as well as with the specific school system (and regulations) within which the project is working. A funder of a project in Africa should be familiar with past successes and failure in international aid in general, and should spend time in the area where the project will be taking place.
  2. The donor has reviewed whatever information is available about past similar projects and about the assumptions underlying this project. If similar, past projects have failed, the donor has a clear sense of why they failed and what about the current project may overcome those obstacles.
  3. The donor has a good deal of confidence in the people running the nonprofit, either because s/he know them personally or because s/he has an excellent sense for their previous work and past accomplishments. (Enough confidence in the people can lower the need for the above two points, to some extent.)
  4. The donor feels that the organization is doing whatever it reasonably can to measure its own impact over time. The donor is confident that– within a reasonable time frame – if the project succeeds, it will be able to prove its success; if it fails, it will see this and it will fold. Until impact is demonstrated, there is no need for the kind of scale that comes with taking many donations from casual donors. As stated above, I believe that the overwhelming majority of charities do not meet this criterion.

If you know a lot about cars, you might try to build your own car. But if you don’t, you’re much better off with a name brand. Likewise, casual donors are better off funding charities that have track records; experimental charities should start small and accumulate track records. This is why we are comfortable with our bias toward larger charities.

Road safety

From the abstract of a new study from the Center for Global Development:

In the experiment, messages designed to lower the costs of speaking up were placed in a random sample of over 1,000 minibuses in Kenya. Analysis of comprehensive insurance data covering a two year period that spanned the intervention shows that insurance claims for treated vehicles decreased by one-half to two-thirds, compared with the control group. In addition, claims involving an injury or death decreased by at least 50 percent. Passenger and driver surveys indicate that passenger heckling contributed to this reduction in accidents

I haven’t read this paper (just the abstract), largely because we haven’t seen any major charities focusing on interventions like this one. Note that the Disease Control Priorities Project sees “increased speeding penalties, enforcement, media campaigns, and speed bumps” as having high potential cost-effectiveness (see this table).

Qualitative evidence vs. stories

Our reviews have a tendency to discount stories of individuals, in favor of quantitative evidence about measurable outcomes. There is a reason for this, and it’s not that we only value quantitative evidence – it’s that (in our experience) qualitative evidence is almost never provided in a systematic and transparent way.

If a charity selected 100 of its clients in a reasonable and transparent way, asked them all the same set of open-ended questions, and published their unedited answers in a single booklet, I would find this booklet to be extremely valuable information about their impact. The problem is that from what we’ve seen, what charities call “qualitative evidence” almost never takes this form – instead, charities share a small number of stories without being clear about how these stories were selected, which implies to me that charities select the best and most favorable stories from among the many stories they could be telling. (Examples: Heifer International, Grameen Foundation, nearly any major charity’s annual report.)

A semi-exception is the Interplast Blog, which, while selective rather than systematic in what it includes, has such a constant flow of stories that I feel it has assisted my understanding of Interplast’s activities. (Our review of Interplast is here.)

I don’t see many blogs like this one, and I can’t think of a particularly good reason why that should be the case. A charity that was clear, systematic and transparent before-the-fact about which videos, pictures and stories it intended to capture (or that simply posted so many of them as to partly alleviate concerns about selection) would likely be providing meaningful evidence. If I could (virtually) look at five random clients and see their lives following the same pattern as the carefully selected “success stories” I hear, I’d be quite impressed.

But this sort of evidence seems to be even more rare than quantitative studies, which are at least clear about how data was collected and selected.