The GiveWell Blog

Followup on Fryer/Dobbie study of “Harlem miracle”

I recently posted about a new, intriguing study on the Harlem Children’s Zone. It’s now been a little over a week since David Brooks’s op-ed brought the study some major attention, and I’ve been keeping up with the reaction of other blogs. Here’s a summary:

Methodology: unusually strong

I haven’t seen any major complaints about the study’s methodology (aside from a couple of authors who appear to have raised possible concerns without having fully read the study – concerns that I don’t believe apply to it). The Social Science Statistics Blog noted it as “a nice example of careful comparisons in a non-experimental situation providing useful knowledge.”

Many studies in this area – particularly those put out by charities – have major and glaring methodological flaws/alternative hypotheses (example). We feel that this one doesn’t, which is part of what makes it so unusual and interesting.

Significance: possibly oversold

David Brooks came under a lot of criticism for his optimistic presentation of the study, stating “We may have found a remedy for the achievement gap.” Thoughts on Education Policy gives a particularly thorough overview of reasons to be cautious, including questions about whether improved test scores really point to improved opportunities and about whether this result can be replicated (“Each school has an inordinate number of things that make it unique — the Promise Academy more so than most”).

Its “What should we learn from the Promise Academy?” series (begun today) looks interesting; it is elaborating on the latter point by highlighting all the different ways in which this school is unusual.

We feel that these concerns are valid, and expressed similar concerns ourselves (here and here). However, given the weak results from past rigorous studies of education, we still feel that the results of this study bear special attention (and possible replication attempts).

Teaching to the test?

Aaron Pallas’s post on Gotham Schools raises the most interesting and worrying concern that I’ve seen.

In the HCZ Annual Report for the 2007-08 school year submitted to the State Education Department, data are presented on not just the state ELA and math assessments, but also the Iowa Test of Basic Skills. Those eighth-graders who kicked ass on the state math test? They didn’t do so well on the low-stakes Iowa Tests. Curiously, only 2 of the 77 eighth-graders were absent on the ITBS reading test day in June, 2008, but 20 of these 77 were absent for the ITBS math test. For the 57 students who did take the ITBS math test, HCZ reported an average Normal Curve Equivalent (NCE) score of 41, which failed to meet the school’s objective of an average NCE of 50 for a cohort of students who have completed at least two consecutive years at HCZ Promise Academy. In fact, this same cohort had a slightly higher average NCE of 42 in June, 2007. [Note that the study shows a huge improvement on the high-stakes test over the same time period, 2007-2008.]

Normal Curve Equivalents (NCE’s) range from 1 to 99, and are scaled to have a mean of 50 and a standard deviation of 21.06. An NCE of 41 corresponds to roughly the 33rd percentile of the reference distribution, which for the ITBS would likely be a national sample of on-grade test-takers. Scoring at the 33rd percentile is no great success story.

One possible interpretation is that cheating occurred on the higher-stakes tests, but this seems unlikely since performance was similarly strong on lower-stakes practice tests (specifics here). Another possible interpretation is that Harlem Children’s Zone teachers focused so narrowly on the high-stakes tests that they did not teach transferable skills (as Mr. Pallas implies).

We haven’t worried much about the “teaching to the test” issue to date, if only because so few interventions have shown any impact on test scores; at the very least, raising achievement test scores doesn’t appear to be easy. But this is a major concern.

Another possible interpretation is that stronger students were absent on the day of the low-stakes test, for some irrelevant reason – or that Mr. Pallas is simply misinterpreting something (I’ve only read, not vetted, his critique).

Bottom line

We know that the Fryer/Dobbie study shows an unusually encouraging result with unusual rigor. We don’t know whether it’s found a replicable way to improve important skills for disadvantaged children.

We feel that the best response to success, in an area such as this one, is not to immediately celebrate and pour in funding; it’s to investigate further.

Related posts:

“Did it happen?” and “did it work?”

You donate some money to a charity in the hopes that it will (a) carry out a project that (b) improves people’s lives. In order to feel confident in your donation, you should feel confident in both of these.

In most areas of charity, we feel that people overfocus on “did it happen?” relative to “did it work?” People often worry about charities’ stealing their money, swallowing it up in overhead, etc., while assuming that if the charity ultimately uses the funds as it says it will, the result will be good. Yet improving lives is more complicated than charities generally make it sound (see this recent post of ours). This partial list of failed programs is made up entirely of programs that appear to have been carried out quite competently, and simply didn’t improve the lives of clients.

In international aid, the relative importance of “did it happen?” grows for a couple of reasons:

  • International charities work far away and often in many different countries at once. It often isn’t feasible for their main stakeholders (Board members, major donors, etc.) to check that projects are being carried out.
  • International charities are working within foreign political systems, cultures, etc. Materials can be stolen or misappropriated en route. Locals can take advantage of their superior knowledge and “game the system.”
  • Many of the activities international charities carry out are proven to work (though many are not as well). Using insecticide-treated nets will reduce risk of malaria (more); an appropriate drug regimen will cure tuberculosis (more); vaccinations will prevent deadly diseases (writeup forthcoming). These claims have been proven and are essentially not subject to debate. This is not the case in the developed world – most of the programs charities work on have not been shown to improve outcome measures of health, standard of living, etc. (See, for example, this guest blog post.)

“Did it happen?” is a question that can largely be answered by informal, qualitative spot-checks. That’s why we would like to see more and better qualitative evidence. By contrast, to know whether a program worked, you need to somehow compare what happened to clients with what would have happened without the program – something that is often hard to have confidence in without formal outcomes tracking and evaluation.

Therefore, we believe that the role of site visits, qualitative evidence, spot-checks, etc. is likely more important in international giving than in domestic giving. In international aid, delivering proven programs (particularly medical ones) is a large part of the battle. In the U.S., most reputable charities are probably doing what they say they’re doing; the question is whether what they’re doing is effective.

Funding research

At some point we’d like to investigate the idea of donating to research projects. Non-profit-motivated research is credited with many large and meaningful successes, both in medical areas (most recently, the development of a rotavirus vaccine) and in other areas (most notably, the Green Revolution).

There are serious concerns when donating in this area. For example, a recent working paper by Austan Goolsbee (h/t Greg Mankiw) argues that the additional dollar of U.S. research funding doesn’t lead to more research – only to higher salaries for the people already in the area.

One of the big challenges, it seems to me, will be coming up with ways to make good/educated guesses at which areas of research are funded to the point of diminishing returns, and which aren’t.

Key resources will be surveys such as the relatively new public survey of research & development funding for “neglected diseases” (h/t Malaria Matters), defined as diseases that predominantly affect the developing world. Data like this could allow a “rough cut” at which areas are over vs. under-funded: look at the proportion of dollars to the death toll, or DALY burden, etc.

Of course such a heuristic can’t capture the whole picture – certain diseases may be costlier to investigate, there may be more promising paths on some than on others, etc. Is there a better way to get at this question?

Two worlds

Why are people so excited about one study of one charter school showing improved performance on math tests? (Our coverage of the study here).

It’s because in academic circles, improving academic performance is seen as an extremely thorny problem with a very long list of past failures. (See pages 1-2 of the paper for an overview.) The very strong default assumption is that an education program will fail to improve performance. To the point where a one-time, one-standard-deviation bump in math scores is considered (by David Brooks) to be a “miracle.”

But you’d never know it from the world of education philanthropy. Attend any fundraiser or read any annual report and all you’ll hear is stories of success.

There’s a similar split between two worlds in international philanthropy. Academics nearly all stress the challenges, the frustrations, and the sense that progress hasn’t matched expectations. Talk to a charity and you’ll hear “success, success, success.”

Many people are incredulous that we recommend so few charities. I can only guess that that’s because they’re coming from the world of fundraisers, where every charity is assumed to be a success. In our world, “recommended” is the exception, not the default.

Where I stand on education, my former favorite cause

Education used to be my favorite cause. My enthusiasm waned as I saw both the cost-effectiveness of international aid and the apparent futility of education. (Elie’s 2007 post captures many of my thoughts.) The study that I’ve been blogging about today (here and here) provides a firmer grounding for our optimism about high-intensity charter schools, and challenges the idea that there aren’t good opportunities for donors in education.

However, I’m still not ready to prioritize education again, personally. One of the things that surprised me most in studying education was not just the difficulty in finding programs that could improve academic performance, but also the complete lack of rigorous evidence that education is key to later life outcomes. I would be fascinated to see a rigorous study of how the students who benefit from excellent charter schools perform later in life – in terms of income, job satisfaction, criminal records, etc. Without evidence, I’m not convinced that raising a child’s math score raises their life prospects, especially in a way that goes beyond “signaling” (i.e., allowing them to outcompete other people due to a superior credentials).

Would putting every child in America in a good school that makes sure they can do math lead to a much better society? I used to assume it would; I’m no longer so sure, and recent information doesn’t change that.

For now, I’m going to wait and see. I’d like to see the academic reaction the Fryer and Dobbie paper on the Harlem Chilren’s Zone. If others agree about the rigor and significance of its findings, I’d like to see who steps forward to continue replicating and examining this approach. The Social Innovation Fund would seem to be one strong candidate.

In the meantime, I’m going to be putting my own money into programs that are proven and replicable and don’t have enough funding – things like tuberculosis control and distribution of insecticide-treated nets.

Perhaps, at some point, I will feel that there is an education program that meets all three of these criteria as well. At that point I may start giving to it, even if it’s many times as expensive per person as developing-world aid.

Related posts:

Fryer and Dobbie on the Harlem Children’s Zone: Significance

My last post summarized a very recent paper by Fryer and Dobbie, finding large gains for charter school students in the Harlem Children’s Zone. (You can get the study here (PDF)).

I believe that this paper is an unusually important one, for reasons that the paper itself lays out very well in the first couple of pages: a wealth of existing literature has found tiny, or zero, effects from attempts to improve educational outcomes. This is truly the first time I (and, apparently, the authors) have seen a rigorously demonstrated, very large effect on math performance for any education program at all.

This study does not show that improving educational outcomes is easy. It shows that it has been done in one case by one program.

The program in question is a charter school that is extremely selective and demanding in its teacher hiring (see pages 6-7 of the study), and involves highly extended school hours (see page 6). Other high-intensity charter schools with similar characteristics have long been suspected of achieving extraordinary results (see, for example, our analysis of the Knowledge is Power Program). This study is consistent with – but more rigorous than – existing evidence suggesting that such high-intensity charter schools can accomplish what no other educational program can.

Those who doubt the value of rigorous outcomes-based analysis should consider what it has yielded in this case. Instead of scattering support among a sea of plausible-sounding programs (each backed with vivid stories and pictures of selected children), donors – and governments – are now in a position where they can focus in on one approach far more promising than the others. They can work on scaling it up across the nation and investigate whether these encouraging effects persist, and just how far the achievement gap can be narrowed. As a donor, the choice is yours: support well-meaning approaches with dubious track records (tutoring, scholarships, summer school, extracurriculars, and more), or an approach that could be the key to huge gains in academic performance.

Reasons to be cautious

The result is exciting, but:

  • This is only one study, and the sample size isn’t huge (especially for the most rigorous randomization-based analysis). It’s a very new study – not even peer-reviewed yet – and it hasn’t had much opportunity to be critiqued. We look forward to seeing both critiques of it and (if the consensus is that it’s worth replicating) the results of replications.
  • Observed effects are primarily on math scores. Effects on reading are encouraging but smaller. Would this improved performance translate to improvements on all aspects of the achievement gap (either causally or because the higher test scores are symptomatic of a general improvement in students’ personal development)? We don’t know.
  • Just because one program worked in one place at one time doesn’t mean funding can automatically translate to more of it. Success could be a function of hard-to-replace individuals, for example. Indeed, if the consensus is that this program “works,” figuring out what about it works and how it can be extended will be an enormous challenge in itself.
  • With both David Brooks and President Obama lending their enthusiasm fairly recently, the worthy mission of replicating and evaluating this program could have all the funding it needs for the near future. Individual donors should bear this in mind.

Related posts: