Futility of standardized metrics: An example

This post is more than 15 years old

We often hear calls to “standardize metrics” so that nonprofits’ outcomes can be compared directly to each other. As an example of why we find this idea unpromising, I’d like to review some of our work on the first cause we ever investigated: employment assistance in NYC.

We received 19 applications from employment assistance programs in NYC. 7 applicants were able to provide relatively long-term data on job placements and retentions. We initially hoped to compare these outcomes to each other and get some sense of which organization was delivering the most “jobs created per dollar.” It didn’t work out (to put it mildly).

Breakdown #1: youth vs. adults

The HOPE Program, Highbridge Community Life Center, St. Nick’s and CCCS serve unemployed/underemployed adults; Covenant House, Year Up and The Way to Work (formerly Vocational Foundation) explicitly focus on “disconnected youth.” It was immediately clear to us that we would have to subdivide the organizations, and could not directly compare something like Year Up to something like The Hope Program, since the challenges and opportunities are so different for a youth seeking a “first job” vs. a struggling adult.

Breakdowns beyond the mission statements

The three “youth” organizations listed above may appear similar, if you’re going only off of their basic mission statements.

Organization	Mission statement
Covenant House	Through our job training programs, homeless teens can gain skills in a specific vocation and also learn what they need to know about job hunting and the professional world. We also give them interview clothes. Job training programs include courses in the culinary arts, desktop publishing, healthcare, public safety, computer skills, woodworking, and more.
Year Up	Year Up’s mission is to close the Opportunity Divide by providing urban young adults with the skills, experience, and support that will empower them to reach their potential through professional careers and higher education.We achieve this mission through a high support, high expectation model that combines marketable job skills, stipends, internships, college credit, a behavior management system and several levels of support to place these young adults on a viable path to economic self-sufficiency.
Way to Work (formerly Vocational Foundation)	At The Way to Work we are committed to empowering young New Yorkers ages 17-24 with the tools needed to achieve their highest potential. Formerly known as the Vocational Foundation, Inc. or VFI, we have created lasting impact through our comprehensive, individualized approach to career training, GED preparation, professional and personal counseling, job placement and retention services.

But when we got into the details of how the different organizations recruit and select clients, it became clear that these three organizations cannot at all be compared in an apples-to-apples way.

Year Up, for example, not only requires a high-school degree (or GED) of all applicants, but conducts a competitive application process and – from the data we looked at – accepted fewer than 1/3 of applicants. (Details.) By contrast, Covenant House asserts that over half its clients have dropped out of school by tenth grade. Year Up places a substantially higher portion of its clients in substantially better-paying jobs than Covenant House, but given the differences in whom they serve, shouldn’t this be expected regardless of the impact of the actual employment assistance programs?

What about Way to Work (formerly Vocational Foundation)? It’s clear that the organization is far less selective than Year Up, as only 23% of its clients have a high school degree or GED, and there does not appear to be a “competitive” process (i.e., willing applicants’ being turned away). However, there does appear to be a good deal of “self-selection” going on – 2/3 of clients drop out early in the program. (Details.) We have no directly comparable data with which to compare this organization’s clients with Covenant House’s: we have 75% in “public housing” vs. 53% “homeless” (staying in Covenant House shelter) and 77% with no high school degree (or GED) vs. 50%+ having dropped out of high school by 10th grade. Covenant House has lower placement rates, and we would guess that it is serving the more challenged population, but we can neither verify nor quantify the extent to which it is.

The importance of differences in target populations

At one point we had hoped to – and attempted to – use Census data to figure out how important the differences in target populations were. This proved futile as anything but a super-rough contextualization: the Census data can only be narrowed down in certain ways, and we only had certain information about target populations, and there was no way to really make them match up. However, for this post I pulled together some data on 1999 wage earnings (focusing on the percentage of relevant people who earned more than $20k in 1999) to give a sense of how much small differences can matter.

Among 18-24 year olds no longer in school, 27% of those with a high school degree (only) made $20k+; only 17% of those without a high school degree did. People with higher degrees made far more.
Among 18-24 year olds no longer in school with a high school degree, earnings varied substantially by neighborhood. 59% of those in the same area as the Covenant House office (Chelsea/midtown) made $20k+, while 27% of those in the same area as the Year Up and Way to Work offices (Financial District) made $20k+.

Details here (XLS)

Apples to oranges

Looking over our overview table for finalists in this cause, it becomes clear how many differences make it impossible to compare their outcomes directly. Their programs target different populations, have different requirements (often varying significantly even within a single charity, which may offer different programs), and train them in different skills. The adult programs have even clearer differences than the youth programs. If St. Nick’s places 2/3 of its clients in Environmental Remediation Technician jobs while Highbridge places just under half in Nurse Aide jobs … what does this tell us about how the two compare?

I wouldn’t be ready to ascribe meaning to a direct comparison of job placements between two charities unless the two charities were working in the same region, with the same requirements, similar actual client populations (in terms of age, educational attainment, etc.) and essentially the same program (since self-selection effects would be different for different programs). Even then, something as simple as a difference in advertising techniques could cause differences in the target populations, differences that could swamp any effects of the programs.

Outcomes vs. impact

What if we could know, for each charity, not just how many clients they placed in jobs, but how many they placed in jobs who wouldn’t have gotten such jobs without the charity’s assistance?

If we had this information, I’d be more ready to compare it across charities. But this information is impact – what Tactical Philanthropy has called the “holy grail” of philanthropy. It’s extraordinarily rare for a charity even to attempt to collect reliable evidence of impact (more).

Our current approach is to seek out the few charities that can give at least somewhat compelling evidence of impact, and recommend them, with the quantification of outcomes and cost-effectiveness as a secondary consideration.

It is simply not feasible to compare charities across large sectors (something like “Employment assistance for disconnected youth in New York City”) in an apples-to-apples way. Even if the charities collected all the information we would like, the fundamentals of their programs and target populations would have to reach an unrealistic degree of similarity.

Comments

Steve Goldberg on May 12, 2010 at 12:27 pm said:

Holden, your blog prompts me to ask, Why would we want to compare nonprofits’ outcomes directly to each other? Is that the way anyone makes funding decisions? Do we want to take a bunch of ostensibly similar nonprofits, rank them in order of impact from 1 through n, and then fund them in order? Or would we like to identify a group of well-run, effective nonprofits and a group of poorly-run, less effective nonprofits and spend our time figuring out which of the former we’d like to support? How about four such groups: excellent, good, average, and poor? Maybe I want to help a “good” nonprofit that’s near me or helps a group I care about rather than an “excellent” one farther away or serving a different population.

It would be a very good thing if people used performance metrics to narrow down their choices and stop supporting demonstrably weak nonprofits and collectively gave more to growth-ready organizations that have some meaningful evidence of impact. As the Hewlett Foundation recently observed in its excellent whitepaper, “The Nonprofit Marketplace: Bridging the Information Gap in Philanthropy,” “Reallocating just 10 percent of the current $300 billion annual fund flow to the best performers would have a similar effect as raising billions in new funds—with nowhere near the same cost in fundraising time and energy.” That’s not futile at all.
J. S. Greenfield on May 12, 2010 at 3:33 pm said:

At the risk of pulling this back toward a prior discussion on “fairness,” I find it interesting that the Tactical Philanthropy post you cite seems to argue for criteria that is the opposite of the criteria GiveWell espouses.

That is, they seem to be arguing that focusing solely on results/impact is an unreasonable standard, because 1) it is too biased toward mature charities because of the inherent inability to reasonably measure impact in the near-term, and to a lesser degree, because 2) it fails to distinguish between charities that have been lucky (and are unlikely to be able to sustain such), vs. those that are high-performing (and are likely to be able to sustain such).

They seem to argue that, instead, one should be betting the team, and near-term recognizable high-performing behaviors, based on the expectation that high performing people/programs will have the best chance at achieving the highest impact, in the long run.

I’m a bit confused since you cite the Tactical Philanthropy post seemingly in concurrence with it, yet it seems counter to GiveWell’s approach and philosophy of charity evaluation.

Do I misread that? Or how do you reconcile such?
Holden on May 14, 2010 at 7:46 am said:

Steve, what I’m saying is that I don’t see a meaningful way to use outcomes data to distinguish between charities at all. Year Up, VFI and Covenant House have wildly diverging rates of job placement, but I don’t think this divergence can be used to come to any meaningful conclusions about the difference in effectiveness between the three.

I’m not saying that there’s no way of distinguishing between charities; I’m saying that comparing outcomes data is a dead end.

J.S., I linked to the Tactical Philanthropy post only to make the point that evidence of impact is considered rare. I didn’t mean to imply an agreement with the post as a whole (in fact we criticized that post and related ones at the time). That said, I think Sean intended to argue for funding high-performance charities as a way to get more impact in the end, not for reasons of “fairness.”
J. S. Greenfield on May 15, 2010 at 8:35 am said:

It definitely wasn’t and isn’t my desire to debate the semantics of the term “fairness” again. My mention of such was intended merely as an acknowledgment that my comment was taking the discussion a bit off-topic, to the domain of another recent discussion. 😉
Bill Huddleston on May 20, 2010 at 12:04 pm said:

Geography Matters

That a lot of other fundamentally common sense observations appear to be missing from many of these discussions about non-profit effectiveness.

If the non-profit serves a children or youth population, the physical location of the non-profit will be an important consideration, and indeed, at times may be the most important consideration. This is on a micro level, e.g. in the greater Washington DC no one is going to transport a child from the Route 1 corridor in Alexandria to a youth program in Silver Spring (opposite sides of the Washington Beltway)much less on a national level.

Regards,
Bill Huddleston
The CFC Coach
BillHuddleston1 at gmail dot com

P.S. In terms of workplace giving fundraising, such as the Combined Federal Campaign, geography matters there too.
Kevin Rafter on May 21, 2010 at 6:55 pm said:

It seems to me that you’re focused on the organizations outputs (employment assistance) but what you want to “buy” are social outcomes (fewer unemployed adults). Each of these organizations has different approaches to the same essential challenge. You may not be able to find simple measures to grade them on, but you can conduct an analysis of which combination of strategy and execution lead to better outcomes.
Holden on May 24, 2010 at 11:18 am said:

Bill, I completely agree. Geography can be one of many factors that makes apples-to-apples comparisons impractical.

Kevin, I agree with you as well and that was largely the point I was trying to make – that we need context-sensitive analysis rather than standardized metrics.

Comments are closed.

Enter search terms here.

Search form

The GiveWell Blog

Comments