The GiveWell Blog

Not everyone under-evaluates …

Fundraisers seem to do a phenomenal job.

Somehow, you don’t see fundraisers making a lot of arguments that “Money spent on evaluation means less letters mailed out” or “Evaluation is difficult and you can never really isolate causality perfectly.” Instead, you see them testing. And testing. And testing. And learning things that are far from obvious. And testing again.

Maybe it’s because they’re the ones the nonprofits rely on to stay in business. Program, on the other hand, doesn’t have to be as good as it can be … unless we demand it.

Literature reviews

All I was trying to say in the last post could be summed up like this:

This is an awesome literature review about early child care programs. It describes how the author found all the papers she discusses … it is totally straightforward about what the methodological strengths and weaknesses of each paper are … and, hold on to your seats – the tables on pages 218 and 223 can only be described as “rock star.” At a glance, you can see all the studies under discussion, who they looked at, how they looked, and what they found.

The literature review I discussed on Saturday is nothing of the sort. It’s unclear about study design, it makes broad claims whose support is unclear, and of course, there are no awesome tables.

If only people were as determined to take a tough look at microlending as they are at Head Start. But of course, Head Start is politics; it affects us all; it’s important; it’s difficult; it’s controversial; it needs to be argued about. Microlending is charity, so it’s none of those things. That all checks out, right?

Microlending: The mystery deepens

Goodness, this post is long and dry. The headline is: I read the paper everyone points to as “hard evidence of microfinance’s effectiveness,” and I came out with tons of questions and a need to visit the library. I’ve learned nothing about how microlending works (is it financing investment? Smoothing consumption? What kinds of activity is it making possible?), and all of the data for how well it works leaves me with about 1000 methodological concerns, possibly just because how vaguely the studies are described.

The paper is published by the Grameen Foundation and available here.

Concerns about bias

As in education, selection bias is a major concern in evaluating microfinance. If Person X is ready to take out a loan, confident that she will pay it back, while person Y isn’t – who would you bet on, even if no loan is given? The Coleman study described on pages 20-21 gives an excellent illustration of this issue. It’s the only study in the paper that uses what I would call the “ideal” design: inviting twice as many participants to a program as it has slots, a year in advance, and then choosing the participants at random. The study found that participants in the credit program were generally wealthier than non-participants, but that once you controlled for this, the program didn’t appear to make them any better off.

The review author points out that the study was done in Thailand, which already has a government-sponsored subsidized credit system. So we agree that the paper doesn’t tell us much about the impact of microlending in general … but it does show the perils of selection bias, and I’ve led with it because this problem affects so many of the other studies.

The worst are the studies – and there are many – that simply compare living standards for clients vs. non-clients, without examining whether the two groups may have been different to begin with. These could easily simply be showing the same effect as the study above: borrowers are wealthier before they ever borrow. Most of the studies discussed early in the paper look likely to have this problem (or at least the description doesn’t make it clear how they deal with it): Hossain 1988 (page 16), SEWA study (24-25), Mibanco study (26 – it isn’t entirely clear who’s being compared to whom, but all the differences discussed are between client and non-clients with no discussion of where they started out), ASHI study (27), and the FINCA studies (28). The last two look at changes in incomes, not just incomes, but if one group started off wealthier, I’d think they’d be more likely to increase their incomes too (regardless of any help from charities).

Incoming-client comparisons: still a long way from easing my mind

The vast majority of the studies discussed try to get around this problem by comparing incoming clients to existing clients. This seems better than simply comparing clients to non-clients in the same region: incoming clients presumably share most qualities of existing clients, aside from the fact that they haven’t yet benefited from microcredit. But, this test is still miles from rigorousness. Page 7 points out a couple potential problems with it – eager borrowers may differ from “wait and see” types, and more importantly (to me), MFIs may loosen their admission standards over time, which would mean that incoming clients are systematically worse off than existing clients for reasons that have nothing to do with the benefits of microloans. And then, of course, there’s just the fact that times change. For example, if microloan programs systematically attract people above a certain income level (as the Coleman study implies), and an economic boom makes everyone wealthier, you’ll see existing clients (originally the only people wealthy enough to enter the program) doing better than new clients (who have just now become wealthy enough).

A relatively simple (imperfect) way to adjust for all this would be to compare incoming clients both to existing clients today, and to those same existing clients at the time they entered the program. This would at least check whether existing clients were systematically better off to start with. Here’s the bad news: if the studies discussed do this, the literature review very rarely mentions it. It occasional points to a divergence between existing clients and incoming clients on some random quality like age (see page 36), rural/urban (36), schooling (33), etc., implying that the two groups are often not very comparable … but the review is generally short on the details, and in almost every case does not address the issue I’ve highlighted here.

Nearly all of these “incoming-clients” studies show significant positive differences between existing and incoming clients, implying that microfinance has improved lives for its clients, if these concerns are addressed. But I’m going to have to check out the papers myself before I feel very convinced.

Here’s what we’re left with:

These are all of the studies discussed that appear to address the concerns above in any way:

  1. A 2004 study of a Pakistan program (33-34) compared clients to non-clients starting at similar levels of income, and showed a much larger increase in income for clients (though both experienced huge increases – it was ~30% to ~20%). This doesn’t account for “motivation bias” (the “optimism and drive” that taking out a loan may indicate), but at least it’s looking at people who started in about the same place.
  2. A similar study was done on a Bosnia/Herzegovina program in 2005 (35-36), again showing much larger income gains for participants in the programs, and again adjusting for starting income though not for the “optimism and drive” bias.
  3. The Second Impact Assessment of the BRAC program (29) compared changes in clients vs. non-clients; between 1993-1996, the % clients with a sanitary latrine went from 9 to 26, while the % non-clients with a sanitary latrine went from 10 to 9. The latrine variable is the only one where the paper makes clear that they started in the same place, and the rest of the discussion of the study seems to imply that they were pretty different to start with, in other ways.
  4. Page 21 claims that Gwen Alexander “recreated” the design of the randomized Coleman study I led off with, using the same dataset that a bunch of the other studies were working off. It’s totally unclear to me how you recreate a randomized-design study using data that didn’t involve randomized design, and the paper doesn’t fill me in at all on this.
  5. Finally, pages 17-20 discuss a back-and-forth between two scholars, in which the details of what it means to own a “half acre of land” – as well as a debate over a complicated, unexplained methodology that actually appears to be called “weighted exogenous sampling maximum likelihood-limited information maximum likelihood-fixed effects” – appear to make the difference between “microfinance is phenomenal” and “microfinance accomplishes nothing.” The part I’m most interested in is the final paper in this series, Khandker (2005) (discussed on page 19), which draws incredibly optimistic conclusions (crediting microfinance with more than a 15% reduction in poverty over a 6-year period). Unfortunately, the review gives no description of the methodology here, particularly how all the concerns about bias were addressed: all it says is that the methodology was “simpler” than whatever wacky thing was done in the first paper.

Bottom line:

So, bottom line: we have 3 studies (the first three above) showing promising impacts at particular sites, though they were not done by independent evaluators and may suffer from both “publication bias” (charities’ refraining from publishing negative reports) and the “optimism/motivation bias.” We have 2 studies that the review claims found great results with a rigorous methodology, but its description leaves the details of this methodology completely unclear. And, we have a host of studies that could easily simply have been observing the phenomenon that (relatively) wealthier people are more likely to take advantage of microlending programs.

My conclusion? We have to get to the library and read these papers, especially the ones that are claimed to be rigorous.

Conclusion of the review? “The previous section leaves little doubt that microfinance can be an effective tool to reduce poverty” (see page 22 of the 47-page study – before 80% of the papers had even been discussed!) And in the end, that’s why I’m so annoyed right now. This paper does not, to me, live up to the promise it makes on page 6, to “[compile] all the studies … and present them together, in a rigorous and unbiased way, so that we could finally have informed discussions about impact rooted in empirical data rather than ideology and emotion” (pg 6). It covers some truly low-information studies (like the first set I discussed) while presenting them as evidence for effectiveness; it discusses the most important studies without giving any idea of how (or whether) they corrected for the most dangerous forms of bias. It calls the fact that a paper stimulated debate, rather than unanimity, “unfortunate” (18). It’s peppered throughout with excited praise (like the quote above, and the super-annoying parenthetical on page 29). In the end, I don’t feel very inclined to take its claims at face value until I hit the library myself.

Without either the detail I need or a tone I trust, I don’t feel very convinced right now that microlending improves lives, especially reliably. (I’d put it around 63%.) I’m surprised that this is the paper everyone points to; I’m tempted to say that if more people read it, as opposed to just counting the pages and references, something clearer and more neutral would have become available by now.

Averages

Averages really annoy me. Average income, average test score, etc. When we’re talking about any kind of analysis of people, I have a hard time thinking of any case where you should be looking at the average of anything.

I much prefer “% of people above some threshold” type measures: % of students who graduated in 4 yrs or less, % of students scoring at proficiency level 3 or higher, % of families earning $20k/yr or less. This kind of metric is about 1.2x as complicated, and 2000x as meaningful, as an average.

Just thought you’d like to know.

Experience vs. data, or, why I just muted the Yankees game

So I’ve been watching the ballgame, and it struck me how much sports announcers have impacted my outlook on charity. I can explain.

The most common form of “evidence” we get from charities goes something like this: “We don’t have the data, but we’re here, every day. We work with the children, personally. We’ve been doing this for decades and we’ve accumulated a lot of knowledge that doesn’t necessarily take the form of statistics.”

Put aside, for a minute, the fact that we get that same story from all 150 charities we’re deciding between (all of which presumably think their activities are most deserving of more funding). There’s another problem with the attitude above, one that occurs to me every time I hear Michael Kay announcing a baseball game. In sports, unlike in charity (and really unlike in most things, which is why I find it an interesting case study), the facts are available – and when you look at them, you realize just how little that “on the ground” experience can be worth.

The fact is that baseball announcers and sportswriters spend their entire lives watching, studying, and thinking about sports. Many of them are former athletes who have played the game themselves. They are respected, they are paid to do what they do, and they are more experienced (i.e., they’ve seen more) than I’ll ever be in my life. And yet so many of them truly know absolutely nothing.

“Jeter’s a whole different player in October,” says Mr. Kay (demonstrably false). “You don’t want young pitchers carrying you in the playoffs.” (Comically false – 3 of the last 5 World Series champions had rookie closers.) I’m not giving any more examples – this post would hit 30,000 words in a heartbeat. But I’m happy to refer you to sources that give 2-3 examples per day of seasoned professionals – who’ve spent their whole lives on this stuff – saying things that are obviously, intuitively, factually, empirically, demonstrably, completely wrong.

It hits me over and over again, and I still haven’t quite gotten used to it. My only explanation is that humans have an incredible ability to ignore what they actually see, in favor of (a) what they expect to see (b) what they want to see. Now when I talk to an Executive Director or Development Officer whose life consists of running a charity and whose livelihood depends on convincing people that it’s the world’s best way to help people … I don’t know how much these factors cloud their judgment. Maybe not at all, in some (truly amazing, borderline inhuman) cases. But when they assure me that outcomes data isn’t necessary because they’ve been doing this for years, forgive me for having trouble swallowing this: I can’t help but think of Michael Kay, a man who’s done very little with his life but watch the Yankees, and still manages to know nothing about them.