The GiveWell Blog

Microfinance/education program didn’t work as expected

A reader was good enough to send in a Lancet article (free registration required for full text) about a well-designed study of a combination microfinance/education program in South Africa.

Study design, strengths and weaknesses

A program consisting of both loans and group meetings was rolled out to 8 villages in rural South Africa, but the villages were randomly split into 4 that received it right away and 4 that received it 3 years later. Meetings included a curriculum that “covered topics including gender roles, cultural beliefs, relation ships, communication, intimate-partner violence, and HIV, and aimed to
strengthen communication skills, critical thinking, and leadership” (pg 1975).

Researchers hypothesized that (a) women in the loan groups would have fewer experiences of intimate-partner violence (presumably due to being financially/culturally more empowered); (b) this in turn would be connected with less unprotected sex in their households; (c) this in turn would slow the spread of HIV in their villages. A very ambitious theory of how to slow the spread of HIV – but to the researchers’ credit, they specified their hypotheses formally before conducting the study, as well as registering it on ClinicalTrials.gov. Combined with the use of randomization, this study had just about all the ingredients for avoiding the plague of publication bias.

A problem with the study, which the researchers partially acknowledge (pg 1981), is that it was only conducted in 8 villages total (4 receiving the program and 4 not receiving it). Therefore, it’s hard to say with confidence that any observed differences were due to the program as opposed to other differences between one randomly chosen set of 4 villages and another. Villages were similar on most observable characteristics, but very different on a few (see pg 1980).

Results

The study concludes that the program resulted in less intimate-partner violence, but not in less safe sex or in slowing the spread of HIV.

A few possible interpretations of this result:

  • The researchers’ interpretation is that the program was responsible for reductions in violence, but that these simply didn’t translate into slowing the spread of HIV. Definitely a possibility. (If this is right, by the way, I’d call this a great program solely on the basis of its successfully reducing intimate-partner violence. That would be a great accomplishment in its own right, even if it didn’t have the hoped-for effect on the spread of HIV.)
  • It’s also possible that the program had no effect, and that the observed change was a change in reported episodes of violence. Perhaps women who participated in the program came to feel more shame about reporting these episodes. (It’s also possible that the measurement error is in the other direction – that women in the program felt more pressure to report episodes, and that the fall in violence was greater than what was measured. This is the researchers’ theory, given on pg 1982.)
  • And it’s possible that random fluctuations simply swamped any effects of the program itself. As mentioned above, it examined only 8 villages; and there was definitely a lot else happening in these villages over the time period in question. For example, the unspecified measure of “greater food security” had a huge rise across all villages studied, whether or not they received the program (see pg 1980). I can’t help but wonder: if this had been a more typical (less rigorous) study without a comparison group, would this increase in food security have been touted as a success of the program?

The one thing I feel fairly sure of after reading this study is that the researchers’ elaborate, multi-step theory of how loans and education can slow the spread of HIV didn’t come out looking great when all the facts were in. For every community program that publishes a study like this (and this is one of the very few I’ve seen), there are many more similar programs, with similarly involved theories of the linkages between credit, knowledge, health, empowerment, etc. that have simply never been checked in any way.

Antiretroviral treatment (ART): Things to look out for

Antiretroviral treatment (ART) is one of the more well-publicized ways to help people in the developing world. The (RED) campaign puts it front and center, and the Gates Foundation places heavy emphasis on it as well. It seems at first glance like a fairly straightforward, if expensive, intervention: directly treat HIV-positive people with proven drugs to extend their lifespan and improve quality of life.

But the Copenhagen Consensus disease experts (also lead authors on the Disease Control Priorities report) make the case for caution (from pg 40 – emphasis mine):

  • Poor implementation (low adherence, development of resistance, interruptions in drug supplies) is likely to lead to very limited health gains, even for individuals on therapy. (This outcome is unlike that of a weak immunization program in which health gains still exist in the fraction of the population that is immunized.) Poorly implemented antiretroviral drug delivery programs could divert substantial resources from prevention or from other high-payoff activities in the health sector. Even worse, they could lead to a false sense of complacency in affected populations: evidence from some countries suggests that treatment availability has led to riskier sexual behavior and increased HIV transmission. The injunction to “do no harm” holds particular salience.
  • Unless systematic efforts are made to acquire hard knowledge about which approaches work and which do not, the likelihood exists that unsuccessful implementation efforts will be continued without the appropriate reallocation of resources to successful approaches. Learning what works will require major variations in approach and careful evaluation of effects. Failing to learn will lead to large numbers of needless deaths. Most efforts to scale up antiretroviral therapy unconscionably fail to commit the substantial resources required for evaluation of effects. Such evaluations are essential if ineffective programs are to be halted or effective ones are to receive more resources.
  • Many programs rely exclusively on the cheapest possible drugs, thereby risking problems with toxicity, adherence, and drug resistance. From the outset a broader range of drug regimens needs to be tested.

An ART program needs to use the right drugs, ensure compliance, be there for the long haul, and deal with side effects (both medical and behavioral). None of these are a given, with the (RED) campaign’s beneficiaries or anyone else, until you see the evidence that the programs are working.

And ART costs can be in the range of $600 per patient treated per year. Compare with vaccinations, which are estimated as saving lives for as little as $200 apiece, have a strong track record of success, and in many ways introduce less potential for complications.

Malaria treatment

The Disease Control Priorities Report says:

The recommended treatments for malaria in areas with resistance to single drugs are combination treatments, preferably artemisinin combination therapy (ACT) (WHO 2001a, 2001b, 2003a, 2005).

But, knowing that your charity of choice runs this program is not sufficient to know that they’re improving lives. Bill Brieger at Malaria Matters points to this article in the WSJ which says:

Cures for malaria are largely designed for adults; the pills are often bitter and too big to swallow for children, who account for most of the more than one million people killed each year by the mosquito-borne disease, malaria experts say.

Bill Brieger adds:

Three challenges that are not mentioned in the article include –

  • For one, when drugs are made available for free or at reduced cost only for children, there will be leakage into wider use as health workers or medicine shop keepers will provide multiple packets of the child drugs to satisfy their adult clients/customers.
  • A second unmentioned challenge is the tendency to overprescribe malaria drugs, especially among adults. The answer to this is case management that includes diagnosis using a laboratory, but more likely rapid diagnostic tests, which can be used at the primary care level.
  • Finally there is the issue of compliance. Artemisinin-based combination therapy generally is taken twice a day for three days. If medicine providers do not counsel clients on the need for full compliance children may swallow only a few doses and not only fail to be cured but also contribute to drug resistance.

Malaria case management is a complicated process that begins with the drug manufacturer and ends in the home. All partners along the way must be [vigilant] if children’s lives are to be saved.

Surgeries performed vs. cases of blindness prevented

We’ve written before about the possibility that surgeries to correct blindness are extremely cost effective. While summarizing the evidence of effectiveness for trachoma interventions, we’ve learned more and it’s clear that equating surgeries performed with cases of blindness prevented is plain wrong.

I read Trachoma: an overview, a literature review of the evidence of effectiveness for the SAFE Strategy, the WHO-recommended approach to trachoma control.

Matthew J. Burton, the author, reviews each of the four components of the strategy, including surgery. He writes (Pg 109, in the PDF version):

There are about 10 million people with trachomatous trichiasis (TT) worldwide who are at increased risk of developing irreversible blinding corneal opacification (CO). Surgical correction of TT probably reduces the risk of progressive CO and blindness. The indications for TT surgery vary between control programmes. Some advocate early surgery when one or more lashes touch the eye, whereas others practice epilation until more severe TT develops. As the progression of TT can be quite swift in some people, where access to ophthalmic services is limited, surgery for mild disease is a logical approach.

A major problem limiting the effectiveness of surgery is the recurrence of trichiasis following surgery, which can be as high as 40– 60%…. There can be a small improvement in vision following surgery of about a line of Snellen visual acuity.

The quote tells me that:

  1. Surgeries are performed for people who are at-risk of becoming blind, but not yet blind. The review doesn’t specify the probability that they’ll become blind.
  2. Surgeries are not performed on those already blind — trachoma-caused blindness is irreversible.
  3. There’s significant chance of recurrence, so performing a surgery is not the same as preventing the patient from ever having the condition again.
  4. There is some vision improvement for those with TT and low vision, but it’s extremely small. (According to Wikipedia, the Snellen visual acuity chart is the eye chart we’re all used to at the doctor’s office and one line is not much.

We don’t have the data to make a reasonable estimate of the cost per case of blindness prevented. We’ve tried unsuccessfully to find the percentage of those with TT who eventually become blind.

Assuming that it costs ~$20-60 to perform one surgery, and assuming 50% recurrence and that 50% of people with TT become blind, the cost per blindness averted would be $80-240. But, assuming 50% recurrence and that only 5% of people with TT become blind, the cost per blindness averted would be well over $1,000.

Addendum added by Holden: Adding a little more context on the ratio between TT infection and blindness. Estimates of total blindness due to trachoma vary a lot – this PLoS paper (table 3) puts the number around 3 million while this WHO report puts it closer to 1 million in the same year (37 million total people blind worldwide, 3.6% due to trachoma). Assuming constant prevalence of both TT and blindness would imply that TT turns to full-blown blindness 10-30% of the time, which in turn implies (using the 50% “recurrence risk” figure) that there’s 1 case of full-blown blindness averted for every 6-20 successful surgeries.

Surgeries have other benefits too, and with all the layers of uncertainty about prevalence, the 6-20 range could be off by a lot in either direction. It seems safe, though, to agree with the top-line statement that equating surgeries with “blindness prevented” would substantially overstate what you’re getting for your dollar.

Publication bias: Over-reporting good news

As we look for evidence-backed programs, a major problem we’re grappling with is publication bias – the tendency of both researchers and publishers to skew the evidence to the optimistic side, before it ever gets out in the open where we can look at it. It sounds too scary to be real – how can we identify real good news if bad news is being buried? – but it’s a very real concern.

Publication bias takes several forms:

  • Bias by publishers: journals are more likely to publish papers that “find” meaningful effects, as opposed to papers that find no effects (of a medicine, social program, etc.) A recent Cochrane review documents this problem in the field of medicine, finding a link between “positive” findings and likelihood of publication; a 1992 paper, “Are All Economic Hypotheses False?”, suggests that it affects economics journals.
  • Bias by researchers: David Roodman writes (pg 13):

    A researcher who has just labored to assemble a data set on civil wars in developing countries since 1970, or to build a complicated mathematical model of how aid raises growth in good policy environment, will feel a strong temptation to zero in on the preliminary regressions that show her variable to be important. Sometimes it is called “specification search” or “letting the data decide.” Researchers may challenge their own results obtained this way with less fervor than they ought … Research assistants may do all these things unbeknownst to their supervisors.

The effect of these problems has been thoroughly documented in many fields (for a few more, see these Overcoming Bias posts: one, two, three, four). And philanthropy-related research seems particularly vulnerable to this problem – a negative evaluation can mean less funding, giving charities every incentive to trumpet the good news and bury the bad.

How can we deal with this problem?

A few steps we are taking to account for the danger of publication bias:

  1. Place more weight on randomized (experimental) as opposed to non-randomized (quasiexperimental) evaluations. A randomized evaluation is one in which program participants are chosen by lottery, and lotteried-in people are then compared to lotteried-out people to look for program effects. In a non-randomized evaluation, selection of which two groups to compare is generally done after-the-fact. As Esther Duflo argues in “Use of Randomization in the Evaluation of Development Effectiveness” (PDF):

    Publication bias is likely to a particular problem with retrospective studies. Ex post the researchers or evaluators define their own comparison group, and thus may be able to pick a variety of plausible comparison groups; in particular, researchers obtaining negative results with retrospective techniques are likely to try different approaches, or not to publish. In the case of “natural experiments” and instrumental variable estimates, publication bias may actually more than compensate for the reduction in bias caused by the use of an instrument because these estimates tend to have larger standard errors, and researchers looking for significant results will only select large estimates. For example, Ashenfelter, Harmon and Oosterberbeek (1999) show that the there is strong evidence of publication bias of instrumental variables estimates of the returns to education: on average, the estimates with larger standard errors also tend to be larger. This accounts for most of the oft-cited result that instrumental estimates of the returns to education are higher than ordinary least squares estimates.

    In contrast, randomized evaluations commit in advance to a particular comparison group: once the work is done to conduct a prospective randomized evaluation the results are usually documented and published even if the results suggest quite modest effects or even no effects at all.

    In short, a randomized evaluation is one where researchers determined in advance which two groups they were going to compare – leaving a lot less room for fudging the numbers (purposefully or subconsciously) later.

    In the same vein, we favor “simple” results from such evaluations: we put more weight on studies that simply measured a set of characteristics for the two groups and published the results as is, rather than performing heavy statistical adjustments and/or claiming effects for sub-groups that are chosen after the fact. (Note that the Nurse-Family Partnership evaluations performed somewhat heavy after-the-fact statistical adjustments; the NYC Voucher Experiment original study claimed effects for a subgroup, African-American students, even though this effect was not hypothesized before the experiment.)

  2. Place more weight on studies that would likely have been published even if they’d shown no results. The Poverty Action Lab and Innovators for Poverty Action publish info on their projects in progress, making it much less feasible to “bury” their results if they don’t come out as hoped. By contrast, for every report published by a standard academic journal – or worse, a nonprofit – there could easily be several discouraging reports left in the filing cabinet.I would also guess that highly costly and highly publicized (in advance) studies are less likely to be buried, and thus more reliable when they bear good news.
  3. Don’t rely only on “micro” evidence. The interventions I have the most confidence are the ones that have both been rigorously studied on a small scale and have been associated with major success stories (such as the eradication of smallpox, the economic emergence of Asia, and more) whose size and impact are not in question. More on this idea in a future post.

BBB standards: Accountability or technicalities?

Yesterday we got an email from someone looking for help on where to give, noting that two of our top charities do not meet the Better Business Bureau (BBB)’s 20 standards for charity accountability.

We believe that both of these organizations are reputable, accountable, and excellent, and were surprised to hear this. After checking out the BBB reports on them, we stand by our recommendations and feel that the BBB’s reservations come from technicalities, not legitimate issues.

Population Services International (PSI): financial transparency, but not in the BBB’s format

The BBB report on PSI states that PSI meets 19 of the BBB’s 20 standards. The missing one:

the detailed functional breakdown of expenses within the organization’s financial statements only included one program service category. It did not include a detailed breakdown of expenses for each of its major program activities.

PSI responds (same page) that it has one high-level program, Social Marketing, breaking down into hundreds of sub-programs, and so chose to list only the one on the financial statement.

Is PSI stingy with financial information? We don’t think so – in fact, we think PSI stands out for its willingness to disclose meaningful, helpful information about where its money goes. In our full review of PSI, we’re able to break out its expenses not only by expense type (promotion, evaluation, materials, staff, etc.) but also by region and by product. Getting breakdowns from several different perspectives is useful for truly understanding their activities, and it’s something that many other charities can’t or won’t provide (for example, it’s common to be refused information on how much was spent in each country).

But the BBB doesn’t look at several different breakdowns – it looks only at the official audited financial statement, and apparently this one wasn’t broken out as they expected. To me this looks like a case of an organization that is more generous with financial data than most, but didn’t anticipate the BBB’s requirements on a particular form.

Partners in Health (PIH): the Board Chair is salaried

Like PSI, PIH meets 19 of the BBB’s 20 standards. The missing one (from the BBB report):

Standard 4 : Compensated Board Members – Not more than one or 10% (whichever is greater) directly or indirectly compensated person(s) serving as voting member(s) of the board. Compensated members shall not serve as the board’s chair or treasurer.

PIH does not meet this Standard since the paid chief executive officer (CEO) also serves as the chair of the board.

The Board of Directors votes on compensation, so I can see why the BBB likes to see some distance between the Board and salaried staff. But it does allow paid staff to be on the Board as long as they don’t make up too many of its votes (as spelled out above); here the problem is that a salaried member has the formal Chair position.

I don’t have a copy of PIH’s bylaws, but to my knowledge (and based on our own bylaws, which are pretty standard and available at the bottom of this page), the Board Chair is distinguished from other members by procedural responsibilities, primarily presiding over meetings. The Chair does not have the power to cast extra votes, approve compensation without voting, or anything along those lines.

It seems worth keeping in mind that PIH is the same organization whose founder has been extensively written about and is known for things like not taking vacations because of his devotion to his work. I don’t know him personally and wouldn’t presume to guarantee an organization’s ethicality, but I’m guessing that if you polled relevant people, you’d find that Partners in Health is one of the more trusted and respected nonprofits out there, and that its choice of putting its CEO as Board Chair wouldn’t give pause to anyone in the know.

Bottom line

I think the BBB’s standards are well-intentioned and that there are sound principles behind them, but ultimately, they are measuring charities’ conformance to formal technicalities. I don’t believe there’s any substitute for carefully examining a charity’s activities, using all the documentation that’s available rather than just the documentation that’s standardized (such as the audited financial statement and bylaws). We reiterate our recommendations of both PSI (full review here) and PIH (full review here).