Published: 2009; Last updated: 2013

One of our core requirements for recommending a charity is that it have evidence of effectiveness. This page discusses the importance of this issue, as well as what we do and don't consider to be a demonstration of impact.

Note that the contents of this page relate to charities which we will consider for our top-charity rankings. This page does not cover GiveWell Labs.

Why demonstrated impact is important

As the following sections illustrate, generalizable evidence of effectiveness can be extremely expensive and difficult, and there may be many charities that are doing good work despite the fact they do not work on evidence-backed programs.

We do not feel that the donors who use our top-charity rankings can reasonably be confident in a charity that isn't working on an evidence-backed program – no matter how well-intentioned the charity or how appealing its activities. For illustrations of the many challenges of international aid and the many ways in which seemingly reasonable programs can fail, see our discussions of:

  • Education programs, which often have disappointing results when thoroughly evaluated – and may be having little impact even when they accomplish their immediate goals. More.
  • Agricultural support, which has a history of being poorly adapted to local conditions. More.
  • More examples of failure in international aid here.

Within the international aid community, calls for more and better measurement and evaluation are near-universal, as we discuss on our blog here. By donating only to charities that demonstrate their impact, you can not only have more confidence in your donation, but also play a part in creating appropriate incentives for charities. We believe that donors are too often uncritical, giving to charities based on very little meaningful information, with the result that charities are encouraged to continue executing – and not examining – programs of unknown effectiveness.

Anecdotes vs. representative information

In our experience, the most common way in which charities argue their effectiveness is by sharing moving stories of individuals.1 However, it is generally unclear (a) how these stories were selected; (b) how representative they are of the charity's results as a whole; (c) the extent to which creative framing and emphasis might play a role in emphasizing the positive aspects of the story while passing over the negative.

We seek evidence of impact that appears to be systematically collected (with as little opportunity for editorial slant as possible) and representative of overall results (which generally means collecting information on a large number of people rather than on a few).

In our experience, such evaluation generally takes the form of technical reports summarizing at least partly quantitative data, though we see no inherent reason that qualitative data can't also be shared in a systematic and credible way. We have also discussed this topic on our blog, here.

What constitutes "impact"?

We have a high standard for evidence: we seek out programs that have been studied rigorously and repeatedly, and whose benefits we can reasonably expect to generalize to large populations (though there are limits to the generalizability of any study results).

Our priority programs are listed on this page. They have the following characteristics:

  • The evidence for their effectiveness appears to have relatively high external validity and thus generalizability: it is relatively clear which components of the program are important for effectiveness, and thus we expect a higher-than-usual chance of being able to meaningfully assess a charity's impact when it focuses on these programs.
  • They appear to be potentially highly cost-effective.

Different programs aim for different sorts of life change, and must be assessed on different terms. We do not hold to a single universal rule for determining what "evidence of effectiveness" we're looking for; rather, what we look for varies by program type. Within the scope of international aid (which includes developing-world health, education, and economic empowerment), we consider the following to be evidence of impact.

For health programs: we often impose a lower burden of proof, because of the large number of health interventions with extremely strong evidence bases. For example, many vaccines have been thoroughly and rigorously tested, to the point where successful delivery of vaccinations can be reasonably assumed to result in improved health outcomes. In general, we require evidence that (a) medical treatments are administered appropriately; (b) health-related supplies (such as condoms and insecticide-treated nets) are used appropriately and consistently by beneficiaries and; (c) health-related behavior change programs succeed in changing behavior over the long term. We accept evidence of improved health outcomes (lowered incidence/prevalence of diseases; drops in death rates; etc.) as well. More detail on what we look for in different health programs – particularly the most promising ones – is available here.

For economic empowerment programs including microfinance, agriculture, business support, and other programs aimed at raising incomes: we require evidence either that (a) wealth is being transferred to low-income people (including strong evidence that recipients are low-income, not wealthier citizens taking advantage of handouts); (b) operations are being created – and have been created in the past – that can cover their expenses with revenues over time; or (c) programs are causing improvements in clients' incomes and standards of living. We discuss these criteria and the reasoning behind them more thoroughly here. Note that we do not consider information about loans made and repayment rates to be sufficient.

For infrastructure projects, including provision of clean water: we require – at a minimum – evidence that any infrastructure improvements are maintained over the long term, given aid's history of abandoned infrastructure projects. See our discussions of transportation infrastructure here and water infrastructure here.

For water projects in particular, we require a higher burden of proof, because there is reason to believe that even successful provision of clean water may not significantly impact health (if unaccompanied by other improvements in hygiene). We require evidence of impact on diarrhea incidence (or other health measures) in order to be confident in a clean water program.

For education programs: at a minimum, we require evidence regarding students' attendance rates, grade promotion/completion rates, or academic performance/aptitude as measured by exam scores. All of these metrics have major limitations. See our discussion on limitations of attendance here, and of test scores here. We ideally wish to see evidence regarding job opportunities, earnings, and/or health as well. Developing-world schooling appears to be a relatively thinly studied area, so there is no program we can feel confident in without such information (more here).

What does it take to demonstrate impact?

The challenge of demonstrating causality

We have found that even in the (relatively rare) cases where charities share the sort of information discussed in the previous section, they still stop short of providing convincing evidence that they caused positive changes. A case in point is Mothers2Mothers (discussed here), which highlights an impact study in which program participants were found (based on survey data) to have had higher drug adherence rates than non-participants (93% vs. 83%). To us, there is a clear alternative explanation for this result: the people who had chosen to participate in a program centered on drug adherence were already – independent of any program impact – more likely to be highly motivated to adhere to their drug regimens. This general issue is known as selection bias; it is one of the common problems we see with evaluations of programs' impact on people's lives.

Generally, we find the strongest type of formal evaluation to be a randomized controlled trial,2 but can be persuaded by a variety of different forms of evidence, on a case-by-case basis. Whenever discussing impact, we try to be clear about how we are assessing the question of whether a charity's program caused any observed improvements.

Ensuring that studies are representative

A single study – no matter how persuasive and rigorous – is not necessarily evidence that your donation can create similar effects in the future. Many charities work on many different types of projects, in many different areas. We find it important to consider whether any encouraging results might be (a) a simple fluke; (b) applicable only to the strongest and most successful programs, not to a charity's activities as a whole.

We believe that evaluation is most compelling when it is conducted on a "spot-check" basis, i.e., with every major project having an equal chance of being selected for evaluation. We seek – generally through discussions with charities' representatives – to determine the extent to which any evidence of effectiveness we have is representative vs. biased toward the positive.

Negative and offsetting impact

There are many ways in which charities' programs may cause harm, offsetting their positive impact partially or fully. Unfortunately, we have found essentially no hard evidence about the prevalence and significance of the following concerns; we simply note that they may conceptually be concerns. When discussing a charity (or program type), we generally include a section on potential negative/offsetting impact, state what little we know, and hazard a guess as to the significance of the risk.

  • Many charities give funding, or other support, to developing-world governments. Such support may conceptually lower these governments' accountability to their citizens, and may even ultimately help bad governments to do more harm (particularly by freeing up health funding and thus allowing more military expenditure). This is a particular concern for charities that directly give cash to governments, and particularly governments with questionable human rights records.
  • On the flip side, charities whose programs are highly independent may interfere with, or end up replacing, services from both the government and the private sector. Charities providing free or below-market-price items run some risk of causing this sort of harm.
  • All charitable programs run a risk of diverting skilled labor from other productive pursuits. We cover this issue in some detail in our discussion of developing-world surgery, here. The salaries that charities pay locals are determined not by local demand, but by donors; it is therefore our (and your) responsibility to determine that the activities they're hired for are more beneficial to the population than the activities they would be carrying out otherwise.

    We have found that information on this concern is extremely thin. In general, we believe that the more a charity relies on highly skilled local labor, the greater the risk that it is creating small (or zero or negative) impact, as it simply switches skilled professionals from one useful, helpful job to another.

Sources

  • 1.

    Examples:

    • Save the Children, "Baby Joyce's Story: A Life Saved." Save the Children front page linked to this page as of July 6, 2009.
    • Heifer International, "When I Remember the Tsunami, I Still Cry." Heifer International front page linked to this page as of July 6, 2009.
    • Grameen Foundation, "Akosua, Ghana." Grameen Foundation front page linked to this page as of July 6, 2010.
  • 2.

    See J-PAL, "Overview."