- Top charities
One of our core requirements for recommending a charity is that it have evidence of effectiveness. This page discusses the importance of this issue, as well as what we do and don't consider to be a demonstration of impact.
As the following sections illustrate, generalizable evidence of effectiveness can be extremely expensive and difficult, and there may be many charities that are doing good work despite the fact they do not work on evidence-backed programs..
We do not feel that the donors who use our top-charity rankings can reasonably be confident in a charity that isn't working on an evidence-backed programs – no matter how well-intentioned the charity or how appealing its activities. For illustrations of the many challenges of international aid and the many ways in which seemingly reasonable programs can fail, see our discussions of:
Within the international aid community, calls for more and better measurement and evaluation are near-universal, as we discuss on our blog here. By donating only to charities that demonstrate their impact, you can not only have more confidence in your donation, but also play a part in creating appropriate incentives for charities. We believe that donors are too often uncritical, giving to charities based on very little meaningful information, with the result that charities are encouraged to continue executing – and not examining – programs of unknown effectiveness.
In our experience, the most common way in which charities argue their effectiveness is by sharing moving stories of individuals.1 However, it is generally unclear (a) how these stories were selected; (b) how representative they are of the charity's results as a whole; (c) the extent to which creative framing and emphasis might play a role in emphasizing the positive aspects of the story while passing over the negative.
We seek evidence of impact that appears to be systematically collected (with as little opportunity for editorial slant as possible) and representative of overall results (which generally means collecting information on a large number of people rather than on a few).
In our experience, such evaluation generally takes the form of technical reports summarizing at least partly quantitative data, though we see no inherent reason that qualitative data can't also be shared in a systematic and credible way. We have also discussed this topic on our blog, here.
We have a high standard for evidence: we seek out programs that have been studied rigorously and repeatedly, and whose benefits we can reasonably expect to generalize to large populations (though there are limits to the generalizability of any study results).
Different programs aim for different sorts of life change, and must be assessed on different terms. We do not hold to a single universal rule for determining what "evidence of effectiveness" we're looking for; rather, what we look for varies by program type. Within the scope of international aid (which includes developing-world health, education, and economic empowerment), we consider the following to be evidence of impact.
For education programs: at a minimum, we require evidence regarding students' attendance rates, grade promotion/completion rates, or academic performance/aptitude as measured by exam scores. All of these metrics have major limitations. See our discussion on limitations of attendance here, and of test scores here. We ideally wish to see evidence regarding job opportunities, earnings, and/or health as well. Developing-world schooling appears to be a relatively thinly studied area, so there is no program we can feel confident in without such information (more here).
For economic empowerment programs including microfinance, agriculture, business support, and other programs aimed at raising incomes: we require evidence either that (a) wealth is being transferred to low-income people (including strong evidence that recipients are low-income, not wealthier citizens taking advantage of handouts); (b) operations are being created – and have been created in the past – that can cover their expenses with revenues over time; or (c) programs are causing improvements in clients' incomes and standards of living. We discuss these criteria and the reasoning behind them more thoroughly here. Note that we do not consider information about loans made and repayment rates to be sufficient.
For infrastructure projects, including provision of clean water: we require – at a minimum – evidence that any infrastructure improvements are maintained over the long term, given aid's history of abandoned infrastructure projects. See our discussions of transportation infrastructure here and water infrastructure here.
For water projects in particular, we require a higher burden of proof, because there is reason to believe that even successful provision of clean water may not significantly impact health (if unaccompanied by other improvements in hygiene). We require evidence of impact on diarrhea incidence (or other health measures) in order to be confident in a clean water program.
For health programs: we often impose a lower burden of proof, because of the large number of health interventions with extremely strong evidence bases. For example, many vaccines have been thoroughly and rigorously tested, to the point where successful delivery of vaccinations can be reasonably assumed to result in improved health outcomes. In general, we require evidence that (a) medical treatments are administered appropriately; (b) health-related supplies (such as condoms and insecticide-treated nets) are used appropriately and consistently by beneficiaries and; (c) health-related behavior change programs succeed in changing behavior over the long term. We accept evidence of improved health outcomes (lowered incidence/prevalence of diseases; drops in death rates; etc.) as well. More detail on what we look for in different health programs – particularly the most promising ones – is available here.
We have found that even in the (relatively rare) cases where charities share the sort of information discussed in the previous section, they still stop short of providing convincing evidence that they caused positive changes. A case in point is Mothers2Mothers (discussed here), which highlights an impact study in which program participants were found (based on survey data) to have had higher drug adherence rates than non-participants (93% vs. 83%). To us, there is a clear alternative explanation for this result: the people who had chosen to participate in a program centered on drug adherence were already – independent of any program impact – more likely to be highly motivated to adhere to their drug regimens. This general issue is known as selection bias; it is one of the common problems we see with evaluations of programs' impact on people's lives.
Generally, we find the strongest type of formal evaluation to be a randomized controlled trial,2 but can be persuaded by a variety of different forms of evidence, on a case-by-case basis. Whenever discussing impact, we try to be clear about how we are assessing the question of whether a charity's program caused any observed improvements.
A single study – no matter how persuasive and rigorous – is not necessarily evidence that your donation can create similar effects in the future. Many charities work on many different types of projects, in many different areas. We find it important to consider whether any encouraging results might be (a) a simple fluke; (b) applicable only to the strongest and most successful programs, not to a charity's activities as a whole.
We believe that evaluation is most compelling when it is conducted on a "spot-check" basis, i.e., with every major project having an equal chance of being selected for evaluation. We seek – generally through discussions with charities' representatives – to determine the extent to which any evidence of effectiveness we have is representative vs. biased toward the positive.
There are many ways in which charities' programs may cause harm, offsetting their positive impact partially or fully. Unfortunately, we have found essentially no hard evidence about the prevalence and significance of the following concerns; we simply note that they may conceptually be concerns. When discussing a charity (or program type), we generally include a section on potential negative/offsetting impact, state what little we know, and hazard a guess as to the significance of the risk.
We have found that information on this concern is extremely thin. In general, we believe that the more a charity relies on highly skilled local labor, the greater the risk that it is creating small (or zero or negative) impact, as it simply switches skilled professionals from one useful, helpful job to another.
See J-PAL, "Overview."