Criteria for Evaluating Programs - 2009-2011

We have published a more recent version of this page. See the most recent version of the page describing our criteria for grantmaking.

A note on this page's publication date

The content on this page has not been recently updated. This content is likely to be no longer fully accurate, both with respect to the research it presents and with respect to what it implies about our views and positions.

Published: November 2010

This page outlines the major criteria we used, from 2009-2011, in the initial formation of our list of priority programs in international aid.

We sought out programs that were associated with past demonstrated success in improving people's lives - i.e., have a track record, and/or were recommended by experts who explicitly seek to compare programs and identify the most promising ones. As discussed below, we have placed less weight on the latter criterion over time, due to issues that have emerged with the methodologies used for these comparisons.

Table of Contents

Track record

We have a strong preference for programs that have had demonstrable past success in improving lives - a preference we believe is particularly appropriate for individual donors seeking to help people they ultimately know very little about. The evidence we consider falls into two categories:

  • "Micro" evidence refers to evidence from relatively contained, small-scale studies of a program's effects. We prefer to rely on randomized, controlled evaluations (more on this approach on the Poverty Action Lab website and in Duflo and Kremer 2008); we use the term "high quality" to refer to evaluations that use this basic approach. When necessary, we also sometimes utilize non-randomized evaluations, but generally focus more on "high quality" studies.

    A well-designed "micro" study can leave very little doubt about a program's effects at a particular time and place. On the other hand, it leaves questions about how the results of a small, carefully executed program would translate on a larger scale - in new environments, often with less oversight, and with other concerns as well. For example, a study may find that people with loans earn more than people without loans, but this could be a simple matter of competitive advantage of those already receiving loans - it may be that extending loans to everyone would not make an area better off on net.1

  • "Macro" evidence refers to evidence from programs carried out on a large scale (regional, national, or multinational) without separating people into "treatment groups" and "control groups." "Macro" evidence inherently does not come from carefully controlled studies, and so it can always potentially be wrongly attributing impact to a program - for example, one might observe that child mortality fell sharply following the introduction of a vaccination program, when in fact other factors (such as generally improving standards of living) had more to do with the decline. However, we consider a "macro" story to be an important indicator that a program can work on a large scale.

We have the highest confidence in programs that are supported by both types of evidence - programs that have been rigorously demonstrated to be effective on a small scale, and that have been associated (even if loosely) with larger-scale successes.


Below are sources we've found particularly useful in assessing programs' track records.

Cochrane Library

The Cochrane Library2 publishes reviews of the evidence for healthcare interventions, focusing on high-quality "micro" evaluations (as defined above). We have found that its reports generally review a large number of studies and are very clear about the findings, strengths and weaknesses of these studies. For health programs, when there are often many high-quality evaluations available, we therefore use Cochrane as our main source of information on "micro" evidence when possible.


The Abdul Latif Jameel Poverty Action Lab (J-PAL) and Innovators for Poverty Action (IPA) are groups that focus on the use of randomized (what we call "high quality") evaluations of developing-world anti-poverty programs. We believe that their lists of working papers,3 which include many evaluations that were not directly carried out by the organizations themselves, represent a significant proportion of the available high-quality "micro" evidence on relevant non-health-centered programs.

Millions Saved: Proven Successes in Global Health

Millions Saved: Proven Successes in Global Health (cited here as Levine et al. 2004) is a publication by the Center for Global Development with 20 case studies on large-scale success stories; it has been cited multiple times as an example of success stories in international aid.4 It is not a complete list,5 and we have not relied on it exclusively, but it is the only compilation of large-scale success stories we have found (in global health or in other areas) that is relatively clear about its information sources and about its criteria.6 We have done some vetting of its conclusions and feel that it is generally a fairly reliable source.7 It is our primary source for "macro" evidence.

Recommendations from comparative analyses

Since we seek to identify the most promising programs, we have sought out experts who explicitly seek to compare interventions - using reasonably clear, consistent, and transparent criteria, and without the aim of advocating for a particular charity. We have essentially only found one suitable source (as the authors for the below three publications overlap heavily); In 2011, we began to find major issues with the dominant methodology used by these sources (explicit cost-effectiveness analysis), and we have accordingly lowered the weight we place on these sources. We welcome referrals to any suitable sources we may have missed.

Disease Control Priorities Report

Disease Control Priorities in Developing Countries, 2nd edition is a publication produced for the World Bank (and in collaboration with the World Health Organization) by over 300 contributors.8 It provides information on a variety of health programs, discussing how they're implemented, potential risks and concerns, and estimated cost-effectiveness.

This report often gives references on the effectiveness of an intervention, but in our experience the quality of such references (as defined above) varies widely, and the references given are rarely as comprehensive or clearly described as they are in Cochrane reviews. In addition, in 2011 we investigated one of this report's cost-effectiveness figures and found major errors that raised general concerns. In brief, it appears that this source (a) uses simplified methodologies, such that minor variations in real-world conditions may cause major divergences from its estimates; (b) does not appear to have a strong process for providing "reality checks" on its estimates (and does not publish the details of its analysis such that outsiders can provide such "reality checks"); (c) is now substantially out of date (it was published in 2006, and much of the data it draws on is older).

We use this publication as a source of general information and expert opinion rather than as a clear indicator of a program's track record or cost-effectiveness.

Copenhagen Consensus

We have examined the work of the 2008 Copenhagen Consensus, a panel of experts that explicitly aimed to answer the question: "Imagine you had $75bn to donate to worthwhile causes. What would you do, and where should we start?"9

The Copenhagen Consensus lists 30 ranked areas to focus on.10 Its scope is broader than ours, covering a variety of topics. We read the papers that were directly relevant to our focus areas11 and flagged endorsed programs.

We have found that the Copenhagen Consensus analysis tends to focus heavily on cost-effectiveness estimates, with little discussion of the differentials in track records between different programs. We have placed less weight on this source as we have become more concerned about the weaknesses of cost-effectiveness estimates.

Copenhagen Consensus 2008 challenge paper on diseases

The paper on diseases for the 2008 Copenhagen Consensus is authored by Dean T. Jamison, Prabat Jha, and David Bloom. It explicitly aims12 to build on the work of the Disease Control Priorities Report (about which more above). It presents its own list of ranked priorities, which differ (in rank order) from the Copenhagen Consensus master list,13 as well as several unranked priorities. We have noted its conclusions separately.

Like the Copenhagen Consensus, Jamison, Jha, and Bloom (2008) focuses largely on cost-effectiveness analysis; it mentions but does not compare the track records of interventions. We have placed less weight on this source as we have become more concerned about the weaknesses of cost-effectiveness estimates.


  • 1

    For more on this idea, see Rodrik 2009 and Deaton 2009.

  • 2

    Cochrane Collaboration, "Cochrane Reviews."

  • 3
    • Poverty Action Lab, "Publications."
    • Innovations for Poverty Action, "Publications."

  • 4

    Examples: Moss, Pettersson, and van de Walle 2006, Pg 5; Easterly 2009, Pg 407.

  • 5

    See Center for Global Development, "Frequently Asked Questions."

  • 6

    Center for Global Development, "What Is Success?"

  • 7

    Our notes from vetting this publication were sent to our public e-mail list, see GiveWell, "Vetting 'Millions Saved'."

    See also GiveWell, "Analysis of a Success Story: Implementation of the DOTS Strategy in China," for a more detailed analysis of one case study.

  • 8

    Jamison et al. 2006, Pgs iv, xxv-xxxiv.

  • 9

    Copenhagen Consensus Center, "The Experts."
    Quote from Copenhagen Consensus Center, "The Basic Idea."

  • 10

    Copenhagen Consensus Center, "Copenhagen Consensus 2008."

  • 11

    The directly relevant papers from this project are: Horton, Alderman, and Rivera 2008; Jamison, Jha, and Bloom 2008; King, Klasen, and Porter 2007; and Whittington et al. 2008.

  • 12

    See Jamison, Jha, and Bloom 2008, Pg 3.

  • 13

    Jamison, Jha, and Bloom 2008, Pg 51. For the Copenhagen Consensus master list, see Copenhagen Consensus Center, "Copenhagen Consensus 2008." To give one example of a difference in priority, Jamison, Jha, and Bloom (2008) ranks tuberculosis treatment first, while the master list ranks it below three of the other interventions listed in Jamison, Jha, and Bloom (2008) (expanded immunization coverage; heart attack acute management; malaria prevention and treatment).