Published: November 2019
- Principle 1: Put significant weight on our cost-effectiveness estimates.
- Principle 2: Consider additional information about an organization that we have not explicitly modeled.
In theory, our recommendations are maximizing for one thing: total improvement in well-being per dollar spent. This is what our cost-effectiveness estimates, referenced in Principle 1, intend to capture.
In practice, there are costs and benefits that we do not observe and are not estimated in our models. In accordance with Principle 2, we make qualitative assessments to account for these unmodeled costs and benefits. These qualitative assessments help us prioritize among top charities when they have similar modeled cost-effectiveness.
We share our qualitative assessments and how we arrive at them below.
- Qualitative assessments of top charities
- What we include in our quantitative model and what we assess qualitatively
- What are proxies for the characteristics above that we can observe?
- How do we assess performance on each of these proxies?
- How we may be wrong
Qualitative assessments of top charities
We use four designations to indicate our assessment of the relative performance of our top charities on qualitative dimensions, listed below from strongest to weakest:
- "Stands out": the strongest designation
- "Relatively strong"
- "Relatively weak": the weakest designation
We believe our top charities are exceptional relative to the majority of organizations. When we refer to a top charity as "relatively weak" on an element below, we do so in the context of its existing standing as a top charity and strength on other dimensions of our review process. In other words, these assessments are intended to capture differences among GiveWell top charities, rather than absolute rankings among all charitable organizations.
|Against Malaria Foundation||Evidence Action's Deworm the World Initiative||The END Fund's deworming program||GiveDirectly||HKI's VAS program||Malaria Consortium's SMC program||SCI Foundation||Sightsavers' deworming program|
|Responses to our questions||Average||Relatively strong||Relatively weak||Relatively strong||Relatively strong||Relatively strong||Relatively weak||Relatively weak|
|Prioritization discussions||Average||Relatively strong||Average||Average||Average||Relatively strong||Relatively weak||Average|
|Self-improvement and attitude toward mistakes||Average||Average||Relatively weak||Stands out||Average||Relatively strong||Relatively weak||Average|
|Role in field||Average||Relatively strong||Average||Stands out||Average||Relatively strong||Average||Average|
|Fundraising||Average||Average||Stands out||Relatively strong||Average||Average||Average||Relatively weak|
|Responsiveness||Relatively weak||Relatively strong||Relatively strong||Relatively strong||Average||Relatively strong||Average||Relatively strong|
|Giving us feedback||Average||Stands out||Average||Relatively strong||Average||Relatively strong||Relatively weak||Average|
|Quality of information shared||Relatively weak||Relatively strong||Average||Relatively strong||Relatively strong||Relatively strong||Relatively weak||Average|
What we include in our quantitative model and what we assess qualitatively
In this section, we share some context on what we include in our cost-effectiveness model. We then discuss the features of our top charities' work that can't be readily measured but that we believe affect the charities' true impact, and that we will use to compare our top charities when they have similar modeled cost-effectiveness.
What we include in our cost-effectiveness model
In deciding which inputs to include in the model versus which to consider outside of the model on a qualitative basis, we consider the impact that they may have on the bottom line and the feasibility of estimating them quantitatively.
Our model includes a number of inputs that are based on educated guesses in the absence of clear data. In these cases, we believe that our decision-making will be improved by using our best guess relative to leaving the inputs out and considering them qualitatively.
As one example, we estimate the likelihood that a key deworming study would find the same results if it were conducted again under the same conditions. We can't empirically measure this, but the strength of the deworming evidence is an important input into our understanding of the cost-effectiveness of deworming charities. We thus include in our model an educated guess based on the information we have access to (the study itself, which we can analyze and determine how high-quality it seems).
Characteristics of charities' work that affect true impact per dollar but are not included in our cost-effectiveness model
The factors we include in our qualitative assessments fall outside the scope of our cost-effectiveness model because we feel that either (a) they are not possible to quantitatively estimate in a reasonable way, or (b) the time it would take to collect the information necessary to make a reasonable quantitative estimate would be disproportionate to how large of a difference the factor would make to the modeled cost-effectiveness.
Below, we describe three characteristics of our charities' work that are not fully captured in our cost-effectiveness model. This list is illustrative rather than comprehensive. We don't account for these characteristics directly but instead consider charities' performance via eight proxy metrics (included in the table above and discussed in the next section).
Allocation of funding
Charities may allocate funding among different locations and program participants based on considerations that we don't capture in our cost-effectiveness model.
- Our model uses estimates of disease burden by country. Charities may have access to better information about the true disease burden in the communities that participate in the program within a country and prioritize funding based on that information.
- Our model generally uses past costs in a country or across the program as a whole to estimate future costs in that location. Charities may have more accurate information about how their costs may be different in the future. For example:
- We would expect future costs to be lower if:
- the charity previously paid startup costs that will not be repeated.
- the charity paid a fixed cost that will be spread over a larger group of program participants in the future.
- We would expect future costs to be higher if:
- the charity plans to expand to a harder-to-reach population.
- We would expect future costs to be lower if:
- Charities may account for risks that our model does not, such as the likelihood of withdrawal of government support for the program or rising security concerns in their area of operation.
- Charities may account for benefits that our model does not, such as work in one country being more likely than work in another country to lead to a government taking over the costs of the program in the future.
Quality of implementation
There are aspects of the quality of the implementation of a program that our cost-effectiveness model doesn't capture.
- The quality of interactions between program participants and program staff may have long-term consequences for the costs and uptake of delivering the same or similar interventions in the future.
- The quality of communications about the program with participants can affect, for example, whether participants receive maximal benefit from the intervention (e.g. whether they consistently use an insecticide-treated net to prevent malaria).
- The equity of the charity's distribution of the intervention could, in some cases, affect whether the individuals who can most benefit are reached and can affect whether the program causes jealousy/conflict/distrust in the local community.
- Charities' decisions may affect local markets for talent or goods to varying degrees.
Additional positive impact
Charities may have positive impacts outside of our model of their delivery of their programs.
- Conducting and disseminating research that improves other actors' decisions.
- Raising funds from donors who would have otherwise spent that funding on something less cost-effective.
- Building government capacity that carries over into other programs.
- Providing assistance to partners that increases those partners' impact outside of the specific intervention.
- Creating a model for a program that other entities copy.
What are proxies for the characteristics above that we can observe?
For some of the elements of each characteristic above, we could seek out information to more directly understand how well the charity performs and update our cost-effectiveness model to incorporate that information. For example, we could fund work to collect more precise data on disease burdens, interview charities' partners about the quality of their interactions with the charities, or investigate the impact charities' research has had on other actors' decisions. In some of these cases, this is work we aim to make progress on in the future—though we expect our answers to continue to be incomplete due to the challenges in measurement of these factors. In other cases, getting direct information is infeasible or prohibitively expensive.
Where we have not observed or will not observe the organizational features discussed above, we rely on information by proxy: differences we've observed in how charities operate and communicate with us. For each of our top charities, we've subjectively answered the following questions:
- Responses to our questions: When we ask the charity a question, do its answers generally either indicate that it has thought through the question before or show us why getting an answer is not important to understanding its work?
- Prioritization discussions: Do the charity's explanations about how it allocates funding among different locations and program participants seem to be aimed at maximizing its impact per dollar? Is the charity consistent in what it says about how it prioritizes among different locations and program participants and is it able to clearly explain any changes in its approach?
- Self-improvement and attitude toward mistakes: Does the charity proactively share information with us and publicly about mistakes it has made? Has the charity designed systems to alert it to problems in its programs and has it made changes based on information from those systems? Has the charity experimented with ways to improve its impact?
- Role in field: Is the charity producing research aimed at informing policymakers or other implementers? Does it participate in global conversations about its field of work?
- Fundraising: Does the charity raise funding from donors who we think would have otherwise supported less cost-effective opportunities?
- Responsiveness: Does the charity send us information by mutually agreed-upon deadlines? Is it responsive to our emails?
- Giving us feedback: Does the charity catch our mistakes and let us know, thus improving our research? Does the charity make useful suggestions for how we could improve our research process and cost-effectiveness models?
- Quality of information shared: Have the documents that the charity has shared with us contained significant errors? Has the charity told us things that were inaccurate? Has the information provided been easy to interpret and use? Have the charity's projections of when it would achieve its goals generally been accurate?
How do we assess performance on each of these proxies?
Below, we share some illustrative examples of areas in which charities we've reviewed appeared to stand out, be relatively strong, be average, and be relatively weak on the proxies listed above. We don't provide an example for each performance rating on each proxy, but we hope those we do provide give a sense of how we consider these factors.
As noted above, we use four categories to indicate the relative performance of our top charities on these proxies, listed below from strongest to weakest:
- "Stands out": the strongest designation
- "Relatively strong"
- "Relatively weak": the weakest designation
These categories refer to the relative performance among our top charities rather than absolute rankings among all charitable organizations. We believe our top charities are exceptional relative to the majority of charitable organizations.
Responses to our questions
Our conversations with Malaria Consortium's seasonal malaria chemoprevention (SMC) Program Directors have stood out on this proxy and we view Malaria Consortium overall as relatively strong on this proxy. The Program Directors have consistently had knowledgeable and nuanced views on questions such as the pros and cons of methodological choices in coverage surveys, the funding landscape for SMC, logistical challenges to scaling up in various locations, global drug supply for SMC, and modifications to program implementation to account for new information or changing circumstances (e.g. a change in the timing and length of the rainy season attributed to climate change).
Seeming relatively weak on this proxy means that we are often unable to get satisfying answers to questions about the organization's decisionmaking.
Stand-out performance on this proxy may include using a model to compare the cost-effectiveness of different programs that the organization is considering extending or starting, or may include rating opportunities for scale-up on factors that are believed to correlate with cost-effectiveness.
Seeming relatively weak on this proxy may include inconsistency in how the organization describes its prioritization process or increasing spending without increasing expected output when additional funding becomes available, without providing a clear explanation for this choice.
Self-improvement and attitude toward mistakes
GiveDirectly stands out on this proxy. It has conducted a large number of experiments aimed at improving its impact. It has on several occasions shared information publicly about mistakes it made and how it responded to them, such as when it uncovered cases of its staff stealing from cash grant recipients. These cases also indicate that it has strong monitoring processes that have detected problems in the past.
Seeming relatively weak on this proxy may include very few, if any, past cases of proactively sharing mistakes, very few, if any, examples of experiments to improve impact, and relatively weak systems for detecting problems in program implementation.
Role in field
GiveDirectly stands out for the role it has played in raising the profile and understanding of cash transfers in global development. Our impression is that GiveDirectly plays a major role in its field through the research it produces and its participation in public discussions around cash transfers. Policy influence is a major goal for GiveDirectly and many of its recent decisions are geared at informing policy decisions.
We've considered lack of evidence for a prominent role in the field as "average" performance rather than "relatively weak."
The END Fund stands out for its track record at raising funds for mass drug administration to treat neglected tropical diseases, including deworming. We understand that several major donors have funded the END Fund who were not previously giving significantly in this space. This case could use further vetting through discussions with the END Fund and its funders.
Seeming relatively weak on this proxy may include a limited track record of bringing in new sources of funding. If a charity implements multiple programs, another action we would rate as "relatively weak" is reacting to the availability of GiveWell-directed funding for the program of interest by reallocating flexible funding away from that program.
New Incentives, a GiveWell Incubation Grant recipient, stands out for responsiveness. It consistently communicates clearly about timelines for sharing information and responds to our questions and requests in a timely manner, and its responses fully answer our questions. In addition, it has made it easy for us to access data about its program on demand through shared sources.
Seeming relatively weak on this proxy may include frequently missing mutually agreed-upon deadlines, not responding to emails in a timely way (particularly after multiple follow-ups), and not providing full responses to our requests (e.g. answering one question when an email has multiple questions).
Giving us feedback
Evidence Action's engagement with our room for more funding analyses of its Deworm the World Initiative stands out for the helpfulness of its feedback. It has raised both broad concerns with our approach (e.g. how we dealt with cases where we had a different prioritization of locations than Evidence Action and how we approached Evidence Action's plans to build up reserve funding) and found specific errors in our spreadsheets (here's an example).
Seeming relatively weak on this proxy may include very few, if any, past cases of providing feedback on our work and demonstrating lack of familiarity with our processes after multiple years of engagement.
Quality of information shared
Seeming relatively strong on this proxy includes (a) sharing written materials that are consistently accurate and easy to understand (with occasional errors), and (b) being well-calibrated in describing plans and expectations for the future (having a track record of accurate predictions about timelines on which work will be completed).
Seeming relatively weak on this proxy may involve (a) sharing information on several occasions that contained errors that were difficult to detect and/or had implications for program management (in other words, information that indicated that the organization was using inaccurate information internally that may have affected the operation of the program), and (b) lacking a track record of accurate predictions about timelines on which work will be completed.
How we may be wrong
While we find the above assessments to be helpful in prioritizing funding gaps with similar modeled cost-effectiveness, we think it's important to consider how our qualitative assessments may be inaccurate. A major risk is that the proxies we discuss above may be poor predictors of the underlying characteristics for which they are intended to stand in.
In addition, our assessment of how each top charity compares on each proxy may be biased. Some specific types of bias that we are concerned about:
- Halo/horn effect. Our assessments on the different proxies may be unreasonably influenced by each other. A positive opinion of a charity on one proxy may subconsciously lead us to assess the charity positively on another proxy. We've tried to counteract this by writing out lists of examples for each charity for each proxy (we plan to share a version of this at a later date), but in some cases we don't have examples to point to and have just recorded an overall feeling. Read more about this effect here.
- Unsystematic collection of examples. There are two ways in which this could cause our assessments to be incorrect:
- Our framework for making qualitative assessments is new. In some cases, we haven't asked questions that would inform our views. For example: What activities has an organization undertaken to influence its field and what happened as a result?
- We have relied on memory and revisiting a small, nonsystematic selection of past materials we've received from charities, with a focus on recent materials, to inform our assessments.
We have received a large number of documents and have had many conversations with our top charities over the years. We have not cataloged examples of the proxies above from these materials and are certainly missing from our lists past examples that could affect our overall assessments.
In the future, we aim to systematically track new examples that may update our views on these proxies in order to develop a more reliable information set.
The remaining five principles are outside the scope of this page.