Research on Moral Weights

Research on Moral Weights - 2019

Summary

Each year, GiveWell identifies more great giving opportunities than we are able to fully fund. As a result, in our charity recommendation decisions, we necessarily face very challenging questions, such as: How much funding should we recommend for programs that reduce poverty versus programs that reduce deaths from malaria? How should we prioritize programs that primarily benefit children versus adults? And, how do we compare funding those programs with others that have different good outcomes, such as reducing suffering from chronic health issues like anemia?

We recently received results from research we supported to help us answer these questions from the perspective of communities similar to those our top charities operate in.

Background on the project

We assess charities based on their overall impact per dollar. In order to compare the impact per dollar across programs, we assign quantitative "moral weights" to each good outcome. We have invested a significant amount of time to arrive at these weights, but we still find our conclusions unsatisfying, in large part because of the fundamental difficulty of these questions. We have worked to improve our process for valuing different outcomes over the years, but we believe our current process is far from ideal.

Moral weights seems to be a highly neglected research topic. Limited information exists on how people value different outcomes. In particular, very few researchers have asked people living in low-income countries how they would make these tradeoffs. We see this as a potentially important input into our weights but have been unable to incorporate this information because it largely did not exist.

We recently supported a project intended to help address this gap in the literature. We provided funding and guidance to IDinsight, a data analytics, research, and advisory organization, to survey about 2,000 people living in extreme poverty in Kenya and Ghana in 2019 about how they value different outcomes.

Survey results

The results from this research are now available here. Among other findings, they suggest that survey respondents have higher values for saving lives (relative to reducing poverty) and higher values for averting deaths of children under 5 years old (relative to averting deaths of individuals over 5 years old) than we had previously been using in our decision-making.

Although we see these study results as adding to our understanding, we would caution against putting too much weight on them. Research methods like those used in the survey have major limitations, discussed here. This study is one that should be put in the context of a larger literature about these questions and represents one approach to moral weights among many.

Nevertheless, we see this research as a valuable contribution to the literature on preferences and moral views in communities with high rates of extreme poverty. It seems to be the first study of its kind conducted in sub-Saharan Africa, and the people surveyed for this study had a substantially lower average consumption level than other studies using similar methods.

Preliminary conclusions and updates

We have provisionally updated our moral weights to place more emphasis on programs that avert deaths (relative to those that reduce poverty) and to value programs averting deaths at all ages more equally (relative to our previous assumption of valuing programs that avert deaths of individuals over 5 years old more highly). The direction of these updates was driven by this study and other, independent arguments for putting more weight on health relative to income. However, we have not yet thoroughly debated how to revise our framework for moral weights or fully completed our analysis of these results, so we see our current, provisionally-updated moral weights as a work in progress. We plan to revisit our framework for moral weights in the future.

These updates did not have a major impact on our recommended funding allocation to charities in 2019.

This summary has been cross-posted on the GiveWell blog.

Published: December 2019

Summary
Context for this project
How has GiveWell chosen its moral weights in the past?
Why did GiveWell support research on preferences in low-income contexts?
- Limited existing research
- Goal in supporting IDinsight
What was the research methodology?
- Approaches for asking questions
- Improving and assessing reliability of results
What did the study find?
What are the limitations of this research?
Should we defer to a survey of moral values?
What are GiveWell's preliminary conclusions from these results?
What does this mean for GiveWell's moral weights and recommendations going forward?
What are valuable next steps on moral weights research more broadly?
Sources

Context for this project

GiveWell aims to recommend charities in order to help people as much as possible with the resources that we have.

To make these recommendations, we must compare programs that are working toward a variety of different good outcomes. These comparisons are often implicit choices people make when deciding which charity to support. We make them explicit so that we can compare programs fairly, share our reasoning, and identify the opportunities to have the most impact.

Examples of some of the challenging questions that we face include:

If one has $3,000 to donate, is it better to roughly double the annual consumption (and hence standard of living) of 3 households living in extreme poverty, or to give to a malaria prevention charity in order to avert one death?
If one only has enough charitable resources to give to a program to either avert the death of one 2-year-old or one 60-year-old, which should one choose?
How valuable is it to reduce suffering from chronic health issues, such as anemia or depression, relative to averting deaths?
How valuable is it to increase autonomy through interventions like providing access to family planning relative to reducing suffering from chronic health issues?

Unfortunately, because charitable resources are limited, answering questions like these cannot be avoided.

In our cost-effectiveness analyses, we assign quantitative "moral weights" to each outcome in order to compare across causes. For example, we might assume that averting a death is equivalent to doubling consumption for 50 people for one year. Our assumptions on these topics can make a large difference to our impact. If doubling consumption is actually much better for people's welfare than we assumed, we would be missing out on the chance to direct funding where it can do the most good.

We have invested a significant amount of time to arrive at our current moral weights, but we still find our conclusions unsatisfying, in large part because of the fundamental difficulty of these questions. Over the years, we have worked to improve our process for valuing different outcomes, but we believe it is far from ideal.

How has GiveWell chosen its moral weights in the past?

Historically, GiveWell has asked our staff to submit their moral weights and used the median answer to guide our recommendations. An example of GiveWell staff's moral weight inputs is here. We expect to change our moral weights process going forward (more details below).

GiveWell staff members' weights were informed by a number of factors and methods. Over the last several years, our process has included:

Building explicit models of the various morally-relevant aspects of a given outcome. For example, when estimating the value of averting a death, some staff have considered factors such as life expectancy, how someone's degree of personhood develops as they age, the level of grief associated with death at different ages, economic contributions made by people at different ages, etc.
Researching how other actors, such as governments and the World Health Organization, make these judgments. Other actors' assumptions about moral weights often rely on empirical literature from high-income countries. See our report here.
Considering moral weights from a variety of philosophical perspectives, and getting input from philosophers and other global prioritization researchers. E.g., see this blog post and conversation notes here and here.
Asking donors who use GiveWell's research to guide their giving for their input on moral weights. We typically found that our donors were hesitant to provide answers and preferred to defer to our conclusions, though we only sought input from a small sample due to limited capacity for this project.
Reviewing relevant empirical literature from low- and middle-income countries. However, this literature was limited, as discussed below.

Why did GiveWell support research on preferences in low-income contexts?

Limited existing research

We reviewed empirical literature from low- and middle-income countries that studied topics relevant to moral weights, but we found that there was little research that we could use to inform our decisionmaking.

We are unaware of existing studies that asked people in sub-Saharan Africa how they would trade off money versus mortality risks (referred to by researchers as "stated preference value of statistical life (VSL) research"), even though there have been many such studies in high-income countries.1

We are aware of some "revealed-preference" studies in sub-Saharan Africa, where people's actions are observed and preferences are inferred from those actions. This literature seems to be relatively limited.2 We are also aware of studies that ask people about their preferences across different programs (example), but these are not attempting to directly assess how people value different outcomes.

We generally did not see much experimentation with alternative methods for learning about moral weights. This led us to be interested in working with IDinsight to pilot other methods, such as choice experiments (discussed below) and detailed qualitative interviews.

Goal in supporting IDinsight

We provided funding and guidance to IDinsight in order to help to fill this gap in the research.3 We believed this research might provide us with more context about the welfare effects of different outcomes in the communities our recommended charities operate in. We also thought that some staff and donors may want to defer, at least in part, to the surveyed values for moral weights.

We see this research as an important contribution to the literature on preferences and moral views in communities with high rates of extreme poverty. It seems to be the first study of its kind conducted in sub-Saharan Africa, and the study's sample had a substantially lower average consumption level than other studies using similar methods.4

We discuss our interpretation of the results below.

What was the research methodology?

IDinsight's final preferences report is here. Surveyors interviewed about 1,820 people living in extreme poverty in Kenya and Ghana in May to September of 2019.5

Approaches for asking questions

The major approaches that IDinsight used to ask these questions were:

A "value of statistical life" approach: "Value of Statistical Life," or VSL, is a measure often used by economists to estimate how much people are willing to pay to avert a death. The VSL methodology in IDinsight's study is intended to be comparable to VSL studies used in policymaking. This methodology roughly works as follows:6
- The surveyor tells the respondent to imagine that a hypothetical disease is affecting their community and that their risk of dying from this disease is 20 in 1,000 over the next 10 years.
- They tell them that a vaccine or medicine is available to prevent or treat the disease, and that it reduces the risk of dying from 20 in 1,000 to 15 in 1,000 over the next 10 years.
- Then, the surveyor asks for the respondent's willingness to pay for the vaccine/medicine for themselves or their child. They tell the respondent that they can pay in small installments of their choosing over the 10 years of risk reduction, in order to reduce the respondent's constraints around being able to pay for the vaccine/medicine.
A "choice experiment" approach: This method asks respondents to choose how to allocate funding to hypothetical programs that achieve different outcomes for their community. A sample question is:7
- "Program A saves the lives of 6 children aged 0-5 years AND gives $1,000 cash transfers to 5 families. Program B saves the lives of 5 children aged 0-5 years AND gives $1,000 cash transfers to 10 families. Which one would you choose?"
As part of the choice experiment approach, this survey also asks questions specifically targeted at understanding how respondents value life at different ages, such as: "Program A saves 300 lives of people aged under 5. Program B saves 500 lives of people aged 19-40. Which one would you choose?" The surveyors then vary the number of people and age ranges used in the example.8

This study focused on the above methods after testing and deprioritizing a number of other methods during pilot research.9

Improving and assessing reliability of results

IDinsight took a number of steps to improve and assess the reliability of these results, including:

Quantitative training and testing: Prior to asking the "VSL" questions above, the surveyors tested and trained respondents on their understanding of small probabilities to both improve the respondents' understanding and to screen out respondents with insufficient understanding.10 IDinsight's reported VSL results are based on the approximately 62% of the sample that demonstrated a basic understanding of math.11
Further quantitative checks from this study are in the following footnote.12
Qualitative checks: The researchers cross-checked the quantitative answers they received with respondents' qualitative reasoning to assess whether the two seemed to correspond.13 Some examples of qualitative interviews with respondents are here.
Testing sensitivity to framing: The survey asked several slight variations on the questions above in order to understand whether the framing of questions had significant impacts on the answers given.14

Finally, in order to provide additional context on respondents' answers, IDinsight also asked people about their subjective well-being (e.g. questions similar to "All things considered, how satisfied are you with your life as a whole these days?") and conducted research on the economic and emotional burden of death at different ages.15

We and IDinsight recognize that these are highly sensitive questions to ask people. We believe that IDinsight did an admirable job of thoughtfully asking difficult questions in a challenging research context. IDinsight found that people were generally willing to discuss these questions and rarely opted not to finish the survey.16

What did the study find?

In brief, the headline findings were:17

The VSL approach found that the average willingness-to-pay for a hypothetical medicine that reduces mortality risk by 5 in 1,000 was about $204 for under-5 year olds, and about $169 for over-5 year olds.18 This can be translated into overall "values of statistical life" of about $40,763 and $33,798 respectively.19
In the choice experiments focused on the value of life versus cash transfers, about 38% of respondents always chose the program that saved children's lives over any number of cash transfers offered (up to the presented maximum of $10 million), 8% of respondents always chose the program that provided any number of cash transfers over programs that saved lives, and 54% of respondents switched from the program that saved more lives to the cash transfer program when the number of cash transfers given by the alternate program was progressively increased.20 IDinsight uses this method to estimate implied values of life of $91,049 for under-5 year olds and $47,644 for individuals 5 and older.21 These estimates are very imprecise and heavily affected by the proportion of the sample that always chose programs that saved children's lives.22
In the choice experiments focused on averting deaths at different ages, respondents placed higher value on averting deaths of children under 5 than they did on averting deaths of older individuals (respondents valued averting deaths of under-5 year olds roughly 1-2x more than averting deaths of older individuals, depending on which method was used).23

IDinsight then converted these results into moral weights using additional assumptions.24 The implied moral weights, in comparison to GiveWell's previous weights, are:25

	Value of doubling consumption for one person for one year	Value of averting the death of an under-5 year old	Value of averting the death of an over-5 year old
GiveWell's August 2019 moral weights	1	47	85
VSL approach	1	143	118
Choice experiment	1	318	167

(Note on interpreting this table: the second line in the table can be read as, "According to the VSL approach in this study, it is equivalently morally valuable to double consumption for 143 people for one year as it is to avert the death of one under-5 year old. It is equivalently morally valuable to double consumption for 118 people for one year as it is to avert the death of one over-5 year old.")

What are the limitations of this research?

Although we see these study results as adding to our understanding, we would caution against putting too much weight on them. This is one study that should be put in the context of a larger literature,26 and it represents one approach to moral weights among many. We think research methods like those used in this survey have major limitations.

Major limitations of this research are discussed in the following sections.

VSL methods rely on understanding of small probabilities

VSL methods rely on respondents' understanding of small probabilities. About 34% of respondents were able to answer the following question correctly: "Which risk of death is larger: 1 in 100 or 2 in 1,000?"27 Other VSL study samples may have similarly low levels of understanding, though the evidence we're aware of is limited on this point.28 We see this as showing the lack of robustness of VSL methods in general: if people cannot correctly distinguish between risks that vary by an order of magnitude, it suggests that their answers could be substantially misinformed.

People do not seem to proportionally update their willingness to pay for very different magnitudes of risk reduction. In this survey, people were not willing to pay proportionally more for a vaccine that reduced risk of death by 10 in 1,000 than one that reduced risk of death by 5 in 1,000.29 In earlier pilot surveys (with smaller sample sizes), IDinsight found that a 30x increase in hypothetical vaccine effectiveness only led to a ~2x increase in willingness to pay.30 Because of this, if much higher risk reduction levels had been chosen as the main question for the survey, the survey likely would have found much lower estimates for the value of life.31

Answering VSL questions is extremely challenging

Even if respondents have a strong mathematical understanding, answering VSL questions is extremely challenging. Stated preference VSL methods in general, and this study in particular, ask questions that are extremely hard to know how to answer, so the answers likely are not robust. Even though we think about small probabilities often as part of our research, we find ourselves very confused about how to answer questions like "How much are you willing to pay for a 5 in 1,000 reduction in your risk of death?"

"Social desirability bias" may impact answers

Respondents may tell surveyors what they think they should say, rather than their true preference. For example, perhaps people believed they would appear callous if they opted for receiving cash transfers instead of saving lives.

Although it's challenging to tell if this is occurring, one piece of evidence that may suggest this bias is present is that the findings of this survey seem to qualitatively be at odds with the general finding that people have low willingness-to-pay for preventive medicine, which can be interpreted as a much lower value for health according to "revealed preference."32 However, we see good reason to put limited weight on revealed preference research as well: people may have incorrect beliefs about the effectiveness of preventive medicine or be subject to behavioral biases.

"Hypothetical bias" is also an issue. Respondents may not give answers that correspond to how they would behave if they were actually given these choices. For example, the questions asked in the community choice experiment are very different from choices people make in their daily lives, and if they were actually responsible for allocating community resources they may think about the relevant issues differently.

Treatment of extreme moral views

The conclusions from this study are sensitive to how we should treat relatively extreme moral views.

In the choice experiment framing, 38% of respondents did not believe that the equivalent of $10 million in cash transfers would be more beneficial than saving a life.33 Respondents sometimes discussed the "sanctity of life" and said that it could not be compared to cash.34 These respondents substantially increase this survey's results for the mean value of life.35

Additionally, this survey's estimation methods imply that community members may perceive over-40-year-olds as having net negative value.36 This finding may indicate some lack of reliability of these survey methods.

Other potential limitations of this study are referenced in the following footnote.37

Should we defer to a survey of moral values?

Apart from the challenges discussed above with interpreting the responses, it is also unclear how much weight to put on a survey of moral values in general. How much one defers to these responses might depend on:

Whether one thinks well-being is about preference satisfaction or autonomy versus an objective measure (where people's preferences might be informative but not decisive).
Whether one agrees with the moral frameworks that respondents provided for making their choices. To give an example that is especially likely to spark potential disagreement, some respondents argued that life could not be compared to cash based on moral arguments about the sanctity of life. IDinsight shared some qualitative data from the survey that indicates which frameworks were used.38
Whether one thinks survey respondents have better access to empirical facts, such as the well-being benefits of having more money, than other decision-makers.

What are GiveWell's preliminary conclusions from these results?

We have provisionally updated our moral weights to place more emphasis on programs that avert deaths (relative to those that reduce poverty) and to value averting deaths at all ages more equally (relative to our previous assumption of valuing averting deaths of individuals over 5 years old more highly). The direction of these updates was driven by this study and other, independent arguments for putting more weight on health relative to income.39 However, we have not yet thoroughly debated how to revise our framework for moral weights or fully completed our analysis of these results, so we see our current, provisionally-updated moral weights as a work in progress. We plan to revisit our framework for moral weights in the future.

We found the results about the relative value of health versus income to be surprising. These results suggested higher valuations for health than we would have expected based on a) extrapolating from other VSL research and b) apparently low willingness-to-pay for preventive health goods. We see this as a reason to believe that health is relatively more highly valued and more important to welfare in these contexts than we previously expected.

Similarly, we previously thought it was possible that people would prefer to avert deaths of adults in their communities due to their responsibilities as caregivers and their economic contributions. The results from this study suggest that program participants value averting the deaths of children more highly than adults, so we expect to put more weight on that view than we did previously.

What does this mean for GiveWell's moral weights and recommendations going forward?

We plan to more thoroughly analyze this study and revisit our framework for moral weights in the future, so we do not yet have robust conclusions on what our moral weights will be going forward.

In the future, we expect to move away from using the median staff member's moral weights in our decisionmaking and to instead have a single set of moral weights. We expect that choosing this set will involve determining what moral weights would be implied by a variety of approaches or worldviews and then taking a weighted average of those views.40

In the interim, we are planning to use the following moral weights. The direction of these updates was driven by this study and other, independent arguments for putting more weight on health relative to income.

	November 2019 moral weights	August 2019 moral weights
Value of doubling consumption for one person for one year	1	1
Value of averting the death of an under-5 year old	100	48
Value of averting the death of an over-5 year old	100	85

Importantly, we ran a sensitivity analysis and found that changing these moral weights to other plausible, but quite different, values does not lead to major changes in our planned allocations of funding among charities in the near term.41

Given a high level of uncertainty, we're defaulting to a position that many people would say is reasonable: treating all deaths averted equally.42 We are using round numbers to indicate lack of precision.43

What are valuable next steps on moral weights research more broadly?

Moral weights seems to be a highly neglected research topic. We would be excited to see more researchers from a variety of disciplines working on these questions.

For example, we would be interested to see:

Additional empirical research on moral weights in low-income countries, using methods like those used in this study as well as entirely new methods.
Additional research that addresses various weaknesses in the VSL literature. For example, how would the results change if very different risk levels were used within these surveys? Given how challenging these questions are, how robust and consistent are people's answers to them?
Philosophical research on questions like: What are the various philosophical perspectives on moral weights? How should we combine these perspectives in order to reach an overall moral weighting? Whose values should be factored in to our moral weights?

We think that this study has shown that there are valuable contributions to be made by beginning to research these very difficult questions, and we hope it is the beginning of substantially more work in this area.

Sources

Document	Source
IDinsight, Beneficiary Preferences 2019	Source
IDinsight, beneficiary preferences field test findings, November 2018	Source
IDinsight, beneficiary preferences pilot final report, May 18, 2018	Source
IDinsight, Beneficiary Profiles (qualitative interviews), November 2019	Source
Robinson et al. 2018	Source

December 12, 2019 update: after drafting this report we received a minorly revised version of IDinsight's study. It added additional analysis to an appendix on pages 79-80.44 See most recent version of the report here.

1
From Robinson et al. 2018:
- "The value of mortality risk reductions is relatively well‐studied; recent reviews suggest that over 200 studies have been completed globally. Because of the importance of these estimates, substantial attention has been paid to developing criteria for evaluating study quality and applicability, particularly in high‐income settings. Relatively few studies have been conducted in low‐ and middle‐income countries, however." Pg. 4
- "Many government agencies (particularly in high‐income countries) have published guidance on estimating VSL when assessing regulatory and other policies (for review, see Narain and Sall 2016, Robinson et al. 2017a). For low‐ and middle‐income countries, values are often extrapolated from the estimates used by either U.S. regulatory agencies or the OECD." Pg. 19
- "In Appendix A, we report the results of our review of studies conducted in low‐ and middle‐income countries. The starting point for our work was the articles identified in previous reviews (including those listed in Table 3.3). We then searched the literature for studies conducted in the 172 countries categorized as low‐ or middle‐income by the World Bank in any of the past 20 years (see Appendix B). We found 17 stated‐preference studies and eight wage‐risk studies that meet our selection criteria. These 25 studies were conducted in 15 countries, all of which are now middle‐ or high‐income. Hence the available studies represent the preferences of only a small fraction of the population globally." Pg. 27
- For list of lower- and middle-income countries where VSL studies have taken place, see "Table C.1. List of VSL Studies", Pg. 59.
- For other estimates of VSL from low- and middle-income countries, see Table C.3, Pgs. 60-61.
2
- We discuss some studies on revealed preferences in low- or middle-income countries here (see footnotes 11-15 and the associated text in the body of the page).
- Note that IDinsight piloted both stated and revealed preference approaches to gather data on the value of statistical life, and decided to focus on the former. When trying revealed preference approaches, it attempted to elicit respondents' valuation of health products (with information on how much the products reduce mortality risk) and then back out their valuation of mortality risk reduction, but this method turned out to be ineffective since respondents anchored on market prices for familiar health products, and the demand for unfamiliar products may have been affected by respondents' lack of trust in their efficacy.
3
Note: we and IDinsight previously used the title "Beneficiary preferences" to refer to this project. On additional reflection, we think that the term "beneficiary" may reinforce paternalistic narratives in development work. We want to use language that respects the dignity of people who our top charities serve. We have not yet settled on the best term to use going forward but have tried to reduce our use of the term "beneficiary" in this report.
4
- For income levels in other studies, see Table C.3, Pgs. 60-61, Robinson et al. 2018. See "Reported income" columns. The lowest "reported income" figures from other "Stated preference" studies seems to be "$1,345 (Mahmud, 2009)" (household-level figure) and "$2,315 (Gibson et al., 2007)" (uncertain if household- or per capita-level figure). The other studies from low- and middle-income countries typically have substantially higher reported income levels.
- The mean "Annual consumption per capita (nominal USD, top 1% winsorized)" in IDinsight, Beneficiary Preferences 2019 is $296.73.
5
"In all presented analysis, we report on poor respondents who have completed at least the VSL section (1820 poor respondents total), unless noted otherwise." Pg. 58, IDinsight, Beneficiary Preferences 2019.
6
For more details, see Appendix 2, Pg. 60, IDinsight, Beneficiary Preferences 2019.
7
For more details, see Appendix 3, Pg. 71, IDinsight, Beneficiary Preferences 2019.
8
For more details, see Appendix 4, Pg. 80, IDinsight, Beneficiary Preferences 2019.
9
See "Appendix D - Discarded Methods," beginning on Pg. 27 of IDinsight, beneficiary preferences pilot final report, May 18, 2018.
10
"First, we trained respondents in basic understanding of probability and assessed their understanding with a set of test questions (all accompanied with visual aids):
1. Basic understanding question 1: Imagine two lotteries. The chance of winning in one lottery is 5 in 1000, the chance of winning in the other lottery is 10 in 1000. Which lottery has the larger chance of winning?
2. Basic understanding question 2: Imagine two roads that are prone to accidents. The risk of dying in an accident on the first road is 1 in 1000, and on the second road is 3 in 1000. Which road is riskier?
3. Scale understanding question: Now imagine two different roads. The risk of dying in an accident on the first road is 1 in 100, and on the second road is 2 in 1000. Which road is riskier?
4. Basic understanding question 3: Imagine two people. The first person’s chance of death is 5 in 1000 in the next 10 years. The second person’s chance of death is 10 in 1000 in the next ten years. Which person is more likely to die in the next ten years?
5. Risk reduction question: Imagine a disease that kills 50 in 1000 people. There are three different vaccines available for the disease. Vaccine A reduces the risk of dying from this disease from 50 in 1000 to 20 in 10000, Vaccine B reduces the risk from 50 in 1000 to 40 in 1000, Vaccine C reduces the risk from 50 in 1000 to 30 in 1000. [Which vaccine has the largest risk reduction?]" Pg. 61, IDinsight, Beneficiary Preferences 2019.
11
- "For the VSL estimation, we include only respondents with sufficient understanding, which is about 62% of all respondents." Pg. 111, IDinsight, Beneficiary Preferences 2019.
- IDinsight pre-registered that it would only use results from the sub-sample that showed some level of basic numeracy, which involves being able to compare two probabilities presented with the same denominator, following the convention of the VSL literature.
See examples of the basic understanding questions that IDinsight used to arrive at its sample below:
1. "Basic understanding question 1: Imagine two lotteries. The chance of winning in one lottery is 5 in 1000, the chance of winning in the other lottery is 10 in 1000. Which lottery has the larger chance of winning?
2. Basic understanding question 2: Imagine two roads that are prone to accidents. The risk of dying in an accident on the first road is 1 in 1000, and on the second road is 3 in 1000. Which road is riskier?
3. Basic understanding question 3: Imagine two people. The first person’s chance of death is 5 in 1000 in the next 10 years. The second person’s chance of death is 10 in 1000 in the next ten years. Which person is more likely to die in the next ten years?
4. Risk reduction question: Imagine a disease that kills 50 in 1000 people. There are three different vaccines available for the disease. Vaccine A reduces the risk of dying from this disease from 50 in 1000 to 20 in 10000, Vaccine B reduces the risk from 50 in 1000 to 40 in 1000, Vaccine C reduces the risk from 50 in 1000 to 30 in 1000. [Which vaccine has the largest risk reduction?]" Pg. 61, IDinsight, Beneficiary Preferences 2019.
12
- See "Validity tests" section for additional VSL validity tests: Pgs. 63-65, IDinsight, Beneficiary Preferences 2019.
- For choice experiments, IDinsight excluded respondents who chose options that were clearly dominated (i.e., programs with both fewer deaths and fewer cash transfers) and respondents who were "inconsistent" in the following sense: "For example, if a respondent preferred the program with more cash transfers, but then switched to the program that saved more lives when the number of cash transfers was increased, we classified the individual as inconsistent. 88.4% of respondents passed both of these tests." See Pgs. 71-72, IDinsight, Beneficiary Preferences 2019.
13
We have not yet asked IDinsight for a detailed description of this process. In its report, IDinsight writes, "We also examined our qualitative data, to see what evidence we have to support that respondents largely understood the presented scenarios, and to identify any misunderstandings or misconceptions that may have affected our data. Overall, a majority of respondents interpreted our questions correctly. As shown in Table 20 we found evidence of respondents clearly comparing and considering the risk reduction levels, and mapping WTP to the value of different lives. However, there were some common pitfalls that may have biased our VSL results. In particular, we know that for some we failed to overcome the liquidity constraint and the respondents answered purely based on available income. A number of respondents also anchored their response to the market value of vaccines/medicines. This is of some concern for the relative value of children vs adults, as a number of respondents noted that they would pay less as children’s medication is typically cheaper." See Pgs. 65-66, IDinsight, Beneficiary Preferences 2019.
14
Some examples:
- VSL survey: IDinsight randomized asking about willingness to pay for risk reductions for adults versus children, randomized whether the item that reduced mortality risk was described as a "vaccine" or "medicine," and randomized the ordering of being offered a vaccine that reduces mortality risk by 5 in 1,000 versus 10 in 1,000. See Pg. 61, IDinsight, Beneficiary Preferences 2019.
- Choice experiment: The survey included two alternatives to the choice experiment that 1) asked respondents to compare the value of life versus additional education, and 2) presented a more direct trade-off (sample question: "Suppose a donor is choosing between two options: buying a medicine that costs X USD which can be used to save the life of a Kenyan aged Y who would otherwise die from a disease, and giving cash transfers worth X to extremely poor Kenyan for them to improve their lives. Which one do you think the donor should choose?"). We have not yet vetted these analyses. IDinsight concluded: "We have low confidence in the monetary values obtained from our secondary methods and do not suggest these are used directly by GiveWell. This is due to the non-monetary values respondents attach to education, the additional biases created when using a direct (one-sided) framing, and the smaller sample size. However, we found that the derived values are within the same range as those from our primary values, and once again we find that highest value is placed on individuals under 5. This gives further confidence to the high-level findings of our main methods." Pg. 103, IDinsight, Beneficiary Preferences 2019.
15
See Sections 4 and 5, Pgs. 39-51, IDinsight, Beneficiary Preferences 2019.
16
"Our consent rates for this survey were particularly high as we had visited the household the day before to conduct PPI [Progress out of Poverty Index; used to establish household wealth], informing them about the study and letting respondents know we would return. There was some drop-off of respondents over the course of the interview; 94% completed all sections (see Table 14). In the case that an interview was not completed, we gained specific consent to use the data already captured. Therefore, data from the complete sections of incomplete interviews is still included in the analysis." Pg. 58, IDinsight, Beneficiary Preferences 2019.
17See pgs. 8-9, IDinsight, Beneficiary Preferences 2019.
18Calculations: $40,763/200 = $203.82. $33,798/200 = $168.99. IDinsight confirmed to us in writing that it calculated VSL by multiplying mortality risk reductions of 5 in 1,000 by 200.
19
IDinsight confirmed to us in writing that it calculated VSL by multiplying mortality risk reductions of 5 in 1,000 by 200. "Across data collected in Ghana and Kenya, the estimated mean VSL is $40,763 for individuals under 5 years and $33,798 for individuals 5 years and older (with a standard error of $2,201 and $6,397 respectively). The central estimates are relatively precise. However, this central estimate varies when alternate estimation models or samples are used (such as the different risk reduction levels, samples with varying levels of understanding, and weighting)." Pg. 19, IDinsight, Beneficiary Preferences 2019.
20See Pg. 22, IDinsight, Beneficiary Preferences 2019. About half of the 'switchers' do so at an implied value of life of $30,000 or less. For more details, see Figure 3, Pg. 23, IDinsight, Beneficiary Preferences 2019.
21"Using the community perspective, the implied value is $91,049 for individuals under 5 and $47,644 for individuals 5 and older." Pg. 18, IDinsight, Beneficiary Preferences 2019.
22
"The importance of non-switchers on our estimation is reflected in the stark differences between country-level estimates. In Ghana, where 52% of respondents always prefer life-saving programs the central estimate is $200,877 (SE: $1,352,999). Meanwhile for Kenya, where 24% of respondents always prefer life-saving programs, the central estimate is $14,499 (SE $67,173)." Pg. 24, IDinsight, Beneficiary Preferences 2019.
23
"Our two approaches yield two relatively similar central estimates of the relative value of individuals under 5 to those 5 and older: 1.2 in the individual perspective; 1.9 in the community perspective (see Figure 6). This is further supported by one of our secondary methods that captured a relative value of 1.3 (see Appendix 9). A comparison of the credibility of the results of our two approaches is found below in Table 12." Pg. 33, IDinsight, Beneficiary Preferences 2019.
24
IDinsight took the previously mentioned figures for the value of a statistical life and implied value of life from choice experiments and divided them by $286. $286 is the estimated annual consumption per capita of GiveDirectly recipients (more), and therefore is a hypothetical cost to double consumption for one person for one year. One can estimate the relative value of averting a death by, for example, comparing the value of a statistical life (e.g. $40,763) to the value of doubling consumption for one person for one year ($286). In this example, the moral weight for averting a death would be $40,763/$286 = ~143.

We have not yet vetted this assumption.
25
- Value of averting the death of an under-5 year old, VSL approach: $40,763/$286 = ~143
- Value of averting the death of an under-5 year old, choice experiment: $91,049/$286 = ~318
- Value of averting the death of an over-5 year old, VSL approach: $33,798/$286 = ~118
- Value of averting the death of an over-5 year old, choice experiment: $47,644/$286 = ~167
See the previous footnote for an explanation of the reasoning behind these calculations.
26
The broader literature on VSL in lower- and middle-income countries suggests that such studies find widely varying results, and some studies conducted in higher-income contexts than this study found lower estimates for the value of life. We expect to consider the broader literature more carefully as part of our further work on moral weights. See Table C.3, Pgs. 60-61, Robinson et al. 2018.
27
More precisely the question was, "Scale understanding question: Now imagine two different roads. The risk of dying in an accident on the first road is 1 in 100, and on the second road is 2 in 1000. Which road is riskier?" Pg. 60, IDinsight, Beneficiary Preferences 2019. IDinsight reports that 34% of respondents answered correctly in Table 17, Pg. 62, IDinsight, Beneficiary Preferences 2019.
28
- "For the scale understanding question comparing 1/100 and 2/1000, 33.8% got it right the first time, and (including this 33.8%) overall 91.5% got it right with at most 2 explanations. There are a number of studies on the phenomenon of denominator neglect. [Footnote 114: For a review, see here.] Garcia-Retamero, Galesic, and Gigerenzer (2010) conducted experiments in Germany and found that in some cases, up to 50% of respondents compared risk reductions incorrectly due to denominator neglect, and that using visual aids (which we used in our scale question) significantly reduced the proportion who misunderstood. Note that this question is not part of our criteria for sufficient basic understanding of probabilities." Pgs. 62-63, IDinsight, Beneficiary Preferences 2019.
- For basic understanding questions, IDinsight notes: "For the basic understanding questions, our respondents performed better than those in a study in Bangladesh with similar questions (Mahmud, 2011). Additionally, our respondents performed similarly as those from a study in China (Hoffman et al., 2017) and a study in Malaysia (Ghani and Yusoff, 2003). They performed worse than respondents of a study in Mongolia (Hoffman et al. 2012). Compared to stated preference VSL studies from high-income countries, our respondents perform slightly worse than respondents when asked the “two people” question (basic understanding question 3 above). In a study in the US and Canada (Alberini et al., 2002), 88% of respondents got this question right the first time, whereas 83.7% of our sample did." Pgs. 62-63, IDinsight, Beneficiary Preferences 2019.
- Note that as previously mentioned, IDinsight pre-registered that it would only use results from the sub-sample that showed some level of basic numeracy, which involves being able to compare two probabilities presented with the same denominator, following the convention of the VSL literature (the previous question was considered beyond basic numeracy). Ultimately, 62% of the sample passed those tests. We have not yet asked IDinsight what proportion of this subsample answered the "1 in 100 vs. 2 in 1,000" question correctly, but it would be about 55% at most (34%/62% = ~55%).
29
"Strong external scope test: Failed. On average, the group of respondents given higher risk reductions for the first WTP [willingness-to-pay] question report higher willingness to pay, but the magnitudes are not proportional to the risk reduction levels. [Footnote 121: This is failed by all estimation models (subsamples with different requirements of understanding) for adult and child VSL. Passing this test would require the average WTP for the first 5/1000 risk reduction and half of the average for the first 10/1000 risk reduction be statistically indistinguishable.]" Pg. 64, IDinsight, Beneficiary Preferences 2019.
30
See Slide 22 in IDinsight, beneficiary preferences field test findings, November 2018. (See the row beginning with "Field Test data (RR=30/1)", with sample size of 94, results for "Strong External Scope Test.")
31
- For context, the main risk level used in this survey (5/1,000 reduction in risk of death) was chosen to be comparable to other stated preference VSL literature. We see this as a good indication of how results in VSL literature generally may be substantially affected by choices in the research methodology.
- "For the stated‐preference studies, one indicator of validity is whether estimates of WTP are sensitive to scope; i.e., whether WTP for different magnitudes of risk reduction are statistically significantly different. Theory suggests that WTP should be larger for a larger risk reduction, and close to proportional to the risk change as long as WTP is small relative to income (see Corso et al. 2001, Alolayan et al. 2017). The common practice of applying a constant VSL across differently‐sized risk changes rests on this assumption of proportionality; if WTP is not proportional to the risk change, then estimated VSL (WTP divided by the risk change) depends on the magnitude of the risk change." Pg. 53, Robinson et al. 2018.
- "The finding that, when scope tests are included, WTP is often relatively insensitive to risk magnitude is common in research conducted in high‐income countries as well (see, for example, USEPA 2010b, Robinson and Hammitt 2016). It suggests, for example, that the value of a 1 in 10,000 risk reduction is similar to the value of a 5 in 10,000 risk reduction. Using the same value for differently‐sized risk reductions in policy analysis would suggest that investing in policies that provide smaller risk reductions may be preferable (assuming the costs of implementing the policy increase with the size of the risk reduction), which seems nonsensical. It is more likely that individuals are misinterpreting the probabilities. This misunderstanding can be reduced by including educational materials in the survey then querying respondents to determine whether they comprehend the differences in probabilities. Using visual aids to illustrate the change in risk (such as a grid in which an area proportional to the risk reduction is colored) has been found to reduce this misunderstanding in several studies, but is not always effective." Pgs. 53-54, Robinson et al. 2018.
32
See, for example, discussion of Kremer et al. 2011 in footnote 15 here.
33
"Across our sample of low-income respondents in Ghana and Kenya, we found that 38% of respondents always chose the program that saved more children's lives over any number of cash transfers offered (up to the presented maximum of $10 million)." Pg. 22, IDinsight, Beneficiary Preferences 2019.
34
"Many respondents attribute a high valuation of life to an ethical rule that saving life is morally right, and equating life to a cash value is impossible and morally incorrect.

“When you have life, God has been faithful, and you can expect life will grow to assist the community. But if one has all these cash and doesn’t have any extra life or child to spend it, then what is the profit in having all this cash?” (Female, Karaga, Ghana)

“Life is very important and cannot be valued using cash” (Female, Migori, Kenya)

There is notable religious influence in these justifications, including mentions of the 'sanctity of life' and the 'religious duty' to care for life." Pg. 37, IDinsight, Beneficiary Preferences 2019.
35
"The importance of non-switchers on our estimation is reflected in the stark differences between country-level estimates. In Ghana, where 52% of respondents always prefer life-saving programs the central estimate is $200,877 (SE: $1,352,999). Meanwhile for Kenya, where 24% of respondents always prefer life-saving programs, the central estimate is $14,499 (SE $67,173)." Pg. 24, IDinsight, Beneficiary Preferences 2019.
36
"Across respondents in both countries, the calculated value of individuals under 5 years relative to individuals 5 years and older is 3.7. While this ratio captures the overall strong preference for individuals below 5 years relative to other age groups, it is highly skewed by an estimated negative valuation of individuals above 40 years. In both Kenya and Ghana, the ratio of individuals over 40 years relative to individuals below 5 years has a negative coefficient. Qualitative data suggests that at least some respondents consider the elderly to contribute a net drain on household resources, although it is not clear to what extent this explains the negative valuation. Plus, it is contradicted by one of our secondary methods which did find a net positive valuation of individuals over 40 (albeit from a much smaller sample, see Appendix 9 for more details).

While this finding is comparable to the results from a similar study in Bangladesh (Johansson-Stenman et al. 2009), it has a substantial effect on the overall results of our study. Plus, we know that few current GiveWell top charities support individuals over 40, so the relative value of this group is of less relevance to mora weights. Finally, we are not confident that the negative estimates for those above 40 reflect true preferences based on the mechanics of the analytical model. When comparing any two age groups against each other, over 40 is the group least frequently selected.

Since choices involving those over 40 are more one-sided, the model may not have enough data to accurately estimate the value placed on those over 40, despite the apparently small standard error.

As such, we suggest placing more weight on the ratio obtained when individuals over 40 years are excluded (1.9). This ratio is consistent with the range predicted by the literature for this population." Pg. 32, IDinsight, Beneficiary Preferences 2019.
37
There are a number of possible additional limitations. For some discussion of other limitations, see Appendix 10, Pg. 111, IDinsight, Beneficiary Preferences 2019. One example is: "Liquidity constraint: As discussed, we find evidence that the ten-year small installment repayment framing helped relax liquidity constraint, but cannot rule out that within each period respondents’ payment amounts are constrained by liquidity."
38
See Section 3, Pg. 35, IDinsight, Beneficiary Preferences 2019. See also the detailed qualitative interviews.
39
In the future, we expect to have a single set of moral weights and that choosing this set will involve determining what moral weights would be implied by a variety of approaches or worldviews and then taking a weighted average of those views.

We have done rough, early-stage versions of this analysis internally, and aspects of this analysis also supported the direction of the previously mentioned updates.

Examples of different approaches to moral weights could include:
- Building explicit models of the various morally-relevant aspects of a given outcome. For example, when estimating the value of averting a death, some staff have considered factors such as life expectancy, how someone's degree of personhood develops as they age, the level of grief associated with death at different ages, economic contributions made by people at different ages, etc. One example of such a model is here.
- Using our best guess of recipient preferences.
- Considering moral weights from a variety of philosophical perspectives: e.g. total utilitarianism vs. the "time-relative interest stance".
- Using best guess estimates from the stated preference VSL literature.
- Using best guess estimates from the revealed preference VSL literature.
- Valuing outcomes according to their estimated effect on subjective well-being.
- Deferring to conventionally used moral weights.
- Using pure intuition.
40
Examples of different approaches to moral weights could include:
- Building explicit models of the various morally-relevant aspects of a given outcome. For example, when estimating the value of averting a death, some staff have considered factors such as life expectancy, how someone's degree of personhood develops as they age, the level of grief associated with death at different ages, economic contributions made by people at different ages, etc. One example of such a model is here.
- Using our best guess of recipient preferences.
- Considering moral weights from a variety of philosophical perspectives: e.g. total utilitarianism vs. the "time-relative interest stance".
- Using best guess estimates from the stated preference VSL literature.
- Using best guess estimates from the revealed preference VSL literature.
- Valuing outcomes according to their estimated effect on subjective well-being.
- Deferring to conventionally used moral weights.
- Using pure intuition.
41
We will aim to avoid situations where we could cause disruptions in charities' programs by quickly and dramatically changing our moral weights.
42
This is informed by an understanding that malaria deaths aren't clustered among newborns or the elderly (where this intuition might break down).
43
However, we see arguments against this approach. Generally, when incorporating subjective inputs into our cost-effectiveness analysis, we don't see value in rounding numbers to communicate uncertainty. We focus instead on using the value that reflects our best guess, informed, whenever possible, by breaking down the question into components, estimating the components with data and/or intuition, and combining the components mathematically.
44This addition discusses how we might derive moral weights only from "switchers" in the choice experiment framing.