GiveWell's Pre-Analysis Plan for the New Incentives RCT: Addendum on Internal and External Validity

Published: March 2020

Purpose of this page

As part of GiveWell’s work to support the creation of future top charities, in August of 2019, IDinsight received a GiveWell Incubation Grant to complete its randomized controlled trial (RCT) of New Incentives' conditional cash transfer (CCT) program for infant immunization in North West Nigeria. GiveWell plans to use the results of this RCT as a key input for an investigation during 2020 into how New Incentives' cost-effectiveness compares to other giving opportunities we are considering.

As the funder of this RCT, GiveWell will have a high degree of supervision over the endline analysis and wants to pre-register its assumptions and planned supplementary data analyses before reviewing any endline data or results. This page is a supplement to GiveWell's Annex 5 in the full New Incentives endline pre-analysis plan.1 Annex 5 gave a general overview of GiveWell's planned approach to using the results from this RCT to complete its top charity investigation of New Incentives. The full pre-analysis plan and the primary and secondary research questions that it specifies remain in effect and are the most important guides to the endline evaluation. This addendum gives further details about the additional information GiveWell plans to review and the factors we expect to consider that could increase or decrease our confidence in the main results or their validity outside of the evaluation period, grounded in hypotheses about mechanisms behind this program's theory of change and known limitations of the data. GiveWell may use these analyses to inform key inputs into the cost-effectiveness analysis for New Incentives, in particular an internal validity adjustment and an external validity adjustment used to adjust the main program results to reflect expected future impact.

This is the first time that GiveWell has been deeply involved in the design phase for a program evaluation that will determine if we fund program scale-up. We believe it is valuable to record our plans for analyzing the results, but we will also need some flexibility in how we interpret the findings. There will inevitably be some outcomes or combinations of outcomes that we do not anticipate. After we have analyzed the results from this evaluation, we plan to revisit the utility of this pre-analysis plan and make changes to the approach for future evaluations we participate in.

In a nutshell

GiveWell believes that the New Incentives endline RCT will be a high-quality, rigorous impact evaluation of New Incentives' CCT for immunization program. Nevertheless, several factors are likely to affect program performance or data quality and may lead to adjustments for internal and external validity of the results in our cost-effectiveness analysis of the program. Many of the pre-registered factors affecting external validity are likely to apply only if New Incentives were to change its area or mode of operation substantially in the future. This program's internal validity adjustment is likely to be larger and more important than the external validity adjustment due to issues with using self-reports to measure results.

(Archived version)

Purpose of this page
In a nutshell
Factors affecting internal validity
- Self-report bias
  - Description of the problem
  - Planned analysis
- Additional factors affecting the internal validity adjustment
Factors affecting external validity
Limitations
Sources

Factors affecting internal validity

GiveWell believes that inaccuracy in self-reports of infant vaccination is likely to be the most important internal validity concern for this program. Inaccurate self-reports could ultimately lead to either upward or downward bias in RCT impact estimates, which GiveWell would like to account for in the internal validity adjustment for this program. While it is likely that self-reports are imperfect, on balance GiveWell has chosen self-reports of vaccination status as the primary outcome measure for this study because we believe that self-reports have fewer weaknesses than alternative impact measures. GiveWell will consider several pieces of evidence when evaluating the final accuracy of self-reported impact estimates. These additional validation exercises, while imperfect, are similar to those conducted in past work, and our impression is that past work did not find serious discrepancies between self-reports and other measures of vaccination status.2

Self-report bias

Description of the problem

Caregivers may misreport infant vaccinations, leading to false negatives and false positives in the data. Inaccurate reporting introduces measurement error into the study's key outcome variables (whether or not the child received each vaccine), which in most cases leads to bias in the study results. GiveWell is currently uncertain about the size and direction of self-report bias in the impact estimates. The possibilities range from substantial downward bias that could dilute any treatment effects present to substantial upward bias in the results due to increased reporting in the treatment group; the possibilities also include limited bias. This section describes the impacts that two common forms of response bias that may apply in the study context could have on study results.

Recall bias occurs when caregivers have imperfect memory of infant vaccinations that results in a pattern of reporting errors. GiveWell believes that recall bias is likely to be an issue for this RCT due to anecdotal reports of survey respondents struggling to remember and distinguish vaccinations, and on balance it probably manifests as caregivers systematically failing to remember and report vaccinations that happened rather than remembering vaccinations that did not happen. Recall bias could lead to under-reporting of coverage and substantial downward bias of the RCT point estimates, because some of any increase in vaccination coverage may also be forgotten and incorrectly reported as non-vaccination. However, it is also possible that the availability of CCTs for vaccination may help caregivers to remember vaccinations received at a higher rate in treatment areas. In this case, a higher proportion of the vaccinations that actually happened would be reported in the treatment group, leading to upward bias in the RCT results. Thus the net effect of recall bias on the impact estimate is uncertain.

Caregiver self-reports may also suffer from social desirability bias, which occurs if caregivers report false positives for vaccinations that did not actually happen, because they view getting infants vaccinated as “good behavior” and want to be viewed as conforming to this social norm. GiveWell is uncertain whether social desirability bias is a major factor for this RCT. Social desirability would bias self-reported coverage estimates upward, and reported baseline infant vaccination coverage was quite low in the area of North West Nigeria where New Incentives operates, so there may not be strong social norms surrounding vaccination.3 If social desirability bias exists and there is no differential social desirability bias between treatment and control groups, then it would lead to downward bias in the difference between treatment and control group coverage represented by the RCT impact estimates. This is because for any true increase in vaccination coverage due to the program, relatively more control group members are still reporting false positives for vaccination, diluting the true impact estimate.

However, social desirability bias could lead to upward bias in the impact estimates if New Incentives’ program messaging has increased social desirability bias in the treatment group relative to the control group, leading relatively more false positives to be reported in the treatment group. Thus, the net direction of social desirability bias is also uncertain.

In summary, the size and net direction of self-report bias in the New Incentives RCT is ambiguous. If treatment induces increased reporting of vaccinations, whether true or false, this will inflate the RCT results. If there is some degree of systematic underreporting or social desirability bias that does not differ by treatment status, the results will be downward biased. These biases could be additive or countervailing, and the net impact depends on the relative magnitude of these effects. It seems likely to GiveWell that recall bias and social desirability bias are both active to some degree, and that treatment status may have increased social desirability bias and improved recall in the treatment group as well.

Planned analysis

The net direction and magnitude of any response bias from self-reporting in the RCT results is unknown. GiveWell plans to take several additional analyses into account when developing an internal validity adjustment that accounts for response biases.

Cross-checks with alternative measures. There are several imperfect alternate data sources that GiveWell plans to use to cross-check the accuracy of self-reports. These are:

Child health cards (CHCs). When caregivers have CHCs available, endline surveyors will collect information from them. It is possible that New Incentives’ program has increased the proportion of caregivers who possess CHCs and the frequency at which they are marked.4 When a CHC is present, it may not be marked with all vaccinations that actually occurred. As a result, CHCs may suffer from both downward bias from unmarked vaccinations and upward bias from increased reporting rates in treatment areas. In addition, caregivers who maintain CHCs may be systematically different from those who do not. Impact results based on CHCs may not be representative of the entire population, and they could underestimate the effect of the program if only a subgroup of caregivers who vaccinate the most keep and maintain CHCs.
Administrative records: child immunization registers (CIRs) and tally sheets. Clinics keep paper administrative records of vaccinations they administer both by child name (in the CIRs) and as a volume count (tally sheets).5 These records are likely to be incomplete. GiveWell believes it is likely that New Incentives’ presence in treatment clinics has increased the rate of recordkeeping at these clinics.6 As a result, impact estimates from clinic tally sheets are likely to suffer from upward bias from increased reporting rates in treatment areas. In addition, treatment areas may attract more out-of-catchment vaccinations, which would further inflate estimates from tally sheets. CIRs will be used to verify whether reported vaccinations are confirmed by the ledger.7 GiveWell expects these records to be more complete in treatment clinics, but to remain imperfect, so a lack of CIR confirmation does not necessarily invalidate a positive self-report. CIRs should be unaffected by social desirability bias.
Bacillus Calmette–Guérin (BCG) scars. While not every BCG vaccination results in a scar, the injection leaves a scar on the arm in the majority of cases.8 GiveWell does not believe it is likely that the treatment induced different rates of scarring between treatment and control areas, so this measure should only be affected by attenuation bias, not differential reporting. However, alternative impact estimates using BCG scars could still be inaccurate if enumerators have trouble identifying subtle scarring, or if some infants are not available for inspection. There is also a concern that enumerators could violate protocol and use the presence of scars to correct self-reports, which would affect the use of scars to estimate the degree of error in self-reports. Estimates using BCG scars may lack external validity to the reporting accuracy of other vaccines because BCG is the first vaccine in the sequence. Validity may be particularly low for extrapolation to measles, the last vaccine in the sequence, and salience as the earliest vaccine as well as the presence of the scar may lead to a higher caregiver recall rate for BCG than other vaccines.
Self-reports of any vaccination. The endline survey asks caregivers if their child has ever been vaccinated in addition to asking about each individual vaccine. Recall of ever-vaccination is likely better than recall for specific vaccinations, because it may be harder to remember distinguishing details of each vaccination and the total number received. However, impact estimates on ever-vaccination could be quite different from individual vaccines even with perfect measurement. GiveWell is unsure whether New Incentives' program, if successful, would be more effective at inducing caregivers to start vaccinating infants or at inducing those who already vaccinate to seek more vaccinations.

GiveWell also considered collecting oral biomarkers to test for immunity to measles as an objective backcheck on self-reports, but we rejected using the biomarkers after they failed to detect immunity at sufficiently high rates during piloting. There is a range of possible explanations for what caused the biomarkers pilot to fail, including issues with the oral test and the possibility that the measles vaccine may have been less effective at inducing immunity than expected in this context. GiveWell may do more research into the question of vaccine effectiveness in the future, and we may decide to use conservative estimates of effectiveness for the first dose of the measles vaccine in its cost-effectiveness analysis.

As registered in the full pre-analysis plan, GiveWell plans to use self-reports as the main outcome measure, but we will also use impact estimates from each of the alternative measures as a robustness check to evaluate whether there are major inconsistencies with self-reported estimates. We will also consider the degree of agreement between self-reports and BCG scars, CIRs, and CHCs for individual respondents and use this information to update on the likelihood of systematic reporting biases. If self-reports indicate a positive program result and this is corroborated by multiple additional measures, this will increase GiveWell’s confidence in the self-reported estimates and its belief that a positive program result is unlikely to be spuriously driven by differential self-report biases in the treatment group. GiveWell will place the most weight on the BCG scar measurement, because it is the least likely to be influenced by treatment assignment. GiveWell understands that CIRs and CHCs may be of less value as cross-checks for the measles vaccination in particular. This is because measles is the last vaccination in the incentivized sequence, given at 9 months of age or older,9 and the longer interval of time elapsed since the second-to-last vaccination makes it more likely that caregivers lose or forget CHCs and more difficult for health workers to locate children in paper CIRs.

Weight on the prior. GiveWell has calculated a rough prior for program impact based on past research on CCTs for immunization that we consider to be the most similar to New Incentives' program based on considerations of program location, baseline vaccination rates, and similarity of the vaccines incentivized.10 This prior estimates roughly a 16 percentage point increase in vaccine coverage for each incentivized vaccine.11 Whether results from this RCT fall substantially above or below this prior may influence GiveWell’s views on whether the RCT results are more likely to be upward or downward biased.12

Changes in variables related to social desirability. As described above, increased social desirability bias in the treatment group could bias RCT results upward, but GiveWell is uncertain to what extent the program has increased the social desirability of vaccination. To inform its views, GiveWell will consider:

Whether there is evidence of increased false positive rates in treatment group self-reports of BCG vaccination as compared to control group self-reports as measured using BCG scars.
A comparison of the proportion of caregivers who report positive vaccination status and fail to have that report confirmed by CIRs in the treatment group relative to the control group. GiveWell expects the rate of unconfirmed reports to be lower in treatment areas because administrative reporting rates are likely to increase in treatment clinics. If those rates were in fact equal or higher in treatment areas than in control areas, this may be evidence of an increase in false positives reported in the treatment group due to increased social desirability bias.

Evidence of difficulty with caregiver recall. GiveWell expects that it will be difficult to determine the degree of recall bias present in the results, since it is likely to take the form of false negatives and incomplete information. The available cross-checks are also incomplete for various reasons. Nevertheless, to inform its views, GiveWell will consider:

The degree to which caregivers fail to report BCG vaccinations when a BCG scar is present, and whether there is evidence of a reduction in false negative rates for BCG confirmed by the scar in the treatment group. A limitation of this method is that GiveWell expects that caregiver recall may be particularly good for BCG compared to other vaccinations.
The degree to which caregivers fail to report vaccinations that are marked on CHCs. A limitation of this method is that caregivers who have CHCs available may have better recall than those who do not.
Results on ever-vaccination. In particular, a moderate to large program effect on ever-vaccination combined with a small impact estimate for individual vaccines would update GiveWell toward believing that the results could suffer from substantial downward recall bias because we expect reports of ever-vaccination to be less impacted by recall bias than reports for individual vaccines.

Additional factors affecting the internal validity adjustment

Treatment spillovers. Caregivers from control clinic catchments may have attended treatment clinics to seek the cash incentive, which would inflate coverage rates in the control group and bias the RCT results. The evaluation design placed buffer areas between treatment and control clinics, and the cash incentives offered for immunization are unlikely to be large enough to defray travel costs over long distances, so GiveWell does not expect major spillover effects. However, GiveWell will review three sources of information to test this hypothesis:

At RCT endline, IDinsight will assess the proportion of control group CHCs bearing an All Babies Are Equal (ABAE) initiative stamp (the local name for New Incentives' program).13 If a substantial proportion of CHCs from control areas bear an ABAE ID sticker, GiveWell would update toward believing that there could be substantial program spillovers leading to downward bias in the RCT results.
GiveWell will also compare endline control group vaccination coverage rates to baseline coverage rates. If control group rates have greatly increased over time across multiple vaccines, this may be an indication that spillovers occurred.
GiveWell will review the results of the analysis estimating the relationship between program uptake and geographic distance to catchment clinic. If there is a large decline in program uptake as distance to the clinic increases, this would support the absence of substantial spillovers, whereas if any increases in coverage persist over geographic distance, this would update GiveWell toward believing that spillovers could be salient.

Plausibility of immunization incentives as a driver of observed behavior change. Evidence that incentives may have motivated caregivers to vaccinate would support the key program mechanism and increase GiveWell's confidence that New Incentives' activities led to any measured increase in vaccination coverage. The baseline and endline surveys collect information on caregivers' reasoning for vaccination. The endline report will provide summary statistics for the following survey questions, stratified by treatment and control groups:

Main reason for getting vaccinated
Main reason for not getting vaccinated
The proportion of caregivers who report receiving the incentive as one of their reasons for vaccinating14

GiveWell plans to compare the endline responses to questions (1) and (2) to baseline responses and evaluate whether any barriers to or motivators for infant vaccination appear to have shifted over time. In particular, wanting to get the incentive is a possible response for the main reason for getting vaccinated, and it will be interesting to see if caregivers report factoring this into their reasoning, which would support a mechanism behind the program's theory of change. Question (3) gives a less restrictive signal of whether caregivers report factoring incentives into their vaccination decision. Positive evidence that more caregivers report seeking the incentive in treatment areas than control areas would increase GiveWell's confidence in the internal validity of these results and the value of the cash incentives. However, social desirability may discourage caregivers from reporting wanting to get the incentive as a reason for vaccination, so failing to find higher rates of incentive-seeking would not necessarily be a strong negative update on internal validity if the RCT demonstrates a large program result.

Plausibility of the magnitude of results. The baseline survey established that baseline vaccination rates were low, poverty was high, and prevalence of anti-vaccination views was low in the Nigerian population that New Incentives serves.15 Our literature review and prior for program impact indicated that other programs offering CCTs for immunization have shown large effects in contexts with similar characteristics. As a result, GiveWell currently believes that a large impact of this program is plausible. When assessing the plausibility of the final results for this program, GiveWell will take into account any updates between baseline and endline about key factors it believes could affect the external validity of the results, listed below. If the endline impact estimates are high, but we simultaneously find a high rate of vaccine stock-outs or a low rate of caregiver awareness of cash incentives, this may lead us to discount the internal validity adjustment based on plausibility.

Partially treated areas. The settlement lists that define clinic catchment areas have changed between the baseline survey in 2017 and the endline pilot survey in August 2019. GiveWell, IDinsight, and New Incentives have aligned on using the August 2019 settlement lists to define the intent-to-treat area for endline surveying because, on balance, they seem to be more representative of New Incentives' current operations than the baseline lists. However, it is likely that a small proportion of settlements on the new lists may only have been treated during part of the study period. GiveWell may adjust the results to estimate a treatment-on-the-treated impact to account for these areas that may contribute smaller results to the total than if they had been fully treated. To do so, we will consider a robustness check that excludes settlements where respondents report that they do not belong to the expected catchment clinic, information from New Incentives about the settlements in which it operates, and map verification analyses from IDinsight.

Alternative measles campaigns. GiveWell understands that government-sponsored Supplementary Immunization Activities (SIAs) have delivered door-to-door measles vaccinations during the RCT period,16 though we do not necessarily expect them to bias the RCT results. GiveWell does not have knowledge of alternative community campaigns for other incentivized vaccines besides measles during the RCT period. Similar campaigns also pre-dated the study period, and GiveWell does not have any evidence that these campaigns operated differently in treatment and control areas. Nevertheless, IDinsight will construct a separate impact estimate for "RI measles," e.g. measles vaccinations that reportedly occurred at health clinics or via health clinic outreach activities and are eligible for New Incentives' program. If the impact estimate for RI measles is substantially different from the overall measles estimate, GiveWell may update toward believing that the effect size was affected by alternative measles campaigns and update the internal validity adjustment accordingly.

Study quality. GiveWell has followed this RCT closely and believes that it meets high quality standards. As a result, we do not expect to downgrade the internal validity adjustment for concerns about study design and quality, unless we receive unexpected negative updates about execution.

Survey quality control included blinding of the enumerators to treatment status,17 quality control backchecks for a high proportion of surveys and all enumerators, and re-surveys in cases where the backchecks identified major errors in survey technique.18 If we become concerned about an aspect of data quality, we expect to have more information available from these auditing procedures that we can review to understand potential weaknesses.

New Incentives plans to contract an outside expert to perform a push-button replication of the primary study results. If the replication finds that IDinsight's analysis deviated from its full pre-analysis plan, GiveWell will investigate and may negatively update on study quality.

Factors affecting external validity

This section lays out explicit factors and analyses that GiveWell expects to consider when setting the external validity adjustment for New Incentives' program. Many of these factors are likely to apply only if New Incentives were to change its area of operation substantially in the future. GiveWell understands that New Incentives has substantial room to scale up its program while remaining within the three Nigerian states included in this RCT.

GiveWell has several guiding hypotheses about necessary conditions for program success:

Hypothesis 1 - baseline coverage: Low baseline vaccination coverage rates leave more scope for New Incentives' CCT program to have a large impact than if coverage is already high. GiveWell's literature review indicates that programs offering incentives for vaccination tend to show larger impacts in contexts with low baseline coverage rates.

Planned analysis: The New Incentives baseline survey has already confirmed that baseline coverage is low in Katsina and Zamfara, two of the Nigerian states where New Incentives works;19 no baseline was conducted in the third state because this state was added to the program later. The endline report will provide average control group coverage rates. If this RCT shows a large impact of the program (at least a 10 percentage point coverage increase per incentivized vaccination), GiveWell will consider this evidence to be consistent with hypothesis 1.
Planned adjustment: None at present. GiveWell is likely to make a larger deduction for external validity if the program expands into a new region or country with higher baseline coverage or if GiveWell believes that an external factor has significantly raised vaccination coverage in the areas where New Incentives currently works.

Hypothesis 2 - poverty rates: Areas with high poverty rates are more likely to benefit from the program. New Incentives offers small cash transfers that serve as a nudge to follow the routine immunization schedule, and less wealthy households may be more motivated to seek these cash incentives for vaccination. The baseline survey also confirmed that vaccination coverage increased with higher socioeconomic status.20 Cash incentives may reduce barriers to seeking vaccination that households with lower socioeconomic status face, such as transportation costs, to help equalize coverage.

Planned analysis: The baseline survey found high poverty rates of over 50% in the areas where New Incentives currently works.21 To test the validity of this hypothesis, the endline report will include results from a secondary regression that interacts poverty status22 with treatment status to assess whether any increase in the average number of vaccines received is more concentrated in households below the poverty line.23
Planned adjustment: None at present. If this analysis shows a clear result, GiveWell would be likely to make an external validity adjustment to account for future new areas of operation with higher or lower poverty rates.

Hypothesis 3 - cultural acceptability: Low prevalence of strongly held opinions against vaccination is important for New Incentives' theory of change. Reported barriers to vaccination should be factors that cash incentives and program marketing could plausibly overcome. It may be difficult to incentivize vaccination if vaccination is not culturally acceptable.

Planned analysis: Acceptability did not seem to be a major barrier to vaccination in the baseline survey, with fewer than 15% of caregivers reporting socio-cultural barriers or fear/mistrust as their main reason for not vaccinating.24 The endline report will report similar summary statistics that confirm whether attitudes have changed in the treatment group relative to the control group and whether a similar pattern holds in Jigawa state, which is part of the RCT but was not covered by the baseline survey. However, since there did not appear to be much scope for change in this factor at baseline, GiveWell believes that the endline is unlikely to be a major update.
Planned adjustment: None at present. If New Incentives expands into a new area, GiveWell is likely to assess whether acceptability is likely to be a barrier to vaccination there.

Hypothesis 4 - delivery through routine immunization (RI) services: Because New Incentives' program offers incentives for vaccinations only through Nigeria's RI services via health facilities and health facility outreach, a high percentage of infant vaccinations must occur through these sources in order for the program to be impactful.

Planned analysis: The baseline survey confirmed that clinics are the main source of vaccinations in the areas of North West Nigeria where New Incentives works (87.6% of vaccinations) and that clinic outreach services are the second most popular source (8% of vaccinations).25 New Incentives' delivery approach appears to be appropriate for the North West Nigerian context.
Planned adjustment: None at present. If New Incentives expands into a new area, GiveWell is likely to assess the delivery model for vaccinations in that context. This factor may also interact with catchment security status, which we address separately below.

Hypothesis 5 - geographic access: Caregiver's ease of access to clinics as the main site of infant vaccinations predicts program success. Some clinic catchment areas are more geographically spread out than others. Also, the cash incentive may not be enough to defray travel costs if the clinic is difficult to access.

Planned analysis: The map verification exercise for endline collected GPS data on the location of settlements within each catchment, and the endline survey will collect GPS records of the location of each household.26 At endline, IDinsight will run a regression of vaccination status on treatment status interacted with the distance from either the household to its catchment clinic or the household's settlement to its catchment clinic to assess if any increase in vaccination uptake is concentrated close to the clinic, and if coverage has increased at all far from the clinic.
Planned adjustment: None at present. GiveWell will interpret these estimates cautiously, since they are not necessarily causal. From the baseline survey, we know that catchment security status is somewhat confounded with geographic spread of clinics: Zamfara, the state which had the worst baseline security situation and where security has worsened over time, also has more spread-out clinics.27 Security concerns may worsen access to clinics apart from considerations of distance. However, if this analysis shows strong results, GiveWell may adjust the external validity adjustment to take into account any future changes in access and geographic spread around the clinics where New Incentives operates. RI services in which New Incentives participates sometimes include outreach posts and door-to-door delivery, which may increase access.28 GiveWell is likely to take into account changes in the share of RI vaccinations given via outreach services in the external validity adjustment as an indicator of access for remote settlements, tracked via New Incentives' monitoring data.

Hypothesis 6 - supply-side issues: Limiting supply-side issues with vaccine stock-outs or shortages of medical staff is important to program success. People who seek out the incentives for vaccination need to be able to receive them.

Planned analysis: Baseline data suggested that supply-side issues were not a major barrier to vaccination, but they were present and may have become more prevalent since baseline if a higher volume of children are now attending clinics for vaccination in treatment areas.29 The endline survey will ask caregivers whether they went to a health facility intending to get a child vaccinated but were not able to do so. The endline report will present means of this variable stratified by treatment and control areas. If significantly more caregivers report being unable to get vaccinations due to supply-side issues, GiveWell may view this as evidence that supply-side constraints limited New Incentives' impact. New Incentives also collects monitoring data on vaccine stock-outs, which is limited to treatment areas, and GiveWell may compare the future volume of recorded stock-outs to the volume during the evaluation period.
Planned adjustment: If the endline survey identifies a trend of stock-outs being significantly more likely to be reported in treatment areas than control areas at endline, or if they have become more prevalent since baseline, GiveWell may negatively update its external validity estimate for New Incentives if the program expected to scale up further within the same region, which may be even more taxing for the existing health system. GiveWell will consider these stock-out results in conjunction with the analysis of treatment spillovers because, if there is strong evidence that spillovers occurred, expanding the program to control clinics may alleviate supply constraints on treatment clinics and increase program impact. Additionally, if there were evidence in the future that supply side constraints were loosening, perhaps due to New Incentives' technical assistance with planning the vaccine supply, GiveWell may positively update its external validity adjustment. Changes in stock-outs relative to the evaluation period from New Incentives' monitoring data may serve as evidence of supply-side constraints tightening or loosening in the future.

Hypothesis 7 - program outreach capacity and intensity: The degree of program outreach and cooperation from local leaders in attempts to reach caregivers with basic information about vaccination and New Incentives' program may be important to program success, in addition to providing cash incentives.30

Planned analysis: GiveWell will review survey information on the proportion of caregivers who report being aware that a cash incentive is available in exchange for getting their child vaccinated in their clinic catchment, and whether the proportion differs between treatment and control areas as a benchmark for outreach performance during the study period. GiveWell understands that New Incentives may expand its messaging to include more types of community outreach in the event of a positive top charity decision post-RCT, when the program is no longer constrained by avoiding advertisement in control communities. We will ask New Incentives about its plans to change its outreach after the RCT.
Planned adjustment: If New Incentives plans to change its outreach, then GiveWell may consider the relative intensity of New Incentives' future community outreach activities relative to the RCT period in the external validity adjustment.

Hypothesis 8 - catchment security: Security concerns may limit travel to clinics and reduce program impact.

Planned analysis: This RCT stratified randomization on baseline catchment security status before the RCT began.31 Security status may have changed over time in non-random ways, but this stratification ensures a comparable group of treatment and control group clinics for each baseline security status. In the endline report, IDinsight will present results from a regression of the average number of vaccines received on treatment status interacted with indicators for baseline clinic security status and all other covariates from our primary outcome regressions.32 During the evaluation period, risks may have materialized in some high-risk areas but not in others. If it appears that heightened baseline security risk is associated with significantly reduced program impact, we may probe further into an analysis of catchments with severe security incidents over the course of the study.
Planned adjustment: If a negative correlation between heightened security risk and program impact exists, GiveWell is likely to positively or negatively update the external validity adjustment due to a substantial positive or negative change in the security situation, respectively. Also, security concerns are likely to prevent surveying in the riskiest settlements at endline. If security status is strongly correlated with program impact, GiveWell is likely to assume that impact was smaller than average in areas that could not be surveyed, and negatively update the external validity adjustment accordingly.

Hypothesis 9 - New Incentives’ program implementation: The quality of program implementation may change post-RCT after program impact is no longer being measured. GiveWell will compare the quality of New Incentives work in the RCT and post-RCT periods though the monitoring data that New Incentives collects (details to be determined).

Hypothesis 10 - Vaccine effectiveness: The effectiveness of incentivized vaccines at inducing immunity may vary by different contexts. We have not yet determined if assessing this is feasible.

Hypothesis 11 - Disease environment: GiveWell also expects context-specific mortality rates from vaccine-preventable diseases to be an important aspect of external validity for this program. We expect to adjust for this component separately from the program's external validity adjustment using external data on context-specific mortality rates. Other GiveWell cost-effectiveness analyses currently use data from IHME's Global Burden of Disease project to adjust for context-specific mortality.

Hypothesis 12 - Disease spillovers: Immunizing a greater proportion of the population against certain diseases may reduce the prevalence of those diseases in unvaccinated populations, and the magnitude of this effect probably varies with different levels of vaccine coverage. GiveWell may conduct additional research on this question and may add an explicit adjustment for this factor to the cost-effectiveness model. However, herd immunity seems unlikely unless the program achieves very high vaccine coverage.

Hypothesis 13 - Introduction of additional vaccines. As discussed in GiveWell's annex in the pre-analysis plan for this trial,33 several new vaccines may be introduced in North West Nigeria in the near future. GiveWell may extrapolate from the results of this trial to evaluate the cost-effectiveness of incentivizing additional vaccines that become available in Nigeria or are available in other contexts where New Incentives may work in the future.

Hypothesis 14 - Presence of competing vaccination campaigns. GiveWell may negatively update on external validity, even if the program continues operating in the same areas, if there is a scale-up in external programming supporting infant vaccination. GiveWell is currently aware of several programs that have the potential to compete with New Incentives: a World Bank CCT program pilot that may incentivize vaccinations in some program areas in the future, and several government-sponsored Supplementary Immunization Activities (SIAs), some of which do not qualify for New Incentives' program depending on the immunizations offered and the age of infants targeted.34 SIAs sometimes use volunteer community mobilizers (VCMs), or community health extension workers (CHEWs) to deliver infant vaccinations door-to-door. VCMs are only qualified to deliver oral vaccines. GiveWell's current understanding is that measles 1 is the only directly incentivized vaccine that faced competition from SIA programs during the study period.35

Hypothesis 15 – Inflation-adjusted transfer size. The value of the cash incentives New Incentives offers to program beneficiaries may change over time with the inflation rate, which could affect program uptake rates relative to the RCT period. GiveWell will consider any changes over time in inflation-adjusted transfer amounts and may negatively update the external validity adjustment if transfers fall substantially, particularly if they fall below the inflation-adjusted travel and clinic fees that caregivers reported paying for vaccination visits in the endline survey.

Limitations

GiveWell understands that the supplementary data analyses it plans to consider do not necessarily represent causal evidence on program mechanisms. Many of these analyses are imperfect and rely on data that were feasible to collect rather than strong evidence.

GiveWell may not complete all of the analyses outlined here, particularly if the program does not appear cost-effective using the main results. In each case, it will depend on whether the specified factor is likely to be decision-relevant for New Incentives' top charity decision or funding allocation. If GiveWell adjusts the main results using endline data analyses not specified on this page or in the full pre-analysis plan, it will write publicly about its reasoning for doing so.

Sources

Document	Source
Banerjee et al. 2010	Source
IDinsight, Endline Design Document Appendix	Unpublished
IDinsight, New Incentives Endline Pre-analysis Plan 2019	Source (archive)
IDinsight, New Incentives Evaluation Baseline Report 2019	Source
Innovations for Poverty Action, PPI for Nigeria	Source (archive)
Kusuma et al. 2017	Source (archive)
New Incentives, RCT Analysis List	Unpublished
New Incentives, RCT Endline Routine Immunization Survey questionnaire	Unpublished
New Incentives, Responses to 13-Feb-2020 Questions from GiveWell	Unpublished
Sato and Fintan 2019	Source (archive)
World Bank Project Appraisal Document, National Social Safety Nets Project	Source

1
The pre-analysis plan for the New Incentives RCT is published on three different registry databases. There is no meaningful difference in content among the three.
- US government database (clinicaltrials.gov)
- WHO database (ISRCTN)
- 3ie database (RIDIE)
2
The Banerjee et al. 2010 RCT of immunization incentives in India also considered administrative records, BCG scars, and immunization card records: "Self report can be influenced by recall bias (mothers, who are often illiterate, might not remember) and potentially by social desirability bias, which might be affected by their intervention group. We carried out several validation exercises in which we compared the self reports with the BCG scar, the immunisation card (available for 343 children), and a sample of children from intervention villages. The immunisation camp records were matched with the survey data. BCG self report seemed to be accurate (see appendix 1 on bmj.com). Immunisation status elicited from the survey instrument corresponded to within one injection of the status indicated on the card 79% of the time and to within one injection of the status indicated in the logbook 74% of the time. The mis-measurement was not correlated with the treatment status or the number of immunisations reported and was not systematically over-reported or under-reported and should therefore increase noise but not necessarily introduce bias (as measurement error is the dependent variable and is not differential in different treatment groups). While self reports of immunisation from mothers who do not have an immunisation card is not perfect, a meta-analysis of several validation studies has shown that they are generally of acceptable quality and that they represent 'the best available independent measure of DPT3 coverage.'" Banerjee et al. 2010, Pg 3.
3
"Our survey found that routine immunization coverage across Katsina and Zamfara is low. A third of 12 to 16-month olds (33.6%, 95% confidence interval (CI): 32.2%, 35.0%) have received at least one injectable vaccine… and only 10.2% (95% CI: 9.1%, 10.9%) of 12 to 24-month olds are fully immunized..." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 30
4
“As New Incentives’ program checks child health cards and child immunization registers and encourages officials to supply them, we expect the program will improve the degree to which clinic staff record vaccinations on the cards and on the register, as well as the degree to which caregivers retain child health cards.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 13.
5
“Step 2: Paperwork
Caregivers are called sequentially according to their numbers, and a nurse will complete paperwork:
1. Fill out the clinic child health register. This contains a child’s complete vaccination history, phone number, and follow-up address.
2. Fill out the infant’s child health card, which the caregiver is supposed to keep at home between visits. If this has been lost, the caregiver is issued a new one using the information in the child health register, or a duplicate card kept at the clinic.
3. Tally vaccine doses on a tally sheet, which is aggregated through the local government area and state administrative areas to determine coverage rates.” IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 74.
6
“As New Incentives’ program checks child health cards and child immunization registers and encourages officials to supply them, we expect the program will improve the degree to which clinic staff record vaccinations on the cards and on the register, as well as the degree to which caregivers retain child health cards.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 13.
7
- “IDinsight will collect either Tally Sheets or Monthly Immunization Summaries, depending on the workload of all clinic-enumerator tasks, as determined during field practice prior to data collection. Tally Sheets are – we think – a slightly more accurate source, though our Midline analysis showed them to be quite consistent with Monthly Immunization Summaries, which are much easier to collect.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Footnote 20, Pg 6.
- “IDinsight collected these records as part of midline data collection in March/April 2019. We found a robust positive impact across vaccinations but these results were for a time period ending part-way through the RCT window so follow-on collection is necessary. These administrative records are imperfect, and we worry that the program itself may cause them to improve, leading to differential data quality in treatment and control. However, they remain an important alternative source that we expect to show results at least qualitatively similar to those found with self reports.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Footnote 21, Pg 6.
8
“During the baseline, around 29% of infants whose administrative records indicated BCG had no scar and 33% of infants who reported receiving BCG had no scar. The issue is more likely with the accuracy of the records than the enumerators. The opposite is not true for those with scars: 93% had BCG recorded on their card if they had a card, and 95% of these respondents reported BCG. In a study of 70 vaccinated infants at a hospital in India, Dhanawade (2015), found 91% of infants had scars indicating a scar failure rate of 9%.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Footnote 61, Pg 16.
9
See IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 15. Table 1.
10
In addition, GiveWell has participated in funding another RCT of immunization incentives in Haryana, India. GiveWell expects a working paper for this project to be available soon, and it may amend the prior to incorporate these results if they are released in time. More information about this project is available here.
11
- See GiveWell’s New Incentives prior calculation, Column “Increase in vaccine take-up (percentage points),” Row “Prior for New Incentives.”
- One of the trials included in this prior, Banerjee et al. 2010, calculated outcomes based on increases in full infant immunization rather than individual vaccine uptake, so this input is likely to underestimate this intervention's impact on each individual component vaccine. This may imply that the Banerjee et al. input should be higher when interpreted as an increase in uptake for an individual infant vaccination, and that as a result, GiveWell's prior is slightly too low. However, GiveWell believes that differences between the Banerjee et al. intervention and New Incentives' program are likely to lead the Banerjee results to be higher than those for New Incentives, balancing out concerns of overestimation. This is because Banerjee et al. also expanded access to vaccination in treatment areas via regular mobile immunization camps (a second treatment group received only the immunization camps, and the point estimate used for the prior is the difference between the treatment group with incentives plus camps and the treatment groups with only camps). Baseline rates of full immunization were 1% in the Banerjee population, compared to 10% in New Incentives' target population, implying less access to vaccinations at baseline and a large scope for impact.
  - See Banerjee et al. 2010, Table 1, Pg 5, Row “Completely immunized." The average of all three columns is 1%.”
  - Though baseline rates are low among New Incentives’ target population, they are higher than among those studied by Banerjee et al.: “Our survey found that routine immunization coverage across Katsina and Zamfara is low. A third of 12 to 16-month olds (33.6%, 95% confidence interval (CI): 32.2%, 35.0%) have received at least one injectable vaccine (Table 2a) and only 10.2% (95% CI: 9.1%, 10.9%) of 12 to 24-month olds are fully immunized (loose definition). The coverage rate for all three doses of PENTA is only half of the rate for any PENTA.” IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 30.
12
A limitation of this method is that the prior may also be subject to self-report biases, though the individual studies also completed various degrees of robustness checks on the accuracy of self-reports.
- “Our interviewers asked each woman about her vaccination status and also inspected, when available, the antenatal care card with the vaccine record to determine the status of tetanus vaccine uptake.” Sato and Fintan 2019, Pg 3.
- “In terms of data collection, the questionnaire combines two sources of child immunization status: health card record (KMS/KIA) and mother’s response on whether a child was immunized. We use the former in our main analysis for more reli-ability and provide the latter as robustness checks in the Appendix.” Kusuma et al. 2017 Pg 3.
- “ We carried out several validation exercises in which we compared the self reports with the BCG scar, the immunisation card (available for 343 children), and a sample of children from intervention villages.” Banerjee et al. 2010, Pg 3.
13
- New Incentives, RCT Endline Routine Immunization Survey questionnaire (unpublished)
- New Incentives, RCT Analysis List (unpublished)
14
New Incentives, RCT Endline Routine Immunization Survey questionnaire (unpublished)
15
- "Our survey found that routine immunization coverage across Katsina and Zamfara is low. A third of 12 to 16-month olds (33.6%, 95% confidence interval (CI): 32.2%, 35.0%) have received at least one injectable vaccine (Table 2a) and only 10.2% (95% CI: 9.1%, 10.9%) of 12 to 24- month olds are fully immunized (loose definition). The coverage rate for all three doses of PENTA is only half of the rate for any PENTA." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 30.
- "Using the $1.25 /day poverty line (2005 purchasing power parity, or PPP), 52.6% of the sample falls below the poverty line. On average, our sample is relatively poorer than both Nigeria as a whole as well as the North West Nigeria region." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 35.
- "Socio-cultural considerations such as lack of permission from the husband or religious reasons were cited as the main reason for skipping vaccinations by only 7.3% of caregivers, suggesting that the majority of respondents are not opposed to vaccinations for socio-cultural reasons – or at least do not say so explicitly. A final category of interest was the mistrust or fear of vaccination, which was the most infrequently cited reason for not vaccinating children – only 5.5% cited medical reasons such as fears of side effects, bad reactions to previous vaccinations, or a fear of needles." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 50.
16
New Incentives, Responses to 13-Feb-2020 Questions from GiveWell (unpublished)
17
To the extent that was possible. GiveWell is aware that New Incentives may place program posters in treatment clinics, and some enumerators may have seen these advertisements and become aware that they were in a treatment area.
18
- IDinsight, conversation with GiveWell, February 5, 2020.
- “Approximately 10% of interviews from each round of data collection are back-checked to ensure data quality.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 8.
- "We expect that each enumerator will have been back-checked at least once for household listings by the time data collection is completed in the first clinic (or equivalently by data collection day 5-6).
  We expect that each enumerator will have been back-checked at least once for Routine Immunization surveys by the time data collection is completed in the second clinic (or equivalently data collection day 10)." IDinsight, Endline Design Document Appendix (Unpublished), Pg 6.
- IDinsight, email to GiveWell, January 24, 2020.
19
“Our survey found that routine immunization coverage across Katsina and Zamfara is low. A third of 12 to 16-month olds (33.6%, 95% confidence interval (CI): 32.2%, 35.0%) have received at least one injectable vaccine (Table 2a) and only 10.2% (95% CI: 9.1%, 10.9%) of 12 to 24- month olds are fully immunized (loose definition). The coverage rate for all three doses of PENTA is only half of the rate for any PENTA.” IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 30.
20
"Higher socioeconomic status, both self-reported and the calculated PPI, were indicators of higher coverage rates. To better compare self-reported wealth with the PPI quintiles, we combined rungs 4-6 into one category due to smaller sample sizes. A higher socioeconomic status represents greater resources and possibly time to commute to a nearby clinic to vaccinate a child." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 40.
21
"Using the $1.25 /day poverty line (2005 purchasing power parity, or PPP), 52.6% of the sample falls below the poverty line. On average, our sample is relatively poorer than both Nigeria as a whole as well as the North West Nigeria region." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 35.
22
As defined using Innovations for Poverty Action's Poverty Probability Index for Nigeria, calculated based on household assets following Schreiner, M. 2015. “Simple Poverty Scorecard® Nigeria.” Innovations for Poverty Action, PPI for Nigeria
23
New Incentives, RCT Analysis List (unpublished)
24
"Socio-cultural considerations such as lack of permission from the husband or religious reasons were cited as the main reason for skipping vaccinations by only 7.3% of caregivers, suggesting that the majority of respondents are not opposed to vaccinations for socio-cultural reasons – or at least do not say so explicitly. A final category of interest was the mistrust or fear of vaccination, which was the most infrequently cited reason for not vaccinating children – only 5.5% cited medical reasons such as fears of side effects, bad reactions to previous vaccinations, or a fear of needles." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 50.
25
"The majority of vaccinations take place at health facilities (87.6%). The second most common location for vaccination was during health facilities’ community outreach activities, although only 8.0% of reported vaccinations took place there. The remaining 4.4% of vaccinations took place either at home, during community campaigns, or at miscellaneous “other” locations." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 41.
26
- “In February/March 2018 and July/August 2019, IDinsight conducted “map verification” across the study area. This activity provided us with the following information: ...Global positioning system (GPS) coordinates for all study settlements.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 11.
- "Plan for Endline: Collect the GPS location of every household that was surveyed." IDinsight, Endline Design Document Appendix (Unpublished), Pg 14-15.
27
"[I]n Zamfara, 15.2% of vaccinations took place during outreach activities compared to 5.0% in Katsina, which may be driven by a host of factors, including security (Zamfara is more insecure) or the geographic spread of clinics (which is greater in Zamfara).

Vaccination locations differed by the security status of the clinic (as defined by New Incentives), shown in Figure 10. Outreach is a more common source of vaccinations in areas assessed as having ‘Some Security Issues’ and ‘Serious Security Issues’ compared to areas with ‘No Security Issues’. Despite this, outreach accounts for a minimal number of vaccinations in the least secure catchments, with 97.5% of vaccinations in ‘No Go Zones’ taking place at health facilities.

It is difficult to interpret these results, although they may suggest that while community campaigns and outreach can still take place in areas with serious security issues, this is simply not possible for areas classified as No Go Zones. In No Go Zones therefore, health facilities have an even greater importance compared to relatively safe areas." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 42.
28
"Incentives can be received in the community as part of regular vaccination outreach organized by the clinic, when nurses go to the villages to administer vaccines." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 15.
29
- "Stock-outs and unreliability of clinics (3.7%) and poor customer service by clinic staff (0.5%) do not seem to be primary contributors to respondents’ lack of motivation to vaccinate their child. Elsewhere in the survey, 12.4% of all caregivers reported that they went to a health facility intending to get a child vaccinated but were not able to do so, mostly due to supply-side issues. Such supply-side pressures may become more important barriers to vaccination if a higher volume of children attend clinics for vaccinations due to the New Incentives’ program." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 50.
- "Nineteen percent of caregivers whose children ever received an injectable vaccination reported going to the facility with the intention of vaccinating their child and not being able to do so. Additionally, 9% of caregivers whose children never received an injectable vaccination attempted to vaccinate and failed. While the single most common issue was a lack of vaccine (35%), responses indicating confusion on when caregivers could access vaccination services were also prevalent (36%). These responses include the clinic was closed, the vaccinator was not there, or that the visit was not on a vaccination day." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 70.
30"Lack of knowledge and ambivalence were the two most common reasons caregivers cited for not vaccinating their child, accounting for 53% and 11% of all caregivers who do not self-report fully vaccinating their child. Ensuring caregivers have basic information about the routine immunization could be an “easy win” in terms of increasing coverage. For example, many caregivers think that vaccinators will come to their doors as in the polio campaigns, and others think no vaccinations are necessary beyond the regular OPV doses children receive through campaigns. Some of this lack of knowledge may stem from the fact that many families live outside the clinic system with a quarter of children having never been to a clinic and only 8.6% born in facilities. Consequently, health workers have few opportunities to explain immunization. While social and cultural reasons for not vaccinating do affect a number of caregivers, these issues can be difficult to address directly…. [B]ased on the baseline, it seems that for many caregivers, knowledge alone is the biggest barrier." IDinsight, New Incentives Evaluation Baseline Report 2019, Pg 68.
31
“The primary use of the baseline data was to facilitate randomizing clinics into balanced treatment and control groups. IDinsight grouped clinics based on coverage, security status, remoteness/staffing, and state, and then randomly allocated half of the clinics from each group, or stratum, to the treatment arm.” IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 7.
32
New Incentives, RCT Analysis List (unpublished)
33
See IDinsight, New Incentives Endline Pre-analysis Plan 2019, Pg 34.
34
- Regarding the World Bank Conditional Cash Transfer program:
  - "The project has developed a menu of co-responsibilities around health and nutrition, education, environment and productivity (box 1). Each state will, depending on their conditions and priorities, choose their co-responsibility area.” Pg 13.
  - “Box 1. Menu of Co-responsibilities” includes “Immunization of children.” Pg 13.
  - World Bank Project Appraisal Document, National Social Safety Nets Project
- New Incentives, Responses to 13-Feb-2020 Questions from GiveWell (unpublished)
35
This program currently operates in some clinic catchments, and we will attempt to control for it in the regressions producing impact estimates for this RCT.