# Schistosomiasis Blogs

Exploring how to get real change for your dollar.
Updated: 16 min 58 sec ago

### Allocation of discretionary funds from Q1 2018

Mon, 06/04/2018 - 14:46

In the first quarter of 2018, we received $2.96 million in funding for making grants at our discretion. In this post we discuss: • The decision to allocate the$2.96 million to the Against Malaria Foundation (AMF) (70 percent) and the Schistosomiasis Control Initiative (SCI) (30 percent).
• Our recommendation that donors give to GiveWell for granting to top charities at our discretion so that we can direct the funding to the top charity or charities with the most pressing funding need. For donors who prefer to give directly to our top charities, we continue to recommend giving 70 percent of your donation to AMF and 30 percent to SCI to maximize your impact.

Allocation of discretionary funds

The allocation of 70 percent of the funds to AMF and 30 percent to SCI follows the recommendation we have made, and continue to make, to donors. For more discussion on this allocation, see our blog post about allocating discretionary funds from the previous quarter.

We also considered the following possibilities for this quarter:

Helen Keller International (HKI) for stopgap funding in one additional country

We discussed this possibility in our blog post about allocating discretionary funds from the previous quarter. After further discussing this possibility with HKI, our understanding is that (a) the amount of funding needed to fill this gap will likely be small relative to the amount of GiveWell-directed funding that HKI currently holds, and (b) we will have limited additional information in time for this decision round that we could use to compare this new use of funding to HKI’s other planned uses of funding. We will continue discussing this opportunity with HKI and may allocate funding to it in the future. Our current expectation is that we will ask HKI to make the tradeoff between allocating the GiveWell-directed funding it holds to this new opportunity and continuing to hold the funds. Holding the funds gives the current programs more runway (originally designed to fund three years) and gives HKI more flexibility to fund highly cost-effective, unanticipated opportunities in the future. We believe that HKI is currently in a better position to assess cost-effectiveness of the opportunities it has than we are, while we will seek to maximize cost-effectiveness in the longer run by assessing HKI’s track record of cost-effectiveness and comparing that to the cost-effectiveness of other top charities.

We remain open to the possibility that HKI will share information with us that will lead us to conclude that this new opportunity is a better use of funds than our current recommendation of 70 percent to AMF and 30 percent to SCI. In that case, we would allocate funds from the next quarter to fill this funding gap (and could accelerate the timeline on that decision if it were helpful to HKI).

Evidence Action’s Deworm the World Initiative for funding gaps in India and Nigeria

We spoke with Deworm the World about two new funding gaps it has due to unexpected costs in its existing programs in India and Nigeria.

In India, the cost overruns total $166,000. Deworm the World has the option of drawing down a reserve of$5.5 million (from funds donated on GiveWell’s recommendation). The reserve was intended to backstop funds that were expected but not fully confirmed from another funder. Given the small size of the gap relative to the available reserves, our preference is for Deworm the World to use that funding and for us to consider recommending further reserves as part of our end-of-year review of our top charities’ room for more funding.

In Nigeria, there is a funding gap of $1.7 million in the states that Deworm the World is currently operating in. Previous budgets assumed annual treatment for all children, and Deworm the World has since become aware of the existence of areas where worm prevalence is high enough that twice per year treatment is recommended. Our best guess is that AMF and SCI are more cost-effective than Deworm the World’s Nigeria program (see discussion in this post). It is possible that because additional funding would go to support additional treatments in states where programs already operate, the cost to deliver these marginal treatments would be lower. We don’t currently have enough data to analyze whether that would significantly change the cost-effectiveness in this case. Deworm the World also continues to have a funding gap for expansion to other states in Nigeria. We wrote about this opportunity in our previous post on allocating discretionary funding. Malaria Consortium for seasonal malaria chemoprevention (SMC) We continue to see a case for directing additional funding to Malaria Consortium for SMC, as we did last quarter. Our views on this program have not changed. For further discussion, see our previous post on allocating discretionary funding. What is our recommendation to donors? We continue to recommend that donors give to GiveWell for granting to top charities at our discretion so that we can direct the funding to the top charity or charities with the most pressing funding need. For donors who prefer to give directly to our top charities, we are continuing to recommend giving 70 percent of your donation to AMF and 30 percent to SCI to maximize your impact. The reasons for this recommendation are the same as in our previous post on allocating discretionary funding. The post Allocation of discretionary funds from Q1 2018 appeared first on The GiveWell Blog. ### Allocation of discretionary funds from Q4 2017 Fri, 04/06/2018 - 12:02 In the fourth quarter of 2017, we received$5.6 million in funding for making grants at our discretion. In this post we discuss:

• The decision to allocate the $5.6 million to the Schistosomiasis Control Initiative (SCI). • Our recommendation that donors give to GiveWell for granting to top charities at our discretion so that we can direct the funding to the top charity or charities with the most pressing funding need. For donors who prefer to give directly to our top charities, we continue to recommend giving 70 percent of your donation to AMF and 30 percent to SCI to maximize your impact. We noted in November that we would use funds received for making grants at our discretion to fill the next highest priority funding gaps among our top charities. We also noted that our best guess at the time was that we would give 70 percent to the Against Malaria Foundation (AMF) and 30 percent to SCI. Based on information received since November, described below, we allocated the$5.6 million to SCI, rather than dividing these funds between AMF and SCI, as previously expected. GiveWell’s Executive Director, Elie Hassenfeld, the fund advisor on the Effective Altruism Fund for Global Health and Development, also recommended that the fund grant out the $1.5 million that it held to SCI. Update on AMF AMF has been somewhat slower to make commitments to fund distributions of insecticide-treated nets than we expected and our best guess is that its currently available funding will be sufficient to fund all distributions that it is likely to commit to before our next major round of funding allocations in November. Notwithstanding that fact, we continue to believe that AMF has room for more funding. Additional funds would reduce the risk that AMF’s progress will be slowed if it is able to sign several major agreements in the next few months, which, while somewhat unlikely in our estimation, remains a possibility. We wrote in November 2017: Progress at signing new agreements was slow in 2017, leaving AMF with a large amount of funds on hand. We attribute this to the fact that countries spent much of 2017 applying for Global Fund funding and decisions about how much funding would be allocated to LLIN distributions for 2018-2020 and what the funding gaps would be for LLINs were being finalized in many countries as of October 2017. AMF noted that it did not commit to funding distributions earlier in part because GiveWell had asked AMF not to make funding commitments until the size of funding gaps were known. Our expectation had been that the last couple months of 2017 and first months of 2018 would be a period in which AMF would commit a significant portion of its available funding to help fill these gaps because we expected countries to have more visibility into their funding gaps following finalization of Global Fund commitments around October 2017. This has not been the case. AMF recently told us that most of the countries that it was in discussions with did not have visibility into their funding gaps until December 2017, and in some cases it has taken longer than that. In making the decision regarding the fourth quarter discretionary funds, we relied on a document from AMF detailing its signed and potential agreements as of early February. The document noted that AMF had committed to one new distribution since October, in Ghana in 2018. This distribution will cost about$8 million. (We have since learned that AMF has also committed to additional distributions in Papua New Guinea in 2019 and 2020, costing $5.2 million and signed in November 2017, and in Malawi in 2018, costing$10.1 million and signed in mid February.)

AMF’s pipeline of potential future distributions includes both repeat distributions with partners and in countries it has worked with in the past and distributions with new potential partners. AMF has decided to move somewhat slowly with both types of partners. In the case of repeat partners, for several distributions, AMF is waiting to verify that the partner is able to deliver all requested data from distributions that took place in 2017 (and the monitoring that follows each distribution) before agreeing to fund the next round of nets to be delivered in 2020. These decisions seem very reasonable to us, but do result in a short-term decrease in the amount of funding we expect AMF to be able to absorb. When it is ready to do so, AMF could potentially commit up to $50 million to distributions in this category. For the largest potential new partnership that AMF is considering, there are some concerns about in-country capacity and AMF expects to to commit to a smaller-scale distribution (with an estimated cost of$5 million) with the partner and assess the results of that distribution before committing to a larger-scale distribution. AMF is also considering two additional opportunities to commit $5 to$7 million each to distributions with new partners. It could potentially commit tens of millions of dollars to one or more of these countries in future rounds if the initial engagements go well. AMF is also in several early stage conversations about potential distributions with new partners.

According to the document that we relied on for this decision, AMF held $64 million in uncommitted funds, of which$15 million was set aside for “agreement imminent” distributions, leaving $49 million “available to allocate.” Accounting for the additional agreements for Papua New Guinea and Malawi noted above, we estimate that AMF had$49 million in uncommitted funds and $45 million available to allocate as of late February. The combination of somewhat slower progress in signing distributions than expected and our updated understanding of AMF’s pipeline led us to conclude that AMF continues to have room for more funding, but that SCI’s funding needs were more urgent. Our best guess was that the$5.6 million from GiveWell discretionary funds and $1.5 million from the Effective Altruism Fund would have a greater impact if allocated to SCI. Update on SCI In November, we recommended that donors give 30 percent to SCI because SCI had additional room for more funding to sustain its work in its current countries of operation and would need to scale down without additional funding. SCI recently confirmed to us that it would need to cut budgets if it did not receive additional funds before setting its annual budget for April 2018 to March 2019 in March 2018. With AMF having a less urgent funding need than previously expected, we concluded that the best use of the fourth quarter discretionary funds would be to allocate them to SCI. It is also the case that in the last few months of 2017 SCI received less funding than we projected, both from donors influenced by GiveWell’s research and other donors. We believe that SCI will continue to have room for more funding after the two grants totaling about$7 million. Recently, SCI sent us an early version of a budget for its 2018-19 budget year. It includes funding requests from each country program, estimates of country program requests in cases where the country has not yet submitted a request, and estimates of SCI spending on central costs and research costs. We estimate that, assuming the same budget in each of the next three years, SCI’s funding gap for that period, after receiving the grants discussed above, is about $9 million. SCI could likely absorb funding beyond that level, as the budget does not include opportunities it has to expand to additional countries. It also assumes that SCI’s other major funders will continue their support at the same level, and some of this funding may be in doubt. We note that about 13 percent of treatments that would be delivered at this scale would be for adults (discussion of this here). Other possibilities that we decided against Helen Keller International (HKI) for stopgap funding in one additional country In December, Good Ventures, on GiveWell’s recommendation, provided HKI with funding for vitamin A supplementation (VAS) programs in Burkina Faso, Mali, and Guinea. Since then, HKI has learned about an unanticipated funding gap for VAS in another country. As a result, a planned VAS distribution in September may not reach national scale and/or may not include deworming (as is common for VAS campaigns). We are in ongoing conversations with HKI about either HKI allocating some of the Good Ventures funding to this country, or GiveWell providing additional funding to cover the gap. We plan to consider this funding opportunity when allocating discretionary funds from the first quarter of 2018. We expect to hold more than enough in discretionary funds (received in the first quarter of 2018) to fill the potential gap and HKI has told us that more information about the gap will be available in time for that decision. (We grant out funds from the previous quarter about two months after the end of that quarter, after we have fully checked the accuracy of our data and the size of grants). Evidence Action’s Deworm the World for Nigeria The grant that Good Ventures made to Evidence Action for Deworm the World in December 2017, based on our recommendation, did not include sufficient funds to fund expansion of Deworm the World’s work in Nigeria. Deworm the World sought funding for this work and we prioritized other charities’ funding gaps ahead of this work because we modeled the cost-effectiveness of this work as being lower. We noted in November, “its planned work in Nigeria is around three times as cost-effective as cash transfers (though this estimate is based on low-quality information).” We continue to think that AMF and SCI’s marginal uses of funding are likely more cost-effective than Deworm the World’s potential work in Nigeria, but this conclusion is highly dependent on a model that incorporates many highly uncertain values. Malaria Consortium for seasonal malaria chemoprevention (SMC) Our recommendation of Malaria Consortium has resulted in about$30 million in funding for its SMC program since November; however, we believe that there will still be a large funding gap for the program over the next three years. We decided against providing additional funding to Malaria Consortium at this time because of worries about increasing our already very large bet on a program that’s relatively new to us. We are not opposed to increasing this funding level in the future but on balance believe that granting additional funds to SCI is a stronger option at current levels. We’d also note that we’d expect additional funding at this time to go to funding SMC in 2019 and beyond (given the time needed to order drugs and plan programs for the 2018 SMC season) and there is some uncertainty as to the size of the funding gap for SMC in 2019. The program is in a scale-up phase globally and other major funders may increase their contributions to SMC starting in 2019.

What is our recommendation to donors?

We continue to recommend that donors give to GiveWell for granting to top charities at our discretion so that we can direct the funding to the top charity or charities with the most pressing funding need. For donors who prefer to give directly to our top charities, we are continuing to recommend giving 70 percent of your donation to AMF and 30 percent to SCI to maximize your impact.

As part of the process we went through to decide where to allocate these funds, we also discussed whether we should update our recommendation for donors who prefer to give directly to our top charities. We ultimately decided that because updating that recommended allocation is a difficult and time-consuming process because of the additional research and internal discussions involved and because, relatively speaking, few dollars follow this recommendation outside of giving season, we plan to update that allocation only once each year (in November) unless we believe our previously recommended allocation is clearly suboptimal.

In this case, we believe that the $7 million in grants to SCI roughly brings the situation back in line with where it was in November, with AMF and SCI having the next most impactful funding gaps and it being difficult to distinguish on the margin between the quality of AMF and SCI’s funding gaps. SCI has better modeled cost-effectiveness, while AMF appears to be better on several qualitative factors, including monitoring of program performance. The post Allocation of discretionary funds from Q4 2017 appeared first on The GiveWell Blog. ### Questioning the evidence on hookworm eradication in the American South Thu, 12/07/2017 - 09:40 Summary • Four of GiveWell’s top charities support deworming—the mass distribution of medicines to children in poor countries to rid their bodies of schistosomiasis, hookworm, and parasites. • GiveWell’s recommendation relies primarily on research from western Kenya finding that deworming in childhood boosted income in adulthood. GiveWell has also placed weight on a study by Hoyt Bleakley of the hookworm eradication effort in the American South 100 years ago. • I reviewed the Bleakley study and reach a different conclusion than he did: the deworming campaign in the American South did not coincide with breaks in long-term trends that would invite eradication as the explanation. • GiveWell research staff took the conclusions of this post into account when updating their recommendations for the 2017 giving season. GiveWell continues to recommend deworming charities. • I also reviewed a separate Bleakley study of the impacts of malaria eradication in the United States, Brazil, Colombia, and Mexico. My reading there is more supportive. I’m finalizing the write-ups now and will share them soon. Introduction After the latest refresh, GiveWell’s list of top charities includes four that support deworming—the mass distribution of medicines to children to rid their guts of certain parasites. Several dozen randomized studies measure the short-term effects of deworming programs (within a year or so) on everything from body weight to being in school.1The 2016 Campbell review finds 52 short-term studies with follow-up duration under five years. Most last one to two years. jQuery("#footnote_plugin_tooltip_1").tooltip({ tip: "#footnote_plugin_tooltip_text_1", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] }); If intestinal worms were often fatal, then short-term gains against them might be measured in lives saved, which could on its own make a decisive case for deworming. But the symptoms are normally subtler. On the other hand, some research finds that the aftereffects last into adulthood. This is why the long-term effects of deworming dominate GiveWell’s estimates of the cost-effectiveness of charities that support it. Unfortunately, only a handful of experimental studies assess deworming’s impacts over the long haul, and most of those are based on a single experiment in Kenya. For summaries, see this 2016 post, in the section entitled “The research on the long-term impacts of deworming.” This paucity of experimental evidence has led GiveWell to place weight on a non-experimental, historical study of deworming. Hoyt Bleakley‘s 2007 paper tracks the impacts of the campaign to eradicate hookworm from the American South a century ago. As part of an ongoing effort to scrutinize the evidence on the long-term impacts of deworming (this, this), GiveWell worked over the past year to revisit the Bleakley study. With huge assists from Christian Smith, Zachary Tausanovitch, and Claire Wang, I have formed a fresh and critical assessment of the evidence. The hookworm eradication effort in the American South did not coincide with breaks in long-term trends that would invite eradication as the explanation. For example, after the eradication campaign, outcomes such as school attendance indeed rose faster for children in historically worm-endemic areas, which could be taken as a sign of success. But that trend began decades before eradication. The full write-up is in this new working paper. As John D. Rockefeller, arguably the richest human in history, entered philanthropy just over a century ago, he was persuaded to back large-scale, scientifically informed public health campaigns—not unlike Bill Gates in our era. In 1910, he gave$1 million to create the Rockefeller Sanitary Commission for the Eradication of Hookworm Disease. Across eleven southern states from North Carolina to Texas, the RSC soon launched what today would be called the War on Worms. Drugs were dispensed to treat infected children. Doctors, teachers, and the public were educated about the importance of sanitation, especially the use of proper privies.

From a researcher’s point of view, the suddenness and success of the campaign, and its broad geographic sweep, offer hope for credible impact assessment. If, for example, school attendance rates jumped just as infection rates plunged, that could be a compelling sign of the knock-on effects of mass deworming of children. The Bleakley (2007) study recognizes and exploits this opportunity for impact assessment. Paralleling the modern research out of Kenya, the study finds that after the RSC campaign, children in formerly worm-afflicted areas went to school more (a short-term development) and earned more as adults (a long-term effect).

In this post, I’ll explain how the GiveWell reanalysis of the Bleakley (2007) hookworm research differs from Bleakley’s original. Then I will show you some graphs that tell most of the analytical story.

I have also reviewed the related Bleakley (2010) study of the impacts of malaria eradication in the United States, Brazil, Colombia, and Mexico. There, my conclusion is more positive. I hope to release and blog that review in the next few weeks. Update: done.

What we did

The reanalysis of the Bleakley (2007) hookworm study included the following steps:

• Returning to primary sources to reconstruct the data set. The data and computer code for the study are not publicly available. In correspondence starting a year ago, Hoyt Bleakley stated that they are effectively inaccessible now. Re-gathering the data was a major undertaking because Bleakley culled nearly 50 variables from obscure, century-old books and articles. Some, such as the student-teacher ratio in each county of the eleven southern states, were found in state government reports that varied in completeness and reporting conventions. Christian Smith, Claire Wang, and, especially, Zachary Tausanovitch, poured many hours into this effort.
• Expanding the census data sets. Bleakley (2007) tracks outcomes such as school attendance, literacy, and income using U.S. census data. These come to us not from old books, but from the IPUMS online database. Until recently, all the IPUMS data sets were samples from a given year’s census records, taking, for example, one household from every fifth page of the enumeration. (Here’s a sample page from 1920 with my great-grandparents and family in rows 3–6.) When carrying out this research in 2003–05, Bleakley appears to have used the biggest sets then available, such as the 1-in-250 sample from the 1910 census and the 1-in-100 sample from 1920. No data were then to be had from 1930. The GiveWell reanalysis takes advantage of the newer, bigger samples, including preliminary 100% samples for 1910–40. In aggregate, the new data set is about 100 times larger than that in Bleakley (2007).
• Copying choices from one Bleakley (2007) table or figure to another. For example, one table in the paper estimates impacts on school enrollment, school attendance, and literacy. A corresponding figure, discussed soon, only depicts impacts on attendance. In the new paper, I rerun the figure for all three outcomes.
• Imposing an arguably tougher standard for proof of impact. I concur with Bleakley that after the eradication campaign swept through the South in 1911–14, prospects improved disproportionately for children born in areas historically prone to hookworm. This catch-up, or convergence, surfaces in the data whether comparing counties within the South (low-lying counties tended to have more hookworm than mountainous ones), or comparing southern states to other states. But that observation alone leaves me unconvinced that ridding children’s bellies of hookworm was the cause. What if the trend began well before eradication or continued well after? I therefore focus on this question: Did convergence temporarily accelerate in tandem with eradication? The Bleakley (2007) tables and figures do not approach this question so aggressively.

We shared drafts of the paper and this post with Hoyt Bleakley. This did not yield any additional insight into why our analysis differs from the original.

The short-term impact on schooling

The figure below, adapted from one edition of the Bleakley study, illustrates the finding that I just mentioned, that after eradication, school attendance surged among kids living where hookworm had been common.2Versions of Bleakley (2007) appeared in the Quarterly Journal of Economics, a World Bank report, and the site of the National Center for Biotechnology Information. They are nearly the same. jQuery("#footnote_plugin_tooltip_2").tooltip({ tip: "#footnote_plugin_tooltip_text_2", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] }); I will convey the gist of the figure first, then explain it more precisely. You can see that the central red line stays essentially flat from 1870 to 1910. Then it jumps to about zero between 1910 and 1920, census years bracketing the Rockefeller campaign. Thereafter, the red line mostly again holds steady. The one-time jump looks like a fingerprint of eradication.

What does the red line mean exactly? For each census round with available data between 1870 and 1950, Bleakley (2007) computes the association within Southern counties between the school attendance rate of 8–16-year-olds and the hookworm infection rate as measured at the start of eradication, circa 1910.3The regressions for each census year control for the interactions of sex and race on the one hand and age on the other. They do not include the other Bleakley (2007) controls. Samples are restricted to eleven Southern states. The unit of observation is the State Economic Area, which is an aggregation of several counties. jQuery("#footnote_plugin_tooltip_3").tooltip({ tip: "#footnote_plugin_tooltip_text_3", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] }); That the red line starts around –0.1 in 1870 means that on average, if a county’s child hookworm infection rate was 10 percentage points higher when measured around 1910, its school attendance rate in the 1870 census was 1 percentage point lower. More plainly, counties with more worms in kids had fewer kids in school. But between the 1910 and 1920 censuses, that bad-news association abruptly faded. As of 1920, a child in a historically high-hookworm county was no less likely to be in school. The black, dashed lines show confidence intervals for these census-by-census estimates—probably 95% confidence, but I cannot tell for sure.

Here is the best replication of that graph using the reconstructed data and code. I have drawn it differently to emphasize that we only have data from certain decennial censuses, and to depict the gradations of confidence within the 95% confidence intervals.4The 1890 census records were destroyed in a fire. 1930 records had not been digitized at the time Bleakley did this work. jQuery("#footnote_plugin_tooltip_4").tooltip({ tip: "#footnote_plugin_tooltip_text_4", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });

I discern a resemblance between the original graph and the reconstruction. In both, school enrollment rises especially quickly between 1910 and 1920 and then declines slightly. But there is a difference too, and it is more than cosmetic. Now it appears that children in hookworm-infested areas gained substantially on school attendance not just between 1910 and 1920 but between 1880 and 1900 as well—and maybe throughout 1880–1910. For lack of access to Bleakley’s data and code, I cannot explain the discrepancy between this reconstruction and the original. There could be an error in the new or the old, or some subtle difference in data or method.

The new graph’s ambiguous mix of confirmation and contradiction forces a question that is at once conceptual and practical. How do we systematically judge whether the signal of hookworm eradication is present amidst the noise of other influences? To what degree does the new graph confirm or contradict the old?

I think there is no one best way to answer that question. One approach that I took is depicted with the red lines in the reconstructed graph above, and in the p values in the bottom-left corner. I drew the red lines to connect the dots that surround the eradication campaign. I wanted to quantify how much the red contour bends upward in 1910 and downward in 1920—as in Bleakley’s graph—and with what statistical significance. That is: Suppose the education gains took place at a constant pace between 1900 and 1940 with no acceleration around the campaign in the early 1910s. (I would have substituted 1930 for 1940 were 1930 data available in this graph.) What is the chance that we would see as much bending in the red line as we do? The computer says that for the upward kink at 1910, the answer is 0.37, which is not very low. On the other hand, the deceleration around 1920 is quite hard to ascribe to pure chance, at p = 0.03. Still, the new graph casts doubt on the proposition that the campaign brought a big break with the past.

Having settled on an analytical approach, the next step was to add all the census data that has been digitized since Bleakley did his work. This brings an obvious change (see below; now that 1930 data are included, I extend the third red line only that far). Now it looks far more as though the high-hookworm parts of the South began closing the schooling gap with low-hookworm parts around 1880, some 30 years before the hookworm campaign:

In a final test, I recomputed the graph while incorporating all the Bleakley (2007) control variables. Hookworm eradication was hardly a clean experiment, in the sense that the geographic reach of the disease was not random going in. The South had it more than the rest of the country; within the South, the coastal plains had it most. If the beneficiaries of eradication differed systematically from the rest before eradication, they could continue to differ after for reasons having little to do with hookworm prevention, creating a false appearance of impact. Striving to statistically remove such initial differences, Bleakley (2007) introduces into some of the regressions an aggressive set of controls. They relate to education, health, agriculture, and race. The paper includes these controls in some of the schooling regressions reported in a table, but does not bring them to the schooling graph shown above. It turns out that doing so (in our expanded-data graph) removes most signs of any long-term gains:

The lack of upward trend here does not mean that the historically hookworm-burdened parts of the South did not after all close a schooling gap between 1880 and 1920. It does suggest that the closure was correlated with, and therefore potentially caused by, the non-hookworm factors that Bleakley sometimes controls for.5Consistent with this graph, while the Bleakley (2007) full-controls regressions continue to put a statistically significant coefficient on the treatment proxy, the reconstructions do not. This is one of the few cases where the original results are not recognizable in the reconstruction. See Table 6, panel B, of the new paper. jQuery("#footnote_plugin_tooltip_5").tooltip({ tip: "#footnote_plugin_tooltip_text_5", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });

In sum, I do not see robust evidence that schooling and literacy improved at an historically anomalous rate circa 1910, in a way naturally attributable to hookworm eradication.

The long-term impact on earnings

What the first half of the Bleakley (2007) study does for short-term impacts on schooling, the second does for long-term impacts on earnings. Here too, the conclusion is encouraging. “Long-term follow-up,” writes Bleakley, “indicates a substantial income gain as a result of the reduction in hookworm infection.” This finding resonates strongly with the GiveWell cost-effectiveness analysis, which makes a key assumption about how much deworming children boosts future income. The number we use for that impact comes from modern, experimental research in Kenya; yet Bleakley’s inference from American history had boosted our confidence in the Kenya number. (That said, GiveWell has discounted the Kenya number by 80–90% out of fear that it won’t replicate to other settings.6See the “Replicability adjustment for deworming” row of the “Parameters” tab of the cost-effectiveness analysis spreadsheet. jQuery("#footnote_plugin_tooltip_6").tooltip({ tip: "#footnote_plugin_tooltip_text_6", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });)

The Bleakley (2007) graph I will focus on draws together data from censuses as ancient as 1870 and modern as 1990. One problem with measuring impacts on income over this span is that not until 1940 did Census takers begin asking people how much money they made. For this reason, the IPUMS census database provides proxies for income that reach back farther. One is the occupational income score (OIS), which is, approximately speaking, the average income in 1950 associated with a person’s census-reported profession. Thus, if lawyers averaged $10,000 in income in 1950, then any self-described lawyer between 1870 and 1990 is taken to earn that much. The OIS is expressed in hundreds of dollars of 1950, and is an example of an index of “occupational standing.” Before scrutinizing the evidence of long-term impacts on occupational standing, I need to describe a twist that Bleakley (2007) introduces in moving from short- to long-term. As one tries to follow people over longer periods of time, the analytical tack that Bleakley took for schooling starts to break down. For it looks at how the people in given places fared over time. The problem is that sometimes people move—across the state or across the country. And in this analytical set-up, the researcher does not follow them. If deworming gave children in coastal Georgia more agency in life—better health, more education—perhaps they exercised that agency by moving to Atlanta. If we only looked at the incomes of the people who stayed behind, we would miss the full story. To minimize this attrition from migration, in studying long-term impacts, Bleakley (2007) groups census records not by place of residence at the time of census, but by place of birth. Then, if a person was born in Georgia in 1915, showed up in the census in 1940 as a bricklayer in Atlanta, in 1950 as a general contractor in Lexington, and in 1960 as the manager of a construction company in Phoenix, all three census records would be associated with Georgia in 1915. After organizing the data this way, Bleakley (2007) could study whether children born in certain areas after eradication went on to earn more than those born in the same places before eradication. Reorganizing the data this way generates two ripple effects. First, while census takers record place of residence with extreme precision, they only record place of birth by state. We cannot differentiate people by whether they were born in hookworm-prone areas within, say, Mississippi. We can only differentiate by whether they were native to a historically high-hookworm state such as Mississippi or a low-hookworm one such as Michigan. Thus, while the short-term analysis compares counties within 11 southern states, the long-term analysis compares states across the continental U.S. The second ripple effect is that the data come to us at higher temporal resolution: by birth year, not census decade. In response, Bleakley (2007) hypothesizes that how much hookworm depressed adult earnings depended on the percentage of one’s childhood spent where it was endemic. If we take eradication to have occurred in 1910 and assume with Bleakley (2007) that childhood lasts 19 years, then babies born in or before 1891 would have reached adulthood before eradication, too soon to benefit. Babies born in endemic areas in 1892 would have been helped for one year (between their 18th and 19th birthdays); in 1893 for two; and so on. Those born in 1910 or later stood to reap the full 19 years of benefit. Bleakley (2007) therefore hypothesizes that the impact of eradication follows a sort of diagonal step shape with respect to birth year. The step starts rising in 1891 and stops in 1910. Bleakley depicted that contour with dashed lines in this figure: As you can see, Bleakley (2007) fit this contour to data, to see how well it could explain historical patterns. These dots are derived much as in the earlier Bleakley (2007) figure. For example, the leftmost dot is for the year 1825, and has a vertical coordinate of about –2. That means that among people born in 1825, being native to a state whose hookworm infection rate circa 1910 was 10 percentage points (0.1) higher corresponded to having an Occupational Income Score 0.20 lower. That means$20/year less income throughout adulthood, in the dollars of 1950. The graph shows that this association was generally negative in the mid-19th century and generally positive after 1910: formerly, coming from a hookworm zone depressed lifetime earnings. And the graph suggests that the transition followed the step pattern expected if the cause was hookworm eradication.

Below is my best reproduction of that graph. As before, I have plotted both the dots and their 95% confidence intervals. I have avoided superimposing the step-like contour the way Bleakley (2007) does because I worry that it tricks the eye into believing that the contour fits the data better than it really does. But I have marked the years when the contour kinks, 1891 and 1910:

Here is the same graph when using the 100-times-bigger census data sets now available7In addition to adding data, this version mimics the rest of the Bleakley (2007) analysis in adding blacks and in fitting directly to census microdata rather than aggregates, in order to include controls for race, sex, and their interaction. jQuery("#footnote_plugin_tooltip_7").tooltip({ tip: "#footnote_plugin_tooltip_text_7", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });:

And here is the graph when I copy Bleakley (2010) in incorporating all the controls for cross-state differences in health and health policy, education policy, and other traits8As I noted, when looking at short-term impacts on education, Bleakley (2007) does not plot a graph while incorporating all controls. But now, when looking at long-term impacts on occupational standing, Bleakley (2007) does also include such a graph. See the bottom right of this figure. jQuery("#footnote_plugin_tooltip_8").tooltip({ tip: "#footnote_plugin_tooltip_text_8", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });:

Does it look to you like the upward trends in these last two graphs accelerated around 1891 and decelerated around 1910, as predicted by the Bleakley (2007) theory? To me, I have to say, not much. The climbs look steady and longer-term.

Since “not much” is muddy, I moved once again to formalize my interpretation. In analogy with my earlier graphs for schooling, I fit lines to the data points in the 19 years between 1891 and 1910, as well as to the 19 years on either side. Then I checked whether any bending in 1891 and 1910 is statistically significant. The final two graphs fit lines to the dots in the previous two. The dots in these next graphs are the same as in the previous two. It doesn’t look that way because I erased the grey confidence bars in order to expand the vertical scales.

In the both graphs the trend looks quite straight over the three generations surrounding the eradication campaign. The p values, shown in the bottom-right of each plot, are high.

Conclusion

Reanalyzing the Bleakley (2007) study left me unconvinced that the children who benefited from hookworm eradication went to school more or earned more as adults. Conceivably, if I had access to the original data and code, the confrontation with the reconstructed versions would expose errors in the the new version that would alter my view. But this seems unlikely. The new census data sets are much bigger than the old, which improves precision. And most of the differences probably do not arise from clear-cut errors on either side, but from minor differences in implementation, such as taking education spending from a different edition of an annual government report. If the conclusions swing on such modest and debatable discrepancies, then they are not robust and reliable.

Finally, even if the two versions of the data matched exactly, I might still disagree on interpretation, since I use tests, illustrated above, that focus more exclusively on whether the time trends contain the temporal fingerprint of hookworm eradication. For me, that fingerprint is characterized not merely by once-high-hookworm areas catching up with low-hookworm ones, but catch-up that accelerates and decelerates at times that fit the timing of the eradication campaign.

The data and code for this study are here (2 GB). The full write-up is here.

Notes   [ + ]

1. ↑ The 2016 Campbell review finds 52 short-term studies with follow-up duration under five years. Most last one to two years. 2. ↑ Versions of Bleakley (2007) appeared in the Quarterly Journal of Economics, a World Bank report, and the site of the National Center for Biotechnology Information. They are nearly the same. 3. ↑ The regressions for each census year control for the interactions of sex and race on the one hand and age on the other. They do not include the other Bleakley (2007) controls. Samples are restricted to eleven Southern states. The unit of observation is the State Economic Area, which is an aggregation of several counties. 4. ↑ The 1890 census records were destroyed in a fire. 1930 records had not been digitized at the time Bleakley did this work. 5. ↑ Consistent with this graph, while the Bleakley (2007) full-controls regressions continue to put a statistically significant coefficient on the treatment proxy, the reconstructions do not. This is one of the few cases where the original results are not recognizable in the reconstruction. See Table 6, panel B, of the new paper. 6. ↑ See the “Replicability adjustment for deworming” row of the “Parameters” tab of the cost-effectiveness analysis spreadsheet. 7. ↑ In addition to adding data, this version mimics the rest of the Bleakley (2007) analysis in adding blacks and in fitting directly to census microdata rather than aggregates, in order to include controls for race, sex, and their interaction. 8. ↑ As I noted, when looking at short-term impacts on education, Bleakley (2007) does not plot a graph while incorporating all controls. But now, when looking at long-term impacts on occupational standing, Bleakley (2007) does also include such a graph. See the bottom right of this figure. function footnote_expand_reference_container() { jQuery("#footnote_references_container").show(); jQuery("#footnote_reference_container_collapse_button").text("-"); } function footnote_collapse_reference_container() { jQuery("#footnote_references_container").hide(); jQuery("#footnote_reference_container_collapse_button").text("+"); } function footnote_expand_collapse_reference_container() { if (jQuery("#footnote_references_container").is(":hidden")) { footnote_expand_reference_container(); } else { footnote_collapse_reference_container(); } } function footnote_moveToAnchor(p_str_TargetID) { footnote_expand_reference_container(); var l_obj_Target = jQuery("#" + p_str_TargetID); if(l_obj_Target.length) { jQuery('html, body').animate({ scrollTop: l_obj_Target.offset().top - window.innerHeight/2 }, 1000); } }

The post Questioning the evidence on hookworm eradication in the American South appeared first on The GiveWell Blog.

### Our top charities for giving season 2017

Mon, 11/27/2017 - 14:33

This year, we added two new top charities, Evidence Action’s No Lean Season program and Helen Keller International’s vitamin A supplementation program, and retained our seven top charities from 2016. We also added Evidence Action’s Dispensers for Safe Water program to our list of standout charities.

We recommend that donors give to GiveWell for granting to top charities at our discretion so that we can direct the funding to the top charity or charities with the most pressing funding need. For donors who prefer to give directly to our top charities, we recommend giving 70 percent of your donation to the Against Malaria Foundation (AMF) and 30 percent to the Schistosomiasis Control Initiative (SCI) to maximize your impact. We expect Good Ventures, a foundation with which we work closely, to provide significant support to each top charity; our recommendation to give to AMF and SCI is based on how much good we believe additional donations can do.

Our top charities and recommendations for donors, in brief

Top charities

We now have nine top charities. They are:

Standout charities

We also provide a list of standout charities. We believe they are implementing programs that are evidence-backed and may be extremely cost-effective. However, we do not feel as confident in the impact of these organizations as we do in our top charities.

Conference call to discuss recommendations

We are planning to hold a conference call at 1:30pm ET/10:30am PT on Thursday, November 30 to discuss our charity recommendations and answer your questions.

If you’d like to join the call, please register using this online form. If you can’t make this date, but would be interested in joining another call at a later date, please indicate this on the registration form.

Below, we provide:

• An explanation of changes to our recommended charity list and of major changes to our review process in the past year that are not specific to any one organization. More
• A discussion of our approach to determining how much funding charities can use effectively (“room for more funding”) and our ranking of charities’ funding gaps. More
• Reasoning behind how we have ranked charities’ funding gaps. More
• Details about each of our new top charities, including an overview of what we know about their work and our understanding of their funding needs. More
• Details about each of the top charities we are continuing to recommend, including an overview of their work, major changes over the past year, and our understanding of their funding needs. More
• A brief overview of each of our standout charities. More
• The process we followed that led to these recommendations. More
• An update on giving to support GiveWell’s operations versus giving to our top charities. More
Major changes in the last 12 months

Major changes to our recommended charities list and review process over the past year include:

• Overall, we believe our top charities are able to absorb more funding than they could in previous years. This is a result both of recent additions to the top charities list with large funding gaps (particularly Malaria Consortium) as well as expansion by top charities that have been on the list for a longer time (particularly Deworm the World and AMF).

We expect overall “room for more funding” to continue to expand as we gain more confidence in recently-added top charities and continue to add new top charities, particularly through GiveWell Incubation Grants, our program to grow the pipeline of potential future top charities and improve our understanding of our current top charities.

• We added two new programs to our list of top charities: vitamin A supplementation (VAS) and seasonal migration subsidies. We have not previously recommended charities that work on these programs.

We had considered VAS a priority program for a number of years but had not found an organization that was able to answer our key questions. While we have some remaining questions, we can now make a strong case for supporting HKI’s work on VAS.

We initially supported No Lean Season through GiveWell’s Incubation Grants program. No Lean Season is the first organization we have added to our top charity list through our Incubation Grants program.

• Last year, the charities we recommended on the margin were estimated to be about three times as cost-effective as unconditional cash transfers, the program implemented by top charity GiveDirectly. This year, we believe that the charities we are recommending on the margin are about six times as cost-effective as cash transfers. For the most part, this change was due to (a) a series of small adjustments to our cost-effectiveness model and (b) changes in which individuals contribute to the model and the values entered into the model by these and other contributors.

We now feel fairly confident that there will be large amounts of room for more funding in this range. As more time has passed without identifying opportunities that are considerably more cost-effective than this, we have become more pessimistic about finding such opportunities. Our current best guess is that, if they exist, they will be in the area of policy advocacy in developing countries, on issues like lead regulation and tobacco taxation. We intend to do further research in those areas.

• We made a significant change to our cost-effectiveness analysis to more formally incorporate adjustments for the way in which our top charities’ funding affects funding from other sources by (a) attracting more resources to the programs they work on (e.g., governments contributing staff time to support implementation of the programs) or (b) displacing resources that would have otherwise supported the programs. We will be writing more about this in a future post.

• We continued to analyze the complex evidence base for deworming (treating intestinal parasites), the program implemented by four out of our nine top charities.

At the end of 2016, David Roodman, a Senior Advisor to GiveWell, conducted a detailed review of the core evidence underlying our deworming recommendation (blog posts here and here).

This year, we saw new follow-up results on the main study that leads us to recommend deworming, which continued to show similar long-term impacts of deworming on adult earnings as were estimated previously.

Further investigation and updates based on new data led us to believe that two deworming studies (Croke 2014 and Bleakley 2007) no longer provide substantial support for the theory that deworming has long-term impacts. We plan to write more about this in the future. All together, this work led us to the same conclusion about deworming: that it is a reasonable bet to take based on its strong cost-effectiveness (which incorporates our uncertainty about the impact).

Room for more funding analysis

Types of funding gaps

In the last two years, we used a framework of “capacity funding” and “execution levels” to compare funding gaps (unfilled funding needs) across charities. This framework was intended to capture whether funding would enable a charity to expand or grow in important ways and how likely it was, in our estimation, that each top charity would be constrained by funding in the next year.

We developed this approach in response to a situation where we expected to direct more funding to several of our top charities than they would be able to use (commit or spend) in that year. We used capacity funding to describe opportunities to increase the amount of funding a charity might be able to absorb in the future (by, say, investing in expanding to a new location) and execution levels to describe the likelihood, down to the 5 percent level, that a charity would be able to make use of additional funding before encountering non-funding bottlenecks to their work.

This year, because we have added new top charities and most of our other top charities have more room for more funding than in previous years, we expect that the funding we will direct to each organization will not reach the level where they will encounter significant non-funding bottlenecks. As a result, we have moved away from describing capacity funding and execution levels.

Ranking funding gaps

The first million dollars to a charity can have a very different impact from the 20th million dollars. Accordingly, we have created a ranking of individual funding gaps that accounts for our best guess of the impact of additional funds at each level.

The below table lays out our ranking of funding gaps, up to $75.7 million in total funding. We expect Good Ventures to give$75 million to GiveWell’s top charities this year, so this table is our recommendation to Good Ventures, plus the allocation of funding that GiveWell holds to allocate at its discretion (currently $0.7 million). We then discuss our recommendation for all other donors. The Open Philanthropy Project, which was incubated at GiveWell but is now a separate organization, plans to write more soon about the reasons for Good Ventures increasing its support of GiveWell top charities from$50 million last year to $75 million this year. In short, the amount was based on discussions about how to allocate funding across time and across cause areas. It was not set based on the total size of top charities’ funding gaps or the projection of what others would give. Charity Description Amount (millions) All top charities Incentive grants:$2.5 million per charity 22.5 All standout charities Standout grants: $100,000 per charity 0.7 Deworm the World Funding gaps in India and Kenya over the next three years (including central costs) 3.0 Helen Keller International Funding gaps over three years in Burkina Faso, Mali, and Guinea—countries that have missed recent vitamin A campaigns due to lack of funding 4.7 No Lean Season Full funding gap over three years for implementing the program in Bangladesh 9.0 Deworm the World Three years of funding for a new program in Pakistan and reserves to protect against funding shortfalls in India 10.4 Malaria Consortium Part of the funding gap for SMC in Burkina Faso, Nigeria, and Chad over the next three years 25.4 In total, we are recommending that Good Ventures make the following grants: • Malaria Consortium’s seasonal malaria chemoprevention program:$27.9 million
• Evidence Action’s Deworm the World Initiative: $15.2 million. We are also recommending that GiveWell’s Board of Directors grant the$0.7 million in discretionary funds that we currently hold from the third quarter (from donors who selected to give to “Grants to recommended charities at GiveWell’s discretion” on our donation form) to Deworm the World, bringing the total to $15.9 million. • Evidence Action’s No Lean Season program:$11.5 million
• Helen Keller International’s vitamin A supplementation program: $7.2 million • Schistosomiasis Control Initiative:$2.5 million
• Against Malaria Foundation: $2.5 million • Sightsavers’ deworming program:$2.5 million
• END Fund’s deworming program: $2.5 million • GiveDirectly:$2.5 million

Our recommendation to donors

For donors who are interested in directing funding to whichever recommended charity or charities GiveWell believes has the most pressing funding need at the time the funds are granted, we recommend giving to “Grants to recommended charities at GiveWell’s discretion.” These grants will respond to the greatest funding need we see; they may not match the recommended allocation outlined below.

For donors (other than Good Ventures) who are interested in donating directly to our top charities, we recommend splitting your donation as follows:

• 70 percent to the Against Malaria Foundation
• 30 percent to the Schistosomiasis Control Initiative
Why these recommendations?

Our recommendations to donors, including Good Ventures, are based on:

1. Overall cost-effectiveness of the charity. Our cost-effectiveness model is a key input into our decision-making process, and large differences in modeled cost-effectiveness impact our recommendations. We try not to put significant weight on relatively small differences in cost-effectiveness according to the model because many inputs are highly uncertain.

Our model this year found relatively small differences between many top charities, with Deworm the World at ~12 times as cost-effective as cash transfers, four top charities in the ~6-10x cash transfers range, and three top charities in the ~3-5x cash transfers range. We consider differences between charities implementing the same intervention or interventions that have similar inputs and output in the model more meaningful (e.g., malaria nets and seasonal malaria chemoprevention) than differences between charities implementing quite different interventions.

We have completed a sensitivity analysis of our cost-effectiveness analysis to get a better sense for which parameters are most sensitive. We are more hesitant to consider differences in the cost-effectiveness as meaningful when they rely on very sensitive inputs.

2. Cost-effectiveness of particular funding opportunities. Charities’ work can vary significantly in cost-effectiveness across locations due to different costs, disease burdens, uptake in the targeted population, or probability that other funders would step in in GiveWell’s absence. While not a part of our formal cost-effectiveness model, we ran supplementary analyses of cost-effectiveness for some locations for which our top charities were seeking additional funding and considered the output as part of our prioritization of funding gaps.
3. Qualitative factors not captured in our cost-effectiveness model. The main factors we focused on were:
• Proportion of the global funding need for the program that is filled. We expect that funders will generally (but imperfectly) select the areas where cost-effectiveness is higher first, leaving the areas with higher costs, lower disease burden, lower cultural acceptance of the program, etc. for last. We believe we have captured some of the consequences of this in our cost-effectiveness analysis. For example, we use national level disease burden estimates for the countries in which each charity has worked and/or plans to work; charities working in higher burden countries are therefore modelled as more cost-effective. But we do not use sub-national estimates to distinguish the highest priority regions within a country; if charities are filling the lowest priority funding gaps within a county, they will likely be less cost-effective than our model suggests. This was an important consideration in comparing AMF and Malaria Consortium. We estimate that ~80 percent of the global funding need for nets (the program AMF implements) has been filled, and ~35 percent of the global funding need for seasonal malaria chemoprevention (the program Malaria Consortium implements).
• Our level of knowledge about the organization. We have recommended AMF, Deworm the World, SCI, and GiveDirectly for many years. We know less about Malaria Consortium and No Lean Season and the least about HKI. We seek to be somewhat conservative about recommending large amounts of funding to organizations where there is a relatively high chance that additional research could lead us to believe the program was less cost-effective than we previously thought.
• Ease of communication with the organization. It is important to us that we are able to learn over time about the charities we recommend, to enable us to improve our decisions. The ability to communicate effectively with an organization is a key factor in our ability to learn from the organization’s experiences.
• Ongoing monitoring and likelihood of detecting future problems. Evaluating an organization’s monitoring processes and results is an important part of our charity reviews and for the most part is not captured in our cost-effectiveness analysis. As with ease of communication, we have more confidence in recommending funds to an organization if we believe that we will learn about how successful its work has been.

Summary of key considerations for top charities

The table below summarizes the key considerations for our nine top charities. More detail is provided below, as well as in the charity reviews.

Estimated cost-effectiveness (relative to cash transfers) Our level of knowledge about the organization Primary benefits of the intervention Ease of communication Ongoing monitoring and likelihood of detecting future problems Room for more funding, after expected funding from Good Ventures and donors who give independently of our recommendation Other major considerations AMF ~6x High Deaths averted and possible increased income in adulthood Strong Strong High: could absorb tens of millions of dollars High proportion (~80%) of global gap for program is filled Malaria Consortium (SMC program) ~7x Moderate Under-5 deaths averted and possible increased income in adulthood Strong Strong High: could absorb tens of millions of dollars Relatively low proportion (~35%) of global gap for program is filled Helen Keller International (VAS program) ~9x Moderate Under-5 deaths averted Strong Moderate High: could absorb tens of millions of dollars Learning benefits Deworm the World ~12x High Possible increased income in adulthood Strong Strong Moderate: could absorb millions of dollars END Fund (deworming program) ~4x Moderate Possible increased income in adulthood Moderate Moderate Moderate: could absorb millions of dollars SCI ~10x High Possible increased income in adulthood Moderate Moderate High: could absorb tens of millions of dollars Sightsavers (deworming program) ~5x Moderate Possible increased income in adulthood Moderate Moderate Moderate: could absorb millions of dollars No Lean Season ~5x Moderate Immediate increase in consumption Strong Moderate Low: further funding would be used for different types of activities Potential upside GiveDirectly Baseline High Immediate increase in consumption and assets Strong Strong Very high: could absorb over 100 million dollars

Reasons for this funding gap ranking

Prioritization of funding that we have recommended to Good Ventures (we recommend Good Ventures fill the highest-priority funding needs first, to ensure these are funded):

1. We start by recommending that each top charity receive $2.5 million as an “incentive grant.” These grants are intended to be a major contribution to the charity’s work in recognition of the fact that they have met GiveWell’s criteria and have dedicated significant time to working with us to help us follow their progress and plans each year. We don’t want our top charity funding process to be winner-takes-all because we believe that charities would be less likely to want to participate in that case. 2. After incentive grants, we believe the next most valuable funding to provide is for Deworm the World’s work in Kenya and India over the next three years. Deworm the World’s work in Kenya and India is the most cost-effective opportunity we have found. We estimate that its work in Kenya is ~20x as cost-effective as cash transfers and in India is ~30x+ as cost-effective as cash transfers. 3. We rank providing funding to our two new top charities, Helen Keller International (HKI)’s VAS program and No Lean Season, next. We estimate that HKI could use$7.2 million over three years to support VAS campaigns in countries with high child mortality rates that have recently missed campaigns due to lack of funds. HKI’s cost-effectiveness is at the high end of the range for top charities (~9x cash transfers). We believe HKI could absorb more than $7.2 million in additional funding for VAS effectively but that this$7.2 million gap is likely more cost-effective than HKI’s average cost-effectiveness. Also, because HKI is a new top charity of ours, we expect this first part of its gap to have significant learning benefits for us: by giving this money, we’ll be better positioned to follow HKI’s work and review its monitoring, which we believe will make it more likely that we have a more accurate estimate of its impact in future years.

We decided to recommend funding all of No Lean Season’s funding gap in Bangladesh for the next three years. While No Lean Season’s cost-effectiveness is at the lower end of our top charities (~5x cash transfers), we see additional reasons to prioritize this gap. We believe No Lean Season is the top charity where there is the strongest case to be made for “upside”; our cost-effectiveness analysis may not capture the potential impact of scaling a new program that could lead to greater visibility and funding for a novel type of program.

4. We think the next highest priority funding to provide is $10.4 million to Deworm the World. This funding would support a new program in Pakistan and provide reserve funding for programs supported with restricted funds. We estimate that the program in Pakistan will be roughly ~7x as cost-effective as cash transfers, though this estimate is very sensitive to estimates of worm burdens in the locations where Deworm the World plans to work. The reserve funding is intended to make it unlikely that the India program, which we believe is very highly cost-effective, will be interrupted—Deworm the World relies on restricted funding for this program and there is some chance that this funding will not be available in the future. It may use this GiveWell-directed funding for other opportunities if it is not needed to backstop restricted funding in India; we expect that it will have unfunded opportunities remaining in the next few years, particularly in Nigeria. 5. The last funding gap on our list of recommendations for Good Ventures is$23.6 million to Malaria Consortium for its work on SMC. When choosing which gap to recommend for the remainder of Good Ventures’ $75 million, we focused on the remaining funding needs for Malaria Consortium’s SMC program, AMF, and SCI, which we believe to have the next highest-value gaps. Our cost-effectiveness model indicates that SCI is the most cost-effective of these three organizations (~10x cash transfers, compared with ~6-7x cash transfers for AMF and Malaria Consortium), but when the difference in modelled cost-effectiveness between two charities is relatively small, we also put significant weight on qualitative factors. We believe that AMF and Malaria Consortium are stronger on some qualitative factors, particularly the likelihood that we will be able to learn about the programs’ performance through the monitoring they conduct. Between AMF and Malaria Consortium, we have prioritized Malaria Consortium’s funding gap primarily due to the qualitative considerations discussed above around the proportion of the global funding need that is filled. After following Malaria Consortium for a second year, we believe that Malaria Consortium and AMF are comparable on other major qualitative factors, such as quality of ongoing monitoring and likelihood of detecting future problems. The total amount we are recommending for Malaria Consortium’s SMC program represents a rough compromise between providing a high level of funding to a program that we prefer to the next funding gap on the list and not wanting to make too large of a bet on an organization that we have less experience with than some other top charities. Prioritization for non-Good Ventures donors: 1. Our current recommendation for donors is to give to GiveWell for making grants to top charities at our discretion. Our goal is for SCI to receive$9 million, in addition to the $2.5 million incentive grant that we are recommending to Good Ventures, and AMF to receive the remainder of expected GiveWell-directed funding because AMF and SCI represent the next highest-value funding opportunities we see. Giving us funding to grant at our discretion allows GiveWell to better target this allocation, and to adapt if we learn new information about pressing, high-value funding needs at our top charities. 2. For donors who prefer to give directly to charities, we recommend giving 70 percent to AMF and 30 percent to SCI. These percentages are our best guess of what will achieve our target allocation given our projections of total donations driven by our recommendations. This allocation comes from a belief that, at these margins, it is difficult to distinguish between the quality of AMF and SCI’s funding gaps. SCI has better modeled cost-effectiveness, while AMF appears to be better on several qualitative factors, including monitoring of program performance. We have roughly targeted a two-to-one ratio between the two. Details on new top charities Helen Keller International (HKI) for work on vitamin A supplementation Our full review of HKI’s work on vitamin A supplementation is here. Overview HKI (http://www.hki.org/) is a large organization with multiple programs focused on reducing malnutrition and averting blindness and poor vision. Our review focuses on HKI’s work on vitamin A supplementation (VAS) and our recommendation is specific to its VAS program. HKI provides technical assistance, engages in advocacy, and contributes funding to government-run VAS programs. There is strong evidence from many randomized controlled trials (RCTs) conducted in the 1980s and 1990s that VAS can substantially reduce child mortality, but weaker evidence on how effective VAS is in the places HKI would work with additional funding in the next few years. In particular, there is little available information on current rates of vitamin A deficiency in areas where HKI works. We have adjusted our cost-effectiveness analysis for our best guess of how much less effective VAS is today (~25 percent as effective as in the trials in the 1980s and 1990s); the intervention remains cost-effective with that adjustment. We feel that the monitoring data that we have seen from HKI’s programs gives us limited information on HKI’s past performance, but demonstrates the types of data HKI is able to collect on program performance. We have requested that HKI collect this monitoring data of all programs funded with GiveWell-directed funds. Overall, we have not yet investigated HKI at the same level of depth as some of our other top charities, which we have recommended for several years. We have reviewed documents from HKI, had a number of conversations with their staff, and spent three days meeting with HKI and observing a VAS campaign in Guinea. We have remaining questions about HKI’s work that we will seek more information on in the future, but overall we believe this program is, like our other top charities, an excellent giving opportunity. Funding gap We believe that HKI’s VAS work is highly likely to be constrained by funding next year. HKI has provided details of VAS programs that it could support with additional funding of up to about$41.4 million in 2018-2020. HKI appears to have limited prospects for funding these programs from other sources.

Our understanding is that with additional funds, HKI would cause additional rounds of VAS to occur in some countries, while in other countries, HKI primarily aims to increase coverage rates in rounds of VAS that would take place regardless of its involvement. We have asked HKI to prioritize use of GiveWell-directed funding in countries where it expects to cause additional rounds of VAS to occur. HKI’s funding gap for countries that have recently missed VAS campaigns due to lack of funds is $7.2 million. HKI’s VAS work was supported by the Canadian government in the past. That funding ended in 2016 and has not been renewed. Over the past year, several VAS campaigns have been skipped in countries HKI previously supported. Evidence Action’s No Lean Season program Our full review of No Lean Season is here. Overview No Lean Season (https://www.evidenceaction.org/beta-no-lean-season/) provides no-interest loans to poor rural households during the season of income and food insecurity (‘lean season’) between planting and the major rice harvest in rural northern Bangladesh. Loans are conditional on a household member stating their intention to migrate to urban or other rural locations to seek short-term employment. Several randomized controlled trials (RCTs) of subsidies to increase migration provide moderately strong evidence that such an intervention increases household income and consumption during the lean season. An additional RCT is ongoing. We estimate that No Lean Season is roughly five times as cost-effective as cash transfers (see our cost-effectiveness analysis). Evidence Action has shared some details of its plans for monitoring No Lean Season in the future, but, as many of these plans have not been fully implemented, we have seen limited results. Therefore, there is some uncertainty as to whether No Lean Season will produce the data required to give us confidence that loans are appropriately targeted and reach their intended recipients in full; that recipients are not pressured into accepting loans; and that participants successfully migrate, find work, and are not exposed to major physical and other risks while migrating. Funding gap We expect No Lean Season to have opportunities to spend$11.5 million more than we expect it to receive over the next three years to implement and monitor the program in Bangladesh. We expect it to have a further $3.9 million in opportunities to expand to other countries and do further research, in Bangladesh and other locations. Evidence Action is seeking funding beyond this level to allow it to build reserves for No Lean Season. Details on top charities we are continuing to recommend Against Malaria Foundation (AMF) Our full review of AMF is here. Background AMF (againstmalaria.com) provides funding for long-lasting insecticide-treated net (LLIN) distributions for protection against malaria in developing countries. AMF has conducted post-distribution surveys of all completed distributions to determine whether LLINs have reached their intended destinations and how long they remain in good condition. AMF’s post-distribution surveys have generally found positive results (with some exceptions); we believe they have some methodological limitations. We estimate that AMF’s program is roughly six times as cost-effective as cash transfers (see our cost-effectiveness analysis). This estimate seeks to incorporate many highly uncertain inputs, such as the effect of mosquito resistance to the insecticides used in nets on how effective they are at protecting against malaria, how differences in malaria burden affect the impact of nets, and how to discount for displacing funding from other funders, among many others. Important changes in the last 12 months Prior to this year, we had seen results from AMF’s “post-distribution check ups” (PDCUs) from two countries, Malawi and the Democratic Republic of the Congo, and had significant uncertainties about the methodology used in each location. We have now also seen results from Ghana. We have more confidence in our understanding of AMF’s PDCUs than we did previously, though this work is ongoing. In particular, we commissioned IDinsight, an organization with which we are partnering as part of our Incubation Grants program, to observe post-distribution surveys in Malawi and Ghana and report their findings (see links). Further discussion of the strengths and weaknesses of PDCUs here. In 2017, AMF signed relatively few new agreements to fund LLIN distributions and, as a result, has a balance of$58 million in uncommitted funds, or $35 million if distributions where AMF believes agreements are imminent are counted as committed. Our understanding is that many of AMF’s conversations with countries could not progress until decisions were made about how much Global Fund funding each country would allocate to LLIN distributions (as opposed to other malaria control efforts). This decision-making process extended into late 2017. Global Fund funding is allocated on three-year cycles and we do not expect this to continue to be a bottleneck for AMF in 2018. Funding gap We believe that AMF is very likely to be constrained by lack of funding. There is high uncertainty in the maximum amount of funding that AMF could use productively, though we expect the maximum to be much greater than what AMF is likely to receive. To fund all of the distributions that it is currently in detailed discussions about, AMF would need$50 million more than we project it will receive. The total funding gap for LLINs for 2018-2020 appears to be hundreds of millions of dollars.

With additional funding, AMF’s top priorities would be to fund a portion of the next round of distributions, in 2018-2020, in each of the countries in which it has recently funded distributions.

END Fund (for work on deworming)

Our full review of the END Fund’s work on deworming is here.

Background

The END Fund (end.org) manages grants, provides technical assistance, and raises funding for controlling and eliminating neglected tropical diseases (NTDs). We have focused our review on its support for deworming.

Slightly more than half of the treatments the END Fund has supported have been deworming treatments, while the rest have been for other NTDs. The END Fund has funded SCI, Deworm the World, and Sightsavers. We see the END Fund’s value-add as a GiveWell top charity as identifying and providing assistance to programs run by organizations other than those we separately recommend, and our review of the END Fund has excluded results from charities on our top charity list.

We have seen limited monitoring results on the number of children reached in END Fund-supported programs. In 2016, the END Fund began requiring that surveys be conducted to determine whether its programs have reached a large proportion of children targeted; we have seen coverage surveys for (a non-random sample of) 35 percent of its 2016 deworming grant portfolio. These studies leave us with some remaining questions about the program’s impact.

Important changes in the last 12 months

We significantly improved our understanding of the END Fund’s cost per treatment and the baseline prevalence in areas that the END Fund works (which is used in our cost-effectiveness analysis), though we continue to have lower confidence in our estimates than we do for the deworming organizations that we have recommended for several years. We also saw some monitoring from END Fund programs; previously our recommendation of the END Fund was based on specific monitoring plans that we found credible.

Funding gap

We believe the END Fund could substantially increase its deworming grantmaking with additional funds. We roughly estimate that there is gap of $18 million between the amount of funding the END Fund will have available for grants for deworming and the amount of funding it would need to make all of the potential grants it has identified. Sources of major uncertainty in this estimate include whether the END Fund will encounter non-funding bottlenecks in some of its identified and early-stage opportunities, the amount of funding it will receive from other sources, the proportion of funding it will allocate to deworming, and costs other than grants. Evidence Action’s Deworm the World Initiative Our full review of Deworm the World is here. Background Evidence Action’s Deworm the World (evidenceaction.org/#deworm-the-world) advocates for, supports, and evaluates deworming programs. Its main countries of operation are India, Kenya, and Nigeria, and it is considering expanding to Pakistan. Deworm the World retains or hires monitors who visit schools during and following deworming campaigns. We believe its monitoring is the strongest we have seen from any organization working on deworming. Monitors have generally found high coverage rates and good performance on other measures of quality. As noted above, we believe that Deworm the World overall is the most cost-effective charity we have found. We estimate that it is ~12 times as cost-effective as cash transfers, but note that, due to differences in worm burdens and costs across countries, there is significant variation in cost-effectiveness across the countries in which it works. We estimate that its work to date in India has been more than 30 times as cost-effective as cash transfers, while its planned work in Nigeria is around three times as cost-effective as cash transfers (though this estimate is based on low-quality information). Important changes in the last 12 months We estimate that Deworm the World could absorb considerably more funding this year than we estimated last year, due to opportunities it has identified to expand its geographic reach. (More in the next section.) The quality of the monitoring that we have seen from Deworm the World has remained high. To date, we have seen limited monitoring from Nigeria, which is a new addition to Deworm the World’s portfolio and is expected to become a major portion of its work in the future. This is of minor concern given the strong monitoring track record elsewhere and how new the program is in Nigeria. Funding gap We believe that Deworm the World is very likely to be constrained by funding. We expect Deworm the World to have opportunities to spend$18.9 million more than we expect it to receive over the next three years. Funding beyond this level would allow Deworm the World to build its reserves and take advantage of unanticipated opportunities.

With additional funding, Deworm the World would sustain its current work in Kenya and India, and would seek to expand its work in Nigeria and India to additional states and support the government in Pakistan to initiate a deworming program.

GiveDirectly

Our full review of GiveDirectly is here.

Background

GiveDirectly (givedirectly.org) transfers cash to households in developing countries via mobile phone-linked payment services. It targets extremely low-income households. The proportion of total expenses that GiveDirectly has delivered directly to recipients is approximately 82 percent overall. We believe that this approach faces an unusually low burden of proof, and that the available evidence supports the idea that unconditional cash transfers significantly help people.

We believe GiveDirectly to be an exceptionally strong and effective organization, even more so than our other top charities. It has invested heavily in self-evaluation from the start, scaled up quickly, and communicated with us clearly. We believe that GiveDirectly has been effective at delivering cash to low-income households. GiveDirectly has one major randomized controlled trial (RCT) of its impact and took the unusual step of making the details of this study public before data was collected. It continues to experiment heavily, with the aim of improving how its own cash transfer programs are run as well as those of governments. It has recently started work on a universal basic income trial and has started partnering with major funders on evaluations of cash transfers in new geographies with the aim of influencing the broader international aid sector to use its funding more cost-effectively.

We believe cash transfers are less cost-effective than the programs our other top charities work on, but have the most direct and robust case for impact. We use cash transfers as a “baseline” in our cost-effectiveness analyses and only recommend other programs that are robustly more cost-effective than cash.

Important changes in the last 12 months

We had previously expressed reservations about GiveDirectly’s targeting strategy: that by excluding the least poor households in each village, the program might lead to negative reactions by non-recipients, increase costs per household reached, and exclude households that were still quite poor. In 2017, GiveDirectly largely switched to a “saturation” approach of making transfers to all households in selected villages. It will continue to use a targeted approach in Rwanda, where government regulations require such an approach, but the saturation approach will be used in Kenya and Uganda.

In 2016, GiveDirectly built up its operations in Uganda and Kenya with the anticipation of revenue growth in 2017. Revenue growth has been slower than expected and GiveDirectly had to lay off some staff as a result.

GiveDirectly launched its universal basic income project this month.

In 2015, Good Ventures made a grant of $25 million to GiveDirectly on GiveWell’s recommendation. GiveDirectly’s goals for the grant were to expand its ability to raise funds from donors not influenced by GiveWell’s recommendation and to collaborate with large aid institutions or governments to address their questions about cash transfers. We expect to write more about the performance of the grant in the future, but, in short, our impression is that fundraising has progressed slower than expected and collaborative projects have progressed more quickly than expected. Funding gap We believe that GiveDirectly is highly likely to be constrained by funding next year. It expects to use additional funding primarily for standard cash transfers and for additional collaborative projects. For collaborative projects, GiveDirectly’s potential partners require it to contribute funding, which the partner matches (at a one-to-one ratio, minimum). These projects would largely be in countries GiveDirectly has not worked in before and many are at an early stage of discussion. We estimate that GiveDirectly could use more than$200 million in additional funding in 2018-2019.

Malaria Consortium (for work on seasonal malaria chemoprevention)

Our full review of Malaria Consortium’s seasonal malaria chemoprevention program is here.

Background

Malaria Consortium (malariaconsortium.org) works on preventing, controlling, and treating malaria and other communicable diseases in Africa and Asia. Our review has focused exclusively on its seasonal malaria chemoprevention (SMC) programs, which distribute preventive anti-malarial drugs to children 3-months to 59-months old in order to prevent illness and death from malaria.

There is strong evidence that SMC substantially reduces cases of malaria. The randomized controlled trials on SMC that we considered showed a decrease in cases of clinical malaria but were not adequately statistically powered to find an impact on mortality.

Malaria Consortium and its partners have conducted studies in all of the countries where it has worked to determine whether its programs have reached a large proportion of children targeted. These studies have generally found positive results, though past surveys have been conducted after four rounds of SMC (SMC is given in a maximum of four treatment courses at monthly intervals) and may be subject to error due to the inaccurate recall or recordkeeping. Starting in 2017, Malaria Consortium is conducting coverage surveys after each round of SMC, to reduce recall error.

Important changes in the last 12 months

We have increased our confidence in Malaria Consortium’s monitoring, though we have not yet seen all of the research that Malaria Consortium expected to share in 2017 (in particular, tracking of malaria cases and deaths over time in areas where Malaria Consortium works). Coverage survey results from 2016 were generally positive, with a couple of outliers. The change from conducting coverage surveys after four treatment cycles to conducting them after each cycle will increase our confidence in the results.

Last year, we had only a rough estimate of how much additional funding Malaria Consortium could use productively. We have significantly improved our understanding of its room for more funding this year.

Funding gap

We believe that Malaria Consortium could productively use more funding than it expects to receive to scale up its SMC activities. It appears that there is a large remaining global need for additional funding for SMC programs and that Malaria Consortium is well-positioned to fill these gaps, if it has sufficient funding to do so.

Malaria Consortium estimates that it could spend $28-30 million per year on SMC in each of the next three years and that this level of funding would largely fill the global funding gap for SMC, with the exception of Nigeria, where the scale of the gap would be beyond Malaria Consortium’s operational capacity in the short term. It appears to have limited prospects for major funding from other sources. The major grant for Malaria Consortium’s work on SMC previously, from Unitaid, is ending and Malaria Consortium told us that it will not be renewed. Schistosomiasis Control Initiative (SCI) Our full review of SCI is here. Background SCI (imperial.ac.uk/schisto) works with governments in sub-Saharan Africa to create or scale up deworming programs. SCI’s role has primarily been to identify partner countries, provide funding to governments for government-implemented programs, provide advisory support, and conduct research on the process and outcomes of the programs. SCI has conducted studies to determine whether its programs have reached a large proportion of children targeted. These studies cover (a non-random sample of) about 40 percent of treatments SCI reports having delivered over the past few years. The studies have generally found moderately positive results, but leave us with some remaining questions about the program’s impact. As noted above, we believe that SCI is less cost-effective than Deworm the World and more cost-effective than Sightsavers and the END Fund. Given the uncertainty in our cost-effectiveness model, we are hesitant to say that SCI is more cost-effective than AMF and Malaria Consortium, though taken literally, SCI is 1.5 times as cost-effective as AMF and Malaria Consortium (~10x cash transfers vs. ~6-7x cash transfers). Important changes in the last 12 months We continued to follow SCI’s progress in 2017 and there have not been many major changes to its work. As in the past, SCI shared monitoring of deworming coverage levels for a portion of its programs with us; there continue to be several SCI-supported countries for which we have not seen monitoring results. In the past, we have noted that we had low confidence in the accuracy of the financial information that SCI provided and that SCI made significant improvements to its financial systems in 2016; our remaining concerns about SCI’s financial management and reporting are fairly minor. In 2017, SCI allocated nearly all available funding to programs in its 2017-2018 budget year. This was a large increase in spending over the previous budget year ($9.6 million in 2016-2017 compared with $22.5 million in 2017-2018), driven in large part by a large increase in GiveWell-directed funding ($3.7 million in 2015 compared with $16.6 million in 2016). We believe this decision was due in part to a miscommunication with GiveWell—in a conversation with SCI in early 2017, we recommended that they treat the funds like a multi-year grant because of the risk of large fluctuations in GiveWell-directed funding, but we did not emphasize this point. SCI told us that it plans to allocate future funding over multiple years, noting that its funding allocation decisions in 2016-2017 were due to the desire to avoid allowing drugs to expire as well as a misunderstanding with GiveWell about how the funding was intended to be used. Funding gap We estimate that SCI could productively use about$30 million more than it expects to receive to deliver treatments to school-aged children over the next three years. It could use almost three times this amount if it were to follow World Health Organization guidelines, which include treating many adults; we are not recommending funding to treat adults because we haven’t seen sufficient evidence on the impact of treating adults.

The primary use of this funding, and SCI’s top priority, would be to sustain and expand work in current countries of operation. A smaller portion would be used to expand to up to four additional countries.

Sightsavers (for work on deworming)

Our full review of Sightsavers is here.

Background

Sightsavers (sightsavers.org) is a large organization with multiple program areas that focuses on preventing avoidable blindness and supporting people with impaired vision. Our review focuses on Sightsavers’ work to prevent and treat neglected tropical diseases (NTDs) and – more specifically – advocating for, funding, and monitoring deworming programs. Deworming is a fairly new addition to Sightsavers’ portfolio; in 2011, it began delivering some deworming treatments through NTD programs that had been originally set up to treat other infections.

Sightsavers has shared surveys for some of its past NTD programs that measure whether these programs have reached a large proportion of children targeted. These studies have generally found moderately positive results, but leave us with some remaining questions about the program’s impact. We have seen very limited results from Sightsavers’ deworming programs specifically. For GiveWell-supported programs, Sightsavers has told us it will conduct coverage surveys for each round of deworming; we have reviewed one of those surveys to date.

Important changes in the last 12 months

In 2017, as expected, we learned relatively little about the performance of Sightsavers’ deworming programs, because programs funded with GiveWell-directed funds were at early stages. We did not expect to receive any monitoring results from programs funded with GiveWell-directed funds; however, Sightsavers shared a coverage survey from Guinea with us earlier than expected. The survey found middling coverage results.

We significantly improved our understanding of Sightsavers’ cost per treatment and the baseline prevalence in areas where Sightsavers works (which is used in our cost-effectiveness analysis), though we continue to have lower confidence in our estimates than we do for the deworming organizations that we have recommended for several years.

Funding gap

We believe that Sightsavers’ deworming work is likely to be constrained by funding next year. Sightsavers has provided details of deworming programs that it could fund with additional funding of up to about $6.4 million in 2018 and 2019. Sightsavers appears to have limited prospects for funding these programs from other sources. We believe it is likely that Sightsavers could absorb funding beyond this amount to extend programs to 2020 and/or seek out additional opportunities to fund deworming programs. Of the$6.4 million, $2.8 million would be used to add deworming to existing NTD programs and$3.7 million would be used to fund NTD programs that would treat several NTDs in addition to schistosomiasis and STH. We will request that Sightsavers prioritize the first set of opportunities, because we believe they will likely be more cost-effective.

Standout charities

In addition to our top charities, we recognize standout charities—organizations that support programs that may be extremely cost-effective and are evidence-backed but for which we have less confidence in their impact than we do for our top charities. We have reviewed their work and feel these groups stand out from the vast majority of organizations we have considered in terms of the evidence base for the program they support, their transparency, and their potential cost-effectiveness. These organizations offer additional giving options for donors who feel highly aligned with their work.

We’ve added one organization to the list this year: Evidence Action’s Dispensers for Safe Water.

We don’t follow standout organizations as closely as we do our top charities. We generally have one or two calls per year with representatives from each group and publish notes on our conversations. We provide brief updates on these charities below.

New addition to the standout list:

• Evidence Action’s Dispensers for Safe Water. The Dispensers for Safe Water program provides chlorine dispensers for decontamination of drinking water to prevent diarrhea and associated deaths of young children. We believe that there is strong evidence that chlorination is biochemically effective at inactivating most diarrhea-causing microorganisms, but weaker evidence on the causal relationship between water chlorination programs and reductions in under-5 diarrhea and death. Our rough cost-effectiveness analysis of Dispensers for Safe Water suggests that the program is in a similar range of cost-effectiveness as unconditional cash transfer programs. Our review of Dispensers for Safe Water is here.

Organizations that have conducted randomized controlled trials of their programs:

• Development Media International (DMI). DMI produces radio and television programming in developing countries that encourages people to adopt improved health practices. It conducted a randomized controlled trial (RCT) of its child survival media campaign in Burkina Faso and has been highly transparent, including sharing preliminary results with us. The results of its RCT were mixed, with a household survey not finding an effect on mortality (it was powered to detect a reduction of 15 percent or more) and data from health facilities finding an increase in facility visits. (The results have not yet been published.) We believe there is a possibility that DMI’s work is highly cost-effective, but we see no solid evidence that this is the case. DMI is conducting an RCT of its family planning radio campaign in Burkina Faso and it is planning work on early child development in Burkina Faso and child survival in Mozambique. It is our understanding that DMI will be constrained by funding in the next year. Our full review of DMI is here and notes from our most recent conversation with DMI are here.
• Living Goods. Living Goods recruits, trains, and manages a network of community health promoters who sell health and household goods door-to-door in Uganda and Kenya and provide basic health counseling. They sell products such as treatments for malaria and diarrhea, fortified foods, water filters, bednets, clean cookstoves, and solar lights. Living Goods completed a RCT of its program and measured a 27 percent reduction in child mortality. Our best guess is that Living Goods’ program is less cost-effective than our top charities, with the possible exception of GiveDirectly. It is conducting a second RCT of its program and results are expected in 2020. Living Goods recently expanded the number of family planning products it offers and is interested in expanding to a third country. Living Goods is scaling up its program and could scale up more quickly with additional funding. Our review of Living Goods is here and notes from our most recent conversation with Living Goods are here.

Organizations working on micronutrient fortification:

We believe that food fortification with certain micronutrients can be a highly effective intervention. For each of these organizations, we believe they may be making a significant difference in the reach and/or quality of micronutrient fortification programs but we have not yet been able to establish clear evidence of their impact. The limited analysis we have done suggests that these programs are likely not significantly more cost-effective than our top charities—if they were, we might put more time into this research or recommend a charity based on less evidence.

• Food Fortification Initiative (FFI). FFI works to reduce micronutrient deficiencies (especially folic acid and iron deficiencies) by doing advocacy and providing assistance to countries as they design and implement flour and rice fortification programs. We have not yet completed a full evidence review of iron and folic acid fortification, but our initial research suggests it may be competitively cost-effective with our other priority programs. Because FFI typically provides support alongside a number of other actors and its activities vary widely among countries, it is difficult to assess the impact of its work. FFI’s recent work includes advocating for legislation to mandate that rice imported to West Africa is fortified with vitamins and minerals. Our full review is here and notes from our most recent conversation are here.
• Global Alliance for Improved Nutrition (GAIN) – Universal Salt Iodization (USI) program. GAIN’s USI program supports national salt iodization programs. We have spent the most time attempting to understand GAIN’s impact in Ethiopia. Overall, we would guess that GAIN’s activities played a role in the increase in access to iodized salt in Ethiopia, but we do not yet have confidence about the extent of GAIN’s impact. GAIN has focused its recent USI work on Tanzania, Mozambique, Ethiopia, and Kenya, which it targeted based on relatively low levels of coverage of iodized salt and strong relationships with stakeholders. It is our understanding that GAIN’s USI work will be constrained by funding in the next year. Our review of GAIN is here and notes from our most recent conversation are here.
• Iodine Global Network (IGN). Like GAIN-USI, IGN supports (via advocacy and technical assistance rather than implementation) salt iodization. IGN is small, and GiveWell-directed funding has made up a large part of its funding in recent years. It expects to have data from before and after its recent work in Madagascar, Lebanon, and possibly Israel by the end of 2018; this data may provide additional evidence of IGN’s impact. It is our understanding that IGN will be constrained by funding in the next year. Our review of IGN is here and notes from our most recent conversation here.
• Project Healthy Children (PHC)/Sanku. PHC/Sanku aims to reduce micronutrient deficiencies by providing assistance to small countries as they design and implement food fortification programs and by enabling fortification among small-scale millers. PHC is scaling up its Sanku project, which equips small millers with a machine that enables them to fortify their flour with micronutrients; we have not done as much formal analysis of Sanku as of PHC’s core work on advocacy and technical assistance to countries to implement fortification. PHC/Sanku expects to be constrained by funding in the future. Our review of PHC/Sanku is here and notes from our more recent conversation are here.
Our research process in 2017

We plan to detail the work we completed this year in a future post as part of our annual review process. A major focus of 2017 was improving our recommendations in future years, in particular through our work on GiveWell Incubation Grants and prioritizing promising programs for further investigation.

Below, we highlight the key research that led to our current charity recommendations. This page describes our overall process.

• Following existing top charities. We followed the progress and plans of each of our 2016 top charities. We had several conversations by phone with each organization, met in person at least once with each top charity (including a three-day visit to Rwanda and the Democratic Republic of the Congo with the END Fund), and reviewed documents they shared with us.
• Identifying new top charities.
• No Lean Season. We had recommended a series of Incubation Grants to No Lean Season beginning in 2014 and have followed its progress since then. This year, due to the scale at which No Lean Season was operating and the track record it had established, we decided that the No Lean Season program was at a stage of development where we could evaluate it as a potential top charity. In addition to extensive communications with No Lean Season staff over the phone and reviewing documents they shared with us, GiveWell staff spent five days visiting the program in Bangladesh.
• Helen Keller International’s vitamin A supplementation program. Earlier this year, Research Analyst Chelsea Tabart began reaching out to organizations that might be a fit for our criteria, but with which we had limited or no previous contact with. As a result of that process, we reconnected with Helen Keller International (which we first considered as a potential top charity in 2007) and began to consider its vitamin A supplementation program as a potential top charity. In addition to extensive communications with HKI staff over the phone and reviewing documents they shared with us, GiveWell staff spent three days meeting with HKI staff in Guinea and observing a vitamin A supplementation program.
• Completing intervention reports on obstetric fistula surgery and measles vaccination campaigns; completing interim intervention reports on SMS reminders for vaccination, Sayana® Press (an injectable contraceptive), oral rehydration solution, and antiretroviral therapy for HIV/AIDS; and expanding our interim intervention report on seasonal malaria chemoprevention to a full intervention report.
• Staying up to date on the research for malaria nets, cash transfers, and deworming. We did not find major new research on cash transfers, nets, or deworming that affected our recommendation of GiveDirectly, AMF, or the organizations we recommend for their work on deworming. David Roodman published an in-depth review (parts 1 and 2) of the deworming studies that form the primary basis of our views on the impact of deworming (though much of this work was completed in 2016 and informed our top charity recommendations last year).
• Making extensive updates to our cost-effectiveness model and publishing several updates to the model over the course of the year. We instituted a process to track and report publicly on updates to the model to reduce the possibility of errors and make our process more transparent. This year, staff members have also provided substantially more detail in our cost-effectiveness file about why they have chosen particular inputs.
Giving to GiveWell vs. top charities

GiveWell is currently in a stable financial position. We project that our revenue and our expenses will be approximately equal in the future. However, this projection forecasts some growth in the level of operating support we receive.

In the long term, we seek to have a model where donors who find our research useful contribute to the costs of creating it, while holding us accountable to providing high-quality, easy-to-use recommendations. We retain our “excess assets policy” to ensure that if we fundraise for our own operations beyond a certain level, we will grant the excess to our recommended charities.

We cap the amount of operating support we ask Good Ventures to provide to GiveWell at 20 percent, for reasons described here. We thus ask that donors who use GiveWell’s research consider the following:

• If you have supported GiveWell’s operations in the past, we ask that you maintain your support. Having a strong base of consistent support allows us to make valuable hires when opportunities arise and to minimize staff time spent on fundraising for our operating expenses.
• If you have not supported GiveWell’s operations in the past, we ask that you designate 10 percent of your donation to help fund GiveWell’s operations. This can be done by selecting the option to “Add 10% to help fund GiveWell’s operations” on our credit card donation form or letting us know how you would like to designate your funding when giving another way.

We’re happy to answer questions in the comments below. Please also feel free to reach out directly with any questions.

The post Our top charities for giving season 2017 appeared first on The GiveWell Blog.

### Are GiveWell’s top charities the best option for every donor?

Wed, 06/21/2017 - 12:15

We’re sometimes asked whether we think GiveWell’s top charities are the “best,” in some absolute sense of the word, or whether we’d ever advise that a donor give to an opportunity outside of our recommendations. This post aims to clarify how GiveWell thinks about different giving options and their suitability for different types of donors.

We believe that GiveWell’s top charities offer donors an outstanding opportunity to do a lot of good and are the best option for most donors. However, some donors—those with a very high degree of trust in a particular individual or organization to make this decision, donors with lots of time (in excess of 50 hours per year, and likely more) to consider their giving decision, or donors whose values point strongly toward a particular cause outside of the ones GiveWell covers—may find opportunities to have a greater impact per dollar than GiveWell’s top charities. Note that we think these characteristics are likely to be necessary, but not sufficient, for finding these types of opportunities; we still expect good giving to be hard, and spending, for example, 50 hours per year on research isn’t necessarily going to yield better opportunities.

In this post, we describe relevant considerations for donors in greater detail.

Giving to GiveWell’s top charities

GiveWell was founded to serve donors with limited amounts of time to make giving decisions. GiveWell’s co-founders, Elie Hassenfeld and Holden Karnofsky, were in this situation when they started GiveWell as a side project in 2006. They found that determining where to give effectively was a full-time project and quit their jobs to start GiveWell in 2007.

GiveWell’s top charity recommendations serve all donors. We rely on evidence and detail our rationale for making a recommendation publicly, so donors can vet our work; a strength of our recommendations is their falsifiability. We believe our top charity recommendations serve donors who want to give as effectively as possible and have only limited time to determine where to donate, and (prior to GiveWell) no trusted person or entity to outsource their thinking to, particularly well. Our criteria and recommendations were designed with this type of donor in mind:

• Our top charities are largely uncontroversial and relatively straightforward ways to do a lot of good—for example, by providing direct aid such as insecticide-treated nets to prevent malaria and cash transfers to very poor households. There is room for debate on the evidence behind these interventions and their cost-effectiveness, but the basic case for them—and the fact that they are likely to do more good than harm—is subject to little debate, so a donor can feel fairly confident in these basics without needing to do their own research.
• GiveWell publishes the full details of our charity analyses so that donors can review and vet our work, and so that donors with very limited time can trust that any major problems would likely be caught by others (with more time).
• Because we lay out the entire case for the charities online, donors can spot-check any particular part of it to get a sense of whether we’re thinking reasonably about the issues that seem most salient to them.
• Our top charities have room for more funding. In other words, we believe additional marginal donations to these organizations enable them to do more good.

Our guess is that most donors that use GiveWell fit this profile (want to give as effectively as possible and have only limited time to determine where to donate, and no other trusted person or entity to outsource their thinking to).

Below, we discuss alternative donor profiles:

(1) Donors with limited time and a high amount of trust in a person or organization to inform their giving decisions

This group of donors has limited time to spend on making a giving decision and has an organization or person (other than GiveWell or GiveWell staff) they personally trust to make or inform this decision. In this case, they may defer to that person or organization’s recommendations.

(2) Donors with lots of time

Donors with a lot of time to spend on giving decisions (50+ hours per year) may be able to find opportunities that GiveWell hasn’t. For example, a donor might know someone who is starting a charity and feel, based on their research, that supporting their project at an early stage might be a particularly leveraged way to do good. A donor with lots of time may also be very familiar with a particular cause and feel highly confident in a particular organization and its need for funding. These donors may want to compare alternative opportunities to GiveWell’s top charities. They may also want to actively vet GiveWell’s recommendations as part of their research process.

Donors with lots of time may also wish to apply a different strategy to their giving. GiveWell largely recommends charities where sufficient evidence exists to make a fairly robust estimate of the expected value of a donation. Donors with much more time to spend (maybe even significantly more than 50 hours per year) thinking about where to give may want to take a “hits-based giving” approach—having a high tolerance for philanthropic risk, so long as the overall expected value is sufficiently high. This is the approach the Open Philanthropy Project, which was incubated at GiveWell, has taken, and we believe doing this well requires a lot of work, as the Open Philanthropy Project discussed in a blog post last year (emphasis original):

Aim for deep understanding of the key issues, literatures, organizations, and people around a cause, either by putting in a great deal of work or by forming a high-trust relationship with someone else who can. If we [the Open Philanthropy Project] support projects that seem exciting and high-impact based on superficial understanding, we’re at high risk of being redundant with other funders. If we support projects that seem superficially exciting and high-impact, but aren’t being supported by others, then we risk being systematically biased toward projects that others have chosen not to support for good reasons. By contrast, we generally aim to support projects based on the excitement of trusted people who are at a world-class level of being well-informed, well-connected, and thoughtful in relevant ways.

Achieving this is challenging. It means finding people who are (or can be) maximally well-informed about issues we’ll never have the time to engage with fully, and finding ways to form high-trust relationships with them. As with many other philanthropists, our basic framework for doing this is to choose focus areas and hire staff around those focus areas. In some cases, rather than hiring someone to specialize in a particular cause, we try to ensure that we have a generalist who puts a great deal of time and thought into an area. Either way, our staff aim to become well-networked and form their own high-trust relationships with the best-informed people in the field.

I [Open Philanthropy Project Executive Director Holden Karnofsky] believe that the payoff of all of this work is the ability to identify ideas that are exciting for reasons that require unusual amounts of thought and knowledge to truly appreciate.

(3) Donors with values that differ from GiveWell staff

Donors who hold different values than the majority of GiveWell staff, or who place more weight on a particular cause outside of the causes covered by GiveWell, may find other giving opportunities to be more attractive for reasons beyond the time/trust framework articulated earlier in this post. For example, individuals who place a very high value on farm animal welfare may wish to give a large proportion of their donation, if not all of their donation, to organizations working in that cause.

We’re happy to speak with you about giving decisions.

If you’re not sure which considerations apply to you, please reach out. We’re always happy to talk through giving decisions.

The post Are GiveWell’s top charities the best option for every donor? appeared first on The GiveWell Blog.

### How thin the reed? Generalizing from “Worms at Work”

Wed, 01/04/2017 - 14:37
Hookworm (AJC1/flickr)

My last post explains why I largely trust the most famous school-based deworming experiment, in particular the report in Worms at Work about its long-term benefits. That post also gives background on the deworming debate, so please read it first. In this post, I’ll talk about the problem of generalization. If deworming in southern Busia County, Kenya, in the late 1990s permanently improved the lives of some children, what does that tell us about the impact of deworming programs today, from sub-Saharan Africa to South Asia? How safely can we generalize from this study?

I’ll take up three specific challenges to its generalizability:

• That a larger evidence base appears to show little short-term benefit from mass deworming—and if it doesn’t help much in the short run, how can it make a big difference in the long run?
• That where mass deworming is done today, typically fewer children need treatment than in the Busia experiment.
• That impact heterogeneity within the Busia sample—the same treatment bringing different results for different children—might undercut expectations of benefits beyond. For example, if examination of the Busia data revealed long-term gains only among children with schistosomiasis, that would devalue treatment for the other three parasites tracked.

In my view, none of these specific challenges knocks Worms at Work off its GiveWell-constructed pedestal. GiveWell’s approach to evaluating mass deworming charities starts with the long-term earnings impacts estimated in Worms at Work. Then it discounts by roughly a factor of ten for lower worm burdens in other places, and by another factor of ten out of more subjective conservatism. As in the previous post, I conclude that the GiveWell approach is reasonable.

But if I parry specific criticisms, I don’t dispel a more general one. Ideally, we wouldn’t be relying on just one study to judge a cause, no matter how compelling the study or how conservative our extrapolation therefrom. Nonprofits and governments are spending tens of millions per year on mass deworming. More research on whether and where the intervention is especially beneficial would cost only a small fraction of all those deworming campaigns, yet potentially multiply their value.

Unfortunately, the benefits that dominate our cost-effectiveness calculations manifest over the long run, as treated children grow up. And long-term research tends to take a long time. So I close by suggesting two strategies that might improve our knowledge more quickly.

Here are Stata files for the quantitative assertions and graphs presented below.

Evidence suggests short-term benefits are modest

Researchers have performed several systematic reviews of the evidence on the impacts of deworming treatment. In my research, I focused on three. Two come from institutions dedicated to producing such surveys, and find that mass deworming brings little benefit, most emphatically in the short run. But the third comes to a more optimistic answer.

The three are:

• The Cochrane review of 2015, which covers 45 trials of the drug albendazole for soil-transmitted worms (geohelminths). It concludes: “Treating children known to have worm infection may have some nutritional benefits for the individual. However, in mass treatment of all children in endemic areas, there is now substantial evidence that this does not improve average nutritional status, haemoglobin, cognition, school performance, or survival.”
• The Campbell review of 2016, which extends to 56 randomized short-term studies, in part by adding trials of praziquantel for water-transmitted schistosomiasis. “Mass deworming for soil-transmitted helminths …had little effect. For schistosomiasis, mass deworming might be effective for weight but is probably ineffective for height, cognition, and attendance.”
• The working paper by Kevin Croke, Eric Hsu, and authors of Worms at Work. The paper looks at impacts only on weight, as an indicator of recent nutrition. (Weight responds more quickly to nutrition than height.) While the paper lacks the elaborate, formal protocols of the Cochrane and Campbell reviews, it adds value in extracting more information from available studies in order to sharpen the impact estimates. It finds: “The average effect on child weight is 0.134 kg.”

Before confronting the contradiction between the first two reviews and the third, I will show you a style of reasoning in all of them. The figure below constitutes part of the Campbell review’s analysis of the impact of mass administration of albendazole (for soil-transmitted worms) on children’s weight (adapted from Figure 6 in the initial version):

Each row distills results from one experiment; the “Total” row at the bottom draws the results together. The first row, for instance, is read as follows. During a randomized trial in Uganda run by Harold Alderman and coauthors, the 14,940 children in the treatment group gained an average 2.413 kilograms while the 13,055 control kids gained 2.259 kg, for a difference in favor of the treatment group of 0.154 kg. For comparability with other studies, which report progress on weight in other ways, the difference is then re-expressed as 0.02 standard deviations, where a standard deviation is computed as a sort of average of the 7.42 and 8.01 kg figures shown for the treatment and control groups. The 95% confidence range surrounding the estimate of 0.02 is written as [–0.00, 0.04] and is in principle graphed as a horizontal black line to the right, but is too short to show up. Because of its large sample, the Alderman study receives more weight (in the statistical sense) than any other in the figure, at 21.6% of the overall number. The relatively large green square in the upper right signifies this influence.

In the lower-right of the figure, the bolded numbers and the black diamond present the meta-analytical bottom line: across these 13 trials, mass deworming increased weight by an average 0.05 standard deviations. The aggregate 95% confidence interval stretches from –0.02  to 0.11, just bracketing zero. The final version of the Campbell report expresses the result in physical units: an average gain of 0.09 kg, with a 95% confidence interval stretching from –0.09 kg to +0.28 kg. And so it concludes: “Mass deworming for soil-transmitted helminths with albendazole twice per year compared with controls probably leads to little to no improvement in weight over a period of about 12 months.”

Applying similar methods to a similar pool of studies, the Cochrane review (Analysis 4.1) produces similar numbers: an average weight gain of 0.08 kg, with a 95% confidence interval of –0.11 to 0.27. This it expresses as “For weight, overall there was no evidence of an effect.”

But Croke et al. incorporate more studies, as well as more data from the available studies, and obtain an average weight gain of 0.134 kg (95% confidence interval: 0.03 to 0.24), which they take as evidence of impact.

How do we reconcile the contradiction between Croke et al. and the other two? We don’t, for no reconciliation is needed, as is made obvious by this depiction of the three estimates of the impact of mass treatment for soil-transmitted worms on children’s weight:

Each band depicts one of the confidence intervals I just cited. The varied shading reminds us that within each band, probability is highest near the center. The bands greatly overlap, meaning that the three reviews hardly disagree. Switching from graphs to numerical calculations, I find that the Cochrane results reject the central Croke et al. estimate of 0.134 kg at p = 0.58 (two-tailed Z-test), which is to say, they do not reject with any strength. For Croke et al. vs. Campbell, p = 0.64. So the Croke et al. estimate does not contradict the others; it is merely more precise. The three reviews are best seen as converging to a central impact estimate of about 0.1 kg of weight gain. Certainly 0.1 kg fits the evidence better than 0.0 kg.

If wide confidence intervals in the Cochrane and Campbell reviews are obscuring real impact on weight, perhaps the same is happening with other outcomes, including height, hemoglobin, cognition, and mortality. Discouragingly, when I scan the Cochrane review’s “Summary of findings for the main comparison” and Campbell’s corresponding tables, confidence intervals for outcomes other than weight look more firmly centered on zero. That in turn raises the worry that by looking only at weight, Croke et al. make a selective case on behalf of deworming.[1]

On the other hand, when we shift our attention from trials of mass deworming to trials restricted to children known to be infected—which have more power to detect impacts—it becomes clear that the boost to weight is not a one-off. The Cochrane review estimates that targeting treatment at kids with soil-transmitted worms increased weight by 0.75 kilograms, height by 0.25 centimeters, mid-upper arm circumference by 0.49 centimeters, and triceps skin fold thickness by 1.34 millimeters, all significant at p = 0.05. Treatment did not, however, increase hemoglobin (Cochrane review, “Data and Analyses,” Comparison 1).

In this light, the simplest theory that is compatible with the evidence arrayed so far is that deworming does improve nutrition in infected children while leaving uninfected children unaffected; and that available studies of mass deworming tend to lack the statistical power to detect the diluted benefits of mass deworming, even when combined in a meta-analysis. The compatibility of that theory with the evidence, by the way, exposes a logical fallacy in the Cochrane authors’ conclusion that “there is now substantial evidence” that mass treatment has zero effect on the outcomes of interest. Lack of compelling evidence is not compelling evidence of lack.

Yet the Cochrane authors might be right in spirit. If the benefit of mass deworming is almost too small to detect, it might be almost too small to matter. Return to the case of weight: is ~0.1 kg a lot? Croke et al. contend that it is. They point out that “only between 2 and 16 percent of the population experience moderate to severe intensity infections in the studies in our sample that report this information,” so their central estimate of 0.134 could indicate, say, a tenth of children gaining 1.34 kg (3 pounds). This would cohere with Cochrane’s finding of an average 0.75 kilogram gain in trials that targeted infected children. In a separate line of argument, Croke et al. calculate that even at 0.134, deworming raises children’s weight more cheaply than school feeding programs do.

But neither defense gets at what matters most for GiveWell, which is whether small short-term benefits make big long-term earnings gains implausible. Is 0.134 kg in weight gain compatible with 15% income gain 10 years later reported in Worms at Work?

More so than it may at first appear, once we take account of two discrepancies embedded in that comparison. First, more kids had worms in Busia. I calculate that 27% of children in the Worms sample had moderate or serious infections, going by World Health Organization (WHO) guidelines, which can be viewed conservatively as double the 2–16% Croke et al. cite as average for the kids behind that 0.134 kg number.[2] So in a Worms-like setting, we should expect twice as many children to have benefited, doubling the average weight gain from 0.134 to 0.268 kg. Second, at 13.25 years, the Worms children were far older than most of the children in the studies surveyed by Croke et al. Subjects averaged 9 months of age in the Awasthi 2001 study, 12–18 months in Joseph 2015, 24 months in Ndibazza 201236 months in Willett 1979, and 2–5 years in Sur 2005. 0.268 kg means more for such small people. As Croke et al. point out, an additional 0.268 kg nearly suffices to lift a toddler from the 25th to the 50th percentile for weight gain between months 18 and 24 of life (girls, boys).

In sum, the statistical consensus on short-term impacts on nutritional status does not render implausible the long-term benefits reported out of Busia. The verdict of Garner, Taylor-Robinson, and Sachdev—“no effect for the main biomedical outcomes…, making the broader societal benefits on economic development barely credible”—overreaches.

In many places, fewer kids have worms than in Busia in 1998–99

If we accept the long-term impact estimates from Worms at Work, we can still question whether those results carry over to other settings. This is precisely why GiveWell deflates the earnings impact by two orders of magnitude in estimating the cost-effectiveness of deworming charities. One of those orders of magnitude arises from the fact that school-age children in Busia carried especially heavy parasite loads. Where loads are lighter, mass deworming will probably do less good. (The other order of magnitude reflects a more subjective worry that if Worms at Work were replicated in other places with similar parasite loads, it would fail to show any benefits there, a theme to which I will return at the end.)

GiveWell’s cost-effectiveness spreadsheet does adjust for difference in worm loads between Worms and places where recommended charities support mass deworming today. So I spent some time scrutinizing this discount—more precisely, the discounts of individual GiveWell staffers. I worried in particular that the ways we measure worm loads could lead my colleagues to overestimate the need for and benefit of mass deworming.

As a starting point, I selected a few data points from one of the metrics GiveWell has gathered, the fraction of kids who test positive for worms. This table shows the prevalence of worm infection, by type, in Busia, 1998–99, before treatment, and in program areas of two GiveWell-recommended charities:

The first row, computed from the public Worms data set, reports that before receiving any treatment from the experiment, 81% of tested children in Busia were positive for hookworm, 51% for roundworm, 62% for whipworm, and 36% for schistosomiasis. 94% tested positive for at least one of those parasites. On average, each child carried 2.3 distinct types of worm. Then, from the GiveWell cost-effectiveness spreadsheet, come corresponding numbers for areas served by programs linked to the Schistosomiasis Control Initiative (SCI) and Deworm the World. Though approximate, the numbers suffice to demonstrate that far fewer children served by these charities have worms than in the Worms experiment. For example, the hookworm rate for Deworm the World is estimated at 24%, which is 30% of the rate of Busia in 1998–99. Facing less need, we should expect these charities’ activities to do less good than is found in Worms at Work.

But that comparison would misrepresent the value of deworming today if the proportion of serious infections is even lower today relative to Busia. To get at the possibility, I made a second table for the other indicator available to GiveWell, which is the intensity of infection, measured in eggs per gram of stool:

Indeed, this comparison widens the apparent gap between Busia of 1998–99 and charities of today. For example, hookworm prevalence in Deworm the World service areas was 30% of the Busia rate (24 vs. 81 out of every 100 of kids), while intensity was only 20% (115 vs. 568 eggs/gram).

After viewing these sorts of numbers, the median GiveWell staffer multiplies the Worms at Work impact estimate by 14%—that is, divides it by seven. In aggregate, I think my coworkers blend the discounts implied by the prevalence and intensity perspectives.[3]

To determine the best discount, we’d need to know precisely what characterized the children within the Worms experiment who most benefited over the long term—perhaps lower weight, or greater infection with a particular parasite species. As I will discuss below, such insight is easier imagined than attained. Then, if we had it, we would need to know the number of children in today’s deworming program areas with similar profiles. Obtaining that data could be a tall order in itself.

To think more systematically about how to discount for differences in worm loads, within the limits of the evidence, I looked to some recent research that models how deworming affects parasite populations. Nathan Lo and Jason Andrews led the work (2015, 2016). With Lo’s help, I estimated how the prevalence of serious infection varies with the two indicators at GiveWell’s fingertips.[4]

For my purposes, the approach introduces two key ideas. First, data gathered from many locales shows how, for each worm type, the average intensity of infection tends to rise as prevalence increases. Not surprisingly, where worm infection is more common, average severity tends to be higher too—and Lo and colleagues estimate how much so. Second is the use a particular mathematical family of curves to represent the distribution of children by intensity levels—how many have no infection, how many have 1-100 eggs/gram, how many are above 100 eggs/gram, etc. (The family, the negative binomial, is an accepted model for the prevalence of infectious diseases.) If we know two things about the pattern of infection, such as the fraction of kids who have it and their average intensity, we can mathematically identify a unique member of this family of distributions. And once a member is chosen, we can estimate the share of children with, for example, hookworm infections exceeding 2,000 eggs/gram, which is the WHO’s suggested minimum for moderate or heavy infection.

The next two graphs examine how, under these modeling assumptions, the fraction of children with moderate/heavy infections varies in tandem with the two indicators at GiveWell’s disposal, which are prevalence of infection and average infection intensity:

The important thing to notice is that the curves are much curvier in the first graph. There, for example, as the orange hookworm curve descends, it converges to the left edge just below 40%. This suggests that if a community has half as many kids with hookworm as in Busia—40% instead of about 80%—then it could have far less than half as many kids with serious infections—indeed, almost none. But the straighter lines in the second graph mean that a 50% drop in intensity (eggs/gram) corresponds to a 50% drop in the number of children with serious disease.

While we don’t know exactly what defines a serious infection, in the sense of offering hope that treatment could permanently lift children’s trajectories, these simulations imply that it is reasonable for GiveWell to extrapolate from Worms at Work on the basis of intensity (eggs/gram).

Returning to the intensity table above, I find that the Deworm the World egg counts, by worm type, average 16% of those in Busia. For the Schistosomiasis Control Initiative, the average ratio is 7% (and is 6% just for SCI’s namesake disease). These numbers say—as far as this sort of analysis can take us—that GiveWell’s 14% discounts are about right for Deworm the World, and perhaps ought to be halved for SCI. Halving is not as big a change as it may seem; GiveWell has no illusions about the precision of its estimates, and performs them only to gauge the order of magnitude of expected impact.

Impact heterogeneity in the Worms experiment

Having confronted two challenges to the generalizability of Worms at Work—that short-term non-impacts make long-term impacts implausible, and that worm loads are lower in most places today than they were in Busia in 1998–99—I turned to one more. Might there be patterns within the Worms at Works data that would douse hopes for impact beyond? For example, if only children with schistosomiasis experienced those big benefits, that would call into question the value of treating geohelminths (hookworm, roundworm, whipworm).

Returning to the Worms at Work data, I searched for—and perhaps found—signs of heterogeneity in impact. I gained two insights thereby. The first, as it happens, is more evidence that is easier-explained if we assume that the Worms experiment largely worked, the theme of the last post. The second is a keener sense that there is no such thing as the “the” impact of an intervention, since it varies by person, time, and place. That heightened my nervousness about extrapolating from a single study. Beyond that general concern, I did not find specific evidence that would cast grave doubt on whole deworming campaigns.

My hunt for heterogeneity went through two phases. In the first, motivated by a particular theory, I brought a narrow set of hypotheses to the data. In the second, I threw about 20 hypotheses at the data and watched what stuck: Did impact vary by sex or age? By proximity to Lake Victoria, where live the snails that carry Schistosoma mansoni?  As statisticians put it, I mined the data. The problem with that is that since I tested about 20 hypotheses, I should expect about one to manifest as statistically significant just by chance (at p = 0.05). So the pattern I unearthed in the second phase should perhaps not be viewed as proof of anything, but as the basis for a hypothesis that, for a proper test, requires fresh data from another setting.

Introducing elevation

My search began this way. In my previous post, I entertained an alternative theory for Owen Ozier‘s finding that deworming indirectly benefited babies born right around the time of the original Worms experiment. Maybe, I thought, the 1997–98 El Nino, which brought heavy flooding to Kenya, exacerbated the conditions for the spread of worms, especially at low elevations. And perhaps by chance the treatment schools were situated disproportionately at high elevations, so their kids fared better. This could explain all the results in Worms and its follow-ups. But the second link in that theory proved weak, especially when defining the treatment group as groups 1 and 2 together, as done in Worms at Work. (Group 1 received treatment starting in 1998, group 2 in 1999, and group 3 in 2001, after the experiment ended.) Average elevation was essentially indistinguishable between the Worms at Work treatment and control groups.

Nevertheless, my investigation of the first link in the theory led me to some interesting discoveries. To start, I directly tested the hypothesis that elevation mattered for impact by “interacting” elevation with the treatment indicator in a key Worms at Work regression. In the original regression, deworming is found to increase the logarithm of wage earnings by 0.269, meaning that deworming increased wage earnings by 30.8%. In the modified regression, the impact could vary with elevation in a straight-line way, as shown in this graph of the impact of deworming in childhood on log wage earnings in early adulthood as a function of school elevation:

The grey bands around the central line show confidence intervals rather as in the earlier graph on weight gains. The black dots along the bottom show the distribution of schools by elevation.

I was struck to find the impact confined to low schools. Yet it could be explained. Low schools are closer to Lake Victoria and the rivers that feed it; and their children therefore were more afflicted by schistosomiasis. In addition, geohelminths (soil-transmitted worms) might have spread more easily in the low, flat lands, especially after El Nino–driven floods. So lower schools may have had higher worm loads.[5]

To fit the data more flexibly, I estimated the relationship semi-parametrically, with locally weighted regressions[6]. This involved analyzing whether among schools around 1140 meters, deworming raised wages; then the same around 1150 meters, and so on. That produced this Lowess-smoothed graph of the impact of deworming on log wage earnings:

This version suggests that the big earnings impact occurred in schools below about 1180 meters, and possibly among schools at around 1250. (For legibility, I truncated the fit at 1270 meters; beyond which the confidence intervals explode for lack of much data.)

Motivated by the theory that elevation mattered for impact because of differences in pre-experiment infection rates, I then graphed how those rates varied with elevation, among the subset of schools with the needed data.[7] Miguel and Kremer measure worm burdens in three ways: prevalence of any infection, prevalence of moderate or heavy infection, and intensity (eggs/gram). So I did as well. First, this graph shows infection prevalence versus school elevation, again in a locally smoothed way:

Like the first table in this post, this graph shows that hookworms lived in nearly all the children, while roundworm and whipworm were each in about half. Not evident before is that schistosomiasis was common at low elevations, but faded higher up. Roundworm and whipworm also appear to fall as one scans from left to right, but then rebound around 1260 meters.

The next graph is the same except that it only counts infections that are moderate or heavy according to WHO definitions[8]:

Interestingly, restricting to serious cases enhances the similarity between the infection curves, just above, and the earlier semi-parametric graph of earnings impact versus elevation. The “Total” curve starts high, declines until 1200 meters or so, then peaks again around 1260. Last, I graphed Miguel and Kremer’s third measure of worm burden, intensity, against elevation. Those images resemble the graph above, and I relegate them to a footnote for concision.[9]

These elevation-stratified plots teach three lessons. First, the similarity between the prevalence contours and the earnings impact contour shown earlier—high at the low elevations and then again around 1260 meters—constitutes circumstantial evidence for a sensible theory: children with the greatest worm burdens benefited most from treatment. Second, that measuring worm load to reflect intensity—moving to the graph just above from the one before—strengthens this resemblance and reinforces the notion of extrapolating from Worms at Work on the basis of intensity (average eggs/gram, not how many kids have any infection).

Finally, these patterns buttress the conclusion of my last post, that the Worms experiment mostly worked. If we grant that deworming probably boosted long-term earnings of children in Busia, then it becomes unsurprising that it did so more where children had more worms. But if we doubt the Worms experiments, then these results become more coincidental. For example, if we hypothesize that flawed randomization put schools whose children were destined to earn more in adulthood disproportionately in the treatment group, then we need another story to explain why this asymmetry only occurred among the schools with the heaviest worm loads. And all else equal, per Occam’s razor, more-complicated theories are less credible.

As I say, the evidence is circumstantial: two quantities of primary interest—initial worm burden and subsequent impact—relate to elevation in about the same way. Unfortunately, it is almost impossible to directly assess the relationship between those two quantities, to ask whether impact covaried with need. The Worms team did not test kids until their schools were about to receive deworming treatment “since it was not considered ethical to collect detailed health information from pupils who were not scheduled to receive medical treatment in that year.” My infection graphs are based on data collected at treatment-group schools only, just before they began receiving deworming in 1998 or 1999. Absent test results for control-group kids, I can’t run the needed comparison.

Contemplating the exploration to this point, I was struck to appreciate that while elevation might not directly matter for the impacts of deworming, like a saw through a log, introducing it exposed the grain of the data. It gave me insight into a relationship that I could not access directly, between initial worm load and subsequent benefit.

Mining in space

After I confronted the impossibility of directly testing whether initial worm burden influenced impact, I thought of one more angle from which to attack the question, if obliquely. This led me, unplanned, to explore the data spatially.

As we saw, nearly all children had geohelminths. So all schools were put on albendazole, whether during the experiment (for treatment groups) or not until after (control group). In addition, the pervasiveness of schistosomiasis in some areas called for a second drug, praziquantel. I sought to check whether the experiment raised earnings more for children in praziquantel areas. Such a finding could be read to say that schistosomiasis is an especially damaging parasite, making treatment for it especially valuable. Or, since the low-elevation schistosomiasis schools tended to have the highest overall worm burdens, it could be taken as a sign that higher parasite loads in general lead to higher benefit from deworming.

Performing the check first required some educated guess work. The Worms data set documents which of the 50 schools in the treatment groups needed and received praziquantel, but not which of the 25 control group schools would have needed it in 1998–99. To fill in these blanks, I mapped the schools by treatment group and praziquantel status. Group 1 schools, treated starting in 1998, are green. Group 2 schools, treated starting in 1999, are yellow. And group 3 (schools not treated till 2001) are red. The white 0’s and 1’s next to the group 1 and 2 markers show which were deemed to need praziquantel, with 1 indicating need:

Most of the 1’s appear in the southern delta and along the shore of Lake Victoria. By eyeballing the map, I could largely determine which group 3 schools also needed praziquantel. For example, those in the delta to the extreme southwest probably needed it since all their neighbors did. I was least certain about the pair to the southeast, which lived in a mixed neighborhood, as it were; I arbitrarily marked one for praziquantel and one not.[10]

Returning to the Worms at Work wage earnings regression and interacting treatment with this new dummy for praziquantel need revealed no difference in impact between schools where only albendazole was deemed needed and given, and schools where both drugs were needed and given:

Evidently, treatment for geohelminths and schistosomiasis, where both were needed, did not help future earnings much more or less than treatment for geohelminths, where only that was warranted. So the comparison generates no strong distinction between the worm types.

After I mapped the schools, it hit me: I could make two-dimensional versions of my earlier graphs, slicing the data not by elevation, but by longitude and latitude.

To start, I fed the elevations of the 75 schools, marked below with white dots, into my statistics software, Stata, and had it estimate the topography that best fit. This produced a depiction of the contours of the land in southern Busia County, with the brightest reds indicating the highest areas:

(Click image for a larger version.) I next graphed the impact of deworming on log wage earnings. Where before I ran the Worms at Work wage earnings regression centering on 1140 meters, then 1150, etc., now I ran the regression repeatedly across a grid, each time giving the most weight to the nearest schools [11]:

Two valleys of low impact dimly emerge, one toward the Lake in the south, one in the north where schools are higher up. Possibly these two troughs are linked to the undulations in my earlier, elevation-stratified graphs.

Next, I made graphs like these for all 21 baseline variables that Worms checks for balance—such as fraction of students who are girls and average age. All the graphs are here. Now I wonder if this was a mistake. None of the graphs fit the one above like a key in lock, so I found myself staring at blobs and wondering which vaguely resembled the pattern I sought. I had no formal, pre-specified measure of fit, which increased uncertainty and discretion. Perhaps it was just a self-administered Rorschach test. Yet the data mining had the power to dilute any p values from subsequent formal tests.

In the end, one variable caught my eye when mapped, and then appeared to be an important mediator of impact when entered into the wage earnings regression. It is: a child’s initial weight-for-age Z-score (WAZ), which measures a child’s weight relative to his or her age peers.[12] Here is the WAZ spatial plot. Compare it to the one just above. To my eye, where WAZ was high, subsequent impact was generally lower:

(Since most children in this sample fell below the reference median, their weight-to-age Z-scores were negative, so in here average WAZ ranges between –1.3 and about –1.5.)

Going back to two dimensions, this graph more directly checks the relationship I glimpsed above, by showing how the impact of deworming on wage earnings varied with children’s pre-treatment weight-to-age Z-score:

It appears that only children below –2, which is the standard definition of “underweight,” benefited enough from deworming treatment that it permanently lifted their developmental trajectories.

If the pattern is real, two dynamics could explain it. Children who were light for their age may have been so precisely because they carried more parasites, and were in deep need of treatment. Or perhaps other health problems made them small, which also rendered them less resilient to infection, and again more needful of treatment. The lack of baseline infection data for the control group prevents me from distinguishing between these theories.

Struck by this suggestion that low initial weight predicted impact, and mindful of the meta-analytic consensus that deworming affects weight, I doubled back to the original Worms study to ask a final question. Were any short-term weight gains in Busia concentrated among kids who started out the most underweight? This could link short-term impacts on weight with long-term impacts on earnings, making both more credible. I made this graph of the one-year impact of deworming treatment on weight-for-age Z-score versus weight-for-age Z-score before treatment (1998)[13]:

The graph seems to support my hypothesis. Severely underweight children (at or below –3) improve by about 0.2 points in Z-score. Underweight children (at or below –2) gain perhaps 0.1 on average.

But there is a puzzling twist. While treatment raised weight among the most severely underweight children, it apparently reduced the weight of the heaviest children. (Bear in mind that in registering just above 0, the highest-WAZ children in Busia were merely surpassing 50th percentile in the global reference population.) Conceivably, certain worm infections cause weight gain, which is reversed by treatment; but here I am speculating. Statisticians might wonder if this graph reveals regression toward the mean. Just as the temperature must rise after the coldest day of the year and fall after the hottest, we could expect that the children who started the experiment the most underweight would become less so, and vice versa. But since the graph compares treatment and control schools, regression toward the mean only works as a theory if it occurred more in the treatment group. That would require a failure of randomization. The previous post argued that the imperfections in the Worms randomization were probably not driving the main results; but possibly they are playing a larger role in these second-order findings about heterogeneity of impact.

Because of these doubts, and because I checked many hypotheses before gravitating to weight-for-age as a mediator of impact, I am not confident that physical health was a good predictor of the long-run impact of deworming on earnings. I view the implications of the last two graphs—that deworming increased weight in the short run and earnings in the long run only among the worst-off children—merely as intriguing. As an indicator of heavy worm burden or poor general health, low weight may have predicted impact. That hypotheses ought to probed afresh in other data, this time with pre-registered transparency. The results from such replication could then sharpen our understanding of how to generalize from Worms at Work.

But I emphasize that my earlier findings revolving around elevation are more confident, because they came out of a small and theoretically motivated set of hypotheses. At elevations where worms were more prevalent, deworming did more long-term good.

Conclusions

I glean these facts:

• Treatment of children known to carry worms improves their nutritional status, as measured by weight and height.
• Typically, a minority of children in today’s deworming settings are infected, so impacts from mass deworming are smaller and harder to detect.
• In meta-analyses, 95% confidence intervals for the impacts of mass deworming tend to contain zero.
• In the case of weight—which is among the best-studied outcomes and more likely to respond to treatment in the short run—Croke et al. improve the precision of meta-analysis. Their results are compatible with others’ estimates, yet make it appear unlikely that average short-term impact of mass deworming is zero or negative.
• Though the consensus estimate of about 0.1 kg for weight gain looks small, once one accounts for the youth and low infection rates of the children behind the number, it does not sit implausibly with the big long-term earnings benefit found in Worms at Work.
• Extrapolating the Worms at Work results to other settings in proportion to infection intensity (eggs/gram) looks reasonable. This will adjust for the likelihood that as prevalence of infection falls, prevalence of serious infection falls faster. Extrapolating this way might leave GiveWell’s cost-effectiveness rating for the Deworm the World unchanged while halving that for the Schistosomiasis Control Initiative (which is not a lot in calculations that already contain large margins of error).
• Within Busia, 1998–99, evidence suggests that the benefits of deworming were confined to children who were the worst off, e.g., who were more numerous at elevations with the most worm infections.
• To speak to the theme of the previous post, this hint of heterogeneity is harder to explain if we believe randomization failure caused the Worms at Work results.
• I did not find heterogeneity that could radically alter our appraisal of charities, such as signs that only treatment of schistosomiasis had long-term benefits.

This recitation of facts makes GiveWell’s estimate of the expected value of deworming charities look reasonable.

Yet, it is also unsatisfying. It is entirely possible that today’s deworming programs do much less, or much more, good than implied by the most thoughtful extrapolation from Worms at Work. Worms, humans, institutions, and settings are diverse, so impacts probably are too. And given the stakes in wealth and health, we ideally would not be in the position of relying so much on one study, which could be flawed or unrepresentative, my defenses notwithstanding. Only more research can make us more sure. If donors and governments are willing to spend nine-figure sums on deworming, they ought to devote a small percentage of that flow to research that could inform how best to spend that money.

Unfortunately, research on long-term impacts can take a long time. In the hope of bringing relevant knowledge to light faster, here are two suggestions. All reasonable effort should be made to:

• Gather and revisit underlying data (“microdata”) from existing high-quality trials, so that certain potential mediators of impact, such as initial worm load and weight, can be studied. This information could influence how we extrapolate from the studies we have to the contexts where mass deworming may be undertaken today. As a general matter, it cannot be optimal that only the original authors can test hypotheses against their data, as is so often the case. In practice, different authors test different outcomes measured different ways, reducing comparability across studies and eroding the statistical power of meta-analysis. Opportunities for learning left unexploited are a waste potentially measured in the health of children.
• Turn past short-term studies into long-term ones by tracking down the subjects and resurveying them.[14] This is easier said than done, but that does not mean a priori that it would be a waste to push against this margin.

Addition, January 9, 2017: One other short-term source of long-term evidence is the impending analysis of the 2011–14 follow-up on the Worms experiment, mentioned in the previous post. If the analysis of the impacts on earnings—which GiveWell has not yet seen—reveals impacts substantially different from those found in the previous round, this could greatly affect GiveWell’s valuations of deworming charities.

Notes

[1] Croke et al. do motivate their focus on weight in a footnote. Only three outcomes are covered by more than three studies in the Cochrane review’s meta-analyses: weight, height, and hemoglobin. Height responds less to recent health changes than weight, so analysis of impacts on height should have lower power. Hemoglobin destruction occurs most with hookworm, yet only one of the hemoglobin studies in the Cochrane review took place in a setting with significant hookworm prevalence.

[2] I thank Kevin Croke for pointing out the need for this adjustment.

[3] Columns S–W of the Parameters tab suggest several choices based on prevalence, intensity, or a mix. Columns Y–AC provide explanations. GiveWell staff may then pick from suggested values or introduce their own.

[4] Lo et al. 2016 fit quadratic curves for the relationship between average infection intensity among the infected (in eggs/gram) and prevalence of any infection. The coefficients are in Table A2. If we then assume that the distribution of infection intensity is in the (two-parameter) negative binomial family, fixing two statistics—prevalence and average intensity as implied by its quadratic relationship with prevalence—suffices to determine the distribution. We can then compute the number of people whose infection intensity exceeds a given standard. In the usual conceptual framework of the negative binomial distribution, each egg per gram is considered a “success.” A fact about the negative binomial distribution that helps us determine the parameters is P = 1–(1 + M/r)^(–r), where M is average eggs/gram for the entire population, including the uninfected; r is the dispersion parameter, i.e., the number of failures before the trials stop; and P is prevalence of any infection, i.e., the probability of at least one success before the requisite number of failures. One conceptual problem in this approach is that intensity in eggs/gram is not a natural count variable despite being modeled as such. Changing the unit of mass in denominator, such as to 100 mg, will somewhat change the simulation results. In the graphs presented here, I work with 1000/24 = 41.67 grams as the denominator since that is a typical mass on the slide of a Kato-Katz test and 24 is thus a standard multiplier when performing the test.

[5] I also experimented with higher-order polynomials in elevation. This hardly changed the results.

[6] I rerun the Worms at Work regression repeatedly while introducing weights centered around elevations 1140, 1150, …, etc. meters. Following the default in Stata’s lowess command, the kernel is Cleveland’s bicube. The bandwidth is 50% of the sample elevation span.

[7] The Worms research team tested random subsets of children at treatment schools just before they were treated, meaning that pre-treatment infection data are available for a third of schools (group 1) for early 1998 and another third (group 2) for early 1999. To maximize statistical power, I merge these pre-treatment samples. Ecological conditions changed between those two collection times, as the El Nino passed, which may well have affected worm loads. But pooling them should not cause bias if schools are reasonably well mixed in elevation, as they appear to be. Averages adjust for the stratification in the sampling of students for testing: 15 students were chosen for each school and grade.

[8] Miguel and Kremer modify the World Health Organization’s suggested standards for moderate infection, stated with reference to eggs per gram of stool. To minimize my discretion, I follow the WHO standards exactly.

[9] There are separate graphs for hookworm, roundworm, whipworm, and schistosomiasis. Here, the shades of grey do not signify levels of confidence about the true average value. Rather, they indicate the 10th, 20th, …, 90th percentiles in eggs per gram, while the black lines show medians (50th percentiles).

[10] Among the group 3 schools, I marked those which school identifiers 108, 218, 205, 202, 189, 167, 212, 211 as warranting praziquantel.

[11] The spatially smoothed impact regressions, and the spatially smoothed averages of baseline variables graphed next, are plotted using the same bandwidth and kernel as before, except that now distance is measured in degrees, in two dimensions. Since Busia is very close to the equator, latitude and longitude degrees correspond to the same distances. Locally weighted averages are computed at a 21×21 grid of points within the latitude and longitude spans of the schools. Points more than .05 degrees from all schools are excluded. Stata’s thin-plate-spline interpolation then fills in the contours.

[12] Weight-for-age z scores are expressed relative to the median of a reference distribution, which I believe comes from samples of American children from about 50 years ago. The WHO and CDC provide reference tables.

[13] The regressions behind the following two graphs incorporate all controls from the Baird et al. low wage earnings regression that are meaningful in this shorter-term context: all interactions of sex and standard (grade) dummies, zone dummies, and initial pupil population.

[14] This idea is inspired by a paper by Kevin Croke, although that paper links a short-term deworming study to long-term outcomes at the parish level, not the individual level.

The post How thin the reed? Generalizing from “Worms at Work” appeared first on The GiveWell Blog.

### Why I mostly believe in Worms

Tue, 12/06/2016 - 09:20

The following statements are true:

• GiveWell is a nonprofit dedicated to finding outstanding giving opportunities through in-depth analysis. Thousands of hours of research have gone into finding our top-ratepd charities.”
• GiveWell recommends four deworming charities as having outstanding expected value. Why? Hundreds of millions of kids harbor parasitic worms in their guts[1]. Treatment is safe, effective, and cheap, so much so that where the worms are common, the World Health Organization recommends administering pills once or twice a year to all children without incurring the cost of determining who is infected.
• Two respected organizations, Cochrane and the Campbell Collaboration, have systematically reviewed the relevant studies and found little reliable evidence that mass deworming does good.

That list reads like a logic puzzle. GiveWell relies on evidence. GiveWell recommends mass-deworming charities. The evidence says mass deworming doesn’t work. How is that possible? Most studies of mass deworming track impact over a few years. The handful that look longer term find big benefits, including one in Kenya that reports higher earnings in adulthood. So great is that benefit that even when GiveWell discounts it by some 99% out of doubts about generalizability, deworming charities look like promising bets.

Still, as my colleagues have written, the evidence on deworming is complicated and ambiguous. And GiveWell takes seriously the questions raised by the Cochrane and Campbell evidence reviews. Maybe the best discount is not 99% but 100%. That would make all the difference for our assessment. This is why, starting in October, I delved into deworming. In this post and the next, I will share what I learned.

In brief, my confidence rose in that Kenya study’s finding of higher earnings in adulthood. I will explain why below. My confidence fell in the generalizability of that finding to other settings, as discussed in the next post.

As with all the recommendations we make, our calculations may be wrong. But I believe they are reasonable and quite possibly conservative. And notice that they do not imply that the odds are 1 in 100 that deworming does great good everywhere and 99 in 100 that it does no good anywhere. It can instead imply that kids receiving mass deworming today need it less than those in the Kenya study, because today’s children have fewer worms or because they are healthy enough in other respects to thrive despite the worms.

Unsurprisingly, I do not know whether 99% overshoots or undershoots. I wish we had more research on the long-term impacts of deworming in other settings, so that we could generalize with more nuance and confidence.

In this post, I will first orient you with some conceptual and historical background. Then I’ll think through two concerns about the evidence base we’re standing on: that the long-term studies lack design features that would add credibility; and that the key experiment in Kenya was not randomized, as that term is generally understood.

Background Conclusions vs. decisions

There’s a deeper explanation for the paradox that opens this post. Back in 1955, the great statistician John Tukey gave an after-dinner talk called “Conclusions vs Decisions,” in which he meditated on the distinction between judging what is true—or might be true with some probability—and deciding what to do with such information. Modern gurus of evidence synthesis retain that distinction. The Cochrane Handbook, which guides the Cochrane and Campbell deworming reviews, is emphatic: “Authors of Cochrane reviews should not make recommendations.” Indeed, researchers arguing today over the impact of mass deworming are mostly arguing about conclusions. Does treatment for worms help? How much and under what circumstances? How confident are we in our answers? We at GiveWell—and you, if you’re considering our charity recommendations—have to make decisions.

The guidelines for the GRADE system for rating the quality of studies nicely illustrates how reaching conclusions, as hard and complicated as it is, still leaves you several logical steps short of choosing action. Under the heading, “A particular quality of evidence does not necessarily imply a particular strength of recommendation,” we read:

For instance, consider the decision to administer aspirin or acetaminophen to children with chicken pox. Observational studies have observed an association between aspirin administration and Reye’s syndrome. Because aspirin and acetaminophen are similar in their analgesic and antipyretic effects, the low-quality evidence regarding the potential harms of aspirin does not preclude a strong recommendation for acetaminophen.

Similarly, high-quality evidence does not necessarily imply strong recommendations. For example, faced with a first deep venous thrombosis (DVT) with no obvious provoking factor patients must, after the first months of anticoagulation, decide whether to continue taking warfarin long term. High-quality randomized controlled trials show that continuous warfarin will decrease the risk of recurrent thrombosis but at the cost of increased risk of bleeding and inconvenience preferences. Because patients with varying values and preferences are likely to make different choices, guideline panels addressing whether patients should continue or terminate warfarin may—despite the high-quality evidence—offer a weak recommendation.

I think some of the recent deworming debate has nearly equated the empirical question of whether mass deworming “works” with the practical question of whether it should be done. More than many participants in the conversation, GiveWell has seriously analyzed the logical terrain between the two questions, with a explicit decision framework that allows and forces us to estimate a dozen relevant parameters. We have found the decision process no more straightforward than the research process appears to be. You can argue with how GiveWell has made its calls (and we hope you will, with specificity), and such argument will probably further expose the trickiness of going from conclusion to decision.

The rest of this post is about the “conclusions” side of the Tukey dichotomy. But having spent time with our spreadsheet helped me approach the research with a more discerning eye, for example, by sensitizing me to the crucial question of how to generalize from the few studies we have.

The research on the long-term impacts of deworming

Two studies form the spine of GiveWell’s support for deworming. Ted Miguel and Michael Kremer’s seminal Worms paper reported that after school-based mass deworming in southern Busia county, Kenya, in the late 1990s, kids came to school more. And there were “spillovers”: even kids at the treated schools who didn’t take the pills saw gains, as did kids at nearby schools that didn’t get deworming. However, children did not do better on standardized tests. In all treatment schools, children were given albendazole for soil-transmitted worms—hookworm, roundworm, whipworm. In addition, where warranted, treatment schools received praziquantel for schistosomiasis, which is transmitted through contact with water and was common near Lake Victoria and the rivers that feed it.

Worms at Work, the sequel written with Sarah Baird and Joan Hamory Hicks, tracked down the (former) kids 10 years later. It found that the average 2.4 years of extra deworming given to the treatment group led to 15% higher non-agricultural earnings[2], while hours devoted to farm work did not change. The earnings gain appeared concentrated in wages (as distinct from self-employment income), which rose 31%.[3] That’s a huge benefit for a few dollars of deworming, especially if it accrued for years. It is what drives GiveWell’s recommendations of deworming charities.

Four more studies track impacts of mass deworming over the long run:

• In 2009–10, Owen Ozier surveyed children in Busia who were too young to have participated in the Kenya experiment, not being in school yet, but who might have benefited through the deworming of their school-age siblings and neighbors. (If your big sister and her friends don’t have worms, you’re less likely to get them too.) Ozier found that kids born right around the time of the experiment scored higher on cognitive tests years later.
• The Worms team confidentially shared initial results from the latest follow-up on the original experiment, based on surveys fielded in 2011–14. Many of those former schoolchildren now have children of their own. The results shared are limited and preliminary, and I advised my colleagues to wait before updating their views based on this research.
• Kevin Croke followed up on a deworming experiment that took place across the border in Uganda in 2000–03. (GiveWell summary here.) Dispensing albendazole (for soil-transmitted worms) boosted children’s scores on basic tests of numeracy and literacy administered years later, in 2010 and 2011. I am exploring and discussing the findings with Kevin Croke, and don’t have anything to report yet.
• In a remarkable act of historical scholarship, Hoyt Bleakley tracked the impacts of the hookworm eradication campaign initiated by the Rockefeller Foundation in the American South a century ago. Though not based on a randomized experiment, his analysis indicates that children who benefited from the campaign went on to earn more in adulthood.

These studies have increased GiveWell’s confidence in generalizing from Worms at Work—but perhaps only a little. Two of the four follow up on the original Worms experiment, so they do not constitute fully independent checks. One other is not experimental. For now, the case for mass deworming largely stands or falls with the Worms and Worms at Work studies. So I will focus on them.

Worm Wars

A few years ago, the International Initiative for Impact Evaluation (3ie) funded British epidemiologists Alexander Aiken and Calum Davey to replicate Worms. (I served on 3ie’s board around this time.) With coauthors, the researchers first exactly replicated the study using the original data and computer code. Then they analyzed the data afresh with their preferred methods. The deeply critical write-ups appeared in the International Journal of Epidemiology in the summer of 2015. The next day, Cochrane (which our Open Philanthropy Project has funded) updated its review of the deworming literature, finding “quite substantial evidence that deworming programmes do not show benefit.” And so, on the dreary plains of academia, did the great worm wars begin.

Much  of that blogospheric explosion of debate[4] is secondary for GiveWell, because it is about the reported bump-up in school attendance after deworming. That matters less to us than the long-term impact on earnings. Getting kids to school is only a means to other ends—at best. Similarly, much debate centers on those spillovers: all sides agree that the original Worms paper overestimated their geographic reach. But that is not so important when assessing charities that aim to deworm all school-age children in a region rather than a subset as in the experiment.

I think GiveWell should focus on these three criticisms aired in the debate:

• The Worms experiment and the long-term follow-ups lack certain design features that are common in epidemiology, with good reason, yet are rare in economics. For example, the kids in the study were not “blinded” through use of placebos to whether they were in a treatment or control group. Maybe they behaved differently merely because they knew they were being treated and observed.
• The Worms experiment wasn’t randomized, as that term is usually meant.
• Against the handful of promising (if imperfect) long-term studies are several dozen short-term studies, which in aggregate find little or no benefit for outcomes such as survival, height, weight, hemoglobin, cognition, and school performance. The surer we are that the short-term impacts are small, the harder it is to believe that the long-term impacts are big.

I will discuss the first two criticisms in this post and the third in the next.

“High risk of bias”: Addressing the critique from epidemiology

Perhaps the most alarming charge against Worms and its brethren has been that they are at “high risk of bias” (Cochrane, Campbell, Aiken et al., Davey et al.). This phrase comes out of a method in epidemiology for assessing the reliability of studies. It is worth understanding exactly what it means.

Within development economics, Worms is seminal because when it circulated in draft in 1999, it launched the field experimentation movement. But it is not as if development economists invented randomized trials. Long before the “randomistas” appeared, epidemiologists were running experiments to evaluate countless drugs, devices, and therapies in countries rich and poor. Through this experience, they developed norms about how to run an experiment to minimize misleading results. Some are codified in the Cochrane Handbook, the bible of meta-analysis, which is the process of systematically synthesizing the available evidence on such questions as whether breast cancer screening saves lives.

The epidemiologists’ norms make sense. An experimental study is more reliable when there is:

• Good sequence generation: The experiment is randomized.
• Sequence concealment: No one knows before subjects enter the study who will be assigned to treatment and who to control. This prevents, for example, cancer patients from dropping out of a trial of a new chemotherapy when they or their doctors learn they’ve been put in the control group.
• Blinding: During the experiment, assignment remains hidden from subjects, nurses, and others who deliver or sustain treatment, so that they cannot adjust their behavior or survey responses, consciously or otherwise. Sometimes this requires giving people in the control group fake treatment (placebos).
• Double-blinding: The people who measure outcomes—who take blood pressure, or count the kids showing up for school—are also kept in the dark about who is treatment and who is control.
• Minimized incomplete outcome data (in economics, “attrition”): If some patients on an experimental drug fare so poorly that they miss follow-up appointments and drop out of a study, that could make the retained patients look misleadingly well-off.
• No selective outcome reporting: Impacts on all outcomes measured are reported—for otherwise we should become suspicious of omissions. Are the researchers hiding contrary findings, or mining for statistically significant impacts? One way researchers can reduce selective reporting and the appearance thereof is to pre-register their analytical plans on a website outside their control.

Especially when gathering studies for meta-analysis, epidemiologists prize these features, as well as clear reporting of their presence or absence.

Yet most of those features are scarce in economics research. Partly that is because economics is not medicine: in a housing experiment, to paraphrase Macartan Humphreys, an agency can’t give you a placebo housing voucher that leaves you sleeping in your car without your realizing it. Partly it is because these desirable features come with trade-offs: the flexibility to test un-registered hypotheses can let you find new facts; sometimes the hospital that would implement your experiment has its own views on how things should be done. And partly the gap between ideal and reality is a sign that economists can and should do better.

I can imagine that, if becoming an epidemiologist involves studying examples of how the absence of such design features can mislead or even kill people, then this batch of unblinded, un-pre-registered, and even un-randomized deworming studies out of economics might look passing strange.[5] So might GiveWell’s reliance upon them.

The scary but vague term of art, “high risk of bias,” captures such worries. The term arises from the Cochrane Handbook. The Handbook, like meta-analysis in general, strives for an approach that is mechanical in its objectivity. Studies are to be sifted, sorted, and assessed on observable traits, such as whether they are blinded. In providing guidance to such work, the Handbook distinguishes credibility from quality. “Quality” could encompass such traits as whether proper ethical review was obtained. Since Cochrane focuses on credibility, the handbook authors excluded “quality” from their nomenclature for study design issues. They settled on “risk of bias” as a core term, it being the logical antithesis of credibility.

Meanwhile, while some epidemiologists have devised scoring systems to measure risk of bias—plus 1 point for blinding, minus 2 for lack of pre-registration, etc.—the Cochrane Handbook says that such scoring is “is not supported by empirical evidence.” So, out of a sort of humility, the Handbook recommends something simpler: run down a checklist of design features, and for each one, just judge whether a study has it or not. If it does, label it as having “low risk of bias” in that domain. Otherwise, mark it “high risk of bias.” If you can’t tell, call it “unclear risk of bias.”

Thus, when a study earns the “high risk of bias” label, that means that it lacks certain design features that all concerned agree are desirable. Full stop.

So while the Handbook’s checklist brings healthy objectivity to evidence synthesis, it also brings limitations, especially in our context:

• Those unversed in statistics, including many decision-makers, may not appreciate that “bias” carries a technical meeting that is less pejorative than the everyday one. It doesn’t mean “prejudiced.” It means “gives an answer different from the true answer, on average.” So, especially in debates that extend outside of academia, the term’s use tends to sow confusion and inflame emotions.
• The binaristic label “high risk of bias” may be humble in origins, but it does not come off as humble in use. At least to non-experts the pronouncement, “the study is at high risk of bias,” seem confident. But how big is the potential bias and how great the risk? More precisely, what is the probability distribution for the bias? No one knows.
• While useful when distilling knowledge from reams of research, the objectivity of the checklist comes at a price in superficiality. And the trade-off becomes less warranted when examining five studies instead of 50. As members of the Worms team point out, some Cochrane-based criticisms of their work make less sense on closer inspection. For example, the lack of blinding in Worms “cannot explain why untreated pupils in a treatment school experienced sharply reduced worm infections.”
• The checklist is incomplete. E.g., with an assist from Ben Bernanke, economics is getting better at transparency. Perhaps we should brand all studies for which data and code have not been publicly shared as being at “high risk of bias” for opacity. The controversy that ensued after the 3ie-funded replication of Worms generated a lot of heat, but light too. There were points of agreement. Speaking personally, exploring the public data and code for Worms and Worms at Work ultimately raised my trust in those studies, as I will explain. If it had done opposite, that too would have raised my confidence in whatever conclusion I extracted. Arguably, Worms is now the most credible deworming study, for no other has survived such scrutiny.

So what is a decisionmaker to do with a report of “high risk of bias”? If the choice is between relying on “low risk” studies and “high risk” studies, all else equal, then the choice is clear: favor the “low risk” studies. But what if all the studies before you contain “high risk of bias”?

That question may seem to lead us to an analytical cul-de-sac. But some researchers have pushed through it, with meta-epidemiology. A 1995 article (hat tip: Paul Garner) drew together 250 studies from 33 meta-analyses of certain interventions relating to pregnancy, labor, and delivery. They asked: do studies lacking blinding or other good features report bigger impacts? The answers were “yes” for sequence concealment and double-blinding and “not so much” for randomization and attrition. More studies have been done like that. And researchers have even aggregated those, which I suppose is meta-meta-epidemiology. (OK, not really.) One example cited by the Cochrane Handbook finds that lack of sequence concealment is associated with an average impact exaggeration of 10%, and, separately, that lack of double-blinding is associated with exaggeration by 22%.[6]

To operationalize “high risk of bias,” we might discount the reported long-term benefits from deworming by such factors. No one knows if those discounts would be right. But they would make GiveWell’s ~99% discount—which can compensate for 100-fold (10000%) exaggeration—look conservative.

The epidemiological perspective should alert economists to ways they can improve. And it has helped GiveWell appreciate limitations in deworming studies. But the healthy challenge from epidemiologists has not undermined the long-term deworming evidence as completely as it may at first appear.

Why I pretty much trust the Worms experiment

Here are Stata do and log files for the quantitative assertions below that are based on publicly available data.

I happened to attend a conference on “What Works in Development” at the Brookings Institution in 2008. As economists enjoyed a free lunch, the speaker, Angus Deaton, launched a broadside against the randomization movement. He made many points. Some were so deep I still haven’t fully grasped them. I remember best two less profound things he said. He suggested that Abhijit Banerjee and Esther Duflo flip a coin and jump out of an airplane, the lucky one with a parachute, in order to perform a much-needed randomized controlled trial of this injury-prevention technology. And he pointed out that the poster child of the randomization movement, Miguel and Kremer’s Worms, wasn’t actually randomized—at least not as most people understood that term.

It appears that that the charity that carried out the deworming for Miguel and Kremer would not allow schools to be assigned to treatment or control via rolls of a die or the computer equivalent. Instead, Deaton said, the 75 schools were listed alphabetically. Then they were assigned cyclically to three groups: the first school went to group 1, the second to group 2, the third to group 3, the fourth to group 1, and so on. Group 1 started receiving deworming treatment in 1998; group 2 in 1999; and group 3, the control, not until after the experiment ended in 2000. During the Q&A that day at Brookings, Michael Kremer politely argued that he could think of no good theory for why this assignment system would generate false results—why it would cause, say, group 1 students to attend school more for some reason other than deworming.[7] I think Deaton replied by citing the example of a study that was widely thought to be well randomized until someone showed that it wasn’t.[8] His point was that unless an experiment is randomized, you can’t sure be that no causal demons lurk within.

This exchange came to mind when I began reading about deworming. As I say, GiveWell is less interested in whether treatment for worms raised school attendance in the short run than whether it raised earnings in the long run. But those long-term results, in Worms at Work, depend on the same experiment for credibility. In contrast with the meta-analytic response to this concern, which is to affix the label “high risk of bias for sequence generation” and move on, I dug into the study’s data. What I attacked hardest was the premise that before the experiment began, the three school groups were statistically similar, or “balanced.”

Mostly the premise won.

Yes, there are reasons to doubt the Worms experiment…

If I were the prosecutor in Statistical balance police v. Miguel and Kremer, I’d point out that:

• Deaton had it wrong: schools were not alphabetized. It was worse than that, in principle. The 75 schools were sorted alphabetically by division and zone (units of local geography in Kenya) and within zones by enrollment. Thus, you could say, a study famous for finding more kids in school after deworming formed its treatment groups on how many kids were in school before deworming. That is not ideal. In the worst case, the 75 schools would have been situated in 25 zones, each with three schools. The cyclic algorithm would then have always put the smallest school in group 1, the middle in group 2, and the largest in group 3. And if the groups started out differing in size, they would probably have differed in other respects too, spoiling credibility. (In defense of Deaton, I should say that the authors’ description of the cyclical procedure changed between 2007 and 2014.)
• Worms reports that the experimental groups did start out different in some respects, with statistical significance: “Treatment schools were initially somewhat worse off. Group 1 pupils had significantly more self-reported blood in stool (a symptom of schistosomiasis infection), reported being sick more often than Group 3 pupils, and were not as clean as Group 2 and Group 3 pupils (as observed by NGO field workers).” Now, in checking balance, Table I of Worms makes 42 comparisons: group 1 vs. group 3 and group 2 vs. group 3 for 21 variables. Even if balance were perfect, when imposing a p = 0.05 significance threshold, one should expect about 5% of the tests to show up as significant, or about two of 42. In the event, five show up that way. I confirmed with formal tests that these differences were unexpected in aggregate if the groups were balanced.
• Moreover, the groups differed before the experiment in a way not previously reported: in school attendance. Again, this looks very bad on the surface, since attendance is a major focus of Worms. According to school registers, attendance in grades 3–8 in early 1998 averaged 97.3%, 96.3%, and 96.9% in groups 1, 2, and 3 respectively. Notice that group 3’s rate put it between the others. This explains why, when Worms separately compares groups 1 and 2 to 3, it does not find terribly significant differences (p = 0.4, 0.12). But the distance from group 1 to 2—which is not checked—is more significant (p = 0.02), as is that from group 1 to 2 and 3 averaged together (p = 0.06). In the first year of the experiment, only group 1 was treated. So if it started out with higher attendance, can we confidently attribute the higher attendance over the following year to deworming?
Miguel and Kremer point out that school registers, from which those attendance rates come, “are not considered reliable in Kenya.” Indeed, at about 97%, the rates implausibly approach perfection. This is why the researchers measured attendance by independently sending enumerators on surprise visits to schools. They found attendance around 68–76% in the 1998 control group schools (bottom of Table VI). So should we worry about a tiny imbalance in nearly meaningless school-reported attendance? Perhaps so. I find that at the beginning of the experiment the school- and researcher-reported attendance correlated positively. Each 1% increase in a school’s self-reported attendance—equivalent to moving from group 2 to group 1—predicted a 3% increase in researcher-recorded attendance (p = 0.008), making the starting difference superficially capable of explaining roughly half the direct impact found in Worms.
…but there are reasons to trust the Worms experiment too

• Joan Hamory Hicks, who manages much of the ongoing Worms follow-up project, sent me the spreadsheet used to assign the 75 schools to the three groups back in 1997. Its contents do not approximate the worst case I described, with three schools in each zone. There are eight zones, and their school counts range from four to 15. Thus, cyclical assignment did introduce substantial arbitrariness with respect to initial school enrollment. In some zones the first and smallest school went into group 1, in others group 2, in others group 3.
• As for the documented imbalances, such as kids in group 1 schools being sick more often, Worms points out that these should probably make the study conservative: the groups that ultimately fared better started out worse off.
• The Worms team began collecting attendance data in all three groups, in early 1998 before the first deworming visits took place. Those more-accurate numbers do not suggest imbalance across the three groups (p = 0.43). And the correlation of school-recorded attendance, which is not balanced, and researcher-recorded attendance, which is, is not dispositive. If you looked across a representative 75 New York City schools at two arbitrarily chosen variables—say, fraction of students who qualify for free meals and average class size—they could easily be correlated too. Finally, when I modify a basic Miguel and Kremer attendance regression (Table IX, col. 1) to control for the imbalanced school-recorded attendance variable, it hardly perturbs the results (except by restricting the sample because of missing observations for this variable). If initial treatment-control differences in school-recorded attendance were a major factor in the celebrated impact estimates, we would expect that controlling for the former would affect the latter

In addition, three observations more powerfully bolster the Worms experiment.

First, I managed to identify the 75 schools and link them to a public database of primary schools in Kenya. (In email, Ted Miguel expressed concern for the privacy of the study subjects, so I will not explain how I did this nor share the school-level information I gained thereby, except the elevations discussed just below.) This gave me fresh school-level variables on which to test the balance of the Worms experiment, such as institution type (religious, central government, etc.) and precise latitude and longitude. I found little suggestion of imbalance on the new variables as a group (p= 0.7, 0.2 for overall differences between group 1 or 2 and group 3; p = 0.54 for a difference between groups 1 and 2 together and group 3, which is the split in Worms at Work). Then, with a Python program I wrote, I used the geo-coordinates of the schools to query Google for their elevations in meters above sea level. The hypothesis that the groups differed on elevation is rejected at p = 0.36, meaning once more that a hypothesis of balance on a new variable is not strongly rejected. And if we aggregate groups 1 and 2 into a single treatment group as in Worms at Work, p = 0.97.

Second, after the Worms experiment finished in 2000—and all 75 schools were receiving deworming—Miguel and Kremer launched a second, truly randomized experiment in the same setting. With respect to earnings in early adulthood (our main interest), the new experiment generates similar, if less precise, results. The experiment took on a hot topic of 2001: whether to charge poor people for basic services such as schooling and health care, in order to make service provision more financially sustainable as well as more accountable to clients. The researchers took the 50 group 1 and group 2 schools from the first experiment and randomly split them into two new groups. In the new control group, children continued to receive deworming for free. In the new treatment group, for the duration of 2001, families were charged 30 shillings ($0.40) for albendazole, for soil-transmitted worms, and another 70 shillings ($0.90) for praziquantel, where warranted for schistosomiasis. In response to the “user fees,” take-up of deworming medication fell 80% in the treatment group (which therefore, ironically, received less treatment). In effect, a second and less impeachable deworming experiment had begun.

Like the original, this new experiment sent ripples into the data that the Worms team collected as it tracked the former schoolchildren into adulthood. Because the user fee trial affected a smaller group—50 instead of 75 schools—for a shorter time—one year instead of an average 2.4 in the original experiment—it did not generate impact estimates of the same precision. This is probably why Worms at Work gives those estimates less space than the ones derived from the original experiment.

But they are there. And they tend to corroborate the main results. The regression that has anchored GiveWell’s cost-effectiveness analysis puts the impact of the first experiment’s 2.4 years of deworming on later wage earnings at +31% (p = 0.002). If you run the publicly available code on the publicly available data, you discover that the same regression estimates that being in the treatment arm of the second experiment cut wage earnings by 14% (albeit with less confidence: p = 0.08). The hypothesis that the two implied rates of impact are equal—31% per 2.4 years and 14% per 80% x 1 year—fits the data (p = 0.44). More generally, Worms at Work states that among 30 outcomes checked, in domains ranging from labor to health to education, the estimated long-term impacts of the two experiments agree in sign in 23 cases. The odds of that happening by chance are 1 in 383.[9]

The third source of reinforcement for the Worms experiment is Owen Ozier’s follow-up. In 2009 and 2010, he and his assistants surveyed 2400 children in the Worms study area who were born between about 1995 and 2001. I say “about” because their birth dates were estimated by asking them how many years old they were, and if a child said in August 2009 that she was eight, that meant that she was born in 2000 or 2001. By design, the survey covered children who were too young to have been in school during the original Worms experiment, but who might have benefited indirectly, through the deworming of their older siblings and neighbors. The survey included several cognitive tests, among them Raven’s Matrices, which are best understood by looking at an example.

This graph from the Ozier working paper shows the impact of Miguel and Kremer’s 1998–2000 deworming experiment on Raven’s Matrix scores of younger children, by approximate year of birth:

To understand the graph, look at the right end first. The white bars extending slightly below zero say that among children born in 2001 (or maybe really 2002) those linked by siblings and neighbors to group 1 or group 2 schools scored slightly lower than those linked to group 3 schools—but not with any statistical significance. The effective lack of difference is easy to explain since by 2001, schools in all three groups were or had been receiving deworming. (Though there was that user fee experiment in 2001….) For children in the 2000 birth cohort, no comparisons are made, because of the ambiguity over whether those linked to group 3 were born in 2000, when group 3 didn’t receive deworming, or 2001, when it did. Moving to 1999, we find more statistically significant cognitive benefits for kids linked to the group 1 and 2 schools, which indeed received deworming in 1999–2000. Something similar goes for 1998. Pushing farther back, to children born before the experiment, we again find little impact, even though a few years after birth some would have had deworming-treated siblings and neighbors and some not. This suggests that the knock-on benefit for younger children was largely to confined to their first year of life.

The evidence that health problems in infancy can take a long-term toll is interesting in itself. But it matters for us in another way too. Suppose you think that because the Worms experiment’s quasi-randomization failed to achieve balance, initial cross-group differences in some factor, visible or hidden, generated the Worms at Work results. Then, essentially, you must explain why that factor caused long-term gains in cognitive scores only among kids born during the experiment. If, say, children at group 1 schools were less poor at the onset of the experiment, creating the illusion of impact, we’d expect the kids at those schools to be less poor a few years before and after too.

It’s not impossible to meet this challenge. I conjectured that the Worms groups were imbalanced on elevation, which differentially exposed them to the destructive flooding caused by the strong 1997–98 El Nino. But my theory foundered on the lack of convincing evidence of imbalance on elevation, which I described above.

At any rate, the relevant question is not whether it is possible to construct a story for how poor randomization could falsely generate all the short- and long-term impacts found from the Worms experiment. It is how plausible such a story would be. The more strained the alternative theories, the more credible does the straightforward explanation become, that giving kids deworming pills measurably helped them.

One caveat: GiveWell has not obtained Ozier’s data and code, so we have not vetted this study as much as we have Worms and Worms at Work.

Summary

I came to this investigation with some reason to doubt Worms and found more when I arrived. But in the end, the defenses persuade me more than the attacks. I find that:

• The charge of “high risk of bias” is legitimate but vague.
• Under a barrage of tests, the statistical balance of the experiment mostly survives.
• The original experiment is corroborated by a second, randomized one.
• There is evidence that long-term cognitive benefits are confined to children born right around the time of the experiment, a pattern that is hard to explain except as impacts of deworming.

In addition, I plan to present some fresh findings in my next post that, like Ozier’s, seem to make alternative theories harder to fashion (done).

When there are reasons to doubt and reasons to trust an experiment, the right response is not to shrug one’s shoulders, or give each point pro and con a vote, or zoom out and ponder whether to side with economists or epidemiologists. The right response is to ask: what is the most plausible theory that is compatible with the entire sweep of the evidence? For me, an important criterion for plausibility is Occam’s razor: simplicity.

As I see it now, the explanation that best blends simplicity and compatibility-with-evidence runs this way: the imbalances in the Worms experiment are real but small, are unlikely to explain the results, and if anything make those results conservative; thus, the reported impacts are indeed largely impacts. If one instead assumes the Worms results are artifacts of flawed experimental design, execution, and analysis, then one has to construct a complicated theory for why, e.g. the user fee experiment produces similar results, and why the benefits for non-school-age children appear confined to those born in the treatment groups around the time of differential treatment.

I hope that anyone who disagrees will prove me wrong by constructing an alternative yet simple theory that explains the evidence before us.

I’m less confident when it comes to generalizing from these experiments. Worms, Worms at Work, and Ozier tell us something about what happened after kids in one time and place were treated for intestinal helminths. What do those studies tell us about the effectiveness of deworming campaigns today, from Liberia to India? I’ll explore that next.

Notes

[1] The WHO estimates that 2 billion people carry soil-transmitted “geohelminths,” including hookworm, roundworm, and whipworm. Separately, it reports that 258 million people needed treatment for schistosomiasis which is transmitted by contact with fresh water. Children are disproportionately affected because of their play patterns and poorer hygiene.

[2] Baird et al. (2016), Table IV, Panel A, row 3, estimates a 112-shilling increase over a control-group mean of 749/month. Panel B, row 1, suggest that the effect is concentrated in wage earnings.

[3] Baird et al. (2016), Table IV, Panel A, row 1, col. 1, reports 0.269. Exponentiating that gives a 31% increase.

[4] For an overview, I recommend Tim Harford’s graceful take. To dig in more, see the Worms authors’ reply and the posts by Berk Ozler, Chris Blattman, and my former colleagues Michael Clemens and Justin Sandefur. To really delve, read Macartan Humphreys, and Andrew Gelman’s and Miguel and Kremer’s responses thereto.

[5] For literature on the impacts of these study design features on results, see the first 10 references of Schulz et al. 1995.

[6] Figures obtained by dividing the “total” point estimates from the linked figures into 1. The study expresses higher benefits as lower risk estimates, in the sense that risk of bad outcomes is reduced.

[7] The Baird et al. (2016) appendix defends the “list randomization” procedure more fully.

[8] Deaton may have mentioned Angrist (1990) and Heckman’s critique of it. But I believe the lesson there is not about imperfect quasi-randomization but local average treatment effects.

[9] For the cumulative distribution function of the binomial distribution, F(30,7,0.5) = .00261.

The post Why I mostly believe in Worms appeared first on The GiveWell Blog.

### Our updated top charities for giving season 2016

Mon, 11/28/2016 - 22:58

We have refreshed our top charity rankings and recommendations. We now have seven top charities: our four top charities from last year and three new additions. We have also added two new organizations to our list of charities that we think deserve special recognition (previously called “standout” charities).

Instead of ranking organizations, we rank funding gaps, which take into account both charities’ overall quality and cost-effectiveness and what more funding would enable them to do. We also account for our expectation that Good Ventures, a foundation we work closely with, will provide significant support to our top charities ($50 million in total). Our recommendation to donors is based on the relative value of remaining gaps once Good Ventures’ expected giving is taken into account. We believe that the remaining funding gaps offer donors outstanding opportunities to accomplish good with their donations. Our top charities and recommendations for donors, in brief Top charities We are continuing to recommend the four top charities we did last year and have added three new top charities: 1. Against Malaria Foundation (AMF) 2. Schistosomiasis Control Initiative (SCI) 3. END Fund for work on deworming (added this year) 4. Malaria Consortium for work on seasonal malaria chemoprevention (added this year) 5. Sightsavers for work on deworming (added this year) 6. Deworm the World Initiative, led by Evidence Action 7. GiveDirectly We have ranked our top charities based on what we see as the value of filling their remaining funding gaps. We do not feel a particular need for individuals to divide their allocation across all of the charities, since we are expecting Good Ventures will provide significant support to each. For those seeking our recommended allocation, we recommend giving 75% to the Against Malaria Foundation and 25% to the Schistosomiasis Control Initiative, which we believe to have the most valuable unfilled funding gaps. Our recommendation takes into account the amount of funding we think Good Ventures will grant to our top charities, as well as accounting for charities’ existing cash on hand, and expected fundraising (before gifts from donors who follow our recommendations). We recommend charities according to how much good additional donations (beyond these sources of funds) can do. Other Charities Worthy of Special Recognition As with last year, we also provide a list of charities that we believe are worthy of recognition, though not at the same level (in terms of likely good accomplished per dollar) as our top charities (we previously called these organizations “standouts”). They are not ranked, and are listed in alphabetical order. Below, we provide: • An explanation of major changes in the past year that are not specific to any one charity. More • A discussion of our approach to room for more funding and our ranking of charities’ funding gaps. More • Summary of key considerations for top charities. More • Detail on each of our new top charities, including an overview of what we know about their work and our understanding of each organization’s room for more funding. More • Detail on each of the top charities we are continuing to recommend, including an overview of their work, major changes over the past year and our understanding of each organization’s room for more funding. More • The process we followed that led to these recommendations. More • A brief update on giving to support GiveWell’s operations vs. giving to our top charities. More Conference call to discuss recommendations We are planning to hold a conference call at 5:30pm ET/2:30pm PT on Thursday, December 1 to discuss our recommendations and answer questions. If you’d like to join the call, please register using this online form. If you can’t make this date but would be interested in joining another call at a later date, please indicate this on the registration form. Major changes in the last 12 months Below, we summarize the major causes of changes to our recommendations (since last year). Most important changes in the last year: • We engaged with more new potential top charities this year than we have in several years (including both inviting organizations to participate in our process and responding to organizations that reached out to us). This work led to three additional top charities. We believe our new top charities are outstanding giving opportunities, though we note that we are relatively less confident in these organizations than in our other top charities—we have followed each of the top charities we are continuing to recommend for five or more years and have only began following the new organizations in the last year or two. • Overall, our top charities have more room for more funding than they did last year. We now believe that AMF, SCI, Deworm the World, and GiveDirectly have strong track records of scaling their programs. Our new top charities add additional room for more funding and we believe that the END Fund and Malaria Consortium, in particular, could absorb large amounts of funding in the next year. We expect some high-value opportunities to go unfilled this year. • Last year, we wrote about the tradeoff between Good Ventures accomplishing more short-term good by filling GiveWell’s top charities’ funding gaps and the long-term good of saving money for other opportunities (as well as the good of not crowding out other donors, who, by nature of their smaller scale of giving, may have fewer strong opportunities). Due to the growth of the Open Philanthropy Project this year and its increased expectation of the size and value of the opportunities it may have in the future, we expect Good Ventures to set a budget of$50 million for its contributions to GiveWell top charities. The Open Philanthropy Project plans to write more about this in a future post on its blog.

Room for more funding analysis

Types of funding gaps

We’ve previously outlined how we categorize charities’ funding gaps into incentives, capacity-relevant funding, and execution levels 1, 2, and 3. In short:

• Incentive funding: We seek to ensure that each top charity receives a significant amount of funding (and to a lesser extent, that charities worthy of special recognition receive funding as well). We think this is important for long-run incentives to encourage other organizations to seek to meet these criteria. This year, we are increasing the top charity incentive from $1 million to$2.5 million.
• Capacity-relevant funding: Funding that we believe has the potential to create a significantly better giving opportunity in the future. With one exception, we don’t believe that any of our top charities have capacity-relevant gaps this year. We have designated the first $2 million of Sightsavers’ room for more funding as capacity-relevant because seeing results from a small number of Sightsavers deworming programs would significantly expand the evidence base for its deworming work and has the potential to lead us to want to support Sightsavers at a much higher level in the future (more). • Execution funding: Funding that allows charities to implement more of their core programs. We separated this funding into three levels: level 1 is the amount at which we think there is a 50% chance that the charity will be bottlenecked by funding; level 2 is a 20% chance of being bottlenecked by funding, and level 3 is a 5% chance. Ranking funding gaps The first million dollars to a charity can have a very different impact from, e.g., the 20th millionth dollar. Accordingly, we have created a ranking of individual funding gaps that accounts for both (a) the quality of the charity and the good accomplished by its program per dollar, and (b) whether a given level of funding is capacity-relevant and whether it is highly or only marginally likely to be needed in the coming year. The below table lays out our ranking of funding gaps. When gaps have the same “Priority,” this indicates that they are tied. When gaps are tied, we recommend filling them by giving each equal dollar amounts until one is filled, and then following the same procedure with the remaining tied gaps. See footnote for more.* The table below includes the amount we expect Good Ventures to give to our top charities. For reasons the Open Philanthropy Project will lay out in another post, we expect that Good Ventures will cap its giving to GiveWell’s top charities this year at$50 million. We expect that Good Ventures will start with funding the highest-rated gaps and work its way down, in order to accomplish as much good as possible.

Note that we do not always place a charity’s full execution level at the same rank and in some cases rank the first portion of a given charity’s execution level ahead of the remainder. This is because many of our top charities are relatively close to each other in terms of their estimated cost-effectiveness (and thus, the value of their execution funding). For reasons we’ve written about in the past, we believe it is inappropriate to put too much weight on relatively small differences in explicit cost-effectiveness estimates. Because we expect that there are diminishing returns to funding, we would guess that the cost-effectiveness of a charity’s funding gap falls as it receives more funding.

Priority Charity Amount, in millions USD (of which, expected from Good Ventures*) Type Comment 1 Deworm the World $2.5 (all) Incentive – 1 SCI$2.5 (all) Incentive – 1 Sightsavers $2.5 (all) Incentive – 1 AMF$2.5 (all) Incentive – 1 GiveDirectly $2.5 (all) Incentive – 1 END Fund$2.5 (all) Incentive – 1 Malaria Consortium $2.5 (all) Incentive – 1 Other charities worthy of special recognition$1.5 (all) Incentive $250,000 each for six charities 3 SCI$6.5 (all) Fills rest of execution level 1 Highest cost-effectiveness of remaining level 1 gaps 4 AMF $8.5 (all) First part of execution level 1 Similar cost-effectiveness to END Fund and Sightsavers and greater understanding of the organization. Expect declining cost-effectiveness within Level 1, and see other benefits (incentives) to switching to END Fund and Sightsavers after this point. 5 END Fund$2.5 (all) Middle part of execution level 1 Given relatively limited knowledge of charity, capping total recommendation at $5 million 6 Sightsavers$0.5 (all) Fills rest of execution level 1 Similar cost-effectiveness to AMF and the END Fund 7 Deworm the World $2.0 (all) Fills execution level 2 Highest-ranked level 2 gap. Highest cost-effectiveness and confidence in organization 8 SCI$4.5 (all) First part of execution level 2 Highest cost-effectiveness of remaining level 2 gaps 9 Malaria Consortium $2.5 (all) Part of execution level 1 Given relatively limited knowledge of charity, capping total recommendation at$5 million 10 AMF $18.6 ($5.1) Part of execution level 1 Expect declining cost-effectiveness within level 1; ranked other gaps higher due to this and incentive effects 11 SCI $4.5 ($0) Fills execution level 2 Roughly expected to be more cost-effective than the remaining $49 million of AMF level 1 * Also includes$1 million that GiveWell holds for grants to top charities. More below.

Summary of key considerations for top charities

The table below summarizes the key considerations for our seven top charities. More detail is provided below as well as in the charity reviews.

Consideration AMF Malaria Consortium Deworm the World END Fund SCI Sightsavers GiveDirectly Estimated cost-effectiveness (relative to cash transfers) ~4x ~4x ~10x ~4x ~8x ~5x Baseline Our level of knowledge about the organization High Relatively low High Relatively low High Relatively low High Primary benefits of the intervention Under-5 deaths averted and possible increased income in adulthood Possible increased income in adulthood Immediate increase in consumption and assets Ease of communication Moderate Strong Strong Strong Moderate Moderate Strongest Ongoing monitoring and likelihood of detecting future problems Moderate Moderate Strong Moderate Moderate Moderate Strongest Room for more funding, after expected funding from Good Ventures and donors who give independently of our recommendation High: less than half of Execution Level 1 filled High: not quantified, but could likely use significantly more funding Low: Execution Levels 1 and 2 filled High: half of Execution Level 1 filled Moderate: Execution Level 1 and some of Level 2 filled Moderate: Execution Level 1 filled Very high: less than 15% of Execution Level 1 filled

Our recommendation to donors

If Good Ventures uses a budget of $50 million to top charities and follows our prioritization of funding gaps, it will make the following grants (in millions of dollars, rounded to one decimal place): • AMF:$15.1
• Deworm the World: $4.5 • END Fund:$5.0
• GiveDirectly: $2.5 • Malaria Consortium:$5.0
• SCI: $13.5 • Sightsavers:$3.0
• Grants to other charities worthy of special recognition: $1.5 We also hold about$1 million that is restricted to granting out to top charities. We plan to use this to make a grant to AMF, which is the next funding gap on the list after the expected grants from Good Ventures.

We estimate that non-Good Ventures donors will give approximately $27 million between now and the start of June 2017; we expect to refresh our recommendations to donors in mid-June. Of this, we expect$18 million will be allocated according to our recommendation for marginal donations, while $9 million will be given based on our top charity list—this$9 million is considered ‘expected funding’ for each charity and therefore subtracted from their room for more funding.

$18 million spans two gaps in our prioritized list, so we are recommending that donors split their gift, with 75% going to AMF and 25% going to SCI, or give to GiveWell for making grants at our discretion and we will use the funds to fill in the next highest priority gaps. Details on new top charities Before this year, our top charity list had remained nearly the same for several years. This means that we have spent hundreds of hours talking to these groups, reading their documents, visiting their work in the field, and modeling their cost-effectiveness. We have spent considerably less time on our new top charities, particularly Malaria Consortium, and have not visited their work in the field (though we met with Sightsavers’ team in Ghana). We believe our new top charities are outstanding giving opportunities, though we think there is a higher risk that further investigation will lead to changes in our views about these groups. A note about deworming Four of our top charities, including two new top charities, support programs that treat schistosomiasis and soil-transmitted helminthiasis (STH) (“deworming”). We estimate that SCI and Deworm the World’s deworming programs are more cost effective than mass bednet campaigns, but our estimates are subject to substantial uncertainty. For Sightsavers and END Fund, our greater uncertainty about cost per treatment and prevalence of infection in the areas where they work leads us to the conclusion that the cost-effectiveness of their work is on par with that of bednets. It’s important to note that we view deworming as high expected value, but this is due to a relatively low probability of very high impact. Our cost-effectiveness model implies that most staff members believe you should use a multiplier of less than 1% compared to the impact (increased income in adulthood) found in the original trials—this could be thought of as assigning some chance that deworming programs have no impact, and some chance that the impact exists but will be smaller than was measured in those trials. Full discussion in this blog post. Our 2016 cost-effectiveness analysis is here. This year, David Roodman conducted an investigation into the evidence for deworming’s impact on long-term life outcomes. David will write more about this in a future post, but in short, we think the strength of the case for deworming is similar to last year’s, with some evidence looking weaker, new evidence that was shared with us in an early form this year being too preliminary to incorporate, and a key piece of evidence standing up to additional scrutiny. END Fund (for work on deworming) Our full review of END Fund is here. Overview The END Fund (end.org) manages grants, provides technical assistance, and raises funding for controlling and eliminating neglected tropical diseases (NTDs). We have focused our review on its support for deworming. About 60% of the treatments the END Fund has supported have been deworming treatments, while the rest have been for other NTDs. The END Fund has funded SCI, Deworm the World, and Sightsavers. We see the END Fund’s value-add as a GiveWell top charity as identifying and providing assistance to programs run by organizations other than those we separately recommend, and our review of the END Fund has excluded results from charities on our top charity list. We have not yet seen monitoring results on the number of children reached in END Fund-supported programs. The END Fund has instituted a requirement that grantees conduct coverage surveys and the first results will be available in early 2017. While we generally put little weight on plans for future monitoring, we feel that the END Fund’s commitment is unusually credible because surveys are already underway or upcoming in the next few months, we are familiar enough with the type of survey being used (from research on other deworming groups) that we were able to ask critical questions, and the END Fund provided specific answers to our questions. We have more limited information on some questions for the END Fund than we do for the top charities we have recommended for several years. We do not have a robust cost per treatment figure, and also have limited information on infection prevalence and intensity. Funding gap We estimate that the END Fund could productively use between$10 million (50% confidence) and $22 million (5% confidence) in the next year to expand its work on deworming. By our estimation, about a third of this would be used to fund other NTD programs. This estimate is based on (a) a list of deworming funding opportunities that the END Fund had identified as of October and its expectation of identifying additional opportunities over the course of the year (excluding opportunities to grant funding to Deworm the World, SCI, or Sightsavers, which we count in those organizations’ room for more funding); and (b) our rough estimate of how much funding the END Fund will raise. The END Fund is a fairly new organization whose revenue comes primarily from a small number of major donors so it is hard to predict how much funding it will raise. The END Fund’s list of identified opportunities includes both programs that END Fund has supported in past years and opportunities to get new programs off the ground. Sightsavers (for work on deworming) Our full review of Sightsavers is here. Overview Sightsavers (sightsavers.org) is a large organization with multiple program areas that focuses on preventing avoidable blindness and supporting people with impaired vision. Our review focuses on Sightsavers’ work to prevent and treat neglected tropical diseases (NTDs) and, more specifically, advocating for, funding, and monitoring deworming programs. Deworming is a fairly new addition to Sightsavers’ portfolio; in 2011, it began delivering some deworming treatments through NTD programs that had been originally set up to treat other infections. We believe that deworming is a highly cost-effective program and that there is moderately strong evidence that Sightsavers has succeeded in achieving fairly high coverage rates for some of its past NTD programs. We feel that the monitoring data we have from SCI and Deworm the World is somewhat stronger than what we have from Sightsavers—in particular, the coverage surveys that Sightsavers has done to date were on NTD programs that largely did not include deworming. Sightsavers plans to do annual coverage surveys on programs that are supported by GiveWell-influenced funding. We have more limited information on some questions for Sightsavers than we do for the top charities we have recommended for several years. We do not have a robust cost-per-treatment figure, though the information we have suggests that it is in the same range as the cost-per-treatment figures for SCI and Deworm the World. We also have limited information on infection prevalence and intensity in the places Sightsavers works. This limits our ability to robustly compare Sightsavers’ cost effectiveness to other top charities, but our best guess is that the cost-effectiveness of the deworming charities we recommend is similar. Funding gap We believe Sightsavers could productively use or commit between$3.0 million (50% confidence) and $10.1 million (5% confidence) in funding restricted to programs with a deworming component in 2017. This estimate is based on (a) a list of deworming funding opportunities that Sightsavers created for us; and (b) our understanding that Sightsavers would not allocate much unrestricted funding to these opportunities in the absence of GiveWell funding. It’s difficult to know whether other funders might step in to fund this work, but Sightsavers believes that is unlikely and deworming has not been a major priority for Sightsavers to date. Sightsavers’ list of opportunities includes both adding deworming to existing NTD mass distribution programs and establishing new integrated NTD programs that would include deworming and spans work in Nigeria, Guinea-Bissau, Democratic Republic of Congo, Guinea, Cameroon, Cote d’Ivoire, and possibly South Sudan. Malaria Consortium (for work on seasonal malaria chemoprevention) Our full review of Malaria Consortium is here. Overview Malaria Consortium (malariaconsortium.org) works on preventing, controlling, and treating malaria and other communicable diseases in Africa and Asia. Our review has focused exclusively on its seasonal malaria chemoprevention (SMC) programs, which distribute preventive anti-malarial drugs to children 3-months to 59-months old in order to prevent illness and death from malaria. The evidence for SMC appears strong (stronger than deworming and not quite as strong as bednets), but we have not yet examined the intervention at nearly the same level that we have for bednets, deworming, unconditional cash transfers, or other priority programs. The randomized controlled trials on SMC that we considered showed a decrease in cases of clinical malaria but were not adequately powered to find an impact on mortality. Malaria Consortium and its partners have conducted studies in most of the countries where it has worked to determine whether its programs have reached a large proportion of children targeted. These studies have generally found positive results, but leave us with some remaining questions about the program’s impact. Overall, we have more limited information on some questions for Malaria Consortium than we do for the top charities we have recommended for several years. We have remaining questions on cost per child per year and on offsetting effects from possible drug resistance and disease rebound. Funding gap We have not yet attempted to estimate Malaria Consortium’s maximum room for more funding. We would guess that Malaria Consortium could productively use at least an additional$30 million to scale up its SMC activities over the next three to four years. We have a general understanding of where additional funds would be used but have not yet asked for a high level of detail on potential bottlenecks to scaling up.

We do not believe Malaria Consortium has substantial unrestricted funding available for scaling up its support of SMC programs and expect its restricted funding for SMC to remain steady or decrease in the next few years.

Details on top charities we are continuing to recommend

Against Malaria Foundation (AMF)

Our full review of AMF is here.

Background

AMF (againstmalaria.com) provides funding for long-lasting insecticide-treated net distributions (for protection against malaria) in developing countries. There is strong evidence that distributing nets reduces child mortality and malaria cases.

AMF provides a level of public disclosure and tracking of distributions that we have not seen from any other net distribution charity.

We estimate that AMF’s program is roughly 4 times as cost effective as cash transfers (see our cost-effectiveness analysis). This estimate seeks to incorporate many highly uncertain inputs, such as the effect of mosquito resistance to the insecticides used in nets on how effective they are at protecting against malaria, how differences in malaria burden affect the impact of nets, and how to discount for displacing funding from other funders, among many others.

Important changes in the last 12 months

In 2016, AMF significantly increased the number and size of distributions it committed funding to. Prior to 2015, it had completed (large-scale) distributions in two countries, Malawi and Democratic Republic of Congo (DRC). In 2016, it completed a distribution in Ghana and committed to supporting distributions in an additional three countries, including an agreement to contribute $28 million to a campaign in Uganda, its largest agreement to date by far. AMF has continued to collect and share information on its past large-scale distributions. This includes both data from registering households to receive nets (and, in some cases, data on the number of nets each household received) and follow-up surveys to determine whether nets are in place and in use. Our research in 2016 has led us to moderately weaken our assessment of the quality of AMF’s follow up surveys. In short, we learned that the surveys in Malawi have not used fully randomized selection of households and that the first two surveys in DRC were not reliable (full discussion in this blog post). We expect to see follow-up surveys from Ghana and DRC in the next few months that could expand AMF’s track record of collecting this type of data. We also learned that AMF has not been carrying out data audits in the way we believed it was (though this was not a major surprise as we had not asked AMF for details of the auditing process previously). AMF has generally been communicative and open with us. We noted in our mid-year update that AMF had been slower to share documentation for some distributions; however, we haven’t had concerns about this in the second half of the year. In August 2016, four GiveWell staff visited Ghana where an AMF-funded distribution had recently been completed. We met with AMF’s program manager, partner organizations, and government representatives and visited households in semi-urban and rural areas (notes and photos from our trip). Our estimate of the cost-effectiveness of nets has fallen relative to cash transfers since our mid-year update. At that point, we estimated that nets were ~10x as cost-effective as cash transfers, and now we estimate that they are ~4x as cost-effective as cash transfers. This change was partially driven by changes in GiveWell staff’s judgments on the tradeoff between saving lives of children under five and improving lives (through increased income and consumption) in our model, and partially driven by AMF beginning to fund bed net distributions in countries with lower malaria burdens than Malawi or DRC. Funding gap AMF currently holds$17.8 million, and expects to commit $12.9 million of this soon. We estimate it will receive an additional$4 million by June 2017 ($2 million from donors not influenced by GiveWell and$2 million from donors who give based on our top charity list) that it could use for future distributions. Together, we expect that AMF will have about $9 million for new spending and commitments in 2017. We estimate that AMF could productively use or commit between$87 million (50% confidence) and $200 million (5% confidence) in the next year. We arrived at this estimate from a rough estimate of the total Africa-wide funding gap for nets in the next three years (from the African Leaders Malaria Alliance)—estimated at$125 million per year. The estimate is rough in large part because the Global Fund to Fight AIDS, Tuberculosis and Malaria, the largest funder of LLINs, works on three-year cycles and has not yet determined how much funding it will allocate for LLINs for 2018-2020. We talked to people involved in country-level planning of mass net distributions and the Global Fund, who agreed with the general conclusion that there were likely to be large funding gaps in the next few years. In mid-2016, AMF had to put some plans on hold due to lack of funding.

We now believe that AMF has a strong track record of finding distribution partners to work with and coming to agreements with governments, and we do not expect that to be a limiting factor for AMF. The main risks we see to AMF’s ability to scale are the possibility that funding from other funders is sufficient (since our estimate of the gap is quite rough), the likelihood that government actors have limited capacity for discussions with AMF during a year in which they are applying for Global Fund funding, AMF’s staff capacity to manage discussions with additional countries (it has only a few staff members), and whether gaps will be spread across many countries or located in difficult operating environments. We believe the probability of any specific one of these things impeding AMF’s progress is low.

We believe there are differences in cost-effectiveness within execution level 1 and believe the value of filling the first part of AMF’s gap may be higher than additional funding at higher levels. This is because AMF’s priorities include committing to large distributions in the second half of 2019 and 2020, which increases the uncertainty about whether funding would have been available from another source.

We and AMF have discussed a few possibilities for how AMF might fill funding gaps. AMF favors an approach where it purchases a large number of nets for a small number of countries. This approach has some advantages including efficiency for AMF and leverage in influencing how distributions are carried out. Our view is that the risk of displacing a large amount of funding from other funders using this approach outweighs the benefits. If AMF did displace a large amount of funding which would otherwise have gone to nets, that could make donations applied to these distributions considerably less cost-effective. More details on our assessment of AMF’s funding gap are in our full review.

Deworm the World Initiative, led by Evidence Action

Our full review of Deworm the World is here.

Background

Deworm the World (evidenceaction.org/#deworm-the-world), led by Evidence Action, advocates for, supports, and evaluates deworming programs. It has worked in India and Kenya for several years and has recently expanded to Nigeria, Vietnam, and Ethiopia.

Deworm the World retains or hires monitors who visit schools during and following deworming campaigns. We believe its monitoring is the strongest we have seen from any organization working on deworming. Monitors have generally found high coverage rates and good performance on other measures of quality.

As noted above, we believe that Deworm the World is slightly more cost-effective than SCI, more cost-effective than AMF and the other deworming charities, and about 10 times as cost-effective as cash transfers.

Important changes in the last 12 months

Deworm the World has made somewhat slower progress than expected in expanding to new countries. In late 2015, Good Ventures, on GiveWell’s recommendation, made a grant of $10.8 million to Deworm the World to fund its execution level 1 and 2 gaps. Execution level 1 funding was to give Deworm the World sufficient resources to expand into Pakistan and another country. Deworm the World has funded a prevalence survey in Pakistan, which is a precursor to funding treatments in the country. It has not expanded into a further country that it was not already expecting to work in. As a result, we believe that Deworm the World has somewhat limited room for more funding this year. Overall, we have more confidence in our understanding of Deworm the World and its parent organization Evidence Action’s spending, revenues, and financial position than we did in previous years. While trying to better understand this information this year, we found several errors. We are not fully confident that all errors have been corrected, though we are encouraged by the fact that we are now getting enough information to be able to spot inconsistencies. Evidence Action has been working to overhaul its financial system this year. Our review of Deworm the World has focused on two countries, Kenya and India, where it has worked the longest. In 2016, we saw the first results of a program in another country (Vietnam), as well as continued high-quality monitoring from Kenya and India. The Vietnam results indicate that Deworm the World is using similar monitoring processes in new countries as it has in Kenya and India and that results in Vietnam have been reasonably strong. Evidence Action hired Jeff Brown (formerly Interim CEO of the Global Innovation Fund) as CEO in 2015. Recently Evidence Action announced that he has resigned and has not yet been replaced. Our guess is this is unlikely to be disruptive to Deworm the World’s work; Grace Hollister remains Director of the Deworm the World Initiative. Funding gap We believe that there is a 50% chance that Deworm the World will be slightly constrained by funding in the next year and that additional funds would increase the chances that it is able to take advantage of any high-value opportunities it encounters. We estimate that if it received an additional$4.5 million its chances of being constrained by funding would be reduced to 20% and at $13.4 million in additional funding, this would be reduced to 5%. In the next year, Deworm the World expects to expand its work in India and Nigeria and may have opportunities to begin treatments in Pakistan and Indonesia. It is also interested in using unrestricted funding to continue its work in Kenya, and puts a high priority on this program. Its work in Kenya has to date been funded primarily by the Children’s Investment Fund Foundation (CIFF) and this support is set to expire in mid 2017. It is unclear to us whether CIFF will continue providing funding for the program and, if so, for how long. Due to the possibility that Deworm the World unrestricted funding may displace funding from CIFF, and, to a lesser extent, the END Fund and other donors, we consider the opportunity to fund the Kenya program to be less cost-effective in expectation than it would be if we were confident in the size of the gap. More details in our full review. Schistosomiasis Control Initiative (SCI) Our full review of SCI is here. Background SCI (imperial.ac.uk/schisto) works with governments in sub-Saharan Africa to create or scale up deworming programs. SCI’s role has primarily been to identify recipient countries, provide funding to governments for government-implemented programs, provide advisory support, and conduct research on the process and outcomes of the programs. SCI has conducted studies in about two-thirds of the countries it works in to determine whether its programs have reached a large proportion of children targeted. These studies have generally found moderately positive results, but leave us with some remaining questions about the program’s impact. As noted above, we believe that SCI is slightly less cost-effective than Deworm the World, more cost-effective than AMF and the other deworming charities, and about 8 times as cost-effective as cash transfers. Important changes in the last 12 months In past years, we’ve written that we had significant concerns about SCI’s financial reporting and financial management, and the clarity of our communication with SCI. In June, we wrote that we had learned of two substantial errors in SCI’s financial managment and reporting that began in 2015. We also noted that we thought that SCI’s financial management and financial reporting, as well as the clarity of its communication with us overall, had improved significantly. In the second half of the year, SCI communicated clearly with us about its plans for deworming programs next year and its room for more funding. SCI reports that it has continued to scale up its deworming programs over the past year and that it plans to start up new deworming programs in two states in Nigeria before the end of its current budget year. This year, SCI has shared a few more coverage surveys from deworming programs in Ethiopia, Madagascar, and Mozambique that found reasonably high coverage. Professor Alan Fenwick, Founder and Director of SCI for over a decade, retired from his position this year, though will continue his involvement in fundraising and advocacy. The former Deputy Director, Wendy Harrison, is the new Director. Funding gap We estimate that SCI could productively use or commit a maximum of between$9.0 million (50% confidence) and $21.4 million (5% confidence) in additional unrestricted funding in its next budget year. Its funding sources have been fairly steady in recent years with about half of its revenue in the form of restricted grants, particularly from the UK government’s Department for International Development (this grant runs through 2018), and half from unrestricted donations, a majority of which were driven by GiveWell’s recommendation. We estimate that SCI will have around$5.4 million in unrestricted funding available to allocate to its 2017-18 budget year (in addition to $6.5 million in restricted funding). SCI has a strong track record of starting and scaling up programs in a large number of countries. SCI believes it could expand significantly with additional funding, reaching more people in the countries it works in and expanding to Nigeria and possibly Chad. More details in our full review. GiveDirectly Our full review of GiveDirectly is here. Background GiveDirectly (givedirectly.org) transfers cash to households in developing countries via mobile phone-linked payment services. It targets extremely low-income households. The proportion of total expenses that GiveDirectly has delivered directly to recipients is approximately 82% overall. We believe that this approach faces an unusually low burden of proof, and that the available evidence supports the idea that unconditional cash transfers significantly help people. We believe GiveDirectly to be an exceptionally strong and effective organization, even more so than our other top charities. It has invested heavily in self-evaluation from the start, scaled up quickly, and communicated with us clearly. It appears that GiveDirectly has been effective at delivering cash to low-income households. GiveDirectly has one major randomized controlled trial (RCT) of its impact and took the unusual step of making the details of this study public before data was collected (more). It continues to experiment heavily, with the aim of improving how its own and government cash transfer programs are run. It has recently started work on evaluations that benchmark programs against cash with the aim of influencing the broader international aid sector to use its funding more cost-effectively. We believe cash transfers are less cost-effective than the programs our other top charities work on, but have the most direct and robust case for impact. We use cash transfers as a “baseline” in our cost-effectiveness analyses and only recommend other programs that are robustly more cost effective than cash. Important changes in the last 12 months GiveDirectly has continued to scale up significantly, reaching a pace of delivering$21 million on an annual basis in the first part of 2016 and expecting to reach a pace of $50 million on an annual basis at the end of 2016. It has continued to share informative and detailed monitoring information with us. Given its strong and consistent monitoring in the past, we have taken a lighter-touch approach to evaluating its processes and results this year. The big news for GiveDirectly this year was around partnerships and experimentation. It expanded into Rwanda (its third country) and launched a program to compare, with a randomized controlled trial, another aid program to cash transfers (details expected to be public next year). The program is being funded by a large institutional funder and Google.org. It expects to do additional “benchmarking” studies with the institutional funder, using funds from Good Ventures’ 2015$25 million grant, over the next few years.

It also began fundraising for and started a pilot of a universal basic income (UBI) guarantee—a program providing long-term, ongoing cash transfers sufficient for basic needs, which will be evaluated with a randomized controlled trial comparing the program to GiveDirectly’s standard lump sum transfers. The initial UBI program and study is expected to cost $30 million. We estimate that it is less cost-effective than GiveDirectly’s standard model, but it could have impact on policy makers that isn’t captured in our analysis. We noted previously that Segovia, a for-profit technology company that develops software for cash transfer program implementers and which was started and is partially owned by GiveDirectly’s co-founders, would provide its software for free to GiveDirectly to avoid conflicts of interest. However, in 2016, after realizing that providing free services to GiveDirectly was too costly for Segovia (customizing the product for GiveDirectly required much more Segovia staff time than initially expected), the two organizations negotiated a new contract under which GiveDirectly will compensate Segovia for its services. GiveDirectly wrote about this decision here. GiveDirectly told us that it recused all people with ties to both organizations from this decision and evaluated alternatives to Segovia. Although we believe that there are possibilities for bias in this decision and in future decisions concerning Segovia, and we have not deeply vetted GiveDirectly’s connection with Segovia, overall we think GiveDirectly’s choices were reasonable. However, we believe that reasonable people might disagree with this opinion, which is in part based on our personal experience working closely with GiveDirectly’s staff for several years. Funding gap We believe that GiveDirectly is very likely to be constrained by funding next year. GiveDirectly has been rapidly building its capacity to enroll recipients and deliver funds, while some of its revenue has been redirected to its universal basic income guarantee program (either because of greater donor interest in that program or by GiveDirectly focusing its fundraising efforts on it). We expect GiveDirectly to have about$20 million for standard cash transfers in its 2017 budget year. This includes raising about $15.8 million from non-GiveWell-influenced sources between now and halfway through its 2017 budget year (August 2017) and$4 million from donors who give because GiveDirectly is on GiveWell’s top charity list. $4 million is much less than GiveWell-influenced donors gave in the last year. This is because several large donors are supporting GiveDirectly’s universal basic income guarantee program this year and because one large donor gave a multi-year grant that we don’t expect to repeat this year. GiveDirectly is currently on pace (with no additional hiring) to have four full teams operating its standard cash transfer model in 2017. To fully utilize four teams, it would need$28 million more than we expect it to raise. We accordingly expect that GiveDirectly will downsize somewhat in 2017, because we do not project it raising sufficient funds to fully utilize the increased capacity it has built to transfer money. Given recent growth, we believe that GiveDirectly could easily scale beyond four teams and we estimate that at $46 million more than we expect it to raise ($66 million total for standard transfers), it would have a 50% chance of being constrained by funding.

Other charities worthy of special recognition

Last year, we recommended four organizations as “standouts.” This year we are calling this list “other charities worthy of special recognition.” We’ve added two organizations to the list: Food Fortification Initiative and Project Healthy Children. Although our recommendation to donors is to give to our top charities over these charities, they stand out from the vast majority of organizations we have considered in terms of the evidence base for their work and their transparency, and they offer additional giving options for donors who feel highly aligned with their work.

We don’t follow these organizations as closely as we do our top charities. We generally have one or two calls per year with each group, publish notes on our conversations, and follow up on any major developments.

We provide brief updates on these charities below:

• Organizations that have conducted randomized controlled trials of their programs:
• Development Media International (DMI). DMI produces radio and television programming in developing countries that encourages people to adopt improved health practices. It conducted a randomized controlled trial (RCT) of its program and has been highly transparent, including sharing preliminary results with us. The results of its RCT were mixed, with a household survey not finding an effect on mortality (it was powered to detect a reduction of 15% or more) and data from health facilities finding an increase in facility visits. (The results, because the trial was only completed in the last year, are not yet published.) We believe there is a possibility that DMI’s work is highly cost-effective, but we see no solid evidence that this is the case. We noted last year that DMI was planning to conduct another survey for the RCT in late 2016; it has decided not to move forward with this, but is interested in conducting new research studies in other countries, if it is able to raise the money to do so. It is our understanding that DMI will be constrained by funding in the next year. Our full review of DMI, with conversation notes and documents from 2016, is here.
• Living Goods. Living Goods recruits, trains, and manages a network of community health promoters who sell health and household goods door-to-door in Uganda and Kenya and provide basic health counseling. They sell products such as treatments for malaria and diarrhea, fortified foods, water filters, bednets, clean cookstoves, and solar lights. Living Goods completed a randomized controlled trial of its program and measured a 27% reduction in child mortality. Our best guess is that Living Goods’ program is less cost-effective than our top charities, with the possible exception of cash. Living Goods is scaling up its program and may need additional funding in the future, but has not yet been limited by funding. We published an update on Living Goods in mid-2016. Our 2014 review of Living Goods is here.
• Organizations working on micronutrient fortification: We believe that food fortification with certain micronutrients can be a highly effective intervention. For each of these organizations, we believe they may be making a significant difference in the reach and/or quality of micronutrient fortification programs but we have not yet been able to establish clear evidence of their impact. The limited analysis we have done suggests that these programs are likely not significantly more cost-effective than our top charities—if they were, we might put more time into this research or recommend a charity based on less evidence.
• Food Fortification Initiative (FFI). FFI works to reduce micronutrient deficiencies (especially folic acid and iron deficiencies) by doing advocacy and providing assistance to countries as they design and implement flour and rice fortification programs. We have not yet completed a full evidence review of iron and folic acid fortification, but our initial research suggests it may be competitively cost effective with our other priority programs. Because FFI typically provides support alongside a number of other actors and its activities vary widely among countries, it is difficult to assess the impact of its work. Our full review is here.
• Global Alliance for Improved Nutrition (GAIN) – Universal Salt Iodization (USI) program. GAIN’s USI program supports national salt iodization programs. We have spent the most time attempting to understand GAIN’s impact in Ethiopia. Overall, we would guess that GAIN’s activities played a role in the increase in access to iodized salt in Ethiopia, but we do not yet have confidence about the extent of GAIN’s impact. It is our understanding that GAIN’s USI work will be constrained by funding in the next year. Our review of GAIN, published in 2016 based on research done in 2015, is here.
• IGN. Like GAIN-USI, IGN supports (via advocacy and technical assistance rather than implementation) salt iodization. IGN is small, and GiveWell-influenced funding has made up a large part of its funding in the past year. This year, we published an update on our investigation into IGN’s work in select countries in 2015 and notes from our conversation with IGN to learn about its progress in 2016 and plans for 2017. It is our understanding that IGN will be constrained by funding in the next year. Our review of IGN, from 2014, is here.
• Project Healthy Children (PHC). PHC aims to reduce micronutrient deficiencies by providing assistance to small countries as they design and implement food fortification programs. Our review is preliminary and in particular we do not have a recent update on how PHC would use additional funding. Our review of PHC, published in 2016 but based on information collected in 2015, is here.

Our research process in 2016

We plan to detail the work we completed this year in a future post as part of our annual review process. Much of this work, particularly our experimental work and work on prioritizing interventions for further investigation, is aimed at improving our recommendations in future years. Here we highlight the key research that led to our current recommendations. See our process page for our overall process.

• As in previous years, we did intensive follow up with each of our top charities, including publishing updated reviews mid-year. We had several conversations by phone with each organization, met in person with Deworm the World, SCI, and AMF (over the course of a 4-day site visit to Ghana), and reviewed documents they shared with us.
• In 2015 and 2016, we sought to expand top charity room for more funding and consider alternatives to our top charities by inviting other groups that work on deworming, bednet distributions, and micronutrient fortification to apply. This led to adding Sightsavers, the END Fund, Project Healthy Children, and Food Fortification Initiative to our lists this year. Episcopal Relief & Development’s NetsforLife® Program, Micronutrient Initiative, and Nothing but Nets declined to fully participate in our review process.
• We completed intervention reports on voluntary medical male circumcision (VMMC) and cataract surgery. We asked VMMC groups PSI (declined to fully participate) and the Centre for HIV and AIDS Prevention Studies (pending) to apply. We had conversations with several charities working on cataract surgery and have not yet asked any to apply.
• We did very preliminary investigations into a large number of interventions and prioritized a few for further work. This led to interim intervention reports on seasonal malaria chemoprevention (SMC), integrated community case management (iCCM) and ready-to-use therapeutic foods for treating severe acute malnutrition and recommending Malaria Consortium for its work on SMC.
• We stayed up to date on the research for bednets, cash transfers, and deworming. We published a report on insecticide resistance and its implications for bednet programs. A blog post on our work on deworming is forthcoming. We did not find major new research on cash transfers that affected our recommendation of GiveDirectly.

Giving to GiveWell vs. top charities

GiveWell and the Open Philanthropy Project are planning to split into two organizations in the first half of 2017. The split means that it is likely that GiveWell will retain much of the assets of the previously larger organization while reducing its expenses. We think it’s fairly likely that our excess assets policy will be triggered and that we will grant out some unrestricted funds. Given that expectation, our recommendation to donors is:

• If you have supported GiveWell’s operations in the past, we ask that you consider maintaining your support. It is fairly likely that these funds will be used this year for grants to top charities, but giving unrestricted signals your support for our operations and allows us to better project future revenue and make plans based on that. Having a strong base of consistent support allows us to make valuable hires when opportunities arise and minimize staff time spent on fundraising.
• If you have not supported GiveWell’s operations in the past, we ask that you consider checking the box on our donate form to add 10% to help fund GiveWell’s operations. In the long term, we seek to have a model where donors who find our research useful contribute to the costs of creating it, while holding us accountable to providing high-quality, easy-to-use recommendations.

Footnotes:

* For example, if $30 million were available to fund gaps of$10 million, $5 million, and$100 million, we would recommend allocating the funds so that the $10 million and$5 million gaps were fully filled and the $100 million gap received$15 million.

The post Our updated top charities for giving season 2016 appeared first on The GiveWell Blog.

### Deworming might have huge impact, but might have close to zero impact

Tue, 07/26/2016 - 12:48

We try to communicate that there are risks involved with all of our top charity recommendations, and that none of our recommendations are a “sure thing.”

Our recommendation of deworming programs (the Schistosomiasis Control Initiative and the Deworm the World Initiative), though, carries particularly significant risk (in the sense of possibly not doing much/any good, rather than in the sense of potentially doing harm). In our 2015 top charities announcement, we wrote:

Most GiveWell staff members would agree that deworming programs are more likely than not to have very little or no impact, but there is some possibility that they have a very large impact. (Our cost-effectiveness model implies that most staff members believe there is at most a 1-2% chance that deworming programs conducted today have similar impacts to those directly implied by the randomized controlled trials on which we rely most heavily, which differed from modern-day deworming programs in a number of important ways.)

The goal of this post is to explain this view and why we still recommend deworming.

Some basics for this post

What is deworming?

Deworming is a program that involves treating people at risk of intestinal parasitic worm infections with parasite-killing drugs. Mass treatment is very inexpensive (in the range of $0.50-$1 per person treated), and because treatment is cheaper than diagnosis and side effects of the drugs are believed to be minor, typically all children in an area where worms are common are treated without being individually tested for infections.

Does it work?

There is strong evidence that administration of the drugs reduces worm loads, but many of the infections appear to be asymptomatic and evidence for short-term health impacts is thin (though a recent meta-analysis that we have not yet fully reviewed reports that deworming led to short-term weight gains). The main evidence we rely on to make the case for deworming comes from a handful of longer term trials that found positive impacts on income or test scores later in life.

For more background on deworming programs see our full report on combination deworming.

Why do we believe it’s more likely than not that deworming programs have little or no impact?

The “1-2% chance” doesn’t mean that we think that there’s a 98-99% chance that deworming programs have no effect at all, but that we think it’s appropriate to use a 1-2% multiplier compared to the impact found in the original trials – this could be thought of as assigning some chance that deworming programs have no impact, and some chance that the impact exists but will be smaller than was measured in those trials. For instance, as we describe below, worm infection rates are much lower in present contexts than they were in the trials.

Where does this view come from?

Our overall recommendation of deworming relies heavily on a randomized controlled trial (RCT) (the type of study we consider to be the “gold standard” in terms of causal attribution) first written about in Miguel and Kremer 2004 and followed by 10-year follow up data reported in Baird et al. 2011, which found very large long-term effects on recipients’ income. We reviewed this study very carefully (see here and here) and we felt that its analysis largely held up to scrutiny.

There’s also some other evidence, including a study that found higher test scores in Ugandan parishes that were dewormed in an earlier RCT, and a high-quality study that is not an RCT but found especially large increases in income in areas in the American South that received deworming campaigns in the early 20th century. However, we consider Baird et al. 2011 to be the most significant result because of its size and the fact that the follow-up found increases in individual income.

While our recommendation relies on the long-term effects, the evidence for short-term effects of deworming on health is thin, so we have little evidence of a mechanism through which deworming programs might bring about long-term impact (though a recent meta-analysis that we have not yet fully reviewed reports that deworming led to short-term weight gains). This raises concerns about whether the long-term impact exists at all, and may suggest that the program is more likely than not to have no significant impact.

Even if there is some long-term impact, we downgrade our expectation of how much impact to expect, due to factors that differ between real-world implementations and the Miguel and Kremer trial. In particular, worm loads were particularly high during the Miguel and Kremer trial in Western Kenya in 1998, in part due to flooding from El Niño, and in part because baseline infection rates are lower in places where SCI and Deworm the World work than in the relevant studies.

Our cost-effectiveness model estimates that the baseline worm infections in the trial we mainly rely on were roughly 4 to 5 times as high as in places where SCI and Deworm the World operate today, and that El Niño further inflated those worm loads during the trial. (These estimates combine data on the prevalence of infections and intensity of infections, and so are especially rough because there is limited data on whether prevalence or intensity of worms is a bigger driver of impact). Further, we don’t know of any evidence that would allow us to disconfirm the possibility that the relationship between worm infection rates and the effectiveness of deworming is nonlinear, and thus that many children in the Miguel and Kremer trial were above a clinically relevant “threshold” of infection that few children treated by our recommended charities are above.

We also downgrade our estimate of the expected value of the impact based on: concerns that the limited number of replications and lack of obvious causal mechanism might mean there is no impact at all, expectation that deworming throughout childhood could have diminishing returns compared to the ~2.4 marginal years of deworming provided in the Miguel and Kremer trial, and the fact that the trial only found a significant income effect on those participants who ended up working in a wage-earning job. See our cost-effectiveness model for more information.

Why do we recommend deworming despite the reasonably high probability that there’s no impact?

Because mass deworming is so cheap, there is a good case for donating to support deworming even when in substantial doubt about the evidence. We estimate the expected value of deworming programs to be as cost-effective as any program we’ve found, even after the substantial adjustments discussed above: our best guess considering those discounts is that it’s still roughly 5-10 times as cost-effective as cash transfers, in expectation. But that expected value arises from combining the possibility of potentially enormous cost-effectiveness with the alternative possibility of little or none.

GiveWell isn’t seeking certainty – we’re seeking outstanding opportunities backed by relatively strong evidence, and deworming meets that standard. For donors interested in trying to do as much good as possible with their donations, we think that deworming is a worthwhile bet.

What could change this recommendation – will more evidence be collected?

To our knowledge, there are currently no large, randomized controlled trials being conducted that are likely to be suitable for long-term follow up to measure impacts on income when the recipients are adults, so we don’t expect to see a high-quality replication of the Miguel and Kremer study in the foreseeable future.

That said, there are some possible sources of additional information:

• The follow-up data that found increased incomes among recipients in the original Miguel and Kremer study was collected roughly 10 years after the trial was conducted. Our understanding is that 15 year follow-up data has been collected and we expect to receive an initial analysis of it from the researchers this summer.
• A recent study from Uganda didn’t involve data collection for the purpose of evaluating a randomized controlled trial; rather, the paper identified an old, short-term trial of deworming and an unrelated data set of parish-level test scores collected by a different organization in the same area. Because some of the parishes overlap, it’s possible to compare the test scores from those that were dewormed to those that weren’t. It’s possible that more overlapping data sets will be discovered and so we may see more similar studies in the future.
• We’ve considered whether to recommend funding for an additional study to replicate Baird et al. 2011: run a new deworming trial that could be followed for a decade to track long term income effects. However, it would take 10+ years to get relevant results, and by that time deworming may be fully funded by the largest global health funders. It would also need to include a very large number of participants to be adequately powered to find plausible effects (since the original trial in Baird et al. 2011 benefited from particularly high infection rates, which likely made it easier to detect an effect), so it would likely be extremely expensive.

For the time being, based on our best guess about the expected cost-effectiveness of the program when all the factors are considered, we continue to recommend deworming programs.

The post Deworming might have huge impact, but might have close to zero impact appeared first on The GiveWell Blog.

### Weighing organizational strength vs. estimated cost-effectiveness

Thu, 07/14/2016 - 15:00

A major question we’ve asked ourselves internally over the last few years is how we should weigh organizational quality versus the value of the intervention that the organization is carrying out.

In particular, is it better to recommend an organization we’re very impressed by and confident in that’s carrying out a good program, or better to recommend an organization we’re much less confident in that’s carrying out an exceptional program? This question has been most salient when deciding how to rank giving to GiveDirectly vs giving to the Schistosomiasis Control Initiative.

GiveDirectly vs SCI

GiveDirectly is an organization that we’re very impressed by and confident in, more so than any other charity we’ve come across in our history. Reasons for this:

But, we estimate that marginal dollars to the program it implements — direct cash transfers — are significantly less cost-effective than bednets and deworming programs. Excluding organizational factors, our best guess is that deworming programs — which SCI supports — are roughly 5 times as cost-effective as cash transfers. As discussed further below, our cost effectiveness estimates are generally based on extremely limited information and are therefore extremely rough, so we are cautious in assigning too much weight to them.

Despite the better cost-effectiveness of deworming, we’ve had significant issues with SCI as an organization. The two most important:

• We originally relied on a set of studies showing dramatic drops in worm infection coinciding with SCI-run deworming programs to evaluate SCI’s track record; we later discovered flaws in the study methodology that led us to conclude that they did not demonstrate that SCI had a strong track record. We wrote about these flaws in 2013 and 2014.
• We’ve seen limited and at times erroneous financial information from SCI over the years. We have seen some improvements in SCI’s financial reporting in 2016, but we still have some concerns, as detailed in our most recent report.

More broadly, both of these cases are examples of general problems we’ve had communicating with SCI over the years. And we don’t believe SCI’s trajectory has generated evidence of overall impressiveness comparable to GiveDirectly’s, discussed above.

Which should we recommend?

One argument is that GiveWell should only recommend exceptional organizations, and so the issues we’ve seen with SCI should disqualify them.

But, we think that the ~5x difference in cost-effectiveness is meaningful. There’s a large degree of uncertainty in our cost-effectiveness analyses, which is something we’ve written a lot about in the past, but this multiplier appears somewhat stable (it has persisted in this range over time, and currently is consistent with the individual estimates of many staff members), and a ~5x difference gives a fair amount of room for SCI to do more good even accounting both for possible errors in our analysis and for differences in organizational efficiency.

A separate argument that we’ve made in the past is that great organizations have upside that goes beyond the value of conducting the specific program they’re implementing. For example, early funding to a great organization may have allow it to grow faster and increase the amount of money going to their program globally, either through proving the model or through their own fundraising. And GiveDirectly has shown some propensity for potentially innovative projects, as discussed above.

We think that earlier funding to GiveDirectly had this benefit, but it’s less of a consideration now that GiveDirectly is a more mature organization.  We believe this upside exists for what we’ve called “capacity-relevant” funding, which is the type of funding need that we consider to be most valuable when ranking the importance of marginal dollars to each of our top charities, and refers to funding gaps that we expect will allow organizations to grow in an outsized way in the future, for instance by going into a new country.

Bottom line

Our most recent recommendations ranked SCI’s funding gap higher than GiveDirectly’s due to SCI’s cost-effectiveness. We think that SCI is a strong organization overall, despite the issues we’ve noted, and we think that the “upside” for GiveDirectly is limited on the margin, so ultimately our estimated 5x multiplier looks meaningful enough to be determinative.

The post Weighing organizational strength vs. estimated cost-effectiveness appeared first on The GiveWell Blog.

### Mid-year update to top charity recommendations

Thu, 06/23/2016 - 17:25

This post provides an update on what we’ve learned about our top charities in the first half of 2016.

We continue to recommend all four of our top charities. Our recommendation for donors seeking to directly follow our advice remains the same: we recommend they give to the Against Malaria Foundation (AMF), which we believe has the most valuable current funding gap.

Below, we provide:

• Updates on our view about AMF, which we consider the most important information we’ve learned in the last half-year (More)
• Updates on other top charities (More)
• A discussion of the reasoning behind our current recommendation to donors (More)

Background

AMF (www.againstmalaria.com) provides funding for long-lasting insecticide-treated net distributions (for protection against malaria) in developing countries. There is strong evidence that distributing nets reduces child mortality and malaria cases. AMF has relatively strong reporting requirements for its distribution partners and provides a level of public disclosure and tracking of distributions that we have not seen from any other net distribution charity. Overall, AMF is the best giving opportunity we are currently aware of. That said, we have concerns about AMF’s recent monitoring and transparency that we plan to focus on in the second half of the year.

Updates from the last six months

We are more confident than we were before in AMF’s ability to successfully complete deals with most countries it engages with. Over the past few years, our key concern about AMF has been whether it would be able to effectively absorb additional funding and sign distribution agreements with governments and other partners. At the end of 2013, we stopped recommending AMF because we felt it did not require additional funding, and our end-of-year analyses in 2014 and 2015 discussed this issue in depth. In early 2016, AMF signed agreements to fund two large distributions (totaling $37 million) of insecticide-treated nets in countries it has not previously worked in. We now believe that AMF has effectively addressed this concern. AMF is in discussions for several additional large distributions. AMF currently holds approximately$23.3 million, and we believe that it is very likely to have to slow its work if it receives less than an additional $11 million very quickly. It is possible that it could also use up to an additional (approximately)$18 million more during this calendar year.

It may be more valuable to give to AMF now than it will be later this year or next year. AMF’s funding gap may be time-sensitive because:

1. AMF is in several discussions about distributions that would take place in 2017. It has told us that it needs to make decisions within a month or two about which discussions to pursue. We don’t have a clear sense for how long before a distribution AMF needs to be able to commit funding, and note that, for example, AMF committed in February 2016 to a distribution in Ghana taking place in June to August 2016. That said, it seems quite plausible that AMF needs to commit soon to distributions taking place in 2017.
2. We don’t know whether there will be large funding gaps for nets in 2018 and beyond. The price of nets has been decreasing and the size of grants from the two largest funders of nets, the Global Fund to fight AIDS, TB, and Malaria and the President’s Malaria Initiative, is not yet known. (The Global Fund is holding its replenishment conference in September, in which donor governments are asked to make three-year pledges, so we may know more before the end of the year.) It’s possible that these funders will fund all or nearly all of the net needs in countries other than those that are particularly hard to work in for 2018. If that happens, gifts to AMF in late 2016 could be less valuable than gifts in the next couple of months. (This could also mean that, if AMF fills gaps in 2017 that would have been filled by other funders in 2018, gifts now are less valuable than they have been in the past. We have added an adjustment for this to our cost-effectiveness analysis, but given the high degree of uncertainty, this could be a more important factor than we are currently adjusting for.)

Notwithstanding the above, we have important questions about AMF that we plan to continue to investigate. None of these developments caused us to change our recommendation about giving to AMF, but they are important considerations for donors:

1. Monitoring data: We have new concerns about AMF’s monitoring of its distributions, particularly its post-distribution check-up (PDCU) surveys. These surveys are a key part of our confidence in the quality of AMF’s distributions. For Malawi, where most of the PDCUs completed to date have been done, our key concern is that villages that surveyors visit are not selected randomly, but are instead selected by hand by staff of the organization that both implements and monitors the distributions, which seems fairly likely to lead to bias in the results. We have also seen results from the first two PDCUs from DRC. We have not yet looked at the DRC results in-depth or discussed them with AMF, but there appear to be major problems in how the surveys were carried out (particularly a high percentage of internally inconsistent data – around 40%-50%) and, if we believe the remaining data, fairly high rates of missing or unhung nets (~20% at 6-months) and nets that deteriorated quickly (65% were in ‘very good’ or ‘good’ condition at 6-months).
2. Transparency: Recently, AMF has been slower to share documentation from some distributions. AMF has told us that it has this documentation and we are concerned that AMF is not being as transparent as it could be. We believe this documentation is important for monitoring the quality of AMF’s distributions; it includes PDCUs, results from re-surveying 5% of households in during pre-distribution registrations (AMF has told us that this is a standard part of its process, but we have not seen results from any distributions), and malaria case rate data from Malawi that AMF has told us it has on hand. AMF attributes the delays to lack of staff capacity. We plan to write more about monitoring and transparency in a future post.
3. Insecticide resistance: Insecticide resistance (defined broadly as “any ways in which populations of mosquitoes adapt to the presence of insecticide-treated nets (ITNs) in order to make them less effective”) is a major threat to the effectiveness of ITNs. Insecticide resistance seems to be fairly common across sub-Saharan Africa, and it seems that resistance is increasing. It remains difficult to quantify the impact of resistance, but our very rough best guess (methodology described in more detail below) is that ITNs are roughly one-third less effective in the areas where AMF is working than they would be in the absence of insecticide resistance. We continue to believe, despite resistance, ITNs remain a highly cost-effective intervention. See our full report for more detail.

• To better understand whether AMF is providing nets that would not otherwise have been funded, we considered five cases where AMF considered funding a distribution and did not ultimately provide funding. We then looked at whether other funders stepped in and how long of a delay resulted from having to wait for other funders. We published the details here. In short, most distributions took place later than they would have if AMF had funded them (on average over a year), which probably means that the people were not protected with nets during that time. We feel that these case studies provide some evidence that nets that AMF buys do not simply displace nets from other funding sources.
• We’ve noted in the past that the delays in AMF signing agreements for distributions may have been due to AMF’s hesitation about paying for the costs of a distribution other than the purchase price of nets. For the distributions that AMF has signed this year, AMF has agreed to pay for some non-net costs, particularly the costs of PDCUs. The Global Fund to fight AIDS, TB, and Malaria is paying for the other non-net costs of the distribution. AMF’s willingness to fund some of the non-net costs may have made it easier for it to sign distribution agreements and put funds to use more quickly.

Updates on our other top charities

Schistosomiasis Control Initiative (full report)

Background

SCI (www3.imperial.ac.uk/schisto) works with governments in sub-Saharan Africa to create or scale up deworming programs (treating children for schistosomiasis and other intestinal parasites). SCI’s role has primarily been to identify recipient countries, provide funding to governments for government-implemented programs, provide advisory support, and conduct research on the process and outcomes of the programs.

In past years, we’ve written that we had significant concerns about SCI’s financial reporting and financial management that meant we lacked high-quality, basic information about how SCI was spending funding and how much funding it had available to allocate to programs. We decided to focus our work in the first half of 2016 on this issue. We felt that seeing significant improvements in the quality of SCI’s finances was necessary for us to continue recommending SCI.

We believe that deworming is a program backed by relatively strong evidence. We have reservations about the evidence, but we think the potential benefits are great enough, and costs low enough, to outweigh these reservations. SCI has conducted studies in about half of the countries it works in (including the countries with the largest programs) to determine whether its programs have reached a large proportion of children targeted. These studies have generally found moderately positive results, but have major methodological limitations. We have not asked SCI for monitoring results since last year.

Updates from the last six months

We published a separate blog post on our work on SCI so far this year. Our main takeaways:

• SCI has begun producing higher-quality financial documents that allow us to learn some basic financial information about SCI.
• We learned of two substantial errors in SCI’s financial management and reporting. 1) a July 2015 grant from GiveWell for about $333,000 was misallocated within Imperial College, which houses SCI, until we noticed it was missing from SCI’s revenue in March 2016; and (2) in 2015, SCI underreported how much funding it would have from other sources in 2016, leading us to overestimate its room for more funding by$1.5 million.
• The clarity of our communication with SCI about its finances has improved, but there is still substantial room for further improvement.

We feel that SCI has improved, but we would still rank our other top charities ahead of it in terms of our ability to communicate and understand their work. Given this situation, we continue to recommend SCI now and think that SCI is reasonably likely to retain its top charity status at the end of 2016. We plan, in the second half of 2016, to expand the scope of our research on SCI.

We have not asked SCI for an update on its room for more funding (due to our focus on financial documents in the first half of the year). It’s our understanding that funds that SCI receives in the next six months will be allocated to work in 2017 and beyond. Because of this, we don’t believe that SCI has a pressing need for additional funds, though our guess is that it will have room for more funding when we next update our recommendations in November and that funds given before then will help fund gaps for the next budget year.

GiveDirectly (full report)

Background

GiveDirectly (www.givedirectly.org) transfers cash to households in developing countries via mobile phone-linked payment services. It targets extremely low-income households. The proportion of total expenses that GiveDirectly has delivered directly to recipients is approximately 83% overall. We believe that this approach faces an unusually low burden of proof, and that the available evidence supports the idea that unconditional cash transfers significantly help people.

We believe GiveDirectly to be an exceptionally strong and effective organization, even more so than our other top charities. It has invested heavily in self-evaluation from the start, scaled up quickly, and communicated with us clearly. It appears that GiveDirectly has been effective at delivering cash to low-income households. GiveDirectly has one major randomized controlled trial (RCT) of its impact and took the unusual step of making the details of this study public before data was collected (more). It continues to experiment heavily.

Updates from the last six months

• GiveDirectly announced an initiative to test a “basic income guarantee” to provide long-term, ongoing cash transfers sufficient for basic needs. The cost-effectiveness of providing this form of cash transfers may be different from the one-time transfers GiveDirectly has made in the past.
• GiveDirectly continues to have more room for more funding than we expect GiveWell-influenced donors to fill in the next six months. Its top priority is funding the basic income guarantee project.
• In late 2015 and early 2016, when GiveDirectly began enrolling participants in Homa Bay county, Kenya, it experienced a high rate of people refusing to be enrolled in the program. The reason for this is not fully clear, though GiveDirectly believes in some cases local leaders advised people to not trust the program. While GiveDirectly has temporarily dealt with this setback by moving its operations to a different location in Homa Bay county, it is possible that similar future challenges could reduce GiveDirectly’s ability to commit as much as it currently projects.
• GiveDirectly has reached an agreement with a major funder which provides a mechanism through which multiple benchmarking projects (projects comparing cash transfers to other types of aid programs) can be launched. The major funder may fund up to $15 million for four different benchmarking projects with GiveDirectly. GiveDirectly plans to make available up to$15 million of the grant it received from Good Ventures in 2015 to match funds committed by the major funder. GiveDirectly and its partner have not yet determined which aid programs will be evaluated or how the evaluations will be carried out.
• We are reasonably confident that GiveDirectly could effectively use significantly more funding than we expect it to receive, including an additional $30 million for additional cash transfers in 2016, though scaling up to this size would require a major acceleration in the second half of the year. We have not asked GiveDirectly how funding above this amount would affect its activities and plans (because we think it is very unlikely that GiveDirectly will receive more than$30 million from GiveWell-influenced supporters before our next update in November).

Deworm the World (full report)

Background

Deworm the World (www.evidenceaction.org/deworming), led by Evidence Action, advocates for, supports, and evaluates government-run school-based deworming programs (treating children for intestinal parasites).

We believe that deworming is a program backed by relatively strong evidence. We have reservations about the evidence, but we think the potential benefits are great enough, and costs low enough, to outweigh these reservations. Deworm the World retains monitors whose reports indicate that the deworming programs it supports successfully deworm children.

Updates from the last six months

• We asked Deworm the World whether additional funding in the next six months would change its activities or plans. It told us that it does not expect funding to be the bottleneck to any work in that time. We’d guess that there is a very small chance that it will encounter an unexpected opportunity and be bottlenecked by funding before our next update in November.
• Deworm the World appears to be making progress expanding to new countries. It has made a multi-year commitment to provide technical assistance and resources to Cross River state, Nigeria for its school-based deworming program (the first deworming is scheduled for the end of this month), and are undertaking a nationwide prevalence survey in Pakistan.
• In the past, we have focused our review of Deworm the World on its work in India. We are in the process of learning more about its work in other locations, particularly Kenya. The monitoring we have seen from Kenya appears to be high quality.

Summary of key considerations for top charities

The table below summarizes the key considerations for our four top charities. With the exception of modest changes to room for more funding, our high-level view of our top charities, as summarized in the table below, is the same as at our last update in November 2015.

Consideration AMF Deworm the World GiveDirectly SCI Program estimated cost-effectiveness (relative to cash transfers) ~10x ~10x Baseline ~5x Directness and robustness of the case for impact Strong Moderate Strongest Moderate Transparency and communication Strong Strong Strongest Weakest Ongoing monitoring and likelihood of detecting future problems Strong Strong Strongest Weakest Organizational track record of rolling out program Moderate Moderate Strong Strong Room for more funding High Limited High Likely moderate (not investigated)

Reasoning behind our current recommendation to donors

Our recommendation for donors seeking to directly follow our advice is to give to AMF, which we believe has the most valuable current funding gap. We believe AMF will likely have opportunities to fund distributions this year which it will not be able to fund without additional funding. Due to the excellent cost-effectiveness of AMF’s work, we consider this a highly valuable funding gap to fill. Our current estimate is that on average AMF saves a life for about every $3,500 that it spends; this is an increase from our November 2015 estimate and reflects changes to our cost-effectiveness model as well as some of our inputs into bed nets’ cost-effectiveness. As always, we advise against taking cost-effectiveness estimates literally and view them as highly uncertain. The below table lays out our ranking of funding gaps for June to November 2016. The first million dollars to a charity can have a very different impact from, e.g., the 20th million dollars. Accordingly, our ranking of individual funding gaps accounts for both (a) the quality of the charity and the good accomplished by its program, per dollar, and (b) whether a given level of funding is highly or only marginally likely to be needed in the next six months. We consider funding that allows a charity to implement more of its core program (without substantial benefits beyond the direct good accomplished by this program) to be “execution funding.” We’ve separated this funding into three levels: • Level 1: the amount we expect a charity to need in the coming year. If a charity has less funding than this level, we think it is more likely than not that it will be bottlenecked (or unable to carry out its core program to the fullest extent) by funding in the coming year. For this mid-year update, we have focused on funds that are needed before our next update in November, with the exception of SCI where we believe funds will not affect its work until next year. • Level 2: if a charity has this amount, we think there is an ~80% chance that it will not be bottlenecked by funding. • Level 3: if a charity has this amount, we think there is a ~95% chance that it will not be bottlenecked by funding. (Our rankings can also take into account whether a gap is “capacity-relevant” or providing an incentive to engage in our process. We do not currently believe that our top charities have capacity-relevant gaps and are not planning to make mid-year incentive grants, so we haven’t gone into detail on that here. More details on how we think about capacity-relevant and execution gaps in this post.) Priority Charity Amount (millions) Type Description Comment 1 AMF$11.3 Execution level 1 Fund distributions in two countries that AMF is in discussions with but does not have sufficient funding for AMF is strongest overall 2 AMF $7.3 Execution level 2 Fund the next largest gap on the list of remaining 2016-17 gaps in African countries – 3 SCI$10.1 Execution level 1 Very rough because we haven’t discussed this with SCI; further gaps not estimated Not as strong as AMF in isolation, so ranked below for same type of gap 4 AMF $10.5 Execution level 3 Fund the final two AMF-relevant gaps on the list of remaining 2016-17 gaps in African countries – 5 GiveDirectly$22.2 Execution level 1 Basic income guarantee program and additional standard transfers Not as cost-effective as bednets or deworming, so lower priority 6 Deworm the World $6.0 Execution level 3 A rough guess at the funding needed to cover a 3-year deworming program in a new country Strong cost-effectiveness, but unlikely to need funds in the short-term 6 GiveDirectly$7.8 Execution level 2 Funding for additional structured projects; further gaps not estimated –

We are not recommending that Good Ventures make grants to our top charities for this mid-year refresh. In November 2015, we recommended that Good Ventures fund 50% of our top charities’ highest-value funding gaps for the year and Good Ventures gave $44.4 million to our top four charities. We felt this approach resulted in Good Ventures funding its “fair share” while avoiding creating incentives for other donors to avoid the causes we’re interested in, which could lead to less overall funding for these causes in the long run. (More on this reasoning available here.) The post Mid-year update to top charity recommendations appeared first on The GiveWell Blog. ### What we’ve learned about SCI this year Mon, 06/20/2016 - 18:19 In past years, we’ve written that we had significant concerns about the financial reporting and financial management of the Schistosomiasis Control Initiative (SCI), one of our top charities since 2011. Our concerns have included: • We had not been able to learn important and basic financial information about SCI. Despite substantial effort, before 2016 we were not able to determine the total amount of funding that SCI held at any one time. We also had very little information on what SCI’s funds were spent on within country programs. • We found that SCI’s financial reports were prone to containing errors. Due to these concerns, we decided to focus our research on SCI in preparation for our June 2016 top charities update only on the quality of its financial reporting and financial management. We felt that seeing significant improvements in the quality of SCI’s finances was necessary for us to continue recommending SCI. Our main takeaways from our research on SCI so far in 2016: • SCI has begun producing higher-quality financial documents that allow us to learn some basic financial information about SCI, including the total amount of funding it holds, how much funding has been allocated to its upcoming budget year, and how it spent restricted and unrestricted funds by country in the previous budget year. We have also been able to learn somewhat more about how its funds are spent within national deworming programs. • We learned of two substantial errors in SCI’s financial management and reporting. 1) a July 2015 grant from GiveWell for about$333,000 was misallocated within Imperial College, which houses SCI, until we noticed it was missing from SCI’s revenue in March 2016; and (2) in 2015, SCI provided inaccurate information about how much funding it would have from other sources in 2016, leading us to overestimate its room for more funding by $1.5 million. • The clarity of our communication with SCI about these financial errors and its plans for the upcoming year has improved in comparison with previous years. Details follow. SCI’s financial documents in 2016 As of the beginning of April 2016, SCI had$15.8 million ($8.6 million in restricted funding and$7.2 million in unrestricted funding) available to allocate to its April 2016 to March 2017 budget year, according to its recent financial documents. Despite our discovery of additional financial errors this year (discussed below), we feel fairly confident that this information is accurate. We’ve seen transaction-level detail for each of SCI’s accounts, asked SCI’s new Finance and Operations Manager questions about the data, and largely received clear and reasonable answers.

SCI also sent us detailed breakdowns of in-country spending in its 2015-16 budget year for six of its country programs. Although this spending data gives us some information about what SCI’s funds were spent on within country programs last year, we note that we have not seen spending breakdowns for the eleven other deworming programs supported by SCI in 2015-16 (additional concerns about this spending data are discussed in our full review of SCI.)

Despite the improvements in SCI’s financial reporting that have allowed us to learn some basic financial information, we remain concerned about SCI’s use of Imperial College’s accounting system, which seems ill-suited to SCI’s needs. SCI has told us that it began using new accounting software in April 2016; we’re uncertain about the degree to which this will alleviate our concerns.

Financial errors we learned about in 2016

We’ve learned about two financial errors this year:

• Not realizing that it had not received a transfer of funds from GiveWell: In July 2015, we granted $333,414 to SCI, which included all donations we received designated for supporting SCI between February and May 2015. After reviewing SCI’s financial documents in March 2016, we informed SCI that the July 2015 funding did not appear to be accounted for. After investigating the issue, SCI found that the funds had been misallocated by Imperial College to a different part of the college. SCI did not receive the funds until April 2016. SCI has asked Imperial College why the error occurred, but has not yet received a substantive response. • Underreporting available funding from DFID: In October 2015, SCI sent us its target treatment numbers for each national deworming program it supports, amounts of funding available from DFID and other large donors, and the amounts of additional funding required to deliver the targeted number of treatments and cover central expenditures for its April 2016 to March 2017 budget year. In March 2016, SCI sent GiveWell documents that indicated that around$1.5 million more funding was available from DFID to allocate to SCI’s 2016-2017 budget year than indicated in the October 2015 document. SCI told us that the October 2015 document included funding that was available from DFID to allocate to national deworming programs, but omitted $1.5 million in funding available from DFID to allocate to SCI’s central expenditures. We consider both of these errors to be substantial. We are uncertain whether SCI would have ever received the funding from donations we collected on SCI’s behalf between February and May 2015 if we had not brought the issue to SCI’s attention. Our room for more funding analysis is a major factor in determining our funding recommendations to donors and to Good Ventures; an overestimation of SCI’s room for more funding by$1.5 million could have caused us to recommend donations to SCI that would have been better allocated to filling other funding gaps.

Our communication with SCI

Although we think that the financial errors we learned about in 2016 were substantial, we believe that it is a good sign that we were able to learn of these errors by communicating with SCI. In the past, we’ve noted that we’ve struggled to communicate effectively with SCI’s representatives, which sometimes meant that we were unable to clear up our confusion about inconsistencies we found in SCI’s documents.

We also feel that we’ve communicated clearly with SCI about its plans for the upcoming year and gained a better understanding of the factors that limit the delivery of additional deworming treatments in different contexts.

Bottom line

Given the improvements, we continue to recommend SCI now and think that SCI is a contender for a top charity recommendation at the end of 2016. We plan, in the second half of 2016, to expand the scope of our research on SCI to include looking at recent monitoring and evaluation, cost per treatment, and room for more funding in 2017 and beyond. We continue to have some concerns about SCI’s financial reporting and management (most notably, the errors noted above) and will be following up with SCI about our outstanding questions.

The post What we’ve learned about SCI this year appeared first on The GiveWell Blog.

### Our updated top charities for giving season 2015

Wed, 11/18/2015 - 14:06

We have refreshed our top charity rankings and recommendations. Our set of top charities and standouts is the same as last year’s, but we have introduced rankings and changed our recommended funding allocation, due to a variety of updates – particularly to our top charities’ room for more funding. In particular, we are recommending that Good Ventures, a foundation with which we work closely, support our top charities at a higher level than in previous years. This post includes our recommendations to Good Ventures, and gives our recommendations to individual donors after accounting for these grants.

Overall, we think the case for our top charities is stronger than in previous years, and room for more funding is greater.

Our top charities and recommendations for donors, in brief

Top charities

1. Against Malaria Foundation (AMF)
2. Schistosomiasis Control Initiative (SCI)
3. Deworm the World Initiative, led by Evidence Action
4. GiveDirectly

This year, we are ranking our top charities based on what we see as the value of filling their remaining funding gaps. Unlike in previous years, we do not feel a particular need for individuals to divide their allocation between the charities, since we are recommending that Good Ventures provide significant support to each. For those seeking our recommended allocation, we simply recommend giving to the top-ranked charity on the list, which is AMF.

Our recommendation takes the grants we are recommending to Good Ventures into account, as well as accounting for charities’ existing cash on hand and expected non-GiveWell-related fundraising, and recommends charities according to how much good additional donations (beyond these sources of funds) can do. (Otherwise, as explained below, Deworm the World would be ranked higher.) Thus, AMF’s #1 ranking is not based on its overall value as an organization, but based on the value of its remaining funding gap.

Standout charities

As with last year, we also provide a list of charities that we believe are strong standouts, though not at the same level (in terms of likely good accomplished per dollar) as our top charities. They are not ranked, and are listed in alphabetical order.

Below, we provide:

• An explanation of major changes in the past year that are not specific to any one charity. More
• A summary of our top charities’ relative strengths and weaknesses, and how we would rank them if room for more funding were not an issue. More
• A discussion of our refined approach to room for more funding. More
• The recommendations we are making to Good Ventures, and how we rank our top charities after taking these grants (and their impact on room for more funding) into account. More
• Detail on each of our top charities, including major changes over the past year, strengths and weaknesses for each, and our understanding of each organization’s room for more funding. More
• The process we followed that led to these recommendations. More
• A brief update on giving to support GiveWell’s operations vs. giving to our top charities. More

Conference call to discuss recommendations

We are planning to hold a conference call at 5:30pm ET/2:30pm PT on Tuesday, December 1st to discuss our recommendations and answer questions.

If you’d like to join the call, please register using this online form. If you can’t make this date but would be interested in joining another call at a later date, please indicate this on the registration form.

Major changes in the last 12 months

Below, we summarize the major causes of changes to our recommendations (since last year).

Overall, the case for our top charities is stronger than it was in past years. The Deworm the World Initiative shared new monitoring and evaluation materials with us, so we are more confident than we were a year ago that it is a strong organization implementing high-quality programs. In addition, the extra year of work we have seen from AMF and GiveDirectly bolsters our view that they will be able to utilize additional funding effectively.

Our top charities have increased room for more funding. Last year, we expected donors following our recommendations to fully fill the most critical funding gaps of our top charities (excluding GiveDirectly) because they had limited room for more funding: GiveDirectly had a total funding gap of ~$40 million and our other three top charities had a total gap of ~$18 million. This year, all of our top charities have more room for more funding. We believe that GiveDirectly could absorb more than $80 million and other top charities together could collectively utilize more than$100 million. We do not expect donors following our recommendations to fully fill these gaps.

We are recommending that Good Ventures make larger grants to top charities. For reasons we will be detailing in a future post, we are recommending that Good Ventures make substantial grants to our top charities this year, though not enough to close their funding gaps.

Continued refinement of the concept of “room for more funding.” We’ve tried to create a much more systematic and detailed room for more funding analysis, because the stakes of this analysis have become higher due to (a) increased room for more funding across the board and (b) increased interest from Good Ventures in providing major support.

In past years, we’ve discussed charities’ room for more funding as a single figure without distinguishing between (a) the amount the charity would spend in the next 12 months, (b) the amount the charity needs to prevent it from slowing its work due to lack of funds, and (c) funding that would be especially important to the organization’s development and success (a dual benefit) in addition to expanding implementation of its program. This year, we’ve made three changes to our room for more funding analysis:

• We’ve made (a) an assessment of whether additional funds merely allow a charity to implement its program (“execution”) or (b) whether additional funds would be especially important to the charity’s development and success as an organization (“capacity-relevant”). We also explicitly note the role of incentives for meeting GiveWell’s top-charity criteria in our recommendations (we seek to ensure that each top charity receives at least $1 million, to encourage other organizations to seek to meet these criteria). • We are explicitly assessing “execution”-related room for more funding based on our estimate of the probability that lack of funding will lead to a charity slowing its progress. We distinguish between Level 1, Level 2, and Level 3 “execution” funding gaps; a higher number means the money is less likely to be needed. • We are now ranking “funding gaps,” not just ranking charities, because the first million dollars to a charity can have a very different impact from, e.g., the 20th million dollars. For example, if Charity A accomplishes more good per dollar with its programs than Charity B, we would rank Charity A above Charity B for a given type of gap (we would rank Charity A’s “Execution Level 1” gap above Charity B’s), but we might rank Charity B’s “Execution Level 1” gap (the amount of funding it will likely need) above Charity A’s “Execution Level 3” gap (the amount of funding gap it might, but probably will not, need to carry out more of its programs in the coming year). We discuss these ideas in greater depth below. Summary of key considerations for top charities The table below summarizes the key considerations for our four top charities. More detail is provided below as well as in the charity reviews. Consideration AMF Deworm the World GiveDirectly SCI Program estimated cost-effectiveness (relative to cash transfers) ~10x ~10x Baseline ~5x Directness and robustness of the case for impact Strong Moderate Strongest Moderate Transparency and communication Strong Strong Strongest Weakest Ongoing monitoring and likelihood of detecting future problems Strong Strong Strongest Weakest Organizational track record of rolling out program Moderate Moderate Strong Strong Room for more funding, after accounting for grants we are recommending to Good Ventures (more below) Very high Limited Very high High Overall, our ranking of the charities with room for more funding issues set aside (just considering a hypothetical dollar spent by the charity on its programs, without the “capacity-relevant funding” and “incentives” issues discussed below) would be: 1. AMF and Deworm the World 3. SCI 4. GiveDirectly However, when we factor in room for more funding (including the impact of the grants we’re recommending to Good Ventures), the picture changes. More on this below. Room for more funding analysis Capacity-relevant funding and incentives Capacity-relevant funding: additional funding can sometimes be crucial for a charity’s development and success as an organization. For example, it can contribute to a charity’s ability to experiment, expand, and ultimately have greater room for more funding over the long run. It can also be important for a charity’s ability to raise funds from non-GiveWell donors, which can be an important source of long-term leverage and can put the organization in a stronger overall position. We think of this sort of funding gap as particularly important to fill, because it can make a big difference over the long run; in particular, it may substantially affect the long-term quality of our giving recommendations. “Capacity-relevant” funds can include (a) funds that are explicitly targeted at growth (e.g., funds to hire fundraising staff); (b) funds that enable a charity to expand into areas it hasn’t worked in before, which can lead to important learning about whether and how the charity can operate in the new location(s); and (c) funds that would be needed in order to avoid challenging contractions in a charity’s activities which could jeopardize the charity’s long-term growth and funding prospects. Some specific examples: • The grant that Good Ventures made to GiveDirectly earlier this year is capacity-relevant because it will be used for: (a) building a fundraising team that will aim to raise substantial donations from non-GiveWell donors, and (b) developing partnerships with bilateral donors and local governments to deliver cash transfers or to run experiments comparing standard aid programs to cash transfers. • Early funding that GiveDirectly received was capacity-relevant because it enabled GiveDirectly to rapidly grow from a small organization moving a few hundred thousand dollars per year to a much larger organization moving more than$10 million per year. If this funding hadn’t been forthcoming, GiveDirectly might be much smaller today and have much less room for more funding.
• We now think that some additional funding to AMF and Deworm the World will be capacity-relevant because each organization has only operated in a very small number of countries and new funding will enable each to enter new countries. This will allow them to learn how to operate there, and demonstrate that they can do so, increasing our willingness (and likely that of other donors) to recommend more to these organizations in the future.

It’s hard to draw sharp lines around capacity-relevant funding, and all funding likely has some effect on an organization’s development, but we have tried to identify and prioritize the funding gaps that seem especially relevant.

Execution funding allows charities to implement more of their core program but doesn’t appear to have substantial benefits beyond the direct good accomplished by this program. We’ve separated this funding into three levels:

• Level 1: the amount we expect a charity to need in the coming year. If a charity has less funding than this level, we think it is more likely than not that it will be bottlenecked (or unable to carry out its core program to the fullest extent) by funding in the coming year.
• Level 2: if a charity has this amount, we think there is an ~80% chance that it will not be bottlenecked by funding.
• Level 3: if a charity has this amount, we think there is a ~95% chance that it will not be bottlenecked by funding.

Incentives: we think it is important that charities we recommend get a substantial amount of funding due to being a GiveWell top charity, because this ensures that incentives are in place for charities (and potential charity founders) to seek to meet our criteria for top charities and thus increase the number of charities we recommend and the total room for more funding available, even when they don’t end up being ranked #1. We seek to ensure that each top charity gets at least $1 million as a result of our recommendation, and we consider this to be a high-priority goal of our recommendations. The charity-specific sections of this post discuss the reasoning behind the figures we’ve assigned to “capacity-relevant” and “Execution Level 1” gaps, but they do not provide the full details of how we arrived at these figures (and do not explicitly address the “Execution Level 2” and “Execution Level 3” gaps). We expect to add this analysis to our charity reviews in the coming weeks. Funding gaps The total (i.e., Capacity-relevant, Execution Levels 1, 2, and 3, and Incentive) funding gaps (in millions of dollars, rounded to one decimal place) for each of our top charities are: • AMF:$98.2
• Deworm the World: $19.0 • GiveDirectly:$84.0
• SCI: $26.3 However, for reasons described above, the first million dollars to a charity can have a very different impact from, e.g., the 20th million dollars. Accordingly, we have created a ranking of individual funding gaps that accounts for both (a) the quality of the charity and the good accomplished by its program, per dollar (as laid out above), and (b) whether a given level of funding is capacity-relevant and whether it is highly or only marginally likely to be needed in the coming year. The below table lays out our ranking of funding gaps. When gaps have the same “Priority,” this indicates that they are tied. The table below includes the amount we are recommending to Good Ventures. For reasons we will lay out in another post, we are recommending to Good Ventures a total of ~$44.4 million in grants to top charities. Having set that total, we are recommending that Good Ventures start with funding the highest-rated gaps and work its way down, in order to accomplish as much good as possible.

When gaps are tied, we recommend filling them by giving each equal dollar amounts until one is filled, and then following the same procedure with the remaining tied gaps. See footnote for more.*

Priority Charity Amount Type Recommendation to Good Ventures Comments 1 DtWI $7.6 Capacity-relevant$7.6 DtWI and AMF are strongest overall 1 AMF $6.5 Capacity-relevant$6.5 See above 1 GD $1.0 Incentive$1.0 Ensuring each top charity receives at least $1 million 1 SCI$1.0 Incentive $1.0 Ensuring each top charity receives at least$1 million 2 GD $8.8 Capacity-relevant$8.8 Not as cost-effective as bednets or deworming, so lower priority, but above non-capacity-relevant gaps 2 DtWI $3.2 Execution Level 2 / possibly capacity-relevant$3.2 Level 1 gap already filled via “capacity-relevant” gap. See footnote for more** 2 AMF $43.8 Execution Level 1$16.3 Exhausts remaining recommendations to Good Ventures 3 SCI $4.9 Execution Level 1 0 Not as strong as DtWI and AMF in isolation, so ranked below them for same type of gap 3 AMF$24.0 Execution Level 2 0 – 4 DtWI $8.2 Execution Level 3 0 – 4 AMF$24.0 Execution Level 3 0 – 4 SCI $11.6 Execution Level 2 0 – 5 GD$24.8 Execution Level 1 0 – 5 SCI $8.8 Execution Level 3 0 – 6 GD$20.9 Execution Level 2 0 – 7 GD $28.6 Execution Level 3 0 – Our recommendations to Good Ventures and others Summing the figures from the above table, we are recommending that Good Ventures make the following grants (in millions of dollars, rounded to one decimal place): • AMF:$22.8
• Deworm the World: $10.8 • GiveDirectly:$9.8
• SCI: $1 We also recommend that Good Ventures give$250,000 to each of our standout charities. These grants go to the outstanding organizations and create additional incentives for groups to try to obtain a GiveWell recommendation.

After these grants, AMF will require an additional ~$27.5 million to close its Execution Level 1 gap (i.e., to make it more likely than not that it is able to proceed without being bottlenecked due to lack of funding). We rank this gap higher than any of the other remaining funding gaps for our top charities, as laid out in the table above. We estimate that non-Good Ventures donors will give approximately$15 million between now and January 31, 2016. Because we do not expect AMF’s remaining ~$27.5 million Execution Level 1 funding gap to be fully filled, we rank it #1 and recommend that donors give to AMF. We rank the remaining charities for donors who are interested in having the greatest impact per dollar based on how highly their highest-rated remaining gap ranks in the table above. That results in the following rankings for individual donors: 1. AMF 2. SCI 3. Deworm the World Initiative 4. GiveDirectly Details on top charities We present information on our top charities in alphabetical order. Against Malaria Foundation (AMF) Our full review of AMF is here. Background AMF (www.againstmalaria.com) provides funding for long-lasting insecticide-treated net distributions (for protection against malaria) in developing countries. There is strong evidence that distributing nets reduces child mortality and malaria cases. AMF has relatively strong reporting requirements for its distribution partners and provides a level of public disclosure and tracking of distributions that we have not seen from any other net distribution charity. In 2011, AMF received a large amount of funding relative to what it had received historically, so it began to focus primarily on reaching agreements for large-scale net distributions (i.e., distributions on the order of hundreds of thousands of nets rather than tens of thousands of nets). In its early efforts to scale up, AMF struggled to finalize large-scale net distribution agreements. At the end of 2013, we announced that we planned not to recommend additional donations to AMF due to room for more funding-related issues (more detail in this blog post). In 2014, AMF committed most of its funds to several new distributions — some in Malawi, some in the Democratic Republic of the Congo (DRC) — and we recommended it as a top charity again. Important changes in the last 12 months Previously, our confidence in AMF’s ability to scale had been limited by the fact that it had only completed large-scale distributions with one partner (Concern Universal) in one country (Malawi). However, AMF carried out its largest distribution to date (~620,000 nets) with a new partner in the DRC in late 2014. We have not yet seen some key documentation from the large DRC distribution, but early indications suggest that the distribution generally went as planned, despite our concern that the DRC may have been an especially challenging place to work (more details here). We see this as a positive update that AMF will be able to carry out high-quality large-scale distributions in a variety of locations in the future. AMF has continued to collect and share follow-up information on its past large-scale distributions, and this information seems to support the notion that these distributions are high-quality (i.e., that nets are reaching the target population and are being used). We provide a summary of these reports in our review. Funding gap AMF currently holds$18.5 million, and we estimate it will receive an additional $1.6 million before January 31, 2016 (excluding donations influenced by GiveWell) that it could use for future distributions. AMF has told us that it has a pipeline of possible future net distributions that add up to roughly$100 million beyond what it currently holds (details in our review).

We believe that AMF’s progress would be slowed due to lack of funding were it to receive less than $50.3 million in additional funding (this is its total capacity-relevant and “Execution Level 1” gap as presented earlier in the post). In particular, we view the first additional$6.5 million that AMF would receive as capacity-relevant (and thus particularly valuable) because it would enable AMF to fund a distribution in a 5th country with a 5th partner, generating additional information about its ability to expand beyond the contexts in which it has worked to date. (Note that AMF already has funds on hand to enter its 3rd and 4th countries.)

We arrived at the capacity-relevant and Execution Level 1 figure by noting that AMF has $70.4 million worth of deals it is actively negotiating (5 deals in 4 countries) that it can only continue with if it holds the funds to do so. Subtracting the$20.1 million we expect to be available (the $18.5 million it currently holds plus the$1.6 million we expect it to receive in the coming months) leaves a $50.3 million funding gap. AMF failed to reach new distribution agreements in 2015; there is still significant uncertainty regarding AMF’s ability to finalize agreements with new partners and countries. Nevertheless, we see providing a large amount of additional funds to AMF as a reasonable bet, and see AMF as a very strong giving opportunity. We think it is possible that in November 2016 (when we next expect to complete a full refresh of our recommendations), we will recommend significantly less funding to AMF. We consider the funding we’re recommending to AMF now to be a good bet, but a risky one, because AMF currently has a relatively limited track record: it has worked with only two partners in two countries. Because of the lag between the time we provide funding and the time net distributions take place (often 2 years) and the additional lag caused by the time it takes to monitor distributions, we may not have additional information about whether or not AMF’s additional distributions were successful for 2-3 years. Next year, it is possible that we will choose to recommend significantly less funding to AMF while we wait for additional data to become available. There still appears to be a large global funding gap for bednets; a global bednet coordination group estimated that about 245 million additional nets would be needed in 2015-2017 (details in our review). Key considerations: • Program impact and cost-effectiveness. We estimate that bednets are ~10x as cost-effective as cash transfers. Our estimates are subject to substantial uncertainty. All of our cost-effectiveness analyses are available here. Our 2015 cost-effectiveness file is available here (.xlsx). • Directness and robustness of the case for impact. We believe that the connection between AMF receiving funds and those funds helping very poor individuals is less direct than GiveDirectly’s and more direct than SCI’s or Deworm the World’s. The uncertainty of our estimates is driven by a combination of AMF’s challenges historically disbursing the funds it receives and a general recognition that aid programs, even those as straightforward as bednets, carry significant risks of failure via ineffective use of nets, insecticide resistance, or other risks we don’t yet recognize relative to GiveDirectly’s program. AMF conducts extensive monitoring of its program; these results have generally indicated that people use the nets they receive. • Transparency and communication. AMF has been extremely communicative and open with us. We feel we have a better understanding of AMF than of SCI, and a similar level of knowledge about AMF as we have for Deworm the World, though our understanding is not as strong as our understanding of GiveDirectly. In particular, were something to go wrong in one of AMF’s distributions, we believe we would eventually find out (something we are not sure of in the case of SCI), but we believe our understanding would be less quick and complete than it would be for problems associated with GiveDirectly’s program (which has more of a track record of consistent intensive follow-up). • Risks: • We are not highly confident that AMF will be able to finalize additional distributions and do so quickly. AMF could struggle again to agree to distribution deals, leading to long delays before it spends funds. We view this as a relatively minor risk because the likely worst-case scenario is that AMF spends the funds slowly (or returns funds to donors). • We remain concerned about the possibility of resistance to the insecticides used in bednets. There don’t appear to be major updates on this front since our 2012 investigation into the matter; we take the lack of major news as a minor positive update. Our full review of AMF is here. Deworm the World Initiative, led by Evidence Action Our full review of Deworm the World is here. Background Deworm the World (www.evidenceaction.org/deworming), led by Evidence Action, advocates for, supports, and evaluates government-run school-based deworming programs (treating children for intestinal parasites). We believe that deworming is a program backed by relatively strong evidence. We have reservations about the evidence, but we think the potential benefits are great enough, and costs low enough, to outweigh these reservations. Deworm the World retains monitors whose reports indicate that the deworming programs it supports successfully deworm children. Important changes in the last 12 months In 2015, Deworm the World continued to support the scale-up and monitoring of deworming programs in India and Kenya. One of its notable activities this year was providing technical assistance to the Indian national government in support of India’s first national deworming day: a program in which the government provided assistance to Indian states to implement school-based deworming on a single day to encourage more states to implement the program. The first national deworming day took place in February 2015, and 12 states participated in the program (more details here). The quality of the monitoring that we saw from Deworm the World improved in 2015. Deworm the World continued to hire and train third-party monitors to directly observe deworming activities, and it slightly improved its estimates of how many children were treated. This information strongly suggests that the programs are generally operating as intended. More details in our review. Last year, Deworm the World stated to us that it could not use significant additional funding to scale up deworming programs. Deworm the World now believes that it has identified countries where it could use additional funds to support the scale-up of deworming programs, beginning with a potential program in Punjab province, Pakistan (more). (Deworm the World also plans to use funds it already holds or expects to receive to expand into Ethiopia and Nigeria.) Future donations to Deworm the World will likely be used outside of India, and in those cases governments may have less funding to support deworming. This may cause Deworm the World to pay a higher fraction of the overall cost of the program, making the potential for leverage of future donations more limited. Overall program costs may also be higher outside of India. More details in our review. A significant organizational update is that Alix Zwane stepped down as Executive Director of Evidence Action in August; she left to join the Global Innovation Fund as CEO. Evidence Action has since hired Jeff Brown (formerly Interim CEO of the Global Innovation Fund) as Executive Director. Grace Hollister remains Director of the Deworm the World Initiative. Overall, our impression is that Dr. Zwane has been a highly effective leader of Evidence Action and her departure risks disruptions that could lead to us changing our view of the organization, though we would guess that this will not be the case. In July, researchers published two new analyses of a key study regarding deworming (the most important piece of evidence we rely on), and the Cochrane Collaboration published an updated review of the evidence for mass deworming programs. The new papers did not change our overall assessment of the evidence on deworming. More in our blog post. Funding gap We believe that Deworm the World has significant opportunities to use additional funding to expand its program. We believe it may have opportunities to enter at least two more countries (in addition to Nigeria and Ethiopia, which it will be able to enter with funds it already has or expects to receive). We estimate its funding need using the two countries it is most likely to enter — Pakistan and Nepal — though note that in both cases, we see these as representative of the types of opportunities it may have, rather than the specific opportunities we expect it to take. Altogether, Deworm the World estimates that it would need$11.25 million to commit to fully funding three years of deworming programs in both countries. Because it holds (or expects to receive shortly) funding that will total $3.6 million, we estimate its funding gap for this work at$7.6 million.

Funding this gap is capacity-relevant, and is therefore a high priority, because we would like to see Deworm the World try to work in additional countries beyond India and Kenya, where it has worked historically. Next year, Deworm the World will also enter Nigeria and Ethiopia (with funding already available), so it will likely end the year having had some experience in five or more countries. This could substantially increase Deworm the World’s long-term room for more funding.

A complicating factor in thinking about Deworm the World’s funding gap is that Deworm the World is part of a larger organization, Evidence Action. Funding for Deworm the World may be fungible with funding for Evidence Action’s other activities, such as its Dispensers for Safe Water initiative (which we believe to be substantially less cost-effective than deworming). Because of this, it is difficult to determine Deworm the World’s true funding gap, and it is possible that some additional funds given to support Deworm the World could effectively lead to additional funds for a non-Deworm the World project. We understand that Evidence Action has received approximately $2.4 million in unrestricted funding over the past year. Fully funding Deworm the World could potentially cause Evidence Action to redirect some or all of these funds to its other programs. More details on all of the above are in our review. Key considerations: • Program impact and cost-effectiveness. We estimate that Deworm the World-associated deworming programs are ~10x as cost-effective as cash transfers. Our estimates are subject to substantial uncertainty. It’s important to note that we view deworming as high expected value, but this is due to a relatively low probability of very high impact. Most GiveWell staff members would agree that deworming programs are more likely than not to have very little or no impact, but there is some possibility that they have a very large impact. (Our cost-effectiveness model implies that most staff members believe there is at most a 1-2% chance that deworming programs conducted today have similar impacts to those directly implied by the randomized controlled trials on which we rely most heavily, which differed from modern-day deworming programs in a number of important ways.) Our 2015 cost-effectiveness file is available here (.xlsx). • Directness and robustness of the case for impact. Deworm the World doesn’t carry out deworming programs itself; it advocates for and provides technical assistance to governments implementing deworming programs, making direct assessments of its impact challenging. We have seen evidence that strongly suggests that Deworm the World-supported programs successfully deworm children. While we believe Deworm the World is impactful, our evidence is limited, and in addition, there is always a risk that future expansions will prove more difficult than past ones. • Transparency and communication. Deworm the World has been communicative and open with us. We believe that were something major to go wrong with Deworm the World’s work, we would be able to learn about it and report on it. • Risks: • Deworm the World is part of a larger organization, Evidence Action. It is possible that some additional funds given to support Deworm the World could effectively lead to additional funds for a non-Deworm the World project due to fungibility. Also, changes that affect Evidence Action (and its other programs) could indirectly impact Deworm the World. For example, if a major event occurs (either positive or negative) for Evidence Action, it is likely that it would reduce the time some staff could devote to Deworm the World. • Deworm the World is now largely raising funds to support programs that will be carried out under a different model in new countries, which makes it harder for us to predict future success based on historical results and may make it harder to understand and quantify Deworm the World’s impact even after the program is completed. Our full review of Deworm the World is here. GiveDirectly Our full review of GiveDirectly is here. Background GiveDirectly (www.givedirectly.org) transfers cash to households in developing countries via mobile phone-linked payment services. It targets extremely low-income households. The proportion of total expenses that GiveDirectly has delivered directly to recipients is approximately 85% overall. We believe that this approach faces an unusually low burden of proof, and that the available evidence supports the idea that unconditional cash transfers significantly help people. We believe GiveDirectly to be an exceptionally strong and effective organization, even more so than our other top charities. It has invested heavily in self-evaluation from the start, scaled up quickly, and communicated with us clearly. It appears that GiveDirectly has been effective at delivering cash to low-income households. GiveDirectly has one major randomized controlled trial (RCT) of its impact and took the unusual step of making the details of this study public before data was collected (more). It continues to experiment heavily, to the point where every recipient is enrolled in a study or a campaign variation. Important changes in the last 12 months GiveDirectly continued to scale up significantly, utilizing most of the funding it received at the end of last year. It continued to share informative and detailed monitoring information with us. Overall, it grew its operations while maintaining the high quality of its program. In August, Good Ventures granted$25 million to GiveDirectly to support potentially high-upside opportunities, such as (a) building a fundraising team that will aim to raise substantial donations from non-GiveWell donors, and (b) developing partnerships with bilateral donors and local governments to deliver cash transfers or to run experiments comparing standard aid programs to cash transfers.

GiveDirectly’s increased efforts to network with potential government and donor partners have led to some results in 2015. For example, GiveDirectly will be implementing cash transfers in a randomized controlled trial in Rwanda that will be funded by a bilateral aid donor and Google. The study will test cash transfers against another still-to-be-chosen aid program. GiveDirectly is currently in several preliminary conversations with partners for similarly large projects in the future.

Funding gap

GiveDirectly believes it could move a total of ~$94 million to poor households in the year following March 1, 2016, for which it expects to have ~$12.6 million available by March 1. We have classified ~$34.5 million of this as the total “Execution Level 1,” capacity-relevant, and incentive funding gap (more on what this means above). We arrived at this figure by assuming that GiveDirectly could double its operations in Kenya (from ~$16.5 million/year to ~$33 million/year) and scale up to ~$12.1 million/year in Uganda. This would cost a total of ~$45.1 million, of which GiveDirectly already has ~$10.6 million on hand (ignoring $2 million that we exclude due to donor coordination issues), which results in a ~$34.5 million gap.

We’ve classified some of this as a “capacity-relevant” funding gap for our purposes (making it higher priority). First, we view the ~$12.1 million it would hope to spend in Uganda as capacity-relevant, in the sense that providing it could make a major difference to GiveDirectly’s long-term development. GiveDirectly told us that operating in Uganda is more challenging than in Kenya and that it expects to learn a significant amount as it grows. It is therefore planning to grow more slowly in Uganda than it did in Kenya. GiveDirectly made two arguments for Uganda being important for its long-term trajectory: 1. If GiveDirectly lost the ability to operate in Kenya, this would significantly diminish its ability to move funds out the door. Operating in Uganda is an important hedge against this risk. 2. Kenya is a particularly easy environment in which to operate because of the existence of M-PESA, a powerful and ubiquitous provider that enables GiveDirectly to transfer funds to recipients via mobile phones. The mobile payments network is significantly less developed outside of Kenya. As such, Uganda offers an important test case for operating in a more standard environment, which could be particularly valuable to GiveDirectly as it encourages aid agencies and country governments to expand direct cash assistance. It’s harder to estimate how much of the Kenya funding needs are properly classified as “capacity-relevant” (an important distinction for our purposes, as discussed above). We guess that were GiveDirectly to be operating at a level 50% its current size (such that it only spent ~$8.25 million/year in Kenya), it would be able to build capacity from that level to its current level (and beyond) as quickly as it did in its recent past. We therefore classify ~$8.25 million of the ~$16.5 million it hopes to spend in Kenya as “capacity-relevant” and ~$8.25 million as “execution.” We note that we are highly uncertain about these estimates and that were GiveDirectly to receive no additional funding, this would cause it to contract in Kenya and lay off some of its middle management, an action that would cause it to incur reasonably high costs; we think much more contraction than that would be significantly more challenging for GiveDirectly as an organization. Based on the above, and based on GiveDirectly’s existing available funds (with some adjustments for coordination issues, along the lines of this discussion from last year) we estimate that GiveDirectly has ~$9.8 million worth of unfunded opportunities that we ought to classify as capacity-relevant or incentive funding. (We arrive at this estimate based on: ~$20.35 million (total amount we classify as capacity-relevant from Kenya and Uganda) – ~$10.6 million (funds on hand, excluding donations we ignore due to coordination issues) = ~$9.75 million.) Longer-term, we expect to continue to view funding ~$8.25 million in Kenya as capacity-relevant support and would expect to consider future expansion in Uganda (up to the current level of Kenya, i.e., ~$16.5 million/year) capacity-relevant, as well. Once GiveDirectly reaches ~$16.5 million in Uganda and proves that it can operate at that level, we only expect to view ~$8.25 million as capacity-relevant and hope that it can raise funds from other sources to support its work. More details in our review. Key considerations: • Program impact and cost-effectiveness. Our best guess is that deworming or distributing bednets achieves ~10x times more humanitarian benefit per dollar donated than cash transfers. Our estimates are subject to substantial uncertainty. All of our cost-effectiveness analyses are available here. Our 2015 cost-effectiveness file is available here (.xlsx). • Directness and robustness of the case for impact. GiveDirectly collects and shares a significant amount of relevant information about its activities. The data it collects show that it successfully directs cash to very poor people, that recipients generally spend funds productively (sometimes on food, clothing, or school fees, other times on investments in a business or home infrastructure), and that it leads to very low levels of interpersonal conflict and tension. We are more confident in the impact of GiveDirectly’s work than in that of any of the other charities discussed in this post; we believe that cash transfers face a lower burden of proof than other interventions. • Transparency and communication. GiveDirectly has always communicated clearly and openly with us. It has tended to raise problems to us before we ask about them, and we generally believe that we have a very clear view of its operations. We feel more confident about our ability to keep track of future challenges than with any of the other charities discussed in this post. • Risks: • GiveDirectly has scaled (and hopes to continue to scale) quickly. Thus far, it has significantly increased the amount of money it can move with limited issues as a result. The case of staff fraud that GiveDirectly detected is one example of an issue possibly caused by its pace of scaling, but its response demonstrated the transparency and rigor we expect. Our full review of GiveDirectly is here. Schistosomiasis Control Initiative (SCI) Our full review of SCI is here. Background SCI (www3.imperial.ac.uk/schisto) works with governments in sub-Saharan Africa to create or scale up deworming programs (treating children for schistosomiasis and other intestinal parasites). SCI’s role has primarily been to identify recipient countries, provide funding to governments for government-implemented programs, provide advisory support, and conduct research on the process and outcomes of the programs. Despite SCI sharing a number of spending reports with us, we do not feel we have a detailed and fully accurate picture of how SCI and the governments it supports have spent funds in the past. We don’t feel that SCI has ever purposefully been indirect with us, but we have often struggled to communicate effectively with SCI representatives. We still lack important and in some cases basic information about SCI’s finances, and we find this problematic. We believe that deworming is a program backed by relatively strong evidence. We have reservations about the evidence, but we think the potential benefits are great enough, and costs low enough, to outweigh these reservations. SCI has conducted studies in about half of the countries it works in (including the countries with the largest programs) to determine whether its programs have reached a large proportion of children targeted. These studies have generally found moderately positive results, but have some methodological limitations. Important changes in the last 12 months SCI reports that it has continued to scale up its deworming programs and that it has supported some programs in new countries, though we have limited monitoring information from these programs (e.g., we have not seen monitoring from its programs in Ethiopia, Sudan, Madagascar, and the DRC). This year, SCI has shared a few more coverage surveys that found reasonably high coverage of its programs. We have continued to have communication challenges with SCI. In particular: • We have a limited understanding of SCI’s work because we still lack important and basic information about how SCI spends money. SCI recognizes that its financial management system is disorganized, and some spending reports that SCI has sent us have contained errors. • We have struggled to gain a confident understanding of how SCI will use additional funds, and we cannot check how its funds were used after the fact because we lack information about its spending. In some cases, SCI has not spent additional funds as expected and it is unclear what caused the shift (more detail on one example in our August 2015 update). In July, researchers published two new analyses of a key study regarding deworming (the most important piece of evidence we rely on), and the Cochrane Collaboration published an updated review of the evidence for mass deworming programs. The new papers did not change our overall assessment of the evidence on deworming. More in our blog post. Funding gap SCI estimates that it would use the following amounts of unrestricted funding in each of the next three years (in millions of US dollars): • April 2016 – March 2017:$9.5
• April 2017 – March 2018: $13.6 • April 2018 – March 2019:$13.3

Our impression is that GiveWell-influenced donors contribute most of SCI’s unrestricted funds.

Our best guess is that, excluding the funds SCI may receive due to GiveWell’s recommendation, SCI will hold approximately $1.5 million in April 2016 that it could allocate to the above gaps. Also, after SCI set its fundraising targets, a funder committed$6 million over the next three years ($2 million per year) to deworming programs in Ethiopia, with which SCI is involved. Our best guess is that this funding reduces SCI’s “Execution Level 1” and incentive funding gap for the coming year from$9.5 million to $5.9 million. (We arrive at this estimate by subtracting ~$1.5 million and another $2 million from the total Level 1/incentive gap for the coming year). We do not classify any of this as “capacity-relevant” because we have little understanding of how it will be spent, and we do not expect to be able to understand how it was spent after the fact, either. More details on SCI’s funding gap are in our review. Key considerations: • Program impact and cost-effectiveness. Our best guess is that deworming programs implemented by SCI are ~5x as cost-effective as cash transfers. Our estimates are subject to substantial uncertainty. It’s important to note that we view deworming as high expected value, but this is due to a relatively low probability of very high impact. Most GiveWell staff members would agree that deworming programs are more likely than not to have very little or no impact, but there is some possibility that they have a very large impact. (Our cost-effectiveness model implies that most staff members believe there is at most a 1-2% chance that deworming programs conducted today have similar impacts to those directly implied by the randomized controlled trials on which we rely most heavily, which differed from modern-day deworming programs in a number of important ways.) Our 2015 cost-effectiveness file is available here (.xlsx). • Directness and robustness of the case for impact. SCI doesn’t carry out deworming programs itself; it advocates for and provides technical assistance to governments implementing deworming programs, making direct assessments of its impact challenging. We have seen some evidence demonstrating that SCI-supported programs successfully deworm children, though this evidence is relatively thin. Nevertheless, deworming is a relatively straightforward program, and we think it is likely (though far from certain) that SCI-supported deworming programs successfully deworm people. We have had difficulties communicating with SCI, which has reduced our ability to understand it. We have also spent significant time interviewing SCI staff and reviewing documents over the past 6 years and have found minor but not major concerns. • Transparency and communication. We don’t feel that SCI has ever purposefully been indirect with us, but we have often struggled to communicate effectively with SCI representatives. Specifically, (a) we had a major miscommunication with SCI about the meaning of its self-evaluations (more) and (b) although we have spent significant time with SCI, we remain unsure how SCI has spent funds and how much funding it has available (and we believe SCI itself does not have a clear understanding of this). Importantly, if there is a future unanticipated problem with SCI’s programs, we don’t feel confident that we will become aware of it. This contrasts with our other top charities, which we feel we have a strong ability to follow up on. • Risks: There are significantly more unknown risks with SCI than our other top charities due to our limited understanding of its activities. Our full review of SCI is here. Standouts As we did last year, we recommend four organizations as “standouts.” These charities score well on some of our criteria, but we are not confident enough in them to name them top charities. This year, we retain the same four standout organizations: Development Media International (DMI), the Global Alliance for Improved Nutrition’s Universal Salt Iodization program (GAIN-USI), the Iodine Global Network (IGN), and Living Goods. We followed all four of these charities in 2015, but have only published an updated review for DMI. We expect to publish updated reviews for GAIN-USI, IGN, and Living Goods in the near future. We provide brief updates on these charities below: • DMI. DMI produces radio and television programming in developing countries that encourages people to adopt improved health practices. It is a standout because of its commitment to monitoring and the possibility that it is implementing a highly cost-effective program. DMI has recently completed a randomized controlled trial of its program. Last year, we had midline results from this trial, which generally looked promising.In November 2015, DMI privately shared preliminary endline results from the RCT. These results did not find any effect of DMI’s program on child mortality, and found substantially less effect on behavior change than was found in the midline results. We (understandably) cannot publicly discuss the details of the endline results we have seen, because they are not yet finalized and because the finalized results will be embargoed prior to publication. DMI believes that there were serious problems with endline data collection (note that we have not yet tried to independently assess this claim). With the support of the trial’s Independent Scientific Advisory Committee, DMI is planning to conduct another endline survey in late 2016, with results available in 2017.We are impressed by DMI’s openness with us about its results (and its willingness for us to share the high-level summary), and we hope to have discussions with DMI about how it might be able to work toward becoming a top charity in the future. Our full review of DMI is here. • GAIN-USI. GAIN’s Universal Salt Iodization (USI) program supports national salt iodization programs. There is strong evidence that salt iodization programs have a significant, positive effect on children’s cognitive development. GAIN-USI does not work directly to iodize salt; rather, it supports governments and private companies to do so, which could lead to leveraged impact of donations or to low impact, depending on its effectiveness. Last year, we wrote, “We tried but were unable to document a demonstrable track record of impact; we believe it may have had significant impacts, but we are unable to be confident in this with what we know now. More investigation next year could change this picture.” In 2015, we continued our assessment of GAIN, focusing on its work in India and Ethiopia, including a site visit to Ethiopia in July.Overall, we tried but were unable to establish clear evidence of GAIN successfully contributing to the impact of iodization programs. This is primarily due to (a) the difficulty in attributing impact to specific activities that GAIN carried out and (b) challenges we have had communicating with GAIN about its work. We have not yet completed our final report on GAIN but hope to publish it in the near future. We have published notes from some of the conversations that were part of this research and they are available here. Our 2014 review of GAIN is here. • IGN. Like GAIN-USI, IGN supports (via advocacy and technical assistance rather than implementation) salt iodization, and as with GAIN-USI, we tried but were unable to establish clear evidence of IGN successfully contributing to the impact of iodization programs. Unlike GAIN-USI, IGN is small, operating on a budget of approximately$0.5-$1 million per year, and relies heavily on volunteer time. We are planning to post an updated review in the near future. Our 2014 review of IGN is here. • Living Goods recruits, trains, and manages a network of community health promoters who sell health and household goods door-to-door in Uganda and Kenya and provide basic health counseling. They sell products such as treatments for malaria and diarrhea, fortified foods, water filters, bednets, clean cookstoves, and solar lights. Living Goods completed a randomized controlled trial of its program and measured a 27% reduction in child mortality. We estimate that Living Goods saves a life for roughly each$10,000 it spends, approximately 3 times as much as our estimate for the cost per life saved of AMF’s program. We spoke with Living Goods and reviewed documents about their progress in 2015. We do not have major updates to report but are planning to post an updated review in the near future. Our 2014 review of Living Goods is here.

Our research process in 2015

This section describes the new work we did in 2015 to supplement our previous work on defining and identifying top charities. See the process page on our website for our overall process.

This year, we did not put a substantial amount of senior staff time into new top charities research work because (a) we were largely focused on building capacity, and (b) we reallocated a significant amount of capacity to the Open Philanthropy Project (see our post on our plans for 2015 for more details).

We focused the bulk of our research capacity for top charities work on staying up-to-date on our recommended charities. We also did an intensive evaluation of GAIN-USI, including a site visit (more details forthcoming).

We completed investigations of vitamin A supplementation and maternal and neonatal tetanus immunization campaigns. Both programs seem potentially competitive with our other priority programs, but we were not able to identify charities that worked on these programs that were willing to apply for a recommendation. We also made substantial progress on investigating several other programs, such as measles immunization, meningitis A vaccination, folic acid fortification, voluntary medical male circumcision for the prevention of HIV, and “Targeting the Ultra-Poor” (or “Ultra-Poor Graduation”) programs.

We stayed up to date on the research for bednets, cash transfers, and deworming.

We did not conduct an extensive search for new charities this year. We feel that we have a relatively good understanding of the existing charities that could potentially meet our criteria, based on past searches (see the process page on our website for more information). Instead, we solicited applications from organizations that we viewed as contenders for recommendations. A March post laid out which organizations we were hoping to investigate and why.

We did some initial research on several charities that we had not investigated before, but we did not complete the reviews in time for our 2015 recommendations. The organizations that we began investigating were:

We plan to complete these reviews in 2016.

Giving to GiveWell vs. top charities

We have grown significantly over the past few years and continue to raise funds to support our operations. This includes work on GiveWell’s top charities and the Open Philanthropy Project.

We plan to post an update on our funding situation before the end of the year.

The most up-to-date information available on this topic is linked from our June 2015 board meeting. The short story is that we are still seeking additional donations and encourage donors who feel they are sufficiently confident in our impact to give to us.

Footnotes:

* For example, if $30 million were available to fund gaps of$10 million, $5 million, and$100 million, we would recommend allocating the funds so that the $10 million and$5 million gaps were fully filled and the $100 million gap received$15 million.

This rule is material to the three gaps tied at priority level 2. It causes us to recommend that Good Ventures’ last $28.3 million to recommended charities is used to fully fill GiveDirectly’s$8.8 million capacity-relevant gap and Deworm the World’s $3.2 million Execution Level 2 (possible capacity-relevant) gap, but only fill$16.3 million of AMF’s Execution Level 1 gap.

** This gap can’t be cleanly classified because we think the funding is relatively unlikely to be needed, but if it is needed, it is likely to have capacity-relevant effects. Thus, it is technically classified as Execution Level 2, but we think it has similar value to Execution Level 1.

The post Our updated top charities for giving season 2015 appeared first on The GiveWell Blog.

### New deworming reanalyses and Cochrane review

Fri, 07/24/2015 - 13:54

On Wednesday, the International Journal of Epidemiology published two new reanalyses of Miguel and Kremer 2004, the most well-known randomized trial of deworming. Deworming is an intervention conducted by two of our top charities, so we’ve read the reanalyses and the simultaneously updated Cochrane review closely and are responding publicly. We still have a few remaining questions about the reanalyses, and have not had a chance to update much of the content on the rest of our website regarding these issues, but our current view is that these new papers do not change our overall assessment of the evidence on deworming, and we continue to recommend the Schistosomiasis Control Initiative and the Deworm the World Initiative.

Key points:

• We’re very much in support of replicating and stress-testing important studies like this one. We did our own reanalysis of the study in question in 2012, and the replication released recently is more thorough and identifies errors that we did not.
• We don’t think the two replications bear on the most important parts of the case we see for deworming. Both focus on Miguel and Kremer 2004, which examines impacts of deworming on school attendance; in our view, the more important case for deworming comes from a later study that found impacts on earnings many years later. The school attendance finding provides a possible mechanism through which deworming might have improved later-in-life earnings; this is important, because (as stated below) the mechanism is a serious question.
• However, the replications do not directly challenge the existence of an attendance effect either. One primarily challenges the finding of externalities (effects of treatment on untreated students, possibly via reducing e.g. contaminated soil and water) at a particular distance. The other challenges both the statistical significance and the size of the main effect for attendance but we believe is best read as finding significant evidence for a smaller attendance effect. Regardless, the results we see as most important, particularly on income later in life, are not affected.
• The updated Cochrane review seems broadly consistent with the earlier version, which we wrote about in 2012. We agree with its finding that there is little sign of short-term impacts of deworming on health indicators (e.g., weight and anemia) or test scores, and, as we have previously noted, we believe that this does undermine – but does not eliminate – the plausibility of the effect on earnings.
• In our view, the best reasons to be skeptical about the evidence for deworming pertain to external validity, particularly related to the occurrence of El Nino during the period of study, which we have written about elsewhere. These issues are not addressed in the recent releases.
• At the same time, because mass deworming is so cheap, there is a good case for donating to support deworming even when in substantial doubt about the evidence. This has consistently been our position since we first recommend the Schistosomiasis Control Initiative in 2011. Our current cost-effectiveness model (which balances the doubts we have about the evidence with the cost of implementing the program) is here.
• While we think that replicating and challenging studies is a good thing, it looks in this case like there was an aggressive media push – publication of two papers at once coinciding with an update of the Cochrane review and a Buzzfeed piece, all on the same day – that we think has contributed to people exaggerating the significance of the findings.

Details follow. We also recommend the comments on this issue by Chris Blattman (whose post has an interesting comment thread) and Berk Ozler.

The reanalyses of Miguel and Kremer 2004Aiken et al. 2015 and Davey et al. 2015 participated in a replication program hosted by the International Initiative for Impact Evaluation (3ie), in which Miguel and Kremer shared the data from their trials and Aiken, Davey and colleagues reanalysed them. Working paper versions of these reanalyses were published on the 3ie website dated October 2014, and Joan Hamory Hicks, Miguel and Kremer responded to both of them there. The World Bank’s Berk Ozler wrote a blog post in January reviewing the reanalyses and Hicks, Miguel, and Kremer’s replies.

Aiken et al. 2015 straightforwardly attempts to replicate Miguel and Kremer 2004’s results from data and code shared by the authors. They do a much more thorough job than when we attempted something similar in 2012, and find a number of errors.

Amongst a number of smaller issues, Aiken et al. find a coding error in Miguel and Kremer’s estimate of the externality impacts of deworming on students in nearby schools, in which Miguel and Kremer only counted the population of the nearest 12 schools. That coding error substantially changes estimates of the impact of deworming on both the prevalence of worm infections in nearby schools and the attendance of students in nearby schools, particularly estimates of the impact of further out schools, between 3 and 6 km away.

Aiken et al. state: “Having corrected these errors, re-analysis found no statistically significant indirect-between-school effect on the worm infection out- come, according to the analysis methods originally used. However, among variables used to construct this effect, a parameter describing the effect of Group 1 living within 0–3 km did remain significant, albeit at a slightly smaller size (original -0.26, SE 0.09, significant at 95% confidence level; updated -0.21, SE 0.10, significant at 95% confidence). The corresponding parameter for the 3–6- km distances became much smaller and statistically insignificant (original -0.14, SE 0.06, significant at 90% confidence; updated -0.05, SE 0.08, not statistically significant).” Aiken et al.’s supplementary material and Hicks, Miguel, and Kremer’s response to the 3ie replication working paper clarifies this explanation. In short, fixing the coding error does not much affect estimates of the externality within 3 km of treatment schools, but does significantly change estimated externalities between 3 and 6 km out, and following the original Miguel and Kremer 2004 process for synthesizing those estimates into an overall estimate of the cross-school externality on worm prevalence, the resulting figure is not statistically significant. However, if you simply drop the 3-6 km externality estimate, which is now negative and no longer statistically significant, then you continue to see a statistically significant cross-school externality (see the second to last row of Table 1).

The same coding error also affects estimates of the externality effect on school attendance, in a broadly similar way. Aiken et al. write: “Correction of all coding errors in Table IX thus led to the major discrepancies shown in Table 3. The indirect-between-school effect [on attendance] was substantially reduced (from +2.0% to -1.7%) with an increased standard error (from 1.3% to 3.0%) making the result non-significant. The total effect on school attendance was also substantially reduced (from 7.5% to 3.9% absolute improvement), making it only slightly more than one standard error interval away [from] zero, hence also non-significant.” The correction to the coding error significantly increases the standard error of the 3-6km externality estimate, which then increases the standard error of the overall estimate significantly. The increased uncertainty, rather than the change in the point estimate of the externality, is what drives the conclusion that the total effect on school attendance is no longer statistically significant. As in the prevalence externality case, dropping the 3-6km estimate altogether preserves a statistically significant cross-school externality (and total effect).

We are uncertain about what to believe about the externality terms at this point. It seems fairly clear that had Miguel and Kremer caught the coding error prior to publication, their paper would have ignored potential externalities beyond 3km, and the replication done today would have found that the analysis up to 3km was broadly right. The replication penalizes the paper for having initially (incorrectly) found externalities further out. While we continue to be worried about the possibility of specification searching in the externality terms, and we see a case for treating the initial paper as a form of preregistration, we don’t see it as at all obvious that we should penalize the Miguel and Kremer results in the way that Aiken et al. suggest.

The Aiken et al. replication, like the original paper, finds no evidence of an impact on test scores.

Davey et al. 2015 is a more interpretive reanalysis, in which the authors use a more “epidemiological” analytical approach to reanalyze the data. The abstract states:

Results: Quasi-randomization resulted in three similar groups of 25 schools. There was a substantial amount of missing data. In year-stratified cluster-summary analysis, there was no clear evidence for improvement in either school attendance or examination performance. In year-stratified regression models, there was some evidence of improvement in school attendance [adjusted odds ratios (aOR): year 1: 1.48, 95% confidence interval (CI) 0.88–2.52, P = 0.150; year 2: 1.23, 95% CI 1.01–1.51, P = 0.044], but not examination performance (adjusted differences: year 1: −0.135, 95% CI −0.323–0.054, P = 0.161; year 2: −0.017, 95% CI −0.201–0.166, P = 0.854). When both years were combined, there was strong evidence of an effect on attendance (aOR 1.82, 95% CI 1.74–1.91, P < 0.001), but not examination performance (adjusted difference −0.121, 95% CI −0.293–0.052, P = 0.169).
Conclusions: The evidence supporting an improvement in school attendance differed by analysis method. This, and various other important limitations of the data, caution against over-interpretation of the results. We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance.

Reviewing the key conclusions in order:

• “In year-stratified cluster-summary analysis, there was no clear evidence for improvement in either school attendance or examination performance.” The results of the year-stratified cluster-summary analysis are substantively the same as the results of the year-stratified regression models that Davey et al. use (next bullet), with wider confidence intervals resulting from the reduction in sample size of caused by using unweighted school-level data (N=75). Table 2 reports a 5.5 percentage point impact on attendance in 1998 (corresponding to an odds ratio of 1.78) and a 2.2 percentage point impact for 1999 (corresponding to an odds ratio of 1.21). Davey et al.’s regressions find an odds ratio for 1998 of 1.77 (unadjusted, p=0.097) or 1.48 (adjusted, p=0.150) and for 1999 of 1.23 (unadjusted, p=0.047, or adjusted, p=0.044), i.e. the same point estimates with tighter confidence intervals. We don’t see it as surprising or problematic that collapsing a large cluster-randomized trials’ data to the cluster level results in a loss of statistical significance.
• “In year-stratified regression models, there was some evidence of improvement in school attendance [adjusted odds ratios (aOR): year 1: 1.48, 95% confidence interval (CI) 0.88–2.52, P = 0.150; year 2: 1.23, 95% CI 1.01–1.51, P = 0.044], but not examination performance (adjusted differences: year 1: −0.135, 95% CI −0.323–0.054, P = 0.161; year 2: −0.017, 95% CI −0.201–0.166, P = 0.854).” The lack of a result on exam performance echoes Miguel and Kremer 2004’s results. The “some evidence of improvement” result for school attendance is more striking, since the year 2 results are positive and statistically significant while the year 1 results are more positive but not statistically significant (due to a wider confidence interval). We read this as the test in year 1 being underpowered; treating years 1 and 2 as two independent randomized control trials, a fixed-effects meta-analysis would find a statistically significant overall effect.
• “When both years were combined, there was strong evidence of an effect on attendance (aOR 1.82, 95% CI 1.74–1.91, P < 0.001), but not examination performance (adjusted difference −0.121, 95% CI −0.293–0.052, P = 0.169).” These results accord with the Miguel and Kremer 2004 results.
• “We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance.” The authors make two main arguments for the high risk of bias. First, they note (in Figure 3) that the correlation across schools between attendance rates and the number of attendance observations appears to differ across the treatment and control groups, with a broad tendency towards positive correlation between observations and attendance rates in the intervention group and a negative correlation in the control group, which would lead to estimates weighted by the number of observations to overestimate the true impact. However, we see three reasons not to regard this evidence as particularly problematic:
• Hicks, Miguel, and Kremer report conducting a test for the claimed change in the correlation and finding a non-statistically significant result (page 9). As far as we know, Davey et al. have not responded to this point, though we think it is possible that Hicks, Miguel, and Kremer’s test is underpowered.
• As noted above, the unweighted (year-stratified cluster-summary) estimates are not lower than the year-stratified regression models (which Davey et al. report do weight by observation–“we used random-effects regression on school attendance observations, an approach which gives greater weight to clusters with higher numbers of observations”), they just have wider confidence intervals. In order for the observed correlation to be biasing the weighted results, the weighted estimates would need to be meaningfully different from the unweighted ones, which is not the case here. Accordingly, we see little reason even in Davey et al.’s framework for preferring the less precise year-stratified cluster-summary results to the year-stratified regressions, which use significantly more information to reach virtually the same point estimates.
• Hicks, Miguel, and Kremer report results weighted by pupil instead of observation (Table 3), and find results strongly consistent with their attendance-weighted results, without the risk of being biased by attendance observations. However, their results imply treatment effects that are larger than the odds ratios reported in Davey et al.’s year-stratified regression models, which Davey et al. report do weight by observation. We’re not sure what to make of this discrepancy, and we haven’t see Davey et al. respond on this point.

Second, and relatedly, Davey et al. note that the estimated attendance effect in the combined years analysis is larger than in either of the underlying years, and they suggest that the change is due to the inclusion of a before-after comparison for Group 2 (which switched from control in year one to treatment in year two) in the purportedly experimental analysis. We see this concern as more plausible, and don’t have a conclusive view on it at this point, but we think it would affect the magnitude of the observed effect rather than its existence (since we read the year-stratified regressions, which are not subject to this potential bias, as supporting an impact on attendance).

To summarize, we see no reason even based on Davey et al.’s own choices to prefer the year-stratified cluster-summary, which discards a significant amount of information, to the year-stratified regression models, which together point to a statistically significant impact on attendance. Hicks, Miguel, and Kremer make a variety of other arguments against decisions made by Davey et al., and they, along with Blattman and Ozler, argue that many of the changes are jointly necessary to yield non-significant results. We haven’t considered this claim fully because we see the Davey et al. results as supporting a statistically significant attendance impact, but if we turn out to be wrong about that, it would be important to more fully weigh the other deviations they make from Miguel and Kremer’s approach in reaching a conclusion.

School attendance data has never played a major role in our view about deworming (more on our views below), but we see little reason based on these re-analyses to doubt the Miguel and Kremer 2004 result that deworming significantly improved attendance in their experiment. We see much more reason to be worried about external validity, particularly related to the occurrence of El Nino during the period of study, which we have written about elsewhere.

The new Cochrane ReviewThe new Cochrane review on deworming reaches largely the same conclusions as the 2012 update, which we have discussed previously.

The new review incorporates the Aiken et al. and Davey et al. replications of Miguel and Kremer 2004 and the results of the large DEVTA trial, but continues to exclude Baird et al. 2011, Croke 2014, and Ozier 2011.

We agree with the general bottom line that there is little evidence for any biological mechanism linking deworming to longer term outcomes, and that that should significantly reduce one’s confidence in any claimed long-term effects of deworming. However, the Cochrane authors make some editorial judgments we don’t agree with.

They state:

• “The replication highlights important coding errors and this resulted in a number of changes to the results: the previously reported effect on anaemia disappeared; the effect on school attendance was similar to the original analysis, although the effect was seen in both children that received the drug and those that did not; and the indirect effects (externalities) of the intervention on adjacent schools disappeared (Aiken 2015).” As described above, in summarizing the results of Aiken et al. 2015, we would have noted that estimated cross-school externalities remain statistically significant in the 0-3km range.
• “The statistical replication suggested some impact of the complex intervention (deworming and health promotion) on school attendance, but this varied depending on the analysis strategy, and there was a high risk of bias. The replication showed no effect on exam performance (Davey 2015).” We think it is misleading to summarize the results as “[impact on school attendance] varied depending on the analysis strategy, and there was a high risk of bias.” Our read is that Davey et al. reported some analyses in which they discarded a significant amount of information and accordingly lost statistical significance, but found attendance impacts that were consistently positive and of the same magnitude (and statistically significant in analyses that preserved information).
• “There have been some recent trials on long-term follow-up, none of which met the quality criteria needed in order to be included in this review (Baird 2011; Croke 2014; Ozier 2011; described in Characteristics of excluded studies). Baird 2011 and Ozier 2011 are follow-up trials of the Miguel 2004 (Cluster) trial. Ozier 2011 studied children in the vicinity of the Miguel 2004 (Cluster) to assess long-term impacts of the externalities (impacts on untreated children). However, in the replication trials (Aiken 2014; Aiken 2015; Davey 2015), these spill-over effects were no longer present, raising questions about the validity of a long-term follow-up.” This last sentence seems problematic from multiple perspectives:
• Davey et al. 2015 does not mention or look for externalities or spill-over effects.
• Aiken et al. 2015 replicates Miguel and Kremer 2004’s finding of a statistically significant externality within 0-3 km, so summarizing it as “these spill-over effects were no longer present” seems to be an over-simplification.
• The lack of geographic externality is a particularly unpersuasive explanation for excluding Ozier 2011, which focuses on spill-over effects to younger siblings of children who were assigned to deworming, especially given that Aiken et al. confirm Miguel and Kremer’s finding of within-school externalities (which seems more similar to the siblings case). More generally, the fact that one study failed to find a result seems like a bad reason to exclude a follow-up study to it that did.

More generally, we agree with many of the conclusions of the Cochrane review, but excluding some of the most important studies on a topic because they eventually treated the control group seems misguided. Doing so structurally excludes virtually all long-term follow-ups, since they are often ethically required to eventually treat their control groups.

Our case for dewormingAs we wrote in 2012, the last time the Cochrane review on deworming was updated, our review of deworming focuses on three kinds of benefits:

• General health impacts, especially on haemoglobin. We currently conclude, partly based on the last edition of the Cochrane review: “Evidence for the impact of deworming on short-term general health is thin, especially for soil-transmitted helminth (STH)-only deworming. Most of the potential effects are relatively small, the evidence is mixed, and different approaches have varied effects. We would guess that deworming populations with schistosomiasis and STH (combination deworming) does have some small impacts on general health, but do not believe it has a large impact on health in most cases. We are uncertain that STH-only deworming affects general health.” This last claim continues to be in line with Cochrane’s updated finding of no impact of STH-only deworming on haemoglobin and most other short-term outcomes.
• Prevention of potentially severe effects, such as intestinal obstruction. These effects are rare and play a relatively small role in our position on deworming.
• Developmental impacts, particularly on income later in life. The new Cochrane review continues to exclude the studies we see as key to this question. Bleakley 2004 is outside of the scope of the Cochrane review because it is not an experimental analysis, and Baird et al. 2011 is excluded because its control group eventually received treatment. However, as before, the Cochrane review does discuss Miguel and Kremer 2004, which underlies the Baird et al. 2011 follow-up; in their assessment of the risk of bias in included studies, Miguel and Kremer 2004 continues to be the worst-graded of the included trials. We also do not think that the Aiken et al. or Davey et al. papers should substantially affect our assessment of the Baird et al. 2011 results. Aiken et al.’s main finding is about the coding error affecting the 3-6km externality terms. I’m not clear on whether the coding error in the construction of the externality variable extends to Baird et al. 2011, but, regardless, the results we see as most important, particularly on income, do not rely on the externality term. Davey et al.’s key argument is against the combined analysis in which Group 2 is considered control in year one and treatment in year two. I remain uncertain about whether this worry is fundamentally correct, but Baird et al. is not subject to it because their estimates treat Group 2 as consistently part of the treatment group.

Nonetheless, we continue to have serious reservations about these studies and would counsel against taking them at face value.

We think it’s a particular mistake to analyze the evidence in this case without respect to the cost of the intervention. Table 4 of Baird et al. 2012 estimates that, not counting externalities, their results imply that deworming generates a net present value of $55.26, against an average cost of$1.07, i.e. that deworming is ~50 times more effective than cash transfers. We do not think it is appropriate to take estimates like these at face value or to expect them to generalize without adjustment, but the strong results leave significant room for cost-effectiveness to regress to the mean and still beat cash. In our cost-effectiveness model, we apply a number of ad-hoc adjustments to penalize for external validity and replicability concerns, and most of us continue to guess that deworming is more cost-effective than cash transfers, though of course these are judgment calls and we could easily be wrong.

The lack of a clear causal mechanism to connect deworming to longer term developmental outcomes is a significant and legitimate source of uncertainty as to whether deworming truly has any effect, and we do not think it would be inappropriate for more risk-averse donors to prefer to support other interventions instead, but we don’t agree with the Cochrane review’s conclusion that it’s the long-term evidence that is obviously mistaken in this case. (We have noted elsewhere that most claims for long-term impact seem to be subject to broadly similar problems.)

The importance of data sharing and replicationWe continue to believe that it is extremely valuable and important for authors to share their data and code, and we appreciate that Miguel and Kremer did so in this case. We’re also glad to see the record corrected regarding the 3-6km externality terms in Miguel and Kremer 2004. But our overall impression is that this is a case in which the replication process has brought more heat than light. We hope that the research community can develop stronger norms supporting data sharing and replication in the future.

The post New deworming reanalyses and Cochrane review appeared first on The GiveWell Blog.