The Millennium Villages, evaluated? A skeptical view

Michael Clemens highlights a new paper by Wanjala and Muradian, who do a clandestine evaluation of Kenya’s Millennium Villages.

The result does not look good for the MVs:

While Wanjala and Muradian find that the project caused a 70% increase in agricultural productivity among the treated households, tending to increase household income, it also caused less diversification of household economic activity into profitable non-farm employment, tending to decrease household income. These countervailing effects are precisely what one might expect from a large and intensive subsidy to agricultural activity. On balance, households that received this large and intensive intervention have no more income today than households that did not receive the intervention.

The only problem: I’m not sure I believe it.

This blog has had long conversations on evaluating the MVs. Before I get into my reading of the paper, do I have a secret side? Not really. I think it’s fair to say I’m an MV agnostic. All the folks on both sides of the divide are close friends and colleagues. This could make me neutral arbiter (but mostly it makes me disappointing to both).

In spite of this, I decided I’d take up the paper as if I were refereeing for a journal, dispassionately as possible (if somewhat hasty in my reading). So put on your propeller hats, folks, and join me under the fold.

My basic concern: there are a number of things that don’t add up in the paper. They could be just fine once clarified, but I worry not.

Let’s begin at the beginning: summary statistics. Before the authors make their adjustments, incomes are actually higher and poverty is lower among villagers in the Millennium Villages (MVs).

  • To see this, take a look at the simple differences between MV and non-MV people (Table 3). The MV people are 14 percentage points more likely to have a higher quality home (one of the best quick indicators of poverty, however imperfect) and 12 percentage points more like to have land. So it looks like durable assets may be greater by a third in the MVs.
  • Also, Most sources of income are the same or greater in the MVs than in the non-MVs. At least before the adjustments.
  • There doesn’t appear to be much of a difference in total income, however. But is this an error? The sub-components of income add up to the total among the MV people, but not in among the non-MV. If an error, then income is about 10% higher in the MVs on average.
  • Finally, the MV people are 14 percentage points more likely to be engaged in agriculture, but agriculture is not an either/or proposition. Most households engage in many activities, and are underemployed to begin with, and so increased time and productivity in agriculture does not necessarily crowd out small business or wage income. So I’m not even sure what to make of these indicators.

Now, you might say, “Hey, we’re not comparing apples to apples. Maybe the MVP folks were richer to begin with.” You’d be right. This is what motivates the authors’ matching method: Let’s match MV people to similar-looking non-MV folks.

This is a great idea if you can match on the right things. But is that the case?

  • Usually you want to match on pre-program characteristics like initial income or prior agricultural work. Better yet, you want to match on pre-program trends, not levels. This way you avoid matching someone on a downswing to someone on an upswing who happen (at that particular moment in time) to have similar levels. Almost no one does this, but they should.
  • The authors can’t do either without pre-program data. So (as far as I can tell) they match on post-program data, like employment status and housing quality and agricultural employment. What this means is…. wait a second… didn’t we just find out that the MV people are different along most of these characteristics? I think we have a problem.
  • What does this mean? I suspect the authors are (unknowingly) taking non-MV people who are unemployed or in poor quality housing–possibly because they didn’t receive the MV project (who knows?)–and matching them to unemployed and poor quality housing MV people. If so, it’s no surprise that there is no difference in income, since they’ve controlled (in their matching) for the impact of the MVs.
  • Meanwhile, you’re comparing employed and richer people in the MVs to the same kind of people in the non-MVs. But the MVs purportedly bump people on the margins of employment and poverty into slightly more employment and riches (i.e. the matching variables flip). Quite possibly these newly-non-poor have lower incomes than the average non-MV employed person in a nice house. But is that because they have just gotten out of poverty, and have yet to rise?
  • What we would like to know is if previously poor and unemployed people have gotten more employed or less poor. I think the authors omit this possibility altogether, and so lose a lot of the potential power of the projects.

So, it’s not clear to me that the matched estimates really mean anything, since they match on things that we think are affected by the program. If anything, they seem to me to indicate the MV are better off even with the matching deck stacked against them:

  • First, the direction of the matched estimates suggest that self-employment and wage income is higher among MV people than non-MV people (even if not statistically significant). This seems at odds with a main claim of the paper.
  • Also, remittances from outside are way, way down in MV households. This is probably the best measured portion of income and (in my mind) a pretty good sign that MV households are better off.

I should stress that this is not a vindication of the Millennium Villages. The MV folks could be better off now because they were better off to begin with. Without a credible matching strategy or other research design, it’s very difficult to say.

Also, the real question is not whether the MVs reduce poverty. If you put in more inputs, you’ll get more outputs. That’s something we mostly know in aid at the micro level. If the MVs actually raise incomes by 10% and assets by a third, then they almost certainly pass a cost-benefit test. That is important.

The real assumption behind the MVs, however, is that the different interventions are complementary: the whole of poverty alleviation is greater than the sum of its parts. That is not at all clear from an evaluation like this.

My own theory of poverty is actually the opposite: there are diminishing marginal returns to aid in a single village. I believe in the possibility of increasing returns and complementarities, but mainly through broad, national institutional and technological change. I’m personally not convinced real poverty traps exist, or can be overcome, at the household or village level.

Before ending, a few other red flags in the paper, which may or may not be an issue:

  1. This is a small sample size (just over 400 households), but the real “smallness” comes from the fact that there are just 16 communities. Since the assignment to MVP was done at the community level, in some sense the sample size here is 16 and not 411. That’s not really true, but one does have to account for the fact that people within communities have similar outcomes and reactions to the MV or lack thereof (“clustering of standard errors”, in the lexicon). I can’t tell if this was done, but it looks like not. If not, the statistical significance of any differences is probably overstated, and none of the results are as significant as the paper says. I suspect sample sizes are too small to say whether there is an impact one way or the other.
  2. I’d like to know more about how household income was measured. This is famously difficult to capture when households have multiple, irregular income streams. Especially agricultural income, which should include consumption of own produce. A poor measure of income could be little better than noise, especially in small samples. More worrisome, since the MV people are more likely to be in agriculture, we might be systematically underestimating agricultural income and hence the effect of the MV project.
  3. Consumption and nutrition data would be one way to get around this issue. In fact, there’s a bunch of data I would love to see measurs: subjective well-being, distress and anxiety, social cohesion. These are all things I’ve seen impacted by aid in Uganda.
  4. I’d also like to see more detailed employment data, like hours instead of employment indicators. In rural Africa, there’s almost no such thing as “fully employed” or “unemployed”. It’s typically a matter of degree of underemployment.
  5. Matching estimates are famously sensitive to the matching method. These are not. In fact, some estimates change not at all. Probably this is because of the small sample size, but it’s a red flag for coding issues.

I could have been wrong in my reading of the paper, especially as it was hasty. Clarifications and corrections welcome.