Am I actually sticking up for the Millennium Villages?

These are tough questions that the Millennium Villages Project will leave unanswered. For a huge pilot project with so much money and support behind it, and one that specifically aims to be exemplary (to “show what success looks like”), this is a disappointment, and a wasted opportunity.

Laura Freschi takes aim at the Millennium Villages and their absence of rigorous evaluation in the Aid Watch blog. She also talks to my advisor and co-author:

Ted Miguel, head of the Center of Evaluation for Global Action at Berkeley, also said he would “hope to see a randomized impact evaluation, as the obvious, most scientifically rigorous approach, and one that is by now a standard part of the toolkit of most development economists. At a minimum I would have liked to see some sort of comparison group of nearby villages not directly affected by MVP but still subject to any relevant local economic/political ‘shocks,’ or use in a difference-in-differences analysis.”

Here’s the thing: I don’t know if rigorous evaluation is feasible with the MVs.

Usually the MVs are a cluster of perhaps 10 villages. This is, in some sense, a sample size of one (or 10, with high levels of cross-village correlation, which is not much of an improvement). Adding a few comparison clusters is informative, but they wouldn’t provide the rigor or precision we would like. (Josh Angrist and Alan Krueger demonstrated this famous flaw with difference-in-differences comparison of cities in the US).

If we did many more MV clusters, we might be able to test their impact more confidently. I think the MV guys would love to do this, but they have a hard enough time getting funding for a single cluster, let alone plenty.

Also, even if we looked at control villages, and saw an impact, what we would learn from it? “A gazillion dollars in aid and lots of government attention produces good outcomes.” Should this be shocking?

We wouldn’t be testing the fundamental premises: the theory of the big push; that high levels of aid simultaneously attacking many sectors and bottlenecks are needed to spur development; that there are positive interactions and externalities from multiple interventions.

The alternative hypothesis is that development is a gradual process, that marginal returns to aid may be high at low levels, and that we can also have a big impact with smaller, sector-specific interventions.

To test the big push and all these externalities, we’d need to measure marginal returns to many single interventions as well as these interventions in combination (to get at the externalities). I’m not sure the sample size exists that could do it.

I once joked with a friend at DFID that we should raise the money to try. I wanted to call them the ‘Villennium Millages’. Now that would be a career maker.

Aid Watch is right to ask for more learning and evaluation. But we shouldn’t ask for rigor that isn’t feasible unless we’re prepared to fund thirty clusters of MVs for them to give it an honest try.

In the meantime, there are other paths to learning what works and why. I’m willing to bet there is a lot of trial-and-error learning in the MVs that could be shared. If they’re writing up these findings, I haven’t seen them. I suspect they could do a much better job, and I suspect they agree. But we shouldn’t hold them to evaluation goals that, from the outset, are bound to fail.

Here is what I suggest: we use the MVs to develop hypotheses about what interventions work and in what combination–things we didn’t expect or didn’t know before. Then, if necessary, we go and test these more manageable claims in a nearby enviroment, to see if they’re worth scaling up. This seems like a more productive debate to have with the MVs.