Evaluating the Millennium Villages: The saga continues

Michael Clemens and Gabriel Demombynes, both friends and colleagues, sprang a provocative paper on me while on vacation. The basic argument: the Millennium Villages could and should be evaluated. No excuses. Sachs and Macarthur and others at the MV project respond here. And the debate would not be complete without a cannonade from Aid Watch, who suggest that the MV folks are moving their goalposts.

Let me stake out a different position. I’ve blogged about the Millennium Villages before, on what an evaluation could look like, but also why a rigorous evaluation might be infeasible. While guest blogging, Julian Jamison commented that it might actually be detrimental to a progressive agenda to evaluate too much.

Now, Michael and Gabriel write a good paper, and do a much more thorough job than I ever did, and more or less convince me that a rigorous evaluation is possible. The evaluation they propose, however, is one that measures the aggregate effect of the villages — e.g. “do the MVs reduce poverty?”.

This should probably get done. But for me this is not really the interesting question.

First, if the Villages “work”, why is that? Maybe it’s the combination of schools plus clean water plus agricultural productivity gains plus clinics. Or maybe it’s the fact that the local and district and state governments all know that the world is watching this one cluster of villages. So the state basically works the way it’s supposed to: services get delivered, they don’t send the idiot ag extension officers over there, and someone makes sure the police chief is not a thug.

If the Villages work, perhaps it’s not the aid or the money or the technical expertise or all the other things that Westerners are in a position to give. Maybe it’s because there’s finally some accountability. And while we outsiders can do a decent job of making the government accountable for 10 villages, there’s not much we can do for the other 9,990 in the country. That’s up to the citizens themselves.

That to me is the interesting question. And to answer it, the alternative intervention we would need is not a “no Millennium Village intervention” but a “institutions and accountability” intervention. I call it the Villennium Millage project, and I am working to interest the Liberian government on something this as we speak. (I need all the luck you can wish me.)

The second interesting question is whether the whole is greater than the sum of its parts. This is (as I understand it) a fundamental premise of the MVs and (if I read him correctly) the fundamental development theory to which Jeff Sachs subscribes: that poverty is in part a bad equilibrium and needs a big push. Without coordinated action (schools plus clean water plus agricultural productivity gains plus clinics) you may as well be handing our right shoes to some villages and left shoes in others.

I don’t subscribe to the strong version of that theory (I view poverty alleviation as something that works on the margin) but there are almost certainly positive interactions between aid interventions. That would be a useful thing to test and understand.

We could get halfway if we could at least figure out whether the Millennium Villages have a level effect (you give more X, and you get more income), or if there’s a growth effect (you give more X, and you set a community off on a virtuous circle). Unfortunately I don’t think there’s enough statistical power (i.e. enough villages) to distinguish between level and growth effects.

Okay, so say we can’t evaluate all these more interesting questions. The “do the Millennium Villages work at all?” question is still an important one, right? Yes, but as economists, we should be thinking in terms of opportunity cost. Research opportunities and funds are not limitless in the real world. What’s the first and most important king of learning and research we’d like to see? I’m not convinced the answer is a randomized control trial.

The Millemium Promise CEO, John Macarthur, is also a friend of mine (we both worked for Sachs about ten years ago) and I still remember something he said to me: the learning that comes from the Millennium Villages is learning by doing. By trying to make these programs work together, and getting 10 or 15 villages right, the state learns how to do it better (mainly by screwing up along the way). It also shows to the rest of the country that it’s doable. No one questions that delivery of public services like roads and schools and agricultural extension and clinics are important. That is a no brainer. The point is to get the government moving and learning and practicing and showing off, in the hopes that it will spread.

By this argument, the kind of research we need to get built around the Millennium Villages are process consultants (to learn how to deliver services better) or people who think about how policy and institutional change actually happens, so the lessons and examples can spread beyong the example villages. We also need lots of close observation so that we can see why some interventions work well together, and why some don’t.

Speaking as someone who does rigorous program evaluation for a living, the idea that a rigorous program evaluation is the most useful evidence for most aid projects is kind of absurd. There are other paths to knowledge, especially practical knowledge, and most especially policy change.

When was the last time you met a prominent academic who said: “I study how good ideas actually get transformed into action.”? It doesn’t happen. The political economy of policy change is a field populated by probably six people we have never heard of. That is a sad state of affairs.

OK. Back to the MV project. They make a similar argument for process knowledge in their response to Michael and Gabriel, but they don’t get points for quality of sales pitch. Personally I would be convinced if the MV hit me with a barrage of amazing things they learned about how to do aid better, or examples of how good practices are spreading. These would ideally get peer reviewed, and would have a degree of independence and impartiality, as a check on the enthusiastic nature of the anecdote. (Apologies to the MV folks if this is out there.) I would still like to see a rigorous evaluation, if only to keep those gushing anecdotes in check, but it’s not my first concern.

Personally, I think this is a great debate that gets to the root of research in development, and where smart policy people should be focusing their energies (especially as the randomized control trial fad starts to fade).

Sadly, rumor has it that the open debate between Sachs and Clemens/Demombynes at the World Bank has been cancelled. (Aid Watch has that scoop.) If so, that is a shame. I would have liked to see that debate. I’d like to know why it was cancelled more.

In my experience, organizations like the Bank and the MVs operate with a “bad stink minimization” strategy, so calling attention to the quiet cancelling of an important public debate could have good effect. Pass it on, and hope for some courage and conviction to appear on all sides.

12 thoughts on “Evaluating the Millennium Villages: The saga continues

  1. Chris these are great points. I couldn’t agree more that there are many other interesting evaluation questions besides the overall impact, including the two good questions you highlight—the mechanisms and the synergies.

    A question is empirically interesting when there’s a low-variance prior that might be far from what a good new theory predicts, or when priors vary all over the place to include lots of existing theories. The overall impact of the Millennium Village Project definitely passes this test of ‘interesting’, because the Project’s prior is extremely narrow (it asserts with certainty that its intervention is the “solution to extreme poverty”) but a long experience of localized, intensive package interventions suggests different priors, as we review in the paper. That said, many other questions about the Project clearly pass the ‘interesting’ test, as you rightly point out.

    By the way, we’ve posted a response to the Project’s official statement on our paper. Our response points out that much of the Project’s statement reflects a remarkably basic misunderstanding of what a project impact evaluation is.

  2. Chris,
    Thanks very much for the thoughtful consideration of the paper.

    Yes, as we note in the paper, the MVP is also doing a process evaluation (we mention on p. 15 of the paper). But our paper is exclusively about the MVP’s impact evaluation and not about process evaluation (we explain this on p.2).

    I largely see their emphasis on the process evaluation to be an instance of moving the goalposts that Laura Freschi writes about. The MVP predicted huge impacts when the project was just starting, they claimed huge impacts before they had any data, they interpret the changes based on before-and-after comparisons as the project’s “impacts,” and they have a detailed evaluation protocol which explains how the rest of their planned impact evaluation. So the notion that it’s just me and Michael, the point-headed research-types, with an obsessive focus on impact–that’s incorrect. We focus on impact because that’s what the MVP has emphasized.

    The question of “Is a rigorous impact evaluation of the MVP worth the cost?” might have a little bit of bite if the MVP wasn’t already doing a costly (but poorly designed) impact evaluation. What Michael and I propose is almost exactly what they are already doing, but with several essential modifications: 1) randomized selection of treatment/control between matched pairs, rather than arbitrary treatment site selection and ex-post control site selection, 2) baseline data for control sites, 3) 20 pairs rather than 10 to get adequate statistical power, and 4) a long-term evaluation plan, rather than one ending at 5 years.

    If they had done what we propose from the beginning, the marginal cost of making the evaluation rigorous would have been low. Going forward, we calculate that doing a rigorous evaluation of the next 20 MVP sites would cost about 16% of the intervention cost per site, using a conservatively high estimate (p.23). (We don’t say this in the paper, but these costs are spread over 15 years, so in net present value terms they would be much less.)

    I always find the general point of “rigorous evaluation can’t answer all the interesting questions” a bit maddening. If I’m thinking about buying a bicycle, I would think about all the places I could go on that bicycle: the park, the library, the ice cream store. If someone told me “But you can’t ride your bicycle to the moon!” it’s not like I would say “Good point, forget the stupid bicycle.”

    With the MVP, the first and most important question is whether it’s having impacts on the order of what its promoters have imagined. And given the track record of similar programs, it is by no means a foregone conclusion that it will.

    We look forward to presenting the paper sometime soon, hopefully with someone from the MVP as discussant. Given my other commitments, this won’t be before December.

  3. These are good points, but if we’re going to generate a fuss, let’s generate a fuss about the important and fundamental questions. NGOs put all sorts of wacky claims in their materials. These might be useful to evaluate, but are the not first priority for moving forward. So rake NGOs over the coals for overselling their impact in their PR campaigns, but let’s write analytical papers about the important questions to answer and how to answer them.

    The big problem with 90% of RCTs is that they evaluate a specific program, and tell us very little about the premise or theory upon which that program was based. So external validity is pretty much nil.

    This is not asking bicycles to take us to the moon. It’s asking bicycles to actually take us to the ice cream store. A simple RCT of the MV project takes us to a gravel pit..

  4. The tone and terms of this evolving debate are most encouraging. It’s important to assess the extent to which the empirical claims made by the MVP leadership (now, and/or at the outset of the project) regarding effiacy are being realized, but I agree with Chris that an even more important task (for MVP and the rest of us) is getting a better sense of how (as opposed to just whether) any outcomes were or were not achieved. Michael and Gabriel’s push for a long-term assessment of MVP is also vital, for the reasons they articulate but also because knowledge of the impact trajectory of development interventions of any kind is (as I’ve written about) woefully thin, empirically and theoretically, yet crucial to accurate causal inference. Keep up the good work!

  5. @Gabriel: You say that you “…always find the general point of ‘rigorous evaluation can’t answer all the interesting questions’ a bit maddening.” But, I think this misses the point. As I pointed out to you guys in my comments before and now Chris has repeated in his blog, perhaps we’re disagreeing that (i) the evaluation of the MVP as proposed IS interesting, and (ii) whether it’s worth all the cost and effort. We’re not saying that evaluations should answer ALL interesting questions, but rather that they should answer AT LEAST ONE, and preferably the most pertinent one. From the description, that question here seems to me to be the synergy/big bang question as I pointed out before.

    As an aside, Chris’ blog confirmed to me that I should not be blogging myself. Someone will eventually say what I would have said, and perhaps more elegantly, and the minor delay is not costly to the development community. Thanks, Chris!

  6. Since they’re dropping by to chat, perhaps Clemens and/or Demombynes could comment on Chris’s last point on the cancellation of the Bank event.

  7. @Chris: The big claims for the MVP are not just in their promotional materials: they’re all over the place in the MVP documents. Just to take one small example where they are extremely specific: one of the principal hypotheses in their impact evaluation protocol is that the MVP will cut child mortality in half at the MVP sites.

    I’d still stick to my guns in saying that the first and most important question is whether the MVP is working to achieve its goals. The most consistent thing they’ve said is that they are aiming to break the MV sites out of the poverty trap and achieve the MDGs. The reason it’s important to know if the MV sites are achieving this goal is because they’ve advocated using the same approach–massive infusions of aid into rural communities–as the model for aid.

    @Berk: I agree the synergies question is interesting. To get at that, you would need a large number of sites (100+), with varied treatment across sites. Given their current setup–some large sites with 50K+ plus people and some with 5000 people–it would be too massive an undertaking and too expensive to have a large number of sites. But it’s conceivable that it could be done if the sites were all restricted to be on the small side.

  8. One more thing: I should mention that I wholeheartedly agree with the general point–which Michael Woolcock has written about eloquently–that there is a lot to be learned from lots of different kinds of evaluation, and I don’t by any means want to suggest that rigorous impact evaluation (the subject of our paper) is the only kind of evaluation worth doing. The PROGRESA case is yet again a good example on this front: the project was famous for its randomized impact evaluation design, but there was a lot of useful process and qualitative work done as part of the program as well.

  9. @Gabriel: I don’t think that’s exactly right. I outlined two theories that underpin the MV project: positive interactions between different poverty alleviation programs, and the role of accountability.

    You’re right that these would be difficult to evaluate in the MVs because of sample size, etc. But one could straightforwardly design interventions that are not the MVs, that have the right sample size and design, and that are more targeted towards these two questions–in principle two of the most fundamental questions in poverty alleviation.

    Doing so is not necessarily the job of the MVs. Whose job is it? If only there were a development and research organization that was five times larger than every other development and research organization on the planet.

    Wait a second…

    Indeed, I think this is exactly the kind of thing the World Bank should be talking about doing. Unfortunately, the Bank is completely wrapped up in running 600 rather minor, individual project evaluations with an external validity of zero. (I would count the Bank evaluations I am running chief among these. )

    This is not an argument for the Bank to stop running these project evaluations. We do learn a great deal. But (from my vantage point) there is little vision or energy behind a bigger and more ambitious randomized evaluation agenda that would answer fundamental questions in development like the ones I posed above.

    That would be a great follow-up paper to your current one.

    OK. Now that I have managed to grievously offend my friends, donors, project partners, and people who will one day write my tenure recommendation letters, I will sign off.

  10. Chris: I agree that the interactions question is interesting. Is this really the big unexplored terrain in impact evaluation? I’m not sure, but I’ll give it some thought.

    I’m not sure I understand the accountability point. You’re saying you think the MVP might be creating more accountability? For whom? If anything, my hypothesis would be that it undermines credibility of govt. officials, since the project largely bypasses them.

  11. Thanks Chris, this is a great discussion. You write, “NGOs put all sorts of wacky claims in their materials.” I agree completely. But the Millennium Village Project is not just another NGO project. Every official publication of the MVP has the name Columbia University stamped on the cover, the wacky claims you refer to have been made in the Proceedings of the National Academy of Sciences, and so on. These statements are knowingly cloaked in academic and scientific garb, and for that reason must be held to academic and scientific standards.