Should the Millennium Villages be randomly evaluated?

Michael Clemens says yes, in a new CGD blog post:

there was no fundamental reason why the selection of treatment villages for the MVP could not have been randomized. There was certainly a large pool of candidate villages, and the people running the MVP are some of the most capable scientists on earth, so they are very familiar with these methods and why they matter.

But treatment selection was not random, and it may be too late to evaluate the initial 13 MVs scientifically. It would be very easy, however, to scientifically evaluate the next wave.

My take: yes, evaluate away, but we probably won’t learn much that is useful from a simple randomized control trial. I’ve written about this before:

even if we looked at control villages, and saw an impact, what we would learn from it? “A gazillion dollars in aid and lots of government attention produces good outcomes.” Should this be shocking?

We wouldn’t be testing the fundamental premises: the theory of the big push; that high levels of aid simultaneously attacking many sectors and bottlenecks are needed to spur development; that there are positive interactions and externalities from multiple interventions.

The alternative hypothesis is that development is a gradual process, that marginal returns to aid may be high at low levels, and that we can also have a big impact with smaller, sector-specific interventions.

To test the big push and all these externalities, we’d need to measure marginal returns to many single interventions as well as these interventions in combination (to get at the externalities). I’m not sure the sample size exists that could do it.

We may (*gasp*) have to resort to non-random, even non-quantitative evaluation. Surely this is going on. I’d say the real question to the MV’s is this: where’s the peer-reviewed evidence so far?

See my full post here.

8 Responses

  1. Having visited Sauri/Kenya in november 2006 and also a Millennium Village close to Zomba/Malawi in October 2009, I can see the enthusiasm of people involved. Changes being achieved.

    But can it be done without Sachs, Jolie and the world watching?

    I asked a Sauri citizen. He immediately shared my doubt on that but said that he was happy with everybody watching and supporting….. I would if was living in Sauri.
    But can Jack and Jill in rural wherever do it?
    Can they get local government moving on healthcare, electricity, roads and more.
    Not without lots of millennium money and everybody watching.
    There is no research needed to figure that out.

    The problem with MV is also that even if it ‘works’, can it be scaled up ?

    http://vanstokkom.blogspot.com/2007/03/rapid-victories-against-extreme-poverty.html

  2. Thanks for this Chris. I really agree with the big point you’ve argued before on this blog, that “Evaluation 2.0” requires all kinds of iterative and evolving learning, much of which is difficult to base on randomized evidence but can still be done rigorously. That said, I don’t agree that the “fundamental premise” of the MVs is untestable.

    The fundamental premise of the MVs is that a certain package of local interventions — together, as a package — can break village clusters out of poverty traps and place them on the “ladder of development”. That premise is testable and can be either supported or rejected without knowing which elements of the package were or were not responsible for a given change observed in the villages.

    What the MVP seeks to scale up is not different elements of the package in different places — schools in one place, fertilizer in other places — but rather what they seek to scale up is application of the package as a package Whether or not the *package* has placed village clusters on a long-term trajectory of economic growth is the evaluation question of interest for the MVs because that is their premise.

    If I understand correctly, you’re making the good point that we would learn a lot more about village-level development interventions if we could test the effectiveness of individual interventions in isolation against the effectiveness of bundles (packages) of individual interventions. I’m sure you’re absolutely right when you say that such complex tests are probably infeasible with the sample sizes of the MVP, and when you say that such tests would be more revealing than a zero-one test of whether a certain package is capable of lastingly freeing villages from poverty traps.

    But that doesn’t mean that such a zero-one test of the package per se is uninteresting or can’t be done. The MVP is making strong claims about the package, and testing what sort of lasting effects are attributable to the package per se is 1) possible with realistic sample sizes and 2) highly policy-relevant given the prominence and cost of the MVP.

  3. Indeed, +1 Amanda.

    “A gazillion dollars in aid produces good outcomes but they don’t last.” or “… don’t produce good” are still viable answers. That’s the fear, and thats what randomized evaluations can answer.

  4. what evidence do you have that ‘a gazillion dollars in aid and lots of government attention produces good outcomes’? are we even confident in this?

    you make a good point that subjecting the MVP to a RCT is hard. i agree that any kind of eval using a comparison group would be a step in the right direction. if this is really a ‘new approach to fighting poverty’ as the slogan suggests, we need to know if this does any good, and if the interventions merit the cost. (i know you know that)

    but here are two things going for a RCT of MVP: 1) even evaluating an interaction of some of the interventions would shed light on this approach/help in understanding the ‘big push.’ what if the eval interacted, say, five of the MVP interventions? that to me is better than a matching/qual eval (not saying that qual shouldnt be done too). maybe this is what you are suggesting w ‘testing manageable claims’? 2) some of the best minds in development economics do RCTs. these people care about what is happening with a ‘gazillion’ in aid dollars, and i am confident a meeting of smart minds could come up w sensible approaches to deal with the big push challenge.

    i hope the MVP team responds to you and michael. (i havent given up on sachs yet.)

  5. I think your argument is a sound one, Chris. But aside from the specific question about randomized evaluation, it seems like there’s a valid broader critique of the MVs that they don’t seem to have been designed for any rigorous evaluation at all. Shouldn’t arguments like yours and Clemmons’ and Freschi’s have been aired at the inception of this massive effort, rather than at this late point in the game? After all, these things are largely publicly funded.

  6. You are certainly right that we aren’t going to be detecting interesting results with N=13 villages. But I don’t think that we would have to measure the marginal impact of the individual interventions to learn anything useful; the individual components have already been measured elsewhere. The great opportunity of the MVs, then, would be to test packages of components. This would require assembling a package that you think will have interesting interactions, say fertilizer, savings products, and irrigation pumps. You could then phase in the packages, randomizing by individual, enterprise, school, location, etc. This would be somewhat counter to the big push idea that everyone gets everything at the same time, but everyone would get something, and everyone would get everything eventually. External validity may be low if the control is also part of an MV, so randomly selected control villages getting no services might be helpful as well. There may not be the power to test interactions between packages, but you could compare the package to existing evaluations of the components and come to a conclusion about that particular delivery module. This may not produce cutting-edge economics research, but at the very least there is a lot the MV movement itself could learn from program evaluations of groups of services as it scales up.

  7. There is a the “placebo effect” in aid interventions: Attention from the international community has an effect that is sometimes more important than the effect of the added funding or expertise. I don have data to back this up, only some circumstantial evidence, but I have seem it time and again: under the spotlight, communities, administrations come up with their own solutions, get their act together and find a spring in their step. Money and consultants can be the oil for the chain, but the motor is this aid placebo effect.
    With the Millennium villages, the spotlight is quite strong.