Three years ago I gave a talk at DFID called Impact Evaluation 2.0.
At the time my blog had a readership of about one (thanks, Mom!) and I never expected the talk would get around much, which is why I gave it such an arrogant title.
To my enduring surprise, some people actually read it. I have seen it circulated at conferences on impact evaluation, and to my horror someone was actually going to quote it in an academic article. (I persuaded them otherwise.)
The talk is in need of some serious revision. A paper is in the works. That probably means 2012.
My point in 2008: to talk about how impact evaluations could better serve the needs of policymakers, and accelerate learning.
Frankly, the benefits of the simple randomized control trial have been (in my opinion) overestimated. But with the right design and approach, they hold even more potential than has been promised or realized.
I’ve learned this the hard way.
Many of my colleagues have more experience than I do, and have learned these lessons already. What I have to say is not new to them. But I don’t think the lessons are widely recognized just yet.
So, when asked to speak to DFID again yesterday (to a conference on evaluating governance programs), I decided to update a little. They had read my 2.0 musings, and so the talk was an attempt to draw out what more I’ve learned in the three years since.
The short answer: policymakers and donors — don’t do M&E, do R&D. It’s not about the method. Randomized trails are a means to an end. Use them, but wisely and strategically. And don’t outsource your learning agenda to academics.
Probably I should actually wait until Ive published my field experiments before I shoot my mouth off any further. But as readers of this blog may know: why stop now?
So, I bring you slides and (abbreviated) speaking notes for… “Impact evaluation 3.0?”
Comments and criticisms welcome.
Update: Incidentally, I should also point people to others singing a similar (but more harmonious) tune:
- Here is a must-read paper on mechanism experiments, by Ludwig, Kling, and Mullainathan.
- Here is one by Duflo and Banerjee, and one by Duflo and Kremer.
11 Responses
I really appreciate this presentation too, but I think the ideas are being stifled by the Impact Evaluation framing. What you are really talking about is Knowledge Accumulation 1.0 with all the difficult practical and epistemological issues that raises. What theories inform our inquiry, what do we need knowledge for, what counts as knowledge, how much evidence do we need to act, etc? I think it’s time we approached these issues directly as opposed to coming at them tangentially through the RCT pro v. con debate.
Okay, so I love your presentation and find it reassuring that DFID are hearing these messages from thoughtful academics. (I work with them in a minor role and always nice to know I’m not the only one trying to help them think through these issues).
But I can’t help wondering whether we really mean that ‘why it works is more important than whether it works’. Don’t we mean that we need them both, and that ‘why’ can be easier to explore so it’s a good place to start. And perhaps that knowing whether it works isn’t good enough on its own. But I’m fairly sure (and still thinking about this so not feeling dogmatic about it) that ‘why it works’ also isn’t good enough on its own.
I remember running training on impact evaluation in southern Africa in 2001 with a DFID member of staff amongst the trainees. He challenged me on why large scale impact evaluation was relevant when he had anecdotal evidence of a sex-education programme working in a school. I asked him if that was enough evidence to scale the programme up to a multi-million pound international scale and he immediately acknowledged the need for rigorous impact evaluation first. DFID does have big budgets and, like many funders, makes large scale international-level decisions about what to do. Given the way that they and many others operate, I would push evaluation of whether AND research into why interventions do or don’t work.
The point 3, why it works is more important than whether it works, really made me think of a recent French evaluation of anonymous resume (by the CREST).
So the theory of change failed. The anonymous resume didn’t increase the likeliness of Ahmed being hired. It was bad without anonymous resume, but worse with it. Cf http://www.crest.fr/images/CVanonyme/rapport.pdf
The anonymous resume legislation was killed. http://www.lemonde.fr/societe/article/2011/08/17/le-cv-anonyme-ne-sera-finalement-pas-generalise_1560612_3224.html
But the big question is why ? How can the anonymous resume fail in what seems quite obvious discrimination. That study simply gives no answer.
Super presentation Chris. Thanks for sharing. You’ve put into words many of my own concerns – and come up with some thoughtful suggestions for how aid agencies can get more bang for their evaluation buck.
The point about project evaluations often generating little useful knowledge is extremely important. And you’ve helpfully elaborated on what causes that.
So very often the lack of generalizability is swept under the rug – and new projects are built along the same lines because X intervention was shown to “work” – without reference to what the intervention truly consisted of, or how the intervention influenced the local reality in order to generate the desired change.
Instead of spending hundreds of thousands of dollars to refine and test the impact, wouldn’t it be better to just give the money as a grant to the poorest of the poor?
Interesting.
I am sorry for the ignorance, but are there some RCTs studies that have been replicated in different regions of the world, with what results? Or does it remains the next step?
Greetings.
Ongoing in many areas (conditional cash transfers, savings, microfinance, pro-poor targeting, elections) with some early reviews out. I don’t have the links handy, unfortunately.
I think what you have is a good description of how impact evaluations can better serve policymakers/donors’ needs to learn generalizable lessons. (I especially like the point about doing R&D, not M&E.) However, I think there’s another purpose for impact evaluations, which is to allow donors to hold implementing organizations accountable for good implementation. And for this purpose, I don’t think your recommendations necessarily apply.
I wonder if you agree, and if so, whether it would be worth making that distinction in the presentation.
I suspect the average impact evaluation takes years, bright academics or researchers, and hundreds of thousands of dollars, if not millions. And it only makes projects accountable to donors, not to the people who receive aid. And that accountability is very narrow, defined by a few measures.
A terrible approach to accountability?
Hm. Perhaps you are using the term “impact evaluation” more restrictively than I do. I think you can evaluate a project’s impact in cheaper ways that provide accountability to the donor and the community (e.g. through participatory methods). It won’t approach the scientific certainty promised by RCTs or similar methods, but it can still be rigorous under a different set of standards.
I like the Microsoft versus Google analogy. And also the focus on testing the theoretical assumptions behind programs versus just testing the method of implementation and whether it works.
I applied for the project associate/coordinator position with STYL in Monrovia and wasn’t selected, but I’m very excited to see the results of this study. This has been one of the better examples of using appropriate methodology to test worthwhile theory that I’ve come across.