The new must-read paper for field experimenters

The vast majority of randomized experiments in economics rely on a single baseline and single follow-up survey. If multiple follow-ups are conducted, the reason is typically to examine the trajectory of impact effects, so that in effect only one follow-up round is being used to estimate each treatment effect of interest.

While such a design is suitable for study of highly autocorrelated and relatively precisely measured outcomes in the health and education domains, this article makes the case that it is unlikely to be optimal for measuring noisy and relatively less autocorrelated outcomes such as business profits, household incomes and expenditures, and episodic health outcomes.

Taking multiple measurements of such outcomes at relatively short intervals allows one to average out noise, increasing power. When the outcomes have low autocorrelation, it can make sense to do no baseline at all. Moreover, I show how for such outcomes, more power can be achieved with multiple follow-ups than allocating the same total sample size over a single follow-up and baseline.

From a new working paper from David McKenzie.

My two cents: the greatest value of baseline data is that the heterogeneity analysis helps you test theory and generalizability. See David’s paper on returns to capital, with de Mel and Woodruff, for a great example.

The counterpoint: If you’re the World Bank and you’re running a gazillion government program experiments and you’re pretty sure a majority are going to tank, then you can save a lot on baselines that go nowhere.

My counter-counterpoint: Holy crap. You’re running a gazillion experiments and a huge number fail? This sounds like ingredient number one in a recipe for publication bias.

Time to adopt the CONSORT system?

(Sorry World Bank friends. Please don’t take away my research funding. Think of it as only troubling the ones we love…)

2 thoughts on “The new must-read paper for field experimenters

  1. This is indeed a very nice paper. I also worry that some researchers may take the advice on ‘baseline or not’ too far: they are useful for a variety of analysis and not just power (as David mentions in the concluding section). However, they are also expensive and the attraction to skip it may be irresistable to many.

    Perhaps carefully designed light baseline surveys that collect some basic information (especially including variables with high autocorrelation) are an acceptable compromise…

  2. Thanks guys.
    Part of the message is that it is often better to do 2 follow-ups and 1 baseline with a smaller n than to do one follow-up and one baseline.

    In terms of publication bias and experiments failing – as Chris well knows, there are many many ways a nice experiment ends up not getting implemented – a change of Government means the program you plan on evaluating gets scrapped for example. Clearly you can’t evaluate what doesn’t get done (except from a political economy point of view perhaps).
    But it would be nice to have some database of failed experiments to refer to and learn from – so look forward to one day we finally moving beyond talk of this to action.