The vast majority of randomized experiments in economics rely on a single baseline and single follow-up survey. If multiple follow-ups are conducted, the reason is typically to examine the trajectory of impact effects, so that in effect only one follow-up round is being used to estimate each treatment effect of interest.
While such a design is suitable for study of highly autocorrelated and relatively precisely measured outcomes in the health and education domains, this article makes the case that it is unlikely to be optimal for measuring noisy and relatively less autocorrelated outcomes such as business profits, household incomes and expenditures, and episodic health outcomes.
Taking multiple measurements of such outcomes at relatively short intervals allows one to average out noise, increasing power. When the outcomes have low autocorrelation, it can make sense to do no baseline at all. Moreover, I show how for such outcomes, more power can be achieved with multiple follow-ups than allocating the same total sample size over a single follow-up and baseline.
From a new working paper from David McKenzie.
My two cents: the greatest value of baseline data is that the heterogeneity analysis helps you test theory and generalizability. See David’s paper on returns to capital, with de Mel and Woodruff, for a great example.
The counterpoint: If you’re the World Bank and you’re running a gazillion government program experiments and you’re pretty sure a majority are going to tank, then you can save a lot on baselines that go nowhere.
My counter-counterpoint: Holy crap. You’re running a gazillion experiments and a huge number fail? This sounds like ingredient number one in a recipe for publication bias.
Time to adopt the CONSORT system?
(Sorry World Bank friends. Please don’t take away my research funding. Think of it as only troubling the ones we love…)