After years of working on program evaluation and related things, it is with great joy that I toss causation out the window and learn to data mine.
A few years ago, a foundation said to me, “hey, all that data you’re collecting to study property disputes and other violence in Liberia–could you use it to test early warning systems for riots and major crimes?” My reaction: “That sounds crazy. As if that’s possible.” Their response, “We will fund your survey if you try.” My reply: “Did I say crazy? I meant that sounds like a great idea.”
After six years of data collection, Rob Blair and Alex Hartman and I finally have a paper:
We use forecasting models and new data from 242 Liberian communities to show that it is to possible to predict outbreaks of local violence with high sensitivity and moderate accuracy, even with limited data.
We train our models to predict communal and criminal violence in 2010 using risk factors measured in 2008. We compare predictions to actual violence in 2012 and find that up to 88% of all violence is correctly predicted. True positives come at the cost of many false positives, giving overall accuracy between 33% and 50%.
From a policy perspective, states, international organizations, and peacekeepers could use such predictions to better prevent and respond to violence. The models also generate new stylized facts for theory to explain.
In this instance, the strongest predictors of more violence are social (mainly ethnic) cleavages, and minority group power-sharing.
This is not precisely “big data” in that it’s a small number of villages and three years of events. But it’s “big” in the sense of having lots and lots of detailed information about the villages themselves, which is rare. We think of this as a pilot, or proof of concept for the approach, and plan to test it next on much bigger data from other countries.
The most interesting finding, to me, was how power-sharing at the local level was associated with more violence. There’s actually a number of papers looking at national power-sharing right now that find the same thing. And yet the common political response to a crisis nowadays is to push for power-sharing. Worth investigating.
I would have liked to name this paper “I just ran 32 million regressions,” but besides other drawbacks, the more honest title would be “My RA just ran 32 million regressions,” which is slightly less compelling.