Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes.

After years of working on program evaluation and related things, it is with great joy that I toss causation out the window and learn to data mine.

A few years ago, a foundation said to me, “hey, all that data you’re collecting to study property disputes and other violence in Liberia–could you use it to test early warning systems for riots and major crimes?” My reaction: “That sounds crazy. As if that’s possible.” Their response, “We will fund your survey if you try.” My reply: “Did I say crazy? I meant that sounds like a great idea.”

After six years of data collection, Rob Blair and Alex Hartman and I finally have a paper:

We use forecasting models and new data from 242 Liberian communities to show that it is to possible to predict outbreaks of local violence with high sensitivity and moderate accuracy, even with limited data.

We train our models to predict communal and criminal violence in 2010 using risk factors measured in 2008. We compare predictions to actual violence in 2012 and find that up to 88% of all violence is correctly predicted. True positives come at the cost of many false positives, giving overall accuracy between 33% and 50%.

From a policy perspective, states, international organizations, and peacekeepers could use such predictions to better prevent and respond to violence. The models also generate new stylized facts for theory to explain.

In this instance, the strongest predictors of more violence are social (mainly ethnic) cleavages, and minority group power-sharing.

This is not precisely “big data” in that it’s a small number of villages and three years of events. But it’s “big” in the sense of having lots and lots of detailed information about the villages themselves, which is rare. We think of this as a pilot, or proof of concept for the approach, and plan to test it next on much bigger data from other countries.

The most interesting finding, to me, was how power-sharing at the local level was associated with more violence. There’s actually a number of papers looking at national power-sharing right now that find the same thing. And yet the common political response to a crisis nowadays is to push for power-sharing. Worth investigating.

I would have liked to name this paper “I just ran 32 million regressions,” but besides other drawbacks, the more honest title would be “My RA just ran 32 million regressions,” which is slightly less compelling.

58 thoughts on “Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes.

  1. Just in time. Now we can turn to the deleterious impact of Ebola on economic fabric of the country

  2. Dear Chris, I was wondering which language(s) and package(s) did you use for analyzing your data?

  3. Interesting article. Couple of questions:
    1) Why say that the models apart from neural network are not interactive or non-linear. Random forest is highly interactive and non-linear.
    2) Did you experiment with the number of trees in the random forest or with depth?
    3) You might want to also mention that neural networks are more prone to overfitting…
    4) Why lasso instead of elastic-net?

  4. @Luke and @RDub2:

    Thanks. @Rdub2: We’re somewhat novices. We know the limitations of the statistics we use but aren’t sure what else we can add to fill out the picture. Suggestions?

    @Luke:
    1) You are right and we clarified what we meant in the new version.
    2) The robustness tables at the back show both. We’ve done more robustness. Mostly the results are stable, though least of all with RF if I recall.
    3) Would like more detail. I’m not familiar enough with NN to think about how this would be a worry with k-fold cross validation and the forward looking forecast.
    4) No good reason other than the simplicity of lasso appealed to us and we had to limit our models somehow.