After years of working on program evaluation and related things, it is with great joy that I toss causation out the window and learn to data mine.
A few years ago, a foundation said to me, “hey, all that data you’re collecting to study property disputes and other violence in Liberia–could you use it to test early warning systems for riots and major crimes?” My reaction: “That sounds crazy. As if that’s possible.” Their response, “We will fund your survey if you try.” My reply: “Did I say crazy? I meant that sounds like a great idea.”
After six years of data collection, Rob Blair and Alex Hartman and I finally have a paper:
We use forecasting models and new data from 242 Liberian communities to show that it is to possible to predict outbreaks of local violence with high sensitivity and moderate accuracy, even with limited data.
We train our models to predict communal and criminal violence in 2010 using risk factors measured in 2008. We compare predictions to actual violence in 2012 and find that up to 88% of all violence is correctly predicted. True positives come at the cost of many false positives, giving overall accuracy between 33% and 50%.
From a policy perspective, states, international organizations, and peacekeepers could use such predictions to better prevent and respond to violence. The models also generate new stylized facts for theory to explain.
In this instance, the strongest predictors of more violence are social (mainly ethnic) cleavages, and minority group power-sharing.
This is not precisely “big data” in that it’s a small number of villages and three years of events. But it’s “big” in the sense of having lots and lots of detailed information about the villages themselves, which is rare. We think of this as a pilot, or proof of concept for the approach, and plan to test it next on much bigger data from other countries.
The most interesting finding, to me, was how power-sharing at the local level was associated with more violence. There’s actually a number of papers looking at national power-sharing right now that find the same thing. And yet the common political response to a crisis nowadays is to push for power-sharing. Worth investigating.
I would have liked to name this paper “I just ran 32 million regressions,” but besides other drawbacks, the more honest title would be “My RA just ran 32 million regressions,” which is slightly less compelling.
58 Responses
@Luke and @RDub2:
Thanks. @Rdub2: We’re somewhat novices. We know the limitations of the statistics we use but aren’t sure what else we can add to fill out the picture. Suggestions?
@Luke:
1) You are right and we clarified what we meant in the new version.
2) The robustness tables at the back show both. We’ve done more robustness. Mostly the results are stable, though least of all with RF if I recall.
3) Would like more detail. I’m not familiar enough with NN to think about how this would be a worry with k-fold cross validation and the forward looking forecast.
4) No good reason other than the simplicity of lasso appealed to us and we had to limit our models somehow.
As I’m sure you are aware True Positives/Negatives are not always the best way to measure a models accuracy and is highly dependent upon category size. Thanks for posting.
Interesting article. Couple of questions:
1) Why say that the models apart from neural network are not interactive or non-linear. Random forest is highly interactive and non-linear.
2) Did you experiment with the number of trees in the random forest or with depth?
3) You might want to also mention that neural networks are more prone to overfitting…
4) Why lasso instead of elastic-net?
Dear Chris, I was wondering which language(s) and package(s) did you use for analyzing your data?
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @logtrust: Can we use #data and #machinelearning to predict local violence in fragile states? As it turns out, yes. http://t.co/FjRFp46y…
RT @logtrust: Can we use #data and #machinelearning to predict local violence in fragile states? As it turns out, yes. http://t.co/FjRFp46y…
RT @logtrust: Can we use #data and #machinelearning to predict local violence in fragile states? As it turns out, yes. http://t.co/FjRFp46y…
Can we use #data and #machinelearning to predict local violence in fragile states? As it turns out, yes. http://t.co/FjRFp46ye7 #tech
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes http://t.co/d17Xc7ZbDL
“Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes.” http://t.co/JqMlHZOrhh
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
Is this a secret battle between you and Sala-i-Martin at Columbia?
http://www.jstor.org/discover/10.2307/2950909?uid=3739832&uid=2&uid=4&uid=3739256&sid=21104277908541
http://www.nber.org/papers/w6252
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @zajacsannerholm: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/x…
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/xxOZRUVfHu
RT @treycausey: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/bN1iVy…
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes http://t.co/OAM3zSHQRu
RT @SIMLab: Data and machine learning can predict local violence in fragile states. http://t.co/AAOGAN4KD8 #EWS #ICT
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/7gdDvAhiNE
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/DyTSjSOiTS
RT @BottmUpThinking: Another tech leapfrog moment for #globaldev? Is @cblatts helping make #minorityreport a reality in Liberia? http://t.c…
Data and machine learning can predict local violence in fragile states. http://t.co/AAOGAN4KD8 #EWS #ICT
RT @Markus_Ellmer: „Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes.“ @cblatts http…
„Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes.“ @cblatts http://t.co/HcysmeXWkb
Just in time. Now we can turn to the deleterious impact of Ebola on economic fabric of the country
RT @jeneambrose: In which @cblatts runs 32 million regressions & determines it’s possible to predict outbreaks of local violence. http://t.…
Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes http://t.co/Squ5g3DIYO
Can we use #data and machine learning to predict local #violence in #fragile states? As it turns out, yes. http://t.co/TBaOHZ9wqd
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @nancymbirdsall: Yup. Consider new “national unity” govt in Afghanistan RT @cblatts: Machine learning predicts violence in Liberia http:…
RT @adrianflorea13: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/XR…
RT @cblatts: Machine learning predicts violence in Liberia http://t.co/a3nphOdwWz
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
@cblatts This was great–it’s rare to see data on a “peaceful” situation. I’d love to see a mixed-methods study on the power-sharing bit.
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8
RT @cblatts: Can we use data and machine learning to predict local violence in fragile states? As it turns out, yes. http://t.co/43AfWlrdz8