Popper on Collier

Bill Easterly took aim today at Paul Collier’s new book, Wars, Guns and Votes. The accusation? Data mining. In short: if you run enough regressions, you’ll find the answer you were looking for from the start.

My informed opinion on Collier’s book is going to have to wait until that elusive day when I have time to finish reading it. (So far no luck, in spite of the end of semester.) I know the literature on which it’s based, though, having just written a behemoth of a civil war lit review.

My own view: the cross-country regression is often a spurious thing. Sometimes incredibly useful, but just as often misleading.

When does statistical work lead us astray? For that we could do worse that to take a second look at Karl Popper. In a famous 1957 lecture, he gave us seven guidelines for scientific inquiry:

  1. It is easy to obtain confirmations, or verifications, for nearly every theory–if we look for confirmations.

  2. Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory–an event which would have refuted the theory.

  3. Every ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.

  4. A theory which is not refutable by any conceivable event is nonscientific. Irrefutability is not a virtue of a theory (as people often think) but a vice.

  5. Every genuine test of a theory is an attempt to falsify it, or to refute it. Testability is falsifiability; but there are degrees of testability: some theories are more testable, more exposed to refutation, than others; they take, as it were, greater risks.

  6. Confirming evidence should not count except when it is the result of a genuine test of the theory; and this means that it can be presented as a serious but unsuccessful attempt to falsify the theory.

  7. Some genuinely testable theories, when found to be false, are still upheld by their admirers–for example by introducing ad hoc some auxiliary assumption, or by re-interpreting the theory ad hoc in such a way that it escapes refutation.

Point number one warns us against data mining. The rest give us a sense of how research should be done.

Now, almost no cross country study meets these strictures. That would probably be too much to ask; risky, refutable predictions are hard to come by, and the questions matter too much to ignore. When we only have confirming evidence and weak tests, we don’t ignore the evidence, we just take it with caution.

This is where most of the literature fails: overconfidence in weak tests on poor data. I actually buy most of Collier’s conclusions, and share his intuition. But this is theory-building, not theory-proving, and the answer remains to be found.

3 thoughts on “Popper on Collier

  1. Hi,

    I do myself cross-country regression (actually, time series cross country) and was wondering: Could you give us some suggestions (references) about why “cross-country regression is often a spurious thing”?

    I mean, what is so different in cross-country regression from a time series regression, or survey-based regression, or whatever? Is the data (somoe countries have poor record practices and their datas are bad) or other things?

    Thanks in advance

  2. Manoel, especially in the civil war/development literature “endogeneity” is the biggest problem with cross-country regression. relationships between 2 variables such as “wealth” and “instability” can support a causal story in either direction (e.g. “more instability scares off investors and thus lowers growth” is equally plausible as “low wealth makes people discontent and more likely to engage in illegal activities and thus increases instability”). moreover these variables may not be even related but caused by a third factor that produces these patterns separately (e.g. “culture”). in panel data it is easier to separate out these competing explanations although it is not always possible.

  3. For one thing, countries have locations and histories, which makes for complicated dependencies in their error terms.