What happens when a very good political science journal checks the statistical code of its submissions?

The Quarterly Journal of Political Science has a policy of in-house review of all statistical code. This is expensive and hard to do. Is it worth it?

According to Nick Eubank, an unambiguous yes:

Of the 24 empirical papers subject to in-house replication review since September 2012, only 4 packages required no modifications.

Of the remaining 20 papers, 13 had code that would not execute without errors, 8 failed to include code for results that appeared in the paper, and 7 failed to include installation directions for software dependencies.

13 (54 percent) had results in the paper that differed from those generated by the author’s own code. Some of these issues were relatively small — likely arising from rounding errors during transcription — but in other cases they involved incorrectly signed or mis-labeled regression coefficients, large errors in observation counts, and incorrect summary statistics.

You might think to yourself, “that would never happen in the very top journals”, or “that’s less likely among statisticians and economists. While I think expertise might reduce errors, I’m not so sure. More senior people with more publications tend to rely more on research assistants. And, speaking from personal experience, I’ve found major problems in important people’s code before.

158 thoughts on “What happens when a very good political science journal checks the statistical code of its submissions?

  1. need of the hour. journals create an escrow style facility specifically for depositing results and code for papers given provisional acceptance from empirical submission for in-house replication

  2. It absolutely happens among economists. “Replication in Empirical Economics” by Dewald, Thursby, and Anderson showed that (at least in 1986) this was true. They even had one author try to help them to get his regressions to work and they couldn’t do it working together. Data also was found to have been screwed up badly by transcription error, and about 2/3 of people who were “required” to provide the data that they used failed to do so on repeated request.

    Its a really eye-opening and disheartening read.

  3. Wouldn’t it be amazing if replication code and data (appropriately anonymized and secured) were made available to reviewers? Reviewers currently rely on some combination of trust and intuition to determine what analyses were actually executed and reported. This creates at least two problems: they suggest bad “fixes” for problems that don’t exist on the one hand and fail to catch major errors on the other!

  4. Ouch. A very good idea would be for folks to run their replication programs by a “devil’s advocate” RA or faculty member before submission, who specifically seeks out errors– maybe promise a free dinner for every mistake they find in the final paper? When I worked in econ consulting, we had to turn over replication programs for every number, figure, and table in our expert reports to the opponents’ expert, and they would set their RAs to work night and day looking for errors in our code. If the replication didn’t even run, well I shudder to think what our bosses would say, but even modest errors could be catastrophic for our experts’ credibility. To prevent this, we ‘double-did’ every calculation: a second analyst, working mostly in the dark, would start from the same raw data and replicate the analysis, often in a different stats package or just in Excel, then check every number for equality down to a certain number of digits. We would submit nothing to court that hadn’t been audited thus.