What happens when a very good political science journal checks the statistical code of its submissions?

The Quarterly Journal of Political Science has a policy of in-house review of all statistical code. This is expensive and hard to do. Is it worth it?

According to Nick Eubank, an unambiguous yes:

Of the 24 empirical papers subject to in-house replication review since September 2012, only 4 packages required no modifications.

Of the remaining 20 papers, 13 had code that would not execute without errors, 8 failed to include code for results that appeared in the paper, and 7 failed to include installation directions for software dependencies.

13 (54 percent) had results in the paper that differed from those generated by the author’s own code. Some of these issues were relatively small — likely arising from rounding errors during transcription — but in other cases they involved incorrectly signed or mis-labeled regression coefficients, large errors in observation counts, and incorrect summary statistics.

You might think to yourself, “that would never happen in the very top journals”, or “that’s less likely among statisticians and economists. While I think expertise might reduce errors, I’m not so sure. More senior people with more publications tend to rely more on research assistants. And, speaking from personal experience, I’ve found major problems in important people’s code before.