Detecting academic fraud, statistically

To undetectably fabricate data is difficult. It requires both (i) a good understanding of the phenomenon being studied (what do measures of this construct tend to look like?, which variables do they correlate with and by how much?, how often and what type of outliers are observed?, etc.), and (ii) a good understanding of how sampling error is expected to influence the data (e.g., how much variation and of what kind should estimates of this construct exhibit for the observed sample size and design?).

This paper makes two main points. First, information already included in most psychology publications, e.g., means and standard deviations, can be analyzed in light of points (i) and (ii) above, to identify likely cases of fraud. Second, the availability of raw data is an invaluable complement to verify the presence, and identify the precise nature of, fabrication. The paper illustrates these points through the analyses used to uncover two cases of fraud in psychology.

Both cases were first identified exclusively through the statistical analysis of means and standard deviations reported in published papers, by examining whether estimates across independent samples were too similar to each other to have originated in random samples

Paper here.

h/t Joe McGee.

I suppose that the trouble with this approach is that someone who understood the detection algorithm could falsify the data better. I suppose the fakers might be fakers because they are too lazy or dim to do even that. This has been my experience with student fraud. Then again, I probably only catch the lazy and dim ones, giving me a select sample…