Detecting academic fraud, statistically

To undetectably fabricate data is difficult. It requires both (i) a good understanding of the phenomenon being studied (what do measures of this construct tend to look like?, which variables do they correlate with and by how much?, how often and what type of outliers are observed?, etc.), and (ii) a good understanding of how sampling error is expected to influence the data (e.g., how much variation and of what kind should estimates of this construct exhibit for the observed sample size and design?).

This paper makes two main points. First, information already included in most psychology publications, e.g., means and standard deviations, can be analyzed in light of points (i) and (ii) above, to identify likely cases of fraud. Second, the availability of raw data is an invaluable complement to verify the presence, and identify the precise nature of, fabrication. The paper illustrates these points through the analyses used to uncover two cases of fraud in psychology.

Both cases were first identified exclusively through the statistical analysis of means and standard deviations reported in published papers, by examining whether estimates across independent samples were too similar to each other to have originated in random samples

Paper here.

h/t Joe McGee.

I suppose that the trouble with this approach is that someone who understood the detection algorithm could falsify the data better. I suppose the fakers might be fakers because they are too lazy or dim to do even that. This has been my experience with student fraud. Then again, I probably only catch the lazy and dim ones, giving me a select sample…

25 Responses

  1. Hello professor, you will be glad to know that If you spend a lot of money at the grocery store, the right credit card could save you hundreds of dollars per year. To determine which earns you good rewards.

  2. Someone smart enough to commit hard-to-detect, convincing fraud is usually smart enough to do the actual work instead, and it’s easier. Unless they’ve got crazy cognitive weirdness going on, like Salvador Dali (who seemed to *like* to be a fraudster), they are unlikely to do so….
    ….unless committing fraud suits their larger political agenda. Which is why those who have an ax to grind or are funded by people with an ax to grind are the ones you have to watch out for.

  3. Hi Chris,
    every thought about deactivating retweets to be posted in the comments? It used to be nice to look at other people’s views, but the comments have become a mess of useless noise.

  4. The underlying issue might be the poor governance mechanisms to ensure faithful and ethical data collection and analysis. Ethics committees, peer review, and ‘fraud’ analysis from both raw and final datasets available for external analysis are steps in this direction, but I think the robust and scalable solution is eventually to have ‘open science’ – akin to open source.

    PS – your site’s repeating a couple of options below the post button:
    Notify me of followup comments via e-mail.
    Notify me of follow-up comments by email.

  5. Chris:
    This stuff has made a huge splash (as you might expect) in psychology. There is a nice summary article of it in Science or Nature.