This note (also available in pdf here) aims to feed the discussion about how to deal with fraud and other forms of academic misconduct in the wake of the Stapel and Smeesters affair and the publication of the report by the Schuyt Commission of the Royal Dutch Academy of Sciences (KNAW).
The recent fraud cases in psychology (the report of the Levelt committee that investigated the Stapel fraud is here: http://www.tilburguniversity.edu/nl/nieuws-en-agenda/finalreportLevelt.pdf; read more on Retraction Watch here) do not only call the credibility of that particular field of science into question, but also reduce the reputation of social science research generally. The KNAW report urges universities to educate employees and students in academic honesty but does not suggest to implement a specific policy to detect fraud and other forms of academic misconduct. The diversity in research practices between disciplines makes it difficult to impose a general policy to detect and deter misconduct. However, skeptics may view the reluctance of the KNAW to increase scrutiny as a way to cover up fraud and misconduct. Universities and science in general run a serious risk of losing their credibility in society if they do not deal with misconduct. With every new case that comes to light the public will ask: how is it possible that this case was not detected and prevented? Anticipating a large scale national investigation universities screen their employees using a list of risk factors for fraud and misconduct. This screening exercise may give a rough sense of how prevalent and serious academic misconduct is at their institution is. Below I give some suggestions for such risk factors, relying on research on academic misconduct.
At present it is unclear how prevalent and serious academic misconduct is at universities. It is difficult to obtain complete, valid and reliable estimates of the prevalence and severity of academic misconduct. Just as in crime outside the walls of academia, it is likely that there is a dark number for academic misconduct that does not come to light because there are no victims or because the victims or other witnesses have no incentive to report misconduct or an incentive not to report it. Relying on a survey among 435 European economists (a 17% response rate), Feld, Necker & Frey (2012) report that less than a quarter of all forms of academic misconduct is reported. There is no official registration of cases of academic misconduct. Cases of misconduct are sometimes covered by news media or by academics on blogs like Retraction Watch (http://retractionwatch.wordpress.com/). Using surveys, researchers have tried to estimate misconduct relying on self-reports and peer reports. In a Gallup 2008 survey among NIH grantees, 7.4% of the respondents reported suspected misconduct (Wells, 2008). Other surveys suggest a much higher incidence of misconduct. John, Loewenstein and Prelec (2012) conducted a study among psychologists with incentives for truth-telling and found that 36% admitted to having engaged in at least one ‘questionable research practice, a much higher incidence than the 9.5% reported by Fanelli (2009). The research available shows that fraud is certainly not unique to experimental (social) psychology, as the high-profile cases of Stapel, Smeesters and Sanna from the University of Michigan seem to suggest. Fraud occurs in many fields of science. Retraction Watch profiles the cases of Jan Hendrik Schön in nanotechnology, Marc Hauser in biology, Hwang Woo-suk in stem cell research, Jon Sudbø and Dipak Das in medicine, and many other researchers working in the medical and natural sciences.
What forms of misconduct should be distinguished? Below is a list of behaviors that are mentioned in discussions on academic dishonesty and the code of conduct of the Association of Universities in the Netherlands (VSNU).
- Fabrication of data. Stapel fabricated ‘data’: he claimed data were collected in experiments while in fact no experiment was conducted and no data were collected. In less severe cases researchers data points are fabricated and added to a real dataset. List et al. (2001) report that 4% of economists admit having fabricated data. A similar estimate emerges from the more recent survey by Feld, Necker & Frey (2012). John, Loewenstein & Prelec report that 1.7% of psychologists admit fabrication of data. However, from this number, they estimate the true prevalence to be around 9%.
- Omission of data points. Smeesters admitted to have worked datasets such that the hypotheses were confirmed, e.g. by fabricating and adding ‘data points’ that increase the and by omitting those that reduced the p-value. John, Loewenstein & Prelec report that 43.4% of psychologists admit this.
- Invalid procedures for data handling. Errors in recoding, reporting or interpreting, inspired by and leading to support for the hypotheses. Research by Bakker & Wichterts (2011) shows this is quite common in psychology: 18% of statistical results in 2008 are incorrectly reported, commonly in the direction of the hypothesis favored by the author.
- ‘Data snooping’: ending data collection before a target sample is achieved when a significant result is realized. This increases the likelihood of false positives or Type I errors (Strube, 2006). John, Loewenstein & Prelec report that 58% of psychologists admit this.
- Cherry picking: not reporting on data that were collected because the results did not support the hypothesis. John, Loewenstein & Prelec report that 50% of psychologists admit this. Cherry picking results in the file drawer problem: the ‘unexpected’ results disappear into a drawer.
- ‘Harking’: Hypothesizing After Results are Known (Kerr, 1998). In a paper, reporting an unexpected finding as having been predicted from the start. John, Loewenstein & Prelec report that 35% of psychologists admit this.
All of the above forms of misconduct lead to artificially strong positive results that are difficult to replicate (Jha, 2012; Simmons, Nelson & Simonsohn, 2011). The positive publication bias is enhanced by high-impact journals that want ‘novel findings’ and refuse to report (failed) replications. In addition to forms of misconduct that lead to positive publication bias, there are several other forms of misconduct:
- Plagiarism. Cut & paste of text without quotation marks and/or proper references.
- Double publication. Sending essentially the same manuscript to different journals without informing them and accepting simultaneous publication without cross-references. Ironically, Bruno Frey, the third author of the Feld, Necker & Frey (2012) paper cited above, has engaged in this form of misconduct on several occasions. The Frey case is documented extensively by Olaf Storbeck on his Economics Intelligence blog (http://economicsintelligence.com/2012/03/19/self-plagiarism-bruno-frey-gets-away-with-a-slap-on-the-wrist/).
- Undeserved authorship. Putting the name of a co-author on a paper who did not contribute to the paper. List et al. lumped undeserved authorship and sending manuscripts simultaneously to two journals and report that 7 to 10% of economists have engaged in this behavior.
- Not disclosing conflicts of interest (e.g., reviewing your own paper, a paper to which you contributed or a paper by a close colleague; sponsorship of the research by a party with interests in a certain outcome).
- Not observing professional codes of conduct. Each academic discipline has its own code of conduct. The content of these codes vary widely. Being aware of the code is phase 1; knowledge of its content is phase 2; observing it is phase 3.
Trends in misconduct
The recent Stapel and Smeesters cases suggest that misconduct is increasing. While Giner-Sorolla (2012) argues that the problems so vividly put on the agenda in this ‘year of horrors’ (Wagenmakers, 2012) are not new at all, Steen (2011) shows that the number of retractions of papers from academic journals covered in PubMed has increased sharply in the past years. These are the cases that form the tip of the iceberg because journal editors considered the evidence for misconduct so convincing. Whether the iceberg has in fact grown is not clear. Fanelli (2012) shows that negative results are disappearing from most disciplines and countries published in ISI journal articles. Most troubling is that the proportion of positive results in journal articles from the Netherlands is stronger than in many other countries (OR: 1.16, reference category: US). Also the proportion is higher in the Social Sciences (OR: 2.14; reference category: Space Science) than in other disciplines – though less strong than in Neuroscience (3.16), Psychology (OR: 2.99) and Economics (2.65).
Characteristics of those who engage in misconduct
Little is known about the characteristics of those who engage in misconduct. List et al.. (2001) find virtually no significant associations between self-reported misconduct and characteristics of economists. Stroebe, Postmes and Spears (2012) compared cases of academics caught for fraud and identified a set of common characteristics of these cases: the fraudsters were highly respected as researchers, published journal articles proficiently, were very quick in making their career, and had perfect datasets. Nosek, Spies & Motyl (2012) vividly illustrate the social dilemma for young researchers trying to build a career with novel findings that they cannot replicate. Pretty much the same sketch emerges from an analysis of retracted publications in PubMed (Steen, 2011). While Stapel and Smeesters seem to have been isolated fraudsters, Steen (2011) find that a fraudster whose PubMed publication has been retracted “more frequently publishes with at least one co-author who also has fraudulent publications”.
What can be done to reduce misconduct?
Nosek, Spies & Motyl (2012) and Stroebe (Hamel, 2012, Witlox, 2012) are skeptical about self-correction in science. At present, the benefits of misconduct are too high and the risk of getting caught is simply too low. The fraudsters lined up by Stroebe at al. were able to pass peer review procedures because the procedures were not stringent enough. Reviewers should be more aware of the possibility of fraud (Matías-Guiu & García-Ramos, 2010). Audits in which random samples of journal articles are drawn and the authors would be a solution because they increase the dection risk, Stroebe et al. argue. Food scientist Katan proposed such an audit at a KNAW conference on data sharing in 2011 (KNAW, 2012, p.47). However, audits are costly procedures. Another recommendation is that replications should be made public. This has also been the dominant response in academic psychology to the Stapel case (Wicherts, 2011). Researchers are often unwilling or reluctant to share data (Wichterts, Borsboom, Kats & Molenaar, 2006). At present the incentives discourage researchers to share data. Researchers save time by not making their data available (Firebaugh, 2007; Giner-Sorolla, 2012). The costs required to make data available are often not budgeted. If research funders such as the Netherlands Organization for Scientific Research (NWO) impose a data sharing requirement this will create an additional cost for researchers. This makes it improbable that scientists will adopt the solution without force. At present, reluctance to share data indicates lower quality of research (Wicherts, Bakker & Molenaar, 2011). While data-sharing is desirable for replication purposes, it is not something that universities can impose and only works in the long run. Journal editors and reviewers could insist on data-sharing, however. This also goes for the idea to require a power analysis for experiments (Ioannidis, 2005; Ioannidis & Trikalinos, 2007; Simmons, Nelson, & Simonsohn, 2011), the proposal that reviews are published along the articles (Mooneyham, Franklin, Mrazek, & Schooler, 2012) and various other ideas proposed by Nosek & Bar-Anan (2012) such as a completely open access data repository. An even longer term proposal is that researchers pre-register their studies and indicate in advance the analyses they intend to conduct (Wagenmakers, Wetzels, Borsboom, Van der Maas, & Kievit, 2012). Generally speaking, academic misconduct is likely to be more prevalent and more severe as the benefits of misconduct are higher, the costs are lower, and the detection risk is lower. Stroebe makes this point in two recent interviews (Hamel, 2012; Witlox, 2012). The increasing publication pressure in many sciences increases the benefit of misconduct (John, Loewenstein & Prelec, 2012). The lack of attention to details from overburdened reviewers, co-authors who are happy to score an additional publication and from dissertation supervisors loaded with work reduces the detection risk. The rat race increases the likelihood that the isolated cases of Stapel and Smeesters will be the ones who were stupid enough not to organize into a pack.
Lifelong anesthesia researcher Mutch (2011: 784) advises the following remedies against misconduct: “good mentoring, appropriately trained and diligent support staff, blinded assessment of data, data review by all investigators, a consensus agreement on data interpretation, a vigorous and independent Research Office, effective internal and external committees to assess adherence to protocols, and strong departmental leadership.” Conversely, it is likely that in the absence of these conditions, there are more opportunities for misconduct. Given all of the above, I propose the following list of conditions that increase the potential for fraud and academic misconduct. The list can be used as a checklist or screening device for academic publications. Obviously an article with a high score is not necessarily fraudulent and requires more detailed attention. I encourage universities, journal editors, and reviewers to use this list, and to suggest additions or modifications. It is by no means intended to be a definitive list, and replication is necessary.
|Condition||Potential misconduct||Detection method|
|1.||The researcher worked alone. Nobody else had or has access to the ‘data’. Co-authors were not involved in the ‘data collection’ and/or ‘data analysis’.||Data fabrication as well as less serious forms of misconduct.||Ask co-authors.|
|2.||The ‘data’ were not collected by others, but by the researcher.||Data fabrication.||Ask co-authors and co-workers.|
|3.||There are no witnesses of the ‘data’ collection.||Data fabrication.||Ask co-authors and co-workers.|
|4.||The raw ‘data’ (documents, fieldwork notes, questionnaires, videos, electronic data files) are not available (anymore). They are reported confidential, missing, lost, or located on a previous computer.||Data fabrication.||Ask author and co-authors, check data archive.|
|5.||The statistics are ‘too good to be true’. The p-values of statistical tests are more often just below .050 than would be expected based on chance (Krawczyk, 2008; Simonsohn, 2012).||Data fabrication, selective omission of data points, cherry picking, and harking.||Compare p-values to expected distribution.|
|6.||The research only finds support for the hypotheses (Fanelli, 2012).||Cherry picking and harking.||Count the proportion of hypotheses supported.|
|7.||There is no fieldwork report or lab log entry available.||Data fabrication.||Check data archive, lab log, ask author.|
|8.||Data are provided but original code or description of procedures followed by the author is not available or it is unclear for others how to replicate the research.||Data fabrication, cherry picking, harking.||Ask author.|
|9.||Replication of the research is impossible with the available raw data and procedures and analyses described by the author.||Cherry picking, harking.||Ask author.|
|10.||Replication of the research is possible but yields no support for the original findings.||Cherry picking, harking.||Try to replicate the findings.|
|11.||The research appeared in high impact journals.||Misconduct with higher benefits.||Check impact factor.|
|12.||The author is early in his/her career.||Misconduct with higher benefits.||Check career stage.|
- Bakker, M. & Wicherts, J.M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research, 43: 666-678.
- Fanelli, D. (2009). How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS ONE 4(5): e5738. doi:10.1371/journal.pone.0005738
- Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3): 891-904. DOI 10.1007/s11192‐011‐0494‐7
- Firebaugh, G. (2007). Replication data sets and favored-hypothesis bias. Sociological Methods & Research 36: 200–209.
- Giner-Sorolla, R.(2012). Will We March to Utopia, or Be Dragged There? Past Failures and Future Hopes for Publishing Our Science. Psychological Inquiry, 23: 263-266. DOI: 10.1080/1047840X.2012.706506
- Ioannidis, J. P. A. (2005). Why most published research findings are false. Plos Medicine, 2(8), 696-701. doi:10.1371/journal.pmed.0020124
- Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4, 245-253. doi: 10.1177/1740774507079441
- Jha, A. (2012). False positives: fraud and misconduct are threatening scientific research: High-profile cases and modern technology are putting scientific deceit under the microscope. The Guardian, 13 September 2012. http://www.guardian.co.uk/science/2012/sep/13/scientific-research-fraud-bad-practice
- John, L.K., Loewenstein, G., & Prelec, D. (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science, doi:10.1177/0956797611430953
- Hamel, E.-J. (18 september 2012). ‘De wetenschap is niet zelfreinigend’. DUB, http://www.dub.uu.nl:8080/artikel/achtergrond/wetenschap-niet-zelfreinigend.html
- Kerr, N. L. (1998). HARKing: Hypothezising after the results are known. Personality and Social Psychology Review, 2, 196-217. doi:10.1207/s15327957pspr0203_4
- Koninklijke Nederlandse Akademie van Wetenschappen (2012). Zorgvuldig en integer omgaan met wetenschappelijke onderzoeksgegevens. Amsterdam: KNAW.
- Krawczyk, M. (2008). Lies, Damned Lies and Statistics: The Adverse Incentive Effects of the Publication Bias. Working paper, University of Amsterdam. http://dare.uva.nl/record/302534
- List, J.A., Bailey, C.D., Euzent, P.J., Martin, T.L. (2001). Academic Economists Behaving Badly? A Survey on Three Areas of Unethical Behavior. Economic Inquiry, 39(1): 162-170).
- Matías-Guiu, J. & García-Ramos, R. (2010). Fraud and misconduct in scientific publications. Neurología, 25(1): 1-4.
- Mooneyham, B.W., Franklin, M.S., Mrazek, M.D., & Schooler, J.W. (2012). Modernizing Science: Comments on Nosek and Bar-Anan (2012). Psychological Inquiry, 23: 281-284. DOI: 10.1080/1047840X.2012.705246
- Mutch, W.A.C. (2011). Academic fraud: perspectives from a lifelong anesthesia Researcher. Canadian Journal of Anesthesiology, 58:782–788. DOI 10.1007/s12630-011-9523-5
- Nosek, B.A. & Bar-Anan, Y. (2012). Scientific Utopia: I. Opening Scientific Communication. Psychological Inquiry, 23: 217-243.
- Nosek, B.A., Spies, J.R. & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6): 615-631.
- Simonsohn, U. (2012). Just post it: The lesson from two cases of fabricated data detected by statistics alone. Paper available at http://ssrn.com/abstract=2114571
- Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11): 1359-1366. DOI:10.1177/0956797611417632
- Steen, R.G. (2011a). Retractions in the scientific literature: is the incidence of research fraud increasing? Journal of Medical Ethics, 37:249-253. doi:10.1136/jme.2010.040923
- Steen, R.G. (2011b). Retractions in the scientific literature: do authors deliberately commit research fraud? Journal of Medical Ethics, 37: 113-117. doi:10.1136/jme.2010.038125
- Stroebe, W., Postmes, T.& Spears, R. (2012). “Scientific misconduct and the myth of self-correction in science.” Perspectives on Psychological Science, 7(6): 670-688.
- Strube, M.J. (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing. Behavior Research Methods, 38(1): 24-27.
- Wagenmakers, E.-J. (2012). A Year of Horrors. De Psychonoom, 27: 12-13.
- Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Van der Maas, H.L.J., & Kievit, R.A. (2012). An Agenda for Purely Confirmatory Research. Perspectives on Psychological Science, 7(6): 632-638.
- Wells, J.A. (2008). Final Report: Observing and Reporting Suspected Misconduct in Biomedical Research. Washington: Gallup. http://ori.hhs.gov/sites/default/files/gallup_finalreport.pdf
- Wichterts, J. (2011). Psychology must learn a lesson from fraud case. Nature, 480: 7.
- Wichterts, J., Bakker, M. & Molenaar, D. (2011). Willingness to Share Research Data is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE, 6(11): e26828.
- Wicherts, J.M., Borsboom, D., Kats, J. & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist 61: 726–728.
- Witlox, M. (13 september 2012). Hoogleraar: ‘wetenschap heeft geen zelfreinigend vermogen’. Univers Online. http://universonline.nl/2012/09/13/hoogleraar-wetenschap-heeft-geen-zelfreinigend-vermogen/
- Yong, E. (2012). The data detective. Nature, 487: 18-19.