Category Archives: fraud

What should we want from publishers?

Last week I attended a conference organized by the university libraries in the Netherlands with the straightforward title: “What do we want from publishers?” The conference served as a consultation with librarians and other academics, before the universities in the Netherlands start negotiations about deals with commercial publishers such as Elsevier. I had missed the morning session, in which a panel of representatives from publishers and academia shared their views on the current state and future directions of academic publishing. In the sessions I attended I noticed signs of rebellion and resistance, with one speaker saying “we should be prepared to walk away” when commenting about the way libraries should approach negotiations with publishers.

Elsevier makes outrageously high profits, as do other academic publishers such as Springer/Nature, Taylor & Francis, and Wiley. As others have noted before (see here, here, and here), these profits often exceed those made by tech giants such as Apple and Google. The reason that profits of 30-40% per year are so outrageous is that publishers create these profits with unpaid work by academics. Scientists publishing in academic journals do not get paid for their contributions. In contrast: scientists often have to pay for publishing their articles. Publishers charge ‘Article Processing Charges’ (APCs) or open access fees to authors, to make the research available to the general public. Work that usually has been paid already by the general public (more about the business model of academic publishing here).

At the conference I learned that publishers seriously think that they add value to the work of academics by enticing us to spend our time as volunteer editors and peer reviewers. I’ll return to the merits of that argument below. First: why are academics – myself included – willing to voluntarily spend their time for commercial companies?

One reason is that academics see it as their duty and responsibility to review work by their peers, so that they can improve it. The peer review system is an economy of favors that relies on generalized reciprocity. As a responsible academic and decent human being, you should not free ride and expect your peers to review your work, while providing no service in return. If you receive three reviews for every article submitted, you should be prepared to review three other articles for the journal. Also there are individual benefits in review work. You learn about new research before it gets published, and you get a chance to influence it.

A collective trap

The systemic problem is that the academic community is trapped. Professional organizations of researchers such as the American Psychological Association benefit financially from the publishing industry. The associations are not prepared to give up the revenues from journals – they would not be able to finance their conferences, scholarships and other association activities without the journal revenues. In my former role as secretary of a professional association of researchers in the field of nonprofit and philanthropy research I completely understand this.

Universities and research funding agencies rely on the academic publishing industry to make decisions on grants for research, tenure and promotion to higher ranks, allocate time for research, and even bonuses to researchers. Universities and funding agencies do not directly evaluate the quality of the research that their employees produce. Instead they reward researchers who publish more proficiently, and especially in journals that are cited more frequently. Universities and funding agencies assume that a publication in a peer reviewed journal, especially in a journal that in the past published work that is cited more often, is a signal of quality. But here’s the catch: it’s not.

The average citation count of articles published in the past is not a good indicator of the quality of a newly published article. Citations follow a power law: a few articles are cited very often, while the majority of articles receive only a low number of citations. Because publishing in journals with a higher average number of citations is more attractive, these journals also attract more fraud and bad science. Journals are struggling to maintain the quality of published research, and the number of retractions is increasing strongly.

Counter to public belief, the fact that a study was peer reviewed does not indicate that it is of high quality. A skyrocketing number of ‘peer reviewed’ articles appear in predatory journals, that accept papers in exchange for payment, without any substantial or critical review. So-called ‘paper mills’ sell publications. Even citations are for sale. Also at respectable journals, peer reviewers are not good at detecting errors. The peer review system is under pressure. As the number of submissions and publications continues to increase, the willingness of researchers to review articles declines. It takes more time and more invitations to get researchers to say yes to requests to review.

Journals seem to do everything to make it difficult to evaluate the quality of the peer review. An overwhelming majority of journals does not publish the peer reviews of accepted articles. Still we know that peer review does not add quality in most cases. Published versions of articles that were posted earlier as a preprint are very similar (see here and here). If the version published in a journal is the same as the preprint, the journal have literally added nothing.

What we want from publishers

On my way to the conference I imagined four answers to the question what we want from publishers.

1) We want our money back, because publishers have extorted profits from academia.

2) Nothing at all, because publishers have not added value.

3) Radical transformation: commercial publishers exploiting public funding have no right to existence and should be transformed into non-profit organizations or social enterprises that put their profits back into science. Funding research on peer review, for instance.

Admittedly these three demands are quite revolutionary and rebellious. At the conference they worked as advocate of the devil statements forcing a clear statement by conversation partners on the value that publishers could add.

4) If commercial publishers are a necessary evil, and continue to exist, we want value for our money. Let publishers propose changes that improve the transparency and quality of the publications. Here are some ideas.

These are just some of the things that journals could have done, but have neglected to enact in the past decades. It would be good if new deals contain agreements on these innovations.

3 Comments

Filed under academic journals, academic misconduct, AI, data, fraud, incentives, meta science, methodology, open science, plagiarism, publications, regulation, research, research integrity, science

5 ways students do not use ChatGPT and AI (you won’t believe #2!)

Gotcha. Or did the clickbait title not get you here? Regardless, I will try not to disappoint you. But before we get to the promised list, here’s why it is important.

The nightmare of every business: suddenly a new competitor emerges, wiping out their entire firm’s market share. When ChatGPT was released a little over a year ago, it was clear that it posed a threat to the business model of schools and universities. If freely accessible large language models can produce texts within seconds that students would otherwise have taken hours to write, and teachers find these texts acceptable, how can educational institutions continue to rely on writing assignments to evaluate student learning?

Some educational institutions, like mine, quickly forbid AI writers for students. That was shortsighted because it was clear that the technology was going to be omnipresent in the future. Why not teach students how to use it in a responsible way? The simple answer was: because schools and universities were not ready. My university did not only forbid AI writers, but also tools that could detect their use. It took a long time before the center for teaching and learning gave guidelines for generative AI and meanwhile the university has changed its policy to allow students to use AI tools, but only if their teachers would explicitly do so. So I decided to develop a policy for the master programme I am directing and test it.

A course I am teaching myself (course manual here) served as the test case. In the course, students read a lot of articles, most of them from disciplines other than their own. Each week we discussed a set of articles from another perspective. Throughout the course students develop an essay, in which they analyze a case from at least two of the perspectives discussed in the course. To help students prepare for the essay I gave them suggestions on their draft research question, and an example of a good essay from a previous year’s student, along with the rubric I had completed to evaluate it. They knew exactly what elements should go into the essay. Also students peer reviewed each other’s draft essays.

This is the text we included in the course manual:

“It is fine to use generative artificial intelligence tools such as ChatGPT, Bing, Bard, Claude, Perplexity, Elicit or Research Rabbit, as long as you identify that you have used them, and how you have used them. Do so in sufficient detail for others to be able to reproduce your findings. This means that you specify the software version, settings, date of usage, the prompts and commands, and output with a URL or a screen dump. Whenever you use AI-generated content, independently verify the claims made and insert references to sources supporting the claims including DOIs (for scholarly publications) or URLs (to non-scholarly sources such as Wikipedia).”

Course manual, Foundations of Resilience 2023-2024, page 7

To evaluate the policy, I held short meetings with all students in the course who had submitted their final essay. I asked questions about the process. I was curious how students went about writing their essays. Did they start with an outline? How did they search for relevant literature? How did they come up with the case? Had they considered a comparison with other cases? How did they decide to select certain perspectives and not others? I figured that students who had not written their essays but instead had generative AI produce them would not be able to justify the decisions the language model had made. I asked them what software, databases and tools they had used. Did they use generative AI, and what were their experiences?

Short answer: students did not use AI tools much, and the tools they did use didn’t help them much. Students had not yet learned how to use the tools available in a sophisticated manner.

Five ways students did use AI

Before we get to the list of ways in which students didn’t use ChatGPT and other AI tools, let’s go over the tasks they did use them for.

1. Clarify terms

Several students said they had used ChatGPT throughout the course to clarify terms they did not know. The terms appeared in the weekly readings and other articles. They found the explanations useful. For this task they did not need references to sources, which ChatGPT does not provide.

2. Find synonyms

One surprising tool based on language models that students used was a thesaurus to find synonyms. One student said she disliked using the same words over and over again, and used a tool to suggest synonyms to use instead. I totally recognize the love of variety, but using synonyms is not a good idea for a scientific text. Scientific language should be boringly precise. It’s better to use the same term for the same thing throughout your essay instead of using different terms.

3. Correct typos and improve writing style

Many students reported using writing assistants such as Grammarly, the auto complete suggestions of MS Word and Google Docs and the spell checkers integrated in them. Using these is fine of course. Students also used ChatGPT to improve their writing. I could tell this from their use (not: ‘utlization’) of words that hardly anybody uses in their own writing style.

Remarkably, very few students submitted a perfect list of references. Many DOIs and URLs were missing. One feature of references copied directly from Google Scholar that gives away they were not checked is the inconsistency in using capitals for the First Letters of Words in Names of Journals. Another identifying feature is the lack of final pages of articles. Using a reference manager like Zotero is the easiest way to obtain such a list (more about how to provide references here).

4. Draft a conclusion section

One student had fed his draft essay to ChatGPT and asked it to write a conclusion section. It created a text including some of the statements from the main text, but it was not integrating them in a synthesis. He deleted it and wrote his own conclusion.

5. Search for relevant research

Students primarily used Google Scholar, the university library search tool, or Scopus to find relevant publications. A problem that all students encountered was that the numbers of search results exceeded their screening capacity. Identifying the most relevant results was a challenge for them (suggestions on how to do that here). Once they’d identified a small set of relevant publications, some students used a backward search strategy to find more relevant publications, starting from the references cited. One student used a forward search using the ‘cited by’ option in Google Scholar.

Tools such as Research Rabbit and Elicit are very handy, but students were not yet familiar with them before the course. Those who tried them did not find them particularly helpful, but also did not use them in the most productive ways. One student had entered keywords in Elicit as if it was an ordinary database, instead of asking substantive questions. The results were not better than those produced by the university library or Google Scholar.

5 ways students did not use AI

Now, as promised, let’s get to the list of ways in which students did not use ChatGPT and other AI tools. By the way, I did not use AI tools to write this text, other than the spell checker embedded in the blog editor software.

1. Creating an outline

Several students said they had started working on their essay with an outline. For some this was their default way of working which they had developed over the years, while others tried it for the first time. It worked well for those who worked with an outline (suggestions on how to do so here). None of the students said they had used some form of AI to produce the outline. I’ve not tried it myself, but I guess that large language models can easily produce an outline based on the assignment text.

2. Criticizing previous research

One aspect that students did not do well on in the essays was criticism of previous research. They had just taken the claims in the sources they cited as given, as if the rule is that once something is published, it must be true. This is unfortunate. The number of retractions is skyrocketing and this is only the tip of the iceberg of bad science. Though the course was not designed to teach the identification of weaknesses in research design and analysis, I had given some examples of the grave shortcomings of peer review as a quality control instrument throughout the course.

I had hoped that students would demonstrate some awareness of the idea that every piece of research is an inherently provisional and partial answer to a question about a complex reality. That hope did not materialize in the essays, despite the fact that the rubric explicitly rewarded a critical approach.

3. Answering their research question

None of the students used AI writers in the way I had feared. I received no submissions in the typical confident mansplaining language that characterizes ChatGPT. In the essays, I could recognize the students’ own language from the weekly assignments they had submitted throughout the course. Also students had benefited from their peers’ reviews in ways that improved their research questions, cases and approaches. These suggestions were much more useful than the answers large language models could have provided.

4. Visualization

None of the students had added an AI generated image to illustrate the essay. In fact, only one of the essays included an image on the front page, and none of the others had graphics or other visuals. I missed them: a photo, graph or image often helps to capture the reader’s attention and spark the imagination, and summarize results.

5. Generate ideas for similar cases or future research

Many students had considered comparing the cases discussed in their essays with similar cases, in other times and locations. Though I imagine that large language models can easily produce comparable cases, none of the students had used them for this purpose. They had no trouble coming up with ideas for other cases. Also the word limit for the essay had made it difficult for them to create a comparison.

Another aspect that was missing in many essays was a paragraph with suggestions for future research. Again, large language models will easily produce suggestions for future research, but none of the students used them for this purpose.

Finally, none of the students reported how they had used generative AI in the essay. Perhaps this was the result of the rather poor results they obtained. Three students who did use form of AI assistance apologized for not having disclosed it. Perhaps this was also a result of the fact that the rubric did not include a section explicitly mentioning and rewarding disclosure of AI tools. Next time I will be sure to include the tools transparency section in the rubric.

The takeaway

In retrospect, the policy was a solution for a problem that didn’t exist. But it was worth the investment. I learned that students didn’t use AI tools in sophisticated ways that could help them. Perhaps this blog will change that.

Also it is clear that we should educate students in critical thinking and analysis. Large language models will provide arguments for pretty much any hunch you feed them, and AI powered research assistants are great tools to find sources that support these claims. While this is one of the major weaknesses of large language models, it can also be used in a productive way. Once you’ve got your argument ready, provide the opposite claim to a language model, and take the output it provides as a starting point for critical thinking and analysis. In a conceptual form of critical analysis, students could find the flaws in the arguments, and criticize the quality of published research (e.g., using rules of thumb such as these) to weigh the evidence in favor of either claim. We should educate our students to become Bad Science Detectives. In two further forms of analysis, students could learn to be Good Science Practitioners: students should be able to design research that tests the empirical validity of alternative claims (suggestions here), and conduct empirical data analysis to adjudicate the claims.

Thanks to the students in the Research Master Social Sciences for a Digital Society at VU Amsterdam

Leave a comment

Filed under academic misconduct, AI, ChatGPT, fraud, incentives, law, plagiarism, publications, research, research integrity, teaching, VU University

Research Integrity Policies in Social Science Research at Vrije Universiteit Amsterdam

René Bekkers, October 10, 2023. An MS Word version of this post is here.

At the Faculty of Social Sciences (FSS) of the Vrije Universiteit Amsterdam, research integrity is governed by seven policies:

  1. The overarching policy is the Netherlands Code of Conduct for Research Integrity adopted by the Royal Academy of Arts & Sciences (KNAW), the Netherlands Association of Universities (Universiteiten van Nederland, formerly VSNU), the Netherlands Organization for Scientific Research (NWO) and other organizations; https://www.universiteitenvannederland.nl/files/documents/Netherlands%20Code%20of%20Conduct%20for%20Research%20Integrity%202018.pdf
  2. The code of ethics for research in the social and behavioural sciences, adopted by the Deans of the Social Sciences (DSW); https://www.nethics.nl/onewebmedia/CODE%20OF%20ETHICS%20FOR%20RESEARCH%20IN%20THE%20SOCIAL%20AND%20BEHAVIOURAL%20SCIENCES%20v2%20230518-2018.pdf
  3. The procedures for ethics review at the Faculty of Social Sciences (FSS); https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/c7e3795f-62b7-4b3f-9282-48859461e87e/RERC-Regulations-Feb18_tcm249-880617.pdf
  4. The national guidelines for archiving research data in the behavioural and social sciences; https://www.utwente.nl/en/bms/datalab/datasharing/guideline-faculties-of-behavioural-sciences-def.pdf
  5. The FSS data management policy, available here: https://vu-fss.github.io/RDM/fss-guidelines-rdm.html
  6. For PhD candidates: the doctorate regulations (‘promotiereglement’) of Vrije Universiteit Amsterdam: https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/08b4502d-de82-47ca-9c4c-fd5ad70d47f0/20220901%20VU%20doctorate%20regulations.pdf
  7. For PhD candidates: the Graduate School for Social Sciences policies: see https://vu.nl/en/about-vu/more-about/the-graduate-school-of-social-sciences under ‘Assessments during your PhD trajectory’: the ‘go/no-go product’, https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/7a3d5a64-b995-47c0-ad93-63fc80ad1a60/VU-GSSS%20Go%20No%20Go%20assessment%20-%20introduction%20and%20explanations.pdf the plagiarism check, https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/e4be9bed-388e-45a9-8847-3bc085d0dec0/VU-GSSS%20Plagiarism%20check%20-%20background%20and%20procedure%20(1).pdf and particularly the final PhD portfolio, https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/88862747-95eb-4f9e-9780-bdac1868f4a1/VU-GSSS%20Final%20PhD%20portfolio%20%28fill-in%20document%29.docx

In your particular discipline, additional policies or codes of conduct may apply:

Throughout the cycle of empirical research, researchers and students at the Faculty of Social Sciences should act in line with the principles and guidelines expressed in the above codes of conduct and policies. The policies employ four instruments to encourage research integrity:

  1. Personal responsibility – your own conscience and internalized norms of good research and ethical standards.
  2. Transparency – the openness you give about the procedures you have followed in your research.
  3. Peer review – the scrutiny of your work by others: supervisors, colleagues, critics.
  4. Complaint procedures – violations of norms of good research and ethical standards may be punished by the Board of the Faculty of Social Sciences, the academic integrity committee at Vrije Universiteit Amsterdam, and ultimately by the Netherlands office of research integrity (LOWI).

Note: The Faculty of Social Sciences does not have audits of research projects.

Step by step guide

At https://vu.nl/en/employee/social-sciences-getting-started/data-management-fss you’ll find step by step guidelines for the organization of research projects.

A. Planning your research

When you are planning research, check whether your study requires ethics review by the FSS Research Ethics Review Board (RERC). Make sure you complete the checklist well ahead of the start of the data collection. In most cases ethics review takes less than a month, but in case the research plans raise ethics issues, you may need three months to complete the entire ethics review process.

  1. Do the FSS ethics review self-check at https://vuletteren.eu.qualtrics.com/jfe/form/SV_6hCj2czIWzboW6V. Save the pdf you get. If the result is that your research does not need further review, you can start with your research. If the result is that your research needs further review, go to step 2.
  2. Discuss the risks with your supervisor and your department’s representative on the FSS Research Ethics Review Committee (RERC), https://vu.nl/en/employee/social-sciences-getting-started/research-ethics-review-fss. Revise your research plan to reduce and tackle risks. Go back to step 1: complete the self-check again based on the revised plan. If the result is still that full ethics review is necessary, proceed to step 3.
  3. Prepare a full ethics review. With your research team, create 1. A short description of the research questions, the societal and scientific relevance of the research, and the research design (max. 1 A4);  2. the information for participants; 3. the consent form; 4. the research materials (manipulations, questionnaire, topic list); 5. the anonymization procedure; and 6. the data management plan. You can find examples of these materials at https://vu.nl/en/employee/social-sciences-getting-started/fss-templates-and-example-documents. If you have everything (i.e., six documents), go to step 4.
  4. Complete the online Ethics Review Application Form at https://vuass.eu.qualtrics.com/jfe/form/SV_9tBjPqFq6bxv2Sx and upload the required documents. Note that only research project leaders can submit an application for ethics review. If you are a PhD candidate, ask your supervisor to submit the materials.

B. Data collection

General information about research data management at the Faculty of Social Sciences is available at https://vu.nl/en/employee/social-sciences-getting-started/data-management-fss. If your project involves collection or analysis of data, write a Data Management Plan (DMP) before you start the data collection. Go to https://dmponline.vu.nl and create a new plan. DMPonline will guide you through the elements that comprise a good DMP. You can share your DMP with the faculty’s data steward Koen Leuveld (k.leuveld@vu.nl) to get feedback. Share the DMP with everyone involved in the research project. Update the data management plan when things change during the research project. Make sure to properly version the document, so changes can be tracked.

Store the data in a secure location. The Faculty of Social Sciences recommends using Yoda, https://portal.yoda.vu.nl/.

Pseudonymize raw data before analysis to prevent data leaks. Avoid working with the raw data to prevent data loss. Store raw data and the pseudonimyzation key file in a secure location where it cannot be lost, corrupted, or accidentally edited. This could possibly be the same place where your raw data will be archived after the project. Make sure that wherever they are stored, the raw data are accompanied by all information needed to understand the data. This includes metadata on when, where, why and by who the data was collected, and all documentation needed to understand variables, such as interviewer manuals. The faculty data steward can help in identifying what documentation or metadata to include.

C. Analysis & write-up

During the preparation of your research report, it is a good idea to discuss the analysis strategy and the findings with your supervisors and other colleagues. Document the code that produces the results reported. For suggestions see https://renebekkers.wordpress.com/2021/04/02/how-to-organize-your-data-and-code/.

To receive feedback on your work and improve it, you can prepare a working paper that you share with discussants and present at an internal research seminar. After internal discussion, it is a good idea to post a working paper in a public preprint repository such as SocArxiv or Zenodo and invite the academic community to review it and suggest improvements. Next, you can present your working paper at conferences. Based on the comments you received from peers, revise the working paper before submitting it to a journal, book editor, or to the funders of your research.

D. Publication

When you submit research reports based on the data you have collected for peer review to a journal or to book editors, also create a publication package containing pseudonymized data, analysis scripts, documentation, and metadata. Archive the publication package in a public repository. You can use Yoda for this purpose, https://portal.yoda.vu.nl/. Alternatively, you can use Dataverse https://dataverse.nl/dataverse/vuamsterdam, or Zenodo, https://zenodo.org/. You can also store data on the Open Science Framework, https://osf.io/ if you select an EU storage location.

Have a DOI assigned to your data so others can cite the data you have collected. You can choose to upload data, documentation and metadata separately and have multiple publication packages refer to the same data set if this works better for your project.

Never share privacy-sensitive raw data with the public. Such data should be stored securely. The Faculty of Social Sciences recommends using Yoda, https://portal.yoda.vu.nl/.

E. Review

When you are invited to review the work of others, it is a good principle to check whether the authors have made the data and the code available that they have used to produce the results they report. If not, you can request them or the editors of the journal that invites you to review to done so. With the data and code, you can verify whether the data and code produce the results and you can conduct robustness analyses.

When you review research reports by others, do so in a constructive way. Here are some suggestions on how to review empirical research: https://osf.io/7ug4w/

The guidelines for peer review of the Committee of Publication Ethics apply to all types of research: https://publicationethics.org/files/Ethical_Guidelines_For_Peer_Reviewers_2.pdf

Getting advice on ethics and integrity issues

When you are planning your research and have questions on ethical dilemmas, ask the FSS Research Ethics Review Committee (RERC) for advice. When you have questions on dilemmas during your research, ask colleagues and supervisors for advice. When you find errors in your own research after you published it, write to the journal or book editors to notify them of the error. In case of a minor problem, prepare a correction. When you no longer support the publication as a whole, ask for a retraction.

When you have questions about the integrity of research of others, consult https://vu.nl/en/about-vu-amsterdam/academic-integrity. Step 1 is to talk to one of the confidential counsellors for integrity (vertrouwenspersoon integriteit), https://vu.nl/en/about-vu-amsterdam/academic-integrity/confidential-counsellor. When you have good reasons to believe that others have violated norms of good science or ethical standards, you can submit a complaint to the executive board of the university, which can forward it to the Academic Integrity Committee (CWI). See the complaints procedure at https://assets.vu.nl/d8b6f1f5-816c-005b-1dc1-e363dd7ce9a5/facfccb1-2b51-4f42-b32c-8bebfb29b89f/Academic%20Integrity%20Complaints%20Procedure%20Vrije%20Universiteit%20Amsterdam%20April%202022.pdf

Leave a comment

Filed under academic misconduct, data, fraud, open science, regulation, research, research integrity, VU University

Ten Meta Science Insights to Deal With the Credibility Crisis in the Social Sciences

A decade of meta science research on social science research has produced devastating results. While the movement towards open science is gaining momentum, awareness of the credibility crisis remains low among social scientists. Here are ten meta science insights on the credibility crisis plus solutions on how to fight it.

This is a blog version of the SocArxiv preprint at https://osf.io/preprints/socarxiv/rm4p8/

1. At least half of all researchers use questionable research practices

Research on research integrity has estimated the prevalence of integrity violations in many subfields of science, including the social and behavioral sciences. According to the best evidence to date from the Netherlands Survey of Research Integrity (Gopalakrishna et al., 2022a), half of researchers in the social and behavioral sciences (50.2%) reported having engaged in at least one questionable research practice in the past two years. The most common questionable research practices in the social and behavioral sciences are not submitting valid negative studies for publication (17.2%) and insufficient discussion of study flaws and limitations (17.2%). Two other frequently reported violations are inadequate note taking of the research process (14.4%) and selective citation of references to enhance findings or convictions (11%).

A smaller but still non-negligible proportion of researchers in the social and behavioral sciences in the Netherlands self-reports data fabrication or falsification in the past two years (5.7%). This proportion seems low, but it should be zero. The estimate implies that one out of every 17.5 researchers fabricated or falsified data.

These estimates are valid for researchers in the Netherlands who responded to a survey on research integrity. There are at least three reasons to suppose that these estimates are underestimates for the global community of social scientists. One reason is that socially undesirable behaviors such as research misconduct and questionable research practices are underreported in surveys (John, Loewenstein & Prelec, 2012). A second reason is that the response rate to the survey was only 21%. Non-response is usually selective, and higher among those who have an interest in the study topic (Groves et al., 2006). Among the non-respondents the proportion of researchers who engaged in violations of research integrity is likely to be higher than among respondents (Tourangeau & Yan, 2017). The third reason is that the survey was conducted in the Netherlands. A meta-analysis of studies on integrity violations found that estimates of the prevalence of violations are higher in lower and middle income countries than in high income countries such as the United States and the Netherlands (Xie, Wang, & Kong, 2021). One audit of survey research found that datasets produced outside the US contained more fabricated observations (Judge & Schechter, 2009).

Codes of conduct for scientific researchers such as the guidelines of the United States Office of Research Integrity (ORI; Steneck, 2007), the Netherlands Code of Conduct for Research Integrity (KNAW et al., 2018) and the European Code of Conduct for Research Integrity (ALLEA, 2017) explicitly forbid not only fraud, fabrication and falsification of data, but also mild violations of integrity and questionable research practices. Clearly, the mere existence of a code of conduct is not enough to eradicate bad research. To design more effective quality control procedures, it is important to understand how researchers make decisions in practice.


2. Researcher degrees of freedom facilitate widely different conclusions

When researchers are allowed to keep their workflow private, and only give access to the final results, it is difficult to detect data fabrication, falsification, and questionable research practices. Throughout the empirical research process, researchers have many degrees of freedom (Simons, Nelson & Simonsohn, 2011): they can limit their samples to specific target groups, have different sampling strategies, use different modes of data collection, ask questions in particular ways, treat missing values in different ways, code variables in more or less fine-grained categories, add or omit covariates, and run different types of statistical tests and models. While some of these decisions are described in publications, many of the choices are not disclosed. Gelman & Loken (2014) compare these choices with a walk in a garden of forking paths. Taking different turns leads researchers to follow different paths, see different things, and they may end up at completely different exits.

Meta research projects of the ‘Many analysts, one dataset’ type, in which many researchers are testing the same hypothesis with the same dataset, demonstrate that researcher degrees of freedom easily lead to entirely different conclusions. In a recent study relying on international survey data, researchers were asked to estimate the association between immigration and public support for government provision of welfare (Breznau et al., 2022). 25% of estimates were significantly negative, 17% were positive, and 58% had a confidence interval including 0. Moreover, the magnitude of the relationships varied strongly, with standardized effect sizes ranging from less than -0.2 to more than +0.2. Even more striking differences between estimates emerge from projects relying on observational data on discrimination (Silberzahn et al., 2018), group processes (Schweinsberg et al., 2021) and financial markets (Menkveld et al., 2022). Documenting all steps taken to obtain the estimates is the only way in which the validity of the estimates can be evaluated.


3. Published research contains questionable research practices

Even when researchers are not required to document all decisions they made in the collection, analysis and reporting on data, divergence from standards of good practice are apparent in the body of published research. Meta research in the past decade has identified many traces of questionable research practices in published research. Here are three indicators.

A first indicator is a suspiciously high rate of success in supporting the hypotheses. Across all social sciences, the proportion of studies supporting the hypotheses has increased in the past decades to unreasonably high levels (Fanelli, 2012). In some subfields of psychology such as applied social psychology, up to 100% of studies support the hypotheses (Schäfer & Schwarz, 2019). The trick is nothing short of magical – it works every time.

A second indicator is the excess of p-values just above cut-off values for statistical significance. P-values tend to be more common when they are just above the critical value of 1.96 for a .05 significant finding in sociology (Gerber & Malhotra, 2008a), political science (Gerber & Malhotra, 2008b), psychology (Simonsohn, Nelson & Simmons, 2014), and economics (Brodeur, Lé, Sagnier, & Zylberberg, 2016). The prevalence of p-hacking is particularly high in online experiments in marketing conducted through MTurk (Brodeur, Cook & Hayes, 2022). A third indicator is the lack of statistical power to test hypotheses. Studies in psychology (Maxwell, 2004), economics (Ioannidis Stanley & Doucouliagos, 2017) and political science (Arel-Bundock et al., 2022) tend to be underpowered.


4. Publication bias and questionable research practices reduce the reliability of published research

A second explanation for the low replicability of published research is publication bias (Friese & Frankenbach, 2020; Smaldino & McElreath, 2016; Grimes, Bauch & Ioannidis, 2018). Negative or null-findings are less likely to be published than positive findings, not only because researchers are less likely to write reports on negative results, but also because they are evaluated by reviewers in a less positive manner (Franco, Malhotra & Siminovits, 2014). Scarcity of resources and perverse effects of incentive systems in academia create a premium for novelty and prolific publication (Smaldino & McElreath, 2016; Brembs, 2019). It is no wonder that researchers engage in questionable research practices to obtain positive results, and to get their research published.

Transparency does not guarantee quality; it enables a fair and independent assessment of quality (Vazire, 2019). Transparency is crucial for the detection of questionable research practices, fraud, fabrication, and plagiarism (Nosek, Spies & Motyl, 2012). Open science practices can improve the reliability of published research (Smaldino, Turner & Contreras Kallens, 2019). Studies with high statistical power, preregistration, and complete methodological transparency are more reliable and replicate well (Protzko et al., 2020). When studies are more replicable with the same methods and new data from the same target population, they are also more generalizable to other populations across time and place (Delios et al., 2022).


5. Closed science facilitates integrity violations

If the rate of questionable research practices is unacceptably high, why does that not change? The reason why research integrity violations continue to be so prevalent is that researchers are allowed to hide them. Compared to other industries, science has a particularly lax system of quality control. Before roads are built and new toys for kids are allowed to be sold there are safety and health checks of the builders, their materials, their construction plans and manufacturing processes, and ultimately the safety of their products. But when we do science, there is much less of this. We ask volunteers to primarily look at the product. If we buy a car at an authorized dealer, there’s a money back guarantee. But reviewers of scientific papers do not even start the engine of the data and code to check whether the thing actually works. That is not good enough.

Researchers are not required to be fully transparent about all the choices they have made. As a result, violations of research integrity are rarely detected in the classical peer review process (Altman, 1994; Smith, 2006, 2010). Without extensive training, peer reviewers are bad at catching mistakes in manuscripts (Schroter et al., 2008). The current peer review system is far from a guarantee of flawless research. If a study is published after it went through peer review, that does mean it is true. Even at the journals with the highest impact factors, the review process does not successfully keep out bad science (Brembs, 2018).

In order to enhance the reliability of the published record of research the peer review process need to change in a direction of more openness and transparency (Smith, 2010; Munafo et al., 2017). In addition, transparency requirements can deter questionable research practices. Violations of research integrity are like crime: when the probability of being detected is high enough, potential perpetrators will not engage in violations. Data from the Netherlands Survey of Research Integrity shows that a higher likelihood of being detected by a reviewer or collaborator for data fabrication is associated with a lower likelihood of engaging in questionable research practices (Gopalakrishna et al., 2022a) and a higher likelihood to engage in responsible research practices such as sharing data and materials (Gopalakrishna et al., 2022b).


6. Roughly half of all published studies do not replicate

With half of researchers admitting that they engage in questionable research practices, it is no surprise that research on the replicability of research published in the social sciences demonstrates that a large proportion of published findings claiming a general regularity in human behavior cannot be replicated with new data by new researchers. This conclusion holds not only for psychology (Open Science Collaboration, 2015), but also for other fields of the social and behavioral sciences, even for publications in the highest ranked journals, such as Nature and Science (Camerer et al., 2018). In psychology, 97% of 100 original studies reported significant effects with a standardized effect size of .40. In independent replications, only 36% produced significant effects, with a standardized effect size of .20 (Open Science Collaboration, 2015). The independent replication project of studies published in Nature and Science found significant effects for 62% of original studies with an effect size of .25, also half of the original effect size (Camerer et al., 2018).

A key difference between original studies and replications explaining why replications are much less likely to achieve significant results is that original studies are not pre-registered. Registered reports in psychology are achieving positive results in only 44% of hypotheses tests, while standard reports obtain positive results in 96% of tests (Scheel, Schijen & Lakens, 2021).

The finding that individual studies are likely to present overestimates also implies that meta-analyses of published findings are too positive (Friese & Frankenbach, 2000). The presence of publication bias, data fabrication and falsification and questionable research practices in the body of peer-reviewed publications implies that statistical meta-analysis is fundamentally unfit to estimate the size of an effect based on previous research. The only way to obtain an accurate estimate of a published association is to conduct an independent, preregistered replication (Van Elk et al., 2017). Effect sizes reported in meta-analyses are two to three times as large as independent preregistered replications (Kvarven, Strømland, & Johannesson, 2020).

These findings imply that the reliability of published research is low. For each pair of published studies, only one will replicate in the original direction. For each published study that does replicate, the magnitude of the association is only half of the original. The implication of these findings is that reliability of published research is low, and you cannot trust roughly half of all published research. You will have to evaluate the quality of published research yourself. A key indicator is whether it was pre-registered. As a rule of thumb, divide the effect size reported in a study that was not preregistered by two, and the effect size from a meta-analysis by three.


7. Why ‘data available upon request’ is not enough

One way to detect data fabrication and falsification and questionable research practices is through close inspection of research data. Fabricated and falsified data contain patterns that original research data do not (Judge & Schechter, 2009; Heathers et al., 2018).

However, researchers rarely provide access to research data. A recent estimate by Serghiou et al. (2021), showed that only 8% of 27,000 articles from the Social Sciences included in the PubMed database provided access to data. In a smaller sample of 250 publications in Scopus-indexed outlets from the period 2014-2017, Hardwicke et al. (2020) find that 7% provided access to data.

A journal encouragement to share data is not a guarantee that authors actually do share data. In a study among researchers who published in Nature and Science, which both require authors to promise they will give access to research data, still only 40% of psychologists and social scientists complied with a request to access the data (Tedersoo et al., 2021). In a study among economists who had indicated in their publications that data and materials were available upon request (Krawczyk & Reuben, 2012). In practice, only 44% complied. Among psychologists who had published in the top journals of the field and promised data and materials would be available upon request, only 26% complied (Wicherts et al., 2006).

Thus, introducing a data sharing policy by itself is an ineffective journal policy if it is not enforced (Stodden, Seiler, & Ma, 2018; Christensen et al., 2019). Authors should not only be required to share data and code, but a data editor should also verify the computational reproducibility of the data and code. In other words, the promise that “data are available upon request” usually means that the data are not made available. A policy relying on such promises is not strict enough to prevent the publication of manuscripts containing results based on questionable research practices and on fabricated and falsified data.


8. Artificial intelligence can support peer review

Just like artificial intelligence facilitates plagiarism detection, it can also support the peer review process by screening manuscripts for errors and the presence of information about relevant indicators of research quality. One example of a useful tool is StatCheck, which helps reviewers check the consistency between reported p-values and the test-statistics (http://statcheck.io; Nuijten & Polanin, 2020). Another example of a useful tool is the p-curve app, which quickly provides reviewers with relevant information about the evidentiary value of a set of experiments (https://shinyapps.org/apps/p-checker/; see Simonsohn, Nelson & Simmons, 2014a, 2014b).

Advancements in natural language processing have enabled software engineers to build tools that automatically screen full texts of articles and extract information about ethics statements, randomization, sample sizes, sharing of data and code, and other indicators of research quality (Menke et al., 2020; Riedel, Kip & Bobrov, 2020; Serghiou et al., 2021; Zavalis & Ioannidis, 2022). Publishers should create an infrastructure in which new submissions are screened automatically and transparency indicators are reported. While peer review should not be automated altogether, artificial intelligence will certainly help improve peer review (Checco et al., 2021; Schulz et al., 2022).


9. Introducing registered reports will improve the credibility of research

A registered report is a new submission format that effectively eliminates an evaluation of the results of research from the review process (Nosek & Lakens, 2014; Chambers & Tzavella, 2022). Reviewers only evaluate the research design: the hypotheses, data collection and analysis plans. Authors receive feedback and may alter their plans in a revised version. Editors then decide whether to accept the study for publication. Only after authors receive the acceptance letter they proceed to collect and analyze the data. This format blinds both reviewers and researchers to the results, and increases the likelihood that null-findings and negative results are published. Journals across all fields of the social sciences are introducing registered reports (Hardwicke & Ioannidis, 2018). When this format becomes the standard for academic research publications the reliability of published research will increase.


10. Replications should be encouraged and actively facilitated

One way to encourage replications is to invite authors to submit preregistered replication reports of published research. A preregistration is a document describing the hypotheses, data collection and analysis plans for a study before it is conducted (Nosek, Ebersole, DeHaven & Mellor, 2018). Public preregistrations enable reviewers to check whether authors changed the hypotheses, report on all of them, and how the data analysis reported in manuscripts is different from the original plans. The use of preregistrations is increasing across all fields of the social sciences (Nosek et al., 2022). The combination of preregistrations with a registered report effectively reduces questionable research practices such altering hypotheses after results are known, hiding negative results, and researcher degrees of freedom to obtain significant results (Soderberg et al., 2021).


Conclusion

The credibility crisis in the social sciences should lead us to redesign the industry of academic research and publications to raise the bar in quality control procedures. Enforcement of open science practices is the solution. Voluntary accountability mechanisms such as promises to uphold standards of good conduct and symbolic rewards such as badges depend on the intrinsic motivation of researchers. At the same time, tenure and promotion systems as well as award criteria in grant proposal competitions introduce extrinsic incentives that lead researchers to produce a high level of output at the expense of quality. Universities, research funders and journals should redesign reward systems so that prestige depends solely on research quality, not quantity. While such reforms are under way, professional training of reviewers and artificial intelligence facilitates the enhanced detection and deterrence of bad research practice.


References

ALLEA (2017). European Code of Conduct for Research Integrity. https://www.allea.org/wp-content/uploads/2017/05/ALLEA-European-Code-of-Conduct-for-Research-Integrity-2017.pdf

Altman, D. G. (1994). The scandal of poor medical research. British Medical Journal, 308(6924), 283-284. https://doi.org/10.1136/bmj.308.6924.283

Arel-Bundock, V., Briggs, R. C., Doucouliagos, H., Mendoza Aviña, M., & Stanley, T. D. (2022). Quantitative Political Science Research is Greatly Underpowered. I4R Discussion Paper Series, No. 6. http://hdl.handle.net/10419/265531

Bollen, K., Cacioppo, J.T., Kaplan, R.M., Krosnick, J.A., & Olds, J.A. (2015). Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science. Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences. https://nsf.gov/sbe/AC_Materials/SBE_Robust_and_Reliable_Research_Report.pdf

Brembs, B. (2018). Prestigious Science Journals Struggle to Reach Even Average Reliability. Frontiers of Human Neuroscience, 12 (37): 1‐7. https://doi.org/10.3389/fnhum.2018.00037

Brembs, B. (2019). Reliable novelty: New should not trump true. PLoS Biology, 17(2), e3000117. https://doi.org/10.1371/journal.pbio.3000117

Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H., Adem, M., Adriaans, J., … & Van Assche, J. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences, 119(44), e2203150119. https://doi.org/10.1073/pnas.2203150119

Brodeur, A., Cook, N. & Heyes, A. (2022). We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments. IZA Discussion Paper No 15478. https://www.econstor.eu/bitstream/10419/265699/1/dp15478.pdf

Brodeur, A., Cook, N. & Neisser, C. (2022). P-Hacking, Data Type and Data-Sharing Policy, IZA Discussion Papers, No. 15586. http://hdl.handle.net/10419/265807

Brodeur, A., Lé, M., Sangnier, M., & Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1), 1-32. https://doi.org/10.1257/app.20150044

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., … & Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637-644. https://doi.org/10.1038/s41562-018-0399-z

Chambers, C.D., & Tzavella, L. (2022). The past, present, and future of Registered Reports. Nature Human Behavior, 6: 29-42. https://doi.org/10.1038/s41562-021-01193-7

Checco, A., Bracciale, L., Loreti, P., Pinfield, S., & Bianchi, G. (2021). AI-assisted peer review. Humanities and Social Sciences Communications, 8(1), 1-11. https://doi.org/10.1057/s41599-020-00703-8

Christensen, G., Dafoe, A., Miguel, E., Moore, D.A., & Rose, A.K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS ONE 14(12): e0225883. https://doi.org/10.1371/journal.pone.0225883

Delios, A., Clemente, E. G., Wu, T., Tan, H., Wang, Y., Gordon, M., … & Uhlmann, E. L. (2022). Examining the generalizability of research findings from archival data. Proceedings of the National Academy of Sciences, 119(30), e2120377119. https://doi.org/10.1073/pnas.2120377119

Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891-904. https://doi.org/10.1007/s11192-011-0494-7

Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502-1505. https://doi.org/10.1126/science.1255484

Friese, M., & Frankenbach, J. (2020). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods, 25(4), 456. https://doi.org/10.1037/met0000246

Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don’t hold up. American Scientist, 102(6), 460-465. https://www.jstor.org/stable/43707868

Gerber, A. S., & Malhotra, N. (2008a). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37(1), 3-30. https://doi.org/10.1177/0049124108318973

Gerber, A., & Malhotra, N. (2008b). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3(3), 313-326. http://dx.doi.org/10.1561/100.00008024

Gopalakrishna, G., Ter Riet, G., Vink, G., Stoop, I., Wicherts, J. M., & Bouter, L. M. (2022a). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PloS one, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023

Gopalakrishna, G., Wicherts, J. M., Vink, G., Stoop, I., van den Akker, O. R., ter Riet, G., & Bouter, L. M. (2022b). Prevalence of responsible research practices among academics in The Netherlands. F1000Research, 11(471), 471. https://doi.org/10.12688/f1000research.110664.2

Grimes, D. R., Bauch, C. T., & Ioannidis, J. P. (2018). Modelling science trustworthiness under publish or perish pressure. Royal Society Open Science, 5(1), 171511. https://doi.org/10.1098/rsos.171511

Groves, R. M., Couper, M. P., Presser, S., Singer, E., Tourangeau, R., Acosta, G. P., & Nelson, L. (2006). Experiments in producing nonresponse bias. International Journal of Public Opinion Quarterly, 70(5), 720-736. https://doi.org/10.1093/poq/nfl036

Hardwicke, T.E., Ioannidis, J.P.A. (2018). Mapping the universe of registered reports. Nature Human Behavior, 2: 793–796. https://doi.org/10.1038/s41562-018-0444-y

Hardwicke, T.E., Wallach, J.D., Kidwell, M.C., Bendixen, T., Crüwell, S. & Ioannidis, J.P.A. (2020). An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014–2017). Royal Society Open Science, 7: 190806. https://doi.org/10.1098/rsos.190806

Heathers, J. A., Anaya, J., van der Zee, T., & Brown, N. J. (2018). Recovering data from summary statistics: Sample parameter reconstruction via iterative techniques (SPRITE). PeerJ Preprints, e26968v1. https://doi.org/10.7287/peerj.preprints.26968v1

Ioannidis, J., Stanley, T. D., & Doucouliagos, H. (2017). The Power of Bias in Economics Research. Economic Journal, 127(605): F236-F265. https://doi.org/10.1111/ecoj.12461

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524-532. https://doi.org/10.1177/0956797611430953

Judge, G., & Schechter, L. (2009). Detecting problems in survey data using Benford’s Law. Journal of Human Resources, 44(1), 1-24. https://doi.org/10.3368/jhr.44.1.1

KNAW, NFU, NWO, TO2‐federatie, Vereniging Hogescholen & VSNU (2018). Netherlands Code of Conduct for Research Integrity. http://www.vsnu.nl/files/documents/Netherlands%20Code%20of%20Conduct%20for%20Research%20Integrity%202018.pdf

Krawczyk, M. & Reuben, E. (2012). (Un)Available upon Request: Field Experiment on Researchers’ Willingness to Share Supplementary Materials. Accountability in Research, 19: 175–186. https://doi.org/10.1080/08989621.2012.678688

Kvarven, A., Strømland, E., & Johannesson, M. (2020). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour, 4(4), 423-434. https://doi.org/10.1038/s41562-019-0787-z

Lakomý, M., Hlavová, R. & Machackova, H. (2019). Open Science and the Science-Society Relationship. Society, 56, 246–255. https://doi.org/10.1007/s12115-019-00361-w

Maxwell, S. E. (2004). The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies. Psychological Methods, 9(2), 147–163. https://doi.org/10.1037/1082-989X.9.2.147

Menke, J., Roelandse, M., Ozyurt, B., Martone, M., & Bandrowski, A. (2020). The rigor and transparency index quality metric for assessing biological and medical science methods. iScience, 23(11): 101698. https://doi.org/10.1016/j.isci.2020.101698.

Menkveld, A. J., Dreber, A., Holzmeister, F., Huber, J., Johannesson, M., Kirchler, M., … & Weitzel, U. (2021). Non-standard errors. Working paper. https://dx.doi.org/10.2139/ssrn.3961574

Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J. & Ioannidis, J. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1-9. https://doi.org/10.1038/s41562-016-0021

Nosek, B.A., Ebersole, C.R., DeHaven, A.C. & Mellor, D.T. (2018). The Preregistration Revolution. Proceedings of the National Academy of Sciences, 115(11): 2600-2606. http://www.pnas.org/cgi/doi/10.1073/pnas.1708274114

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., … & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-114157

Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7(6), 615–631. https://doi.org/10.1177/1745691612459058

Nuijten, M.B. & Polanin, J.R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta‐analyses. Research Synthesis Methods, 11(5): 574–579. https://doi.org/10.1002/jrsm.1408

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716

Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., … & Schooler, J. (2020). High replicability of newly-discovered social-behavioral findings is achievable. PsyArxiv preprint. https://psyarxiv.com/n2a9x/

Riedel, N., Kip, M., & Bobrov, E. (2020). ODDPub—a text‑mining algorithm to detect data sharing in biomedical publications. Data Science Journal, 19:42. http://doi.org/10.5334/dsj-2020-042 

Schäfer, T. & Schwarz, M.A. (2019). The Meaningfulness of Effect Sizes in Psychological Research: Differences Between Sub-Disciplines and the Impact of Potential Biases. Frontiers of Psychology, 10:813. https://doi.org/10.3389/fpsyg.2019.00813

Scheel, A. M., Schijen, M. R., & Lakens, D. (2021). An excess of positive results: Comparing the standard Psychology literature with Registered Reports. Advances in Methods and Practices in Psychological Science, 4(2). https://doi.org/10.1177/25152459211007467  

Schroter, S., Black, N., Evans, S., Godlee, F., Osorio, L., & Smith, R. (2008). What errors do peer reviewers detect, and does training improve their ability to detect them? Journal of the Royal Society of Medicine, 101(10), 507-514. https://doi.org/10.1258/jrsm.2008.080062

Schulz, R., Barnett, A., Bernard, R., Brown, N. J., Byrne, J. A., Eckmann, P., … & Weissgerber, T. L. (2022). Is the future of peer review automated? BMC Research Notes, 15(1), 1-5. https://doi.org/10.1186/s13104-022-06080-6

Schweinsberg, M., Feldman, M., Staub, N., van den Akker, O. R., van Aert, R. C., Van Assen, M. A., … & Schulte-Mecklenbeck, M. (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes, 165, 228-249. https://doi.org/10.1016/j.obhdp.2021.02.003

Serghiou, S., Contopoulos-Ioannidis, D.G., Boyack, K.W., Riedel, N., Wallach, J.D., & Ioannidis, J.P.A. (2021). Assessment of transparency indicators across the biomedical literature: How open is open? PLoS Biology, 19(3): e3001107. https://doi.org/10.1371/journal.pbio.3001107

Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356. https://doi.org/10.1177/2515245917747646

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 1359-1366. https://doi.org/10.1177/0956797611417632

Simonsohn, U., Nelson, L.D. & Simmons, J.P. (2014a). P-Curve: A Key to the File Drawer. Journal of Experimental Psychology: General, 143 (2): 534–547. https://doi.org/10.1037/a0033242

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014b). p-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9(6), 666-681. https://doi.org/10.1177/1745691614553988

Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384

Smaldino, P. E., Turner, M. A., & Contreras Kallens, P. A. (2019). Open science and modified funding lotteries can impede the natural selection of bad science. Royal Society Open Science, 6(7), 190194. https://doi.org/10.1098/rsos.190194

Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine, 99: 178–182. https://dx.doi.org/10.1177/014107680609900414

Smith, R. (2010). Classical peer review: an empty gun. Breast Cancer Research, 12 (Suppl 4), S13. https://doi.org/10.1186/bcr2742

Soderberg, C. K., Errington, T. M., Schiavone, S. R., Bottesini, J., Thorn, F. S., Vazire, S., … & Nosek, B. A. (2021). Initial evidence of research quality of registered reports compared with the standard publishing model. Nature Human Behaviour, 5(8), 990-997. https://doi.org/10.1038/s41562-021-01142-4

Song, H., Markowitz, D. M., & Taylor, S. H. (2022). Trusting on the shoulders of open giants? Open science increases trust in science for the public and academics. Journal of Communication, 72(4), 497-510. https://doi.org/10.1093/joc/jqac017

Steneck, N. H. (2007). Introduction to the Responsible Conduct of Research. Washington, DC: Department of Health and Human Services, Office of Research Integrity. https://ori.hhs.gov/sites/default/files/2018-04/rcrintro.pdf

Stodden, V., Seiler, J. & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11): 2584-2589. https://doi.org/10.1073/pnas.1708290115

Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K. & Sepp, T. (2021). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, 8(1), 1-11. https://doi.org/10.1038/s41597-021-00981-0

Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859-883. https://doi.org/10.1037/0033-2909.133.5.859

Trisovic, A., Lau, M.K., Pasquier, T. & Crosas, M. (2022). A large-scale study on research code quality and execution. Scientific Data, 9, 60. https://doi.org/10.1038/s41597-022-01143-6

Tsai, A. C., Kohrt, B. A., Matthews, L. T., Betancourt, T. S., Lee, J. K., Papachristos, A. V., … & Dworkin, S. L. (2016). Promises and pitfalls of data sharing in qualitative research. Social Science & Medicine, 169, 191-198. https://doi.org/10.1016/j.socscimed.2016.08.004

Van Elk, M., Matzke, D., Gronau, Q., Guang, M., Vandekerckhove, J., & Wagenmakers, E. J. (2015). Meta-analyses are no substitute for registered replications: A skeptical perspective on religious priming. Frontiers in Psychology, 1365. https://doi.org/10.3389/fpsyg.2015.01365

Vazire, S. (2019, March 14). The Credibility Revolution in Psychological Science. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2404.

Vlaeminck, S. (2021) : Dawning of a New Age? Economics Journals’ Data Policies on the Test Bench. LIBER Quarterly, 31 (1): 1–29. https://doi.org/10.53377/lq.10940

Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. https://doi.org/10.1037/0003-066X.61.7.726

Xie, Y., Wang, K., & Kong, Y. (2021). Prevalence of research misconduct and questionable research practices: a systematic review and meta-analysis. Science and Engineering Ethics, 27(4), 1-28. https://doi.org/10.1007/s11948-021-00314-9

Zavalis, E.A., & Ioannidis, J.P.A. (2022) A meta-epidemiological assessment of transparency indicators of infectious disease models. PLoS ONE 17(10): e0275380. https://doi.org/10.1371/journal.pone.0275380

Leave a comment

Filed under academic journals, academic misconduct, data, experiments, fraud, incentives, methodology, open science, publications, regulation, research, research integrity, science, sociology, statistical analysis, survey research, writing

10 Things You Need to Know About Open Science

1. What is Open Science?

Open science is science as it should be: as open as possible. The current practice of open science is that scientists provide open access to publications, the data they analyzed, the code that produces the results, and the materials. Open science is no magic, no secrets, no hidden cards, no tricks; what you see is what you get. Fully open science is sharing everything, including research ideas, grant applications, reviews, funding decisions, failed experiments and null results.

2. Why should I preregister my research?

When you preregister your research, you put your ideas, expectations, hypotheses and your plans for data collection and analyses in an archive (e.g., on As Predicted, https://aspredicted.org/ or on the Open Science Framework, https://help.osf.io/hc/en-us/articles/360019738834-Create-a-Preregistration) before you have executed the study. A preregistration allows you to say: “See, I told you so!” afterwards. Preregister your research if you have theories and methods you want to test, and if you want to make testable predictions about the results.

3. Won’t I get scooped when I post a preliminary draft of my ideas?
No, when you put your name and a date in the file, it will be obvious that you were the first person who came up with the ideas.

4. Where can I post the data I collect and the code for the analysis?

On Dataverse, https://dataverse.org/, the Open Science Framework, https://osf.io/, and on Zenodo, https://zenodo.org/. As a researcher you can use these platforms for free, and they are not owned by commercial enterprises. You keep control over the content of the repositories you create, and you can give them a DOI so others can cite your work.

5. Won’t I look like a fool when others find mistakes in my code?
No, on the contrary: you will look like a good scientist when you correct your mistakes. https://retractionwatch.com/2021/03/08/authors-retract-nature-majorana-paper-apologize-for-insufficient-scientific-rigour/
Instead, you will look like a fool if you report results that nobody can reproduce, and stubbornly persist claiming support for your result.

6. Does open science mean that I have to publish all of my data?

No, please do not publish all of your data! Your data probably contains details of individual persons that could identify them. Make sure you pseudonymize these persons and deidentify the data by removing details that could identify them before you share your data.

7. Why should I post a preprint of my work in progress?

If you post a preprint of your work in progress, alert others to it and invite them to review it, you will get comments and suggestions from others. They will improve your own work. You will also make your work citable before you have published the paper. Read more about Preprints here https://plos.org/open-science/preprints/

8. Where should I submit my research paper?

Submit your research paper to a journal that doesn’t exploit authors and reviewers. Commercial publishers do not care about the quality of the research, only about making a profit by selling it back to you. A short history is here https://www.theguardian.com/science/2017/jun/27/profitable-business-scientific-publishing-bad-for-science.

9. How can I get a publication before I have done my study?
Submit it as a registered report to a journal that offers this format. Learn about registered reports here https://www.cos.io/initiatives/registered-reports and here https://doi.org/10.1027/1864-9335/a000192

10. Can I get sued when I post the published version of a paper on my website?

It has never happened to me, and I have posted accepted and published versions on my website https://renebekkers.wordpress.com/publications/ for more than 20 years now.

2 Comments

Filed under data, experiments, fraud, incentives, methodology, open science, statistical analysis, survey research

A Data Transparency Policy for Results Based on Experiments

pdf

Transparency is a key condition for robust and reliable knowledge, and the advancement of scholarship over time. Since January 1, 2020, I am the Area Editor for Experiments submitted to Nonprofit & Voluntary Sector Quarterly (NVSQ), the leading journal for academic research in the interdisciplinary field of nonprofit research. In order to improve the transparency of research published in NVSQ, the journal is introducing a policy requiring authors of manuscripts reporting on data from experiments to provide, upon submission, access to the data and the code that produced the results reported. This will be a condition for the manuscript to proceed through the blind peer review process.

The policy will be implemented as a pilot for papers reporting results of experiments only. For manuscripts reporting on other types of data, the submission guidelines will not be changed at this time.

 

Rationale

This policy is a step forward strengthening research in our field through greater transparency about research design, data collection and analysis. Greater transparency of data and analytic procedures will produce fairer, more constructive reviews and, ultimately, even higher quality articles published in NVSQ. Reviewers can only evaluate the methodologies and findings fully when authors describe the choices they made and provide the materials used in their study.

Sample composition and research design features can affect the results of experiments, as can sheer coincidence. To assist reviewers and readers in interpreting the research, it is important that authors describe relevant features of the research design, data collection, and analysis. Such details are also crucial to facilitate replication. NVSQ receives very few, and thus rarely publishes replications, although we are open to doing so. Greater transparency will facilitate the ability to reinforce, or question, research results through replication (Peters, 1973; Smith, 1994; Helmig, Spraul & Temp, 2012).

Greater transparency is also good for authors. Articles with open data appear to have a citation advantage: they are cited more frequently in subsequent research (Colavizza et al., 2020; Drachen et al., 2016). The evidence is not experimental: the higher citation rank of articles providing access to data may be a result of higher research quality. Regardless of whether the policy improves the quality of new research or attracts higher quality existing research – if higher quality research is the result, then that is exactly what we want.

Previously, the official policy of our publisher, SAGE, was that authors were ‘encouraged’ to make the data available. It is likely though that authors were not aware of this policy because it was not mentioned on the journal website. In any case, this voluntary policy clearly did not stimulate the provision of data because data are available for only a small fraction of papers in the journal. Evidence indicates that a data sharing policy alone is ineffective without enforcement (Stodden, Seiler, & Ma, 2018; Christensen et al., 2019). Even when authors include a phrase in their article such as ‘data are available upon request,’ research shows that this does not mean that authors comply with such requests (Wicherts et al., 2006; Krawczyk & Reuben, 2012). Therefore, we are making the provision of data a requirement for the assignment of reviewers.

 

Data Transparency Guidance for Manuscripts using Experiments

Authors submitting manuscripts to NVSQ in which they are reporting on results from experiments are kindly requested to provide a detailed description of the target sample and the way in which the participants were invited, informed, instructed, paid, and debriefed. Also, authors are requested to describe all decisions made and questions answered by the participants and provide access to the stimulus materials and questionnaires. Most importantly, authors are requested to share the data and code that produced the reported findings available for the editors and reviewers. Please make sure you do so anonymously, i.e. without identifying yourself as an author of the manuscript.

When you submit the data, please ensure that you are complying with the requirements of your institution’s Institutional Review Board or Ethics Review Committee, the privacy laws in your country such as the GDPR, and other regulations that may apply. Remove personal information from the data you provide (Ursin et al., 2019). For example, avoid logging IP and email addresses in online experiments and any other personal information of participants that may identify their identities.

The journal will not host a separate archive. Instead, deposit the data at a platform of your choice, such as Dataverse, Github, Zenodo, or the Open Science Framework. We accept data in Excel (.xls, .csv), SPSS (.sav, .por) with syntax (.sps), data in Stata (.dta) with a do-file, and projects in R.

When authors have successfully submitted the data and code along with the paper, the Area Editor will verify whether the data and code submitted actually produce the results reported. If (and only if) this is the case, then the submission will be sent out to reviewers. This means that reviewers will not have to verify the computational reproducibility of the results. They will be able to check the integrity of the data and the robustness of the results reported.

As we introduce the data availability policy, we will closely monitor the changes in the number and quality of submissions, and their scholarly impact, anticipating both collective and private benefits (Popkin, 2019). We have scored the data transparency of 20 experiments submitted in the first six months of 2020, using a checklist counting 49 different criteria. In 4 of these submissions some elements of the research were preregistered. The average transparency was 38 percent. We anticipate that the new policy improves transparency scores.

The policy takes effect for new submissions on July 1, 2020.

 

Background: Development of the Policy

The NVSQ Editorial Team has been working on policies for enhanced data and analytic transparency for several years, moving forward in a consultative manner.  We established a Working Group on Data Management and Access which provided valuable guidance in its 2018 report, including a preliminary set of transparency guidelines for research based on data from experiments and surveys, interviews and ethnography, and archival sources and social media. A wider discussion of data transparency criteria was held at the 2019 ARNOVA conference in San Diego, as reported here. Participants working with survey and experimental data frequently mentioned access to the data and code as a desirable practice for research to be published in NVSQ.

Eventually, separate sets of guidelines for each type of data will be created, recognizing that commonly accepted standards vary between communities of researchers (Malicki et al., 2019; Beugelsdijk, Van Witteloostuijn, & Meyer, 2020). Regardless of which criteria will be used, reviewers can only evaluate these criteria when authors describe the choices they made and provide the materials used in their study.

 

References

Beugelsdijk, S., Van Witteloostuijn, A. & Meyer, K.E. (2020). A new approach to data access and research transparency (DART). Journal of International Business Studies, https://link.springer.com/content/pdf/10.1057/s41267-020-00323-z.pdf

Christensen, G., Dafoe, A., Miguel, E., Moore, D.A., & Rose, A.K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS ONE 14(12): e0225883. https://doi.org/10.1371/journal.pone.0225883

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416, https://doi.org/10.1371/journal.pone.0230416

Drachen, T.M., Ellegaard, O., Larsen, A.V., & Dorch, S.B.F. (2016). Sharing Data Increases Citations. Liber Quarterly, 26 (2): 67–82. https://doi.org/10.18352/lq.10149

Helmig, B., Spraul, K. & Tremp, K. (2012). Replication Studies in Nonprofit Research: A Generalization and Extension of Findings Regarding the Media Publicity of Nonprofit Organizations. Nonprofit and Voluntary Sector Quarterly, 41(3): 360–385. https://doi.org/10.1177%2F0899764011404081

Krawczyk, M. & Reuben, E. (2012). (Un)Available upon Request: Field Experiment on Researchers’ Willingness to Share Supplementary Materials. Accountability in Research, 19:3, 175-186, https://doi.org/10.1080/08989621.2012.678688

Malički, M., Aalbersberg, IJ.J., Bouter, L., & Ter Riet, G. (2019). Journals’ instructions to authors: A cross-sectional study across scientific disciplines. PLoS ONE, 14(9): e0222157. https://doi.org/10.1371/journal.pone.0222157

Peters, C. (1973). Research in the Field of Volunteers in Courts and Corrections: What Exists and What Is Needed. Journal of Voluntary Action Research, 2 (3): 121-134. https://doi.org/10.1177%2F089976407300200301

Popkin, G. (2019). Data sharing and how it can benefit your scientific career. Nature, 569: 445-447. https://www.nature.com/articles/d41586-019-01506-x

Smith, D.H. (1994). Determinants of Voluntary Association Participation and Volunteering: A Literature Review. Nonprofit and Voluntary Sector Quarterly, 23 (3): 243-263. https://doi.org/10.1177%2F089976409402300305

Stodden, V., Seiler, J. & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. PNAS, 115(11): 2584-2589. https://doi.org/10.1073/pnas.1708290115

Ursin, G. et al., (2019), Sharing data safely while preserving privacy. The Lancet, 394: 1902. https://doi.org/10.1016/S0140-6736(19)32633-9

Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726-728. http://dx.doi.org/10.1037/0003-066X.61.7.726

Working Group on Data Management and Access (2018). A Data Availability Policy for NVSQ. April 15, 2018. https://renebekkers.files.wordpress.com/2020/06/18_04_15-nvsq-working-group-on-data.pdf

Leave a comment

Filed under academic misconduct, data, experiments, fraud, methodology, open science, statistical analysis

How to review a paper

Including a Checklist for Hypothesis Testing Research Reports *

See https://osf.io/6cw7b/ for a pdf of this post

 

Academia critically relies on our efforts as peer reviewers to evaluate the quality of research that is published in journals. Reading the reviews of others, I have noticed that the quality varies considerably, and that some reviews are not helpful. The added value of a journal article above and beyond the original manuscript or a non-reviewed preprint is in the changes the authors made in response to the reviews. Through our reviews, we can help to improve the quality of the research. This memo provides guidance on how to review a paper, partly inspired by suggestions provided by Alexander (2005), Lee (1995) and the Committee on Publication Ethics (2017). To improve the quality of the peer review process, I suggest that you use the following guidelines. Some of the guidelines – particularly the criteria at the end of this post – are peculiar for the kind of research that I tend to review – hypothesis testing research reports relying on administrative data and surveys, sometimes with an experimental design. But let me start with guidelines that I believe make sense for all research.

Things to check before you accept the invitation
First, I encourage you to check whether the journal aligns with your vision of science. I find that a journal published by an exploitative publisher making a profit in the range of 30%-40% is not worth my time. A journal that I have submitted my own work to and gave me good reviews is worth the number of reviews I received for my article. The review of a revised version of the paper does not count as a separate paper.
Next, I check whether I am the right person to review the paper. I think it is a good principle to describe my disciplinary background and expertise in relation to the manuscript I am invited to review. Reviewers do not need to be experts in all respects. If you do not have useful expertise to improve the paper, politely decline.

Then I check whether I know the author(s). If I do, and I have not collaborated with the author(s), if I am not currently collaborating or planning to do so, I describe how I know the author(s) and ask the editor whether it is appropriate for me to review the paper. If I have a conflict of interest, I notify the editor and politely decline. It is a good principle to let the editor know immediately if you are unable to review a paper, so the editor can start to look for someone else to review the paper. Your non-response means a delay for the authors and the editor.

Sometimes I get requests to review a paper that I have reviewed before, for a conference or another journal. In these cases I let the editor know and ask the editor whether she would like to see the previous review. For the editor it will be useful to know whether the current manuscript is the same as the version, or includes revisions.

Finally, I check whether the authors have made the data and code available. I have made it a requirement that authors have to fulfil before I accept an invitation to review their work. An exception can be made for data that would be illegal or dangerous to make available, such as datasets that contain identifying information that cannot be removed. In most cases, however, the authors can provide at least partial access to the data by excluding variables that contain personal information.

A paper that does not provide access to the data analyzed and the code used to produce the results in the paper is not worth my time. If the paper does not provide a link to the data and the analysis script, I ask the editor to ask the authors to provide the data and the code. I encourage you to do the same. Almost always the editor is willing to ask the authors to provide access. If the editor does not respond to your request, that is a red flag to me. I decline future invitation requests from the journal. If the authors do not respond to the editor’s request, or are unwilling to provide access to the data and code, that is a red flag for the editor.

The tone of the review
When I write a review, I think of the ‘golden rule’: treat others as you would like to be treated. I write the review report that I would have liked to receive if I had been the author. I use the following principles:

  • Be honest but constructive. You are not at war. There is no need to burn a paper to the ground.
  • Avoid addressing the authors personally. Say: “the paper could benefit from…” instead of “the authors need”.
  • Stay close to the facts. Do not speculate about reasons why the authors have made certain choices beyond the arguments stated in the paper.
  • Take a developmental approach. Any paper will contain flaws and imperfections. Your job is to improve science by identifying problems and suggesting ways to repair them. Think with the authors about ways they can improve the paper in such a way that it benefits collective scholarship. After a quick glance at the paper, I determine whether I think the paper has the potential to be published, perhaps after revisions. If I think the paper is beyond repair, I explain this to the editor.
  • Try to see beyond bad writing style and mistakes in spelling. Also be mindful of disciplinary and cultural differences between the authors and yourself.

The substance of the advice
In my view, it is a good principle to begin the review report by describing your expertise and the way you reviewed the paper. If you searched for literature, checked the data and verified the results, ran additional analyses, state this. It will allow the editor to adjudicate the review.

Then give a brief overview of the paper. If the invitation asks you to provide a general recommendation, consider whether you’d like to give one. Typically, you are invited to recommend ‘reject’, ‘revise & resubmit’ – with major or minor revisions, or ‘accept’. Because the recommendation is the first thing the editor wants to know it is convenient to state it early in the review.

When giving such a recommendation, I start from the assumption that the authors have invested a great deal of time in the paper and that they want to improve it. Also I consider the desk-rejection rate at the journal. If the editor sent the paper out for review, she probably thinks it has the potential to be published.

To get to the general recommendation, I list the strengths and the weaknesses of the paper. To ease the message you can use the sandwich principle: start with the strengths, then discuss the weaknesses, and conclude with an encouragement.

For authors and editors alike it is convenient to give actionable advice. For the weaknesses in the paper I suggest ways to repair them. I distinguish major issues such as not discussing alternative explanations from minor issues such as missing references and typos. It is convenient for both the editor and the authors to number your suggestions.

The strengths could be points that the authors are underselling. In that case, I identify them as strengths that the authors can emphasize more strongly.

It is handy to refer to issues with direct quotes and page numbers. To refer to the previous sentence: “As the paper states on page 3, [use] “direct quotes and page numbers””.

In 2016 I have started to sign my reviews. This is an accountability device: by exposing who I am to the authors of the paper I’m reviewing, I set higher standards for myself. I encourage you to think about this as an option, though I can imagine that you may not want to risk retribution as a graduate student or an early career researcher. Also some editors do not appreciate signed reviews and may take away your identifying information.

How to organize the review work
Usually, I read a paper twice. First, I go over the paper superficially and quickly. I do not read it closely. This gets me a sense of where the authors are going. After the first superficial reading, I determine whether the paper is good enough to be revised and resubmitted, and if so, I provide more detailed comments. After the report is done, I revisit my initial recommendation.

The second time I go over the paper, I do a very close reading. Because the authors had a word limit, I assume that literally every word in the manuscript is absolutely necessary – the paper should have no repetitions. Some of the information may be in the supplementary information provided with the paper.

Below you find a checklist of things I look for in a paper. The checklist reflects the kind of research that I tend to review, which is typically testing a set of hypotheses based on theory and previous research with data from surveys, experiments, or archival sources. For other types of research – such as non-empirical papers, exploratory reports, and studies based on interviews or ethnographic material – the checklist is less appropriate. The checklist may also be helpful for authors preparing research reports.

I realize that this is an extensive set of criteria for reviews. It sets the bar pretty high. A review checking each of the criteria will take you at least three hours, but more likely between five and eight hours. As a reviewer, I do not always check all criteria myself. Some of the criteria do not necessarily have to be done by peer reviewers. For instance, some journals employ data editors who check whether data and code provided by authors produce the results reported.

I do hope that journals and editors can get to a consensus on a set of minimum criteria that the peer review process should cover, or at least provide clarity about the criteria that they do check.

After the review
If the authors have revised their paper, it is a good principle to avoid making new demands for the second round that you have not made before. Otherwise the revise and resubmit path can be very long.

 

References
Alexander, G.R. (2005). A Guide to Reviewing Manuscripts. Maternal and Child Health Journal, 9 (1): 113-117. https://doi.org/10.1007/s10995-005-2423-y
Committee on Publication Ethics Council (2017). Ethical guidelines for peer reviewers. https://publicationethics.org/files/Ethical_Guidelines_For_Peer_Reviewers_2.pdf
Lee, A.S. (1995). Reviewing a manuscript for publication. Journal of Operations Management, 13: 87-92. https://doi.org/10.1016/0272-6963(95)94762-W

 

Review checklist for hypothesis testing reports

Research question

  1. Is it clear from the beginning what the research question is? If it is in the title, that’s good. In the first part of the abstract is good too. Is it at the end of the introduction section? In most cases that is too late.
  2. Is it clearly formulated? By the research question alone, can you tell what the paper is about?
  3. Does the research question align with what the paper actually does – or can do – to answer it?
  4. Is it important to know the answer to the research question for previous theory and methods?
  5. Does the paper address a question that is important from a societal or practical point of view?

 

Research design

  1. Does the research design align with the research question? If the question is descriptive, do the data actually allow for a representative and valid description? If the question is a causal question, do the data allow for causal inference? If not, ask the authors to report ‘associations’ rather than ‘effects’.
  2. Is the research design clearly described? Does the paper report all the steps taken to collect the data?
  3. Does the paper identify mediators of the alleged effect? Does the paper identify moderators as boundary conditions?
  4. Is the research design waterproof? Does the study allow for alternative interpretations?
  5. Has the research design been preregistered? Does the paper refer to a public URL where the preregistration is posted? Does the preregistration include a statistical power analysis? Is the number of observations sufficient for statistical tests of hypotheses? Are deviations from the preregistered design reported?
  6. Has the experiment been approved by an Internal or Ethics Review Board (IRB/ERB)? What is the IRB registration number?

 

Theory

  1. Does the paper identify multiple relevant theories?
  2. Does the theory section specify hypotheses? Have the hypotheses been formulated before the data were collected? Before the data were analyzed?
  3. Do hypotheses specify arguments why two variables are associated? Have alternative arguments been considered?
  4. Is the literature review complete? Does the paper cover the most relevant previous studies, also outside the discipline? Provide references to research that is not covered in the paper, but should definitely be cited.

 

Data & Methods

  1. Target group – Is it identified? If mankind, is the sample a good sample of mankind? Does it cover all relevant units?
  2. Sample – Does the paper identify the procedure used to obtain the sample from the target group? Is the sample a random sample? If not, has selective non-response been dealt with, examined, and have constraints on generality been identified as a limitation?
  3. Number of observations – What is the statistical power of the analysis? Does the paper report a power analysis?
  4. Measures – Does the paper provide the complete topic list, questionnaire, instructions for participants? To what extent are the measures used valid? Reliable?
  5. Descriptive statistics – Does the paper provide a table of descriptive statistics (minimum, maximum, mean, standard deviation, number of observations) for all variables in the analyses? If not, ask for such a table.
  6. Outliers – Does the paper identify treatment of outliers, if any?
  7. Is the multi-level structure (e.g., persons in time and space) identified and taken into account in an appropriate manner in the analysis? Are standard errors clustered?
  8. Does the paper report statistical mediation analyses for all hypothesized explanation(s)? Do the mediation analyses evaluate multiple pathways, or just one?
  9. Do the data allow for testing additional explanations that are not reported in the paper?

 

Results

  1. Can the results be reproduced from the data and code provided by the authors?
  2. Are the results robust to different specifications?

Conclusion

  1. Does the paper give a clear answer to the research question posed in the introduction?
  2. Does the paper identify implications for the theories tested, and are they justified?
  3. Does the paper identify implications for practice, and are they justified given the evidence presented?

 

Discussion

  1. Does the paper revisit the limitations of the data and methods?
  2. Does the paper suggest future research to repair the limitations?

 

Meta

  1. Does the paper have an author contribution note? Is it clear who did what?
  2. Are all analyses reported, if they are not in the main text, are they available in an online appendix?
  3. Are references up to date? Does the reference list include a reference to the dataset analyzed, including an URL/DOI?

 

 

* This work is licensed under a Creative Commons Attribution 4.0 International License. Thanks to colleagues at the Center for Philanthropic Studies at Vrije Universiteit Amsterdam, in particular Pamala Wiepking, Arjen de Wit, Theo Schuyt and Claire van Teunenbroek, for insightful comments on the first version. Thanks to Robin Banks, Pat Danahey Janin, Rense Corten, David Reinstein, Eleanor Brilliant, Claire Routley, Margaret Harris, Brenda Bushouse, Craig Furneaux, Angela Eikenberry, Jennifer Dodge, and Tracey Coule for responses to the second draft. The current text is the fourth draft. The most recent version of this paper is available as a preprint at https://doi.org/10.31219/osf.io/7ug4w. Suggestions continue to be welcome at r.bekkers@vu.nl.

Leave a comment

Filed under academic misconduct, data, experiments, fraud, helping, incentives, law, open science, sociology, survey research

Closing the Age of Competitive Science

In the prehistoric era of competitive science, researchers were like magicians: they earned a reputation for tricks that nobody could repeat and shared their secrets only with trusted disciples. In the new age of open science, researchers share by default, not only with peer reviewers and fellow researchers, but with the public at large. The transparency of open science reduces the temptation of private profit maximization and the collective inefficiency in information asymmetries inherent in competitive markets. In a seminar organized by the University Library at Vrije Universiteit Amsterdam on November 1, 2018, I discussed recent developments in open science and its implications for research careers and progress in knowledge discovery. The slides are posted here. The podcast is here.

2 Comments

Filed under academic misconduct, data, experiments, fraud, incentives, law, Netherlands, open science, statistical analysis, survey research, VU University

Tools for the Evaluation of the Quality of Experimental Research

pdf of this post

Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.

The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.

The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.

Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.

Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.

In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.

The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori  are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.

The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).

  • In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
  • In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
  • In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
  • In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.

Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).

Alarm bells, red flags and other warning signs

Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.

Warning signs

  • The power of the analysis is too low.
  • The results are too good to be true.
  • All hypotheses are confirmed.
  • P-values are just below critical thresholds (e.g., p<.05)
  • A groundbreaking result is reported but not replicated in another sample.
  • The data and code are not made available upon request.
  • The data are not made available upon article submission.
  • The code is not made available upon article submission.
  • Materials (manipulations, survey questions) are described superficially.
  • Descriptive statistics are not reported.
  • The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
  • The research is not preregistered.
  • No details of an IRB procedure are given.
  • Participant recruitment procedures are not described.
  • Exact details of time and location of the data collection are not described.
  • A power analysis is lacking.
  • Unusual / non-validated measures are used without justification.
  • Different dependent variables are analyzed in different studies within the same article without justification.
  • Variables are (log)transformed or recoded in unusual categories without justification.
  • Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
  • A one-sided test is reported when a two-sided test would be appropriate.
  • Test-statistics (p-values, F-values) reported are incorrect.

With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.

  • Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
  • Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
  • Full disclosure of methodological details of research submitted for publication, for instance through psychdisclosure.org is now required by major journals in psychology.
  • Apps such as Statcheck (for text or files), p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.

As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.

The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:

  • Preregistration of research, for instance on Aspredicted.org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
  • Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.

A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.

Update, December 28, 2020 – here’s a checklist of 50 questions on the quality of an experiment: https://vuass.eu.qualtrics.com/jfe/form/SV_8cgxZHPXPDCqzad. The result is a Quality Score ranging from 0 to 100.

References

Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554. https://doi.org/10.1177%2F1745691612459060

Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135. https://doi.org/10.1017/S0140525X0999152X

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217. https://doi.org/10.1207%2Fs15327957pspr0203_4

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366. https://doi.org/10.1177%2F0956797611417632

Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588

Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract

2 Comments

Filed under academic misconduct, experiments, fraud, incentives, open science, psychology, survey research

Four Reasons Why We Are Converting to Open Science

The Center for Philanthropic Studies I am leading at VU Amsterdam is converting to Open Science.

Open Science offers four advantages to the scientific community, nonprofit organizations, and the public at large:

  1. Access: we make our work more easily accessible for everyone. Our research serves public goods, which are served best by open access.
  2. Efficiency: we make it easier for others to build on our work, which saves time.
  3. Quality: we enable others to check our work, find flaws and improve it.
  4. Innovation: ultimately, open science facilitates the production of knowledge.

What does the change mean in practice?

First, the source of funding for contract research we conduct will always be disclosed.

Second, data collection – interviews, surveys, experiments – will follow a prespecified protocol. This includes the number of observations forseen, the questions to be asked, measures to be included, hypotheses to be tested, and analyses to be conducted. New studies will be preferably be preregistered.

Third, data collected and the code used to conduct the analyses will be made public, through the Open Science Framework for instance. Obviously, personal or sensitive data will not be made public.

Fourth, results of research will preferably be published in open access mode. This does not mean that we will publish only in Open Access journals. Research reports and papers for academic will be made available online in working paper archives, as a ‘preprint’ version, or in other ways.

 

December 16, 2015 update:

A fifth reason, following directly from #1 and #2, is that open science reduces the costs of science for society.

See this previous post for links to our Giving in the Netherlands Panel Survey data and questionnaires.

 

July 8, 2017 update:

A public use file of the Giving in the Netherlands Panel Survey and the user manual are posted at the Open Science Framework.

1 Comment

Filed under academic misconduct, Center for Philanthropic Studies, contract research, data, fraud, incentives, methodology, open science, regulation, survey research