A decade of meta science research on social science research has produced devastating results. While the movement towards open science is gaining momentum, awareness of the credibility crisis remains low among social scientists. Here are ten meta science insights on the credibility crisis plus solutions on how to fight it.
This is a blog version of the SocArxiv preprint at https://osf.io/preprints/socarxiv/rm4p8/
1. At least half of all researchers use questionable research practices
Research on research integrity has estimated the prevalence of integrity violations in many subfields of science, including the social and behavioral sciences. According to the best evidence to date from the Netherlands Survey of Research Integrity (Gopalakrishna et al., 2022a), half of researchers in the social and behavioral sciences (50.2%) reported having engaged in at least one questionable research practice in the past two years. The most common questionable research practices in the social and behavioral sciences are not submitting valid negative studies for publication (17.2%) and insufficient discussion of study flaws and limitations (17.2%). Two other frequently reported violations are inadequate note taking of the research process (14.4%) and selective citation of references to enhance findings or convictions (11%).
A smaller but still non-negligible proportion of researchers in the social and behavioral sciences in the Netherlands self-reports data fabrication or falsification in the past two years (5.7%). This proportion seems low, but it should be zero. The estimate implies that one out of every 17.5 researchers fabricated or falsified data.
These estimates are valid for researchers in the Netherlands who responded to a survey on research integrity. There are at least three reasons to suppose that these estimates are underestimates for the global community of social scientists. One reason is that socially undesirable behaviors such as research misconduct and questionable research practices are underreported in surveys (John, Loewenstein & Prelec, 2012). A second reason is that the response rate to the survey was only 21%. Non-response is usually selective, and higher among those who have an interest in the study topic (Groves et al., 2006). Among the non-respondents the proportion of researchers who engaged in violations of research integrity is likely to be higher than among respondents (Tourangeau & Yan, 2017). The third reason is that the survey was conducted in the Netherlands. A meta-analysis of studies on integrity violations found that estimates of the prevalence of violations are higher in lower and middle income countries than in high income countries such as the United States and the Netherlands (Xie, Wang, & Kong, 2021). One audit of survey research found that datasets produced outside the US contained more fabricated observations (Judge & Schechter, 2009).
Codes of conduct for scientific researchers such as the guidelines of the United States Office of Research Integrity (ORI; Steneck, 2007), the Netherlands Code of Conduct for Research Integrity (KNAW et al., 2018) and the European Code of Conduct for Research Integrity (ALLEA, 2017) explicitly forbid not only fraud, fabrication and falsification of data, but also mild violations of integrity and questionable research practices. Clearly, the mere existence of a code of conduct is not enough to eradicate bad research. To design more effective quality control procedures, it is important to understand how researchers make decisions in practice.
2. Researcher degrees of freedom facilitate widely different conclusions
When researchers are allowed to keep their workflow private, and only give access to the final results, it is difficult to detect data fabrication, falsification, and questionable research practices. Throughout the empirical research process, researchers have many degrees of freedom (Simons, Nelson & Simonsohn, 2011): they can limit their samples to specific target groups, have different sampling strategies, use different modes of data collection, ask questions in particular ways, treat missing values in different ways, code variables in more or less fine-grained categories, add or omit covariates, and run different types of statistical tests and models. While some of these decisions are described in publications, many of the choices are not disclosed. Gelman & Loken (2014) compare these choices with a walk in a garden of forking paths. Taking different turns leads researchers to follow different paths, see different things, and they may end up at completely different exits.
Meta research projects of the ‘Many analysts, one dataset’ type, in which many researchers are testing the same hypothesis with the same dataset, demonstrate that researcher degrees of freedom easily lead to entirely different conclusions. In a recent study relying on international survey data, researchers were asked to estimate the association between immigration and public support for government provision of welfare (Breznau et al., 2022). 25% of estimates were significantly negative, 17% were positive, and 58% had a confidence interval including 0. Moreover, the magnitude of the relationships varied strongly, with standardized effect sizes ranging from less than -0.2 to more than +0.2. Even more striking differences between estimates emerge from projects relying on observational data on discrimination (Silberzahn et al., 2018), group processes (Schweinsberg et al., 2021) and financial markets (Menkveld et al., 2022). Documenting all steps taken to obtain the estimates is the only way in which the validity of the estimates can be evaluated.
3. Published research contains questionable research practices
Even when researchers are not required to document all decisions they made in the collection, analysis and reporting on data, divergence from standards of good practice are apparent in the body of published research. Meta research in the past decade has identified many traces of questionable research practices in published research. Here are three indicators.
A first indicator is a suspiciously high rate of success in supporting the hypotheses. Across all social sciences, the proportion of studies supporting the hypotheses has increased in the past decades to unreasonably high levels (Fanelli, 2012). In some subfields of psychology such as applied social psychology, up to 100% of studies support the hypotheses (Schäfer & Schwarz, 2019). The trick is nothing short of magical – it works every time.
A second indicator is the excess of p-values just above cut-off values for statistical significance. P-values tend to be more common when they are just above the critical value of 1.96 for a .05 significant finding in sociology (Gerber & Malhotra, 2008a), political science (Gerber & Malhotra, 2008b), psychology (Simonsohn, Nelson & Simmons, 2014), and economics (Brodeur, Lé, Sagnier, & Zylberberg, 2016). The prevalence of p-hacking is particularly high in online experiments in marketing conducted through MTurk (Brodeur, Cook & Hayes, 2022). A third indicator is the lack of statistical power to test hypotheses. Studies in psychology (Maxwell, 2004), economics (Ioannidis Stanley & Doucouliagos, 2017) and political science (Arel-Bundock et al., 2022) tend to be underpowered.
4. Publication bias and questionable research practices reduce the reliability of published research
A second explanation for the low replicability of published research is publication bias (Friese & Frankenbach, 2020; Smaldino & McElreath, 2016; Grimes, Bauch & Ioannidis, 2018). Negative or null-findings are less likely to be published than positive findings, not only because researchers are less likely to write reports on negative results, but also because they are evaluated by reviewers in a less positive manner (Franco, Malhotra & Siminovits, 2014). Scarcity of resources and perverse effects of incentive systems in academia create a premium for novelty and prolific publication (Smaldino & McElreath, 2016; Brembs, 2019). It is no wonder that researchers engage in questionable research practices to obtain positive results, and to get their research published.
Transparency does not guarantee quality; it enables a fair and independent assessment of quality (Vazire, 2019). Transparency is crucial for the detection of questionable research practices, fraud, fabrication, and plagiarism (Nosek, Spies & Motyl, 2012). Open science practices can improve the reliability of published research (Smaldino, Turner & Contreras Kallens, 2019). Studies with high statistical power, preregistration, and complete methodological transparency are more reliable and replicate well (Protzko et al., 2020). When studies are more replicable with the same methods and new data from the same target population, they are also more generalizable to other populations across time and place (Delios et al., 2022).
5. Closed science facilitates integrity violations
If the rate of questionable research practices is unacceptably high, why does that not change? The reason why research integrity violations continue to be so prevalent is that researchers are allowed to hide them. Compared to other industries, science has a particularly lax system of quality control. Before roads are built and new toys for kids are allowed to be sold there are safety and health checks of the builders, their materials, their construction plans and manufacturing processes, and ultimately the safety of their products. But when we do science, there is much less of this. We ask volunteers to primarily look at the product. If we buy a car at an authorized dealer, there’s a money back guarantee. But reviewers of scientific papers do not even start the engine of the data and code to check whether the thing actually works. That is not good enough.
Researchers are not required to be fully transparent about all the choices they have made. As a result, violations of research integrity are rarely detected in the classical peer review process (Altman, 1994; Smith, 2006, 2010). Without extensive training, peer reviewers are bad at catching mistakes in manuscripts (Schroter et al., 2008). The current peer review system is far from a guarantee of flawless research. If a study is published after it went through peer review, that does mean it is true. Even at the journals with the highest impact factors, the review process does not successfully keep out bad science (Brembs, 2018).
In order to enhance the reliability of the published record of research the peer review process need to change in a direction of more openness and transparency (Smith, 2010; Munafo et al., 2017). In addition, transparency requirements can deter questionable research practices. Violations of research integrity are like crime: when the probability of being detected is high enough, potential perpetrators will not engage in violations. Data from the Netherlands Survey of Research Integrity shows that a higher likelihood of being detected by a reviewer or collaborator for data fabrication is associated with a lower likelihood of engaging in questionable research practices (Gopalakrishna et al., 2022a) and a higher likelihood to engage in responsible research practices such as sharing data and materials (Gopalakrishna et al., 2022b).
6. Roughly half of all published studies do not replicate
With half of researchers admitting that they engage in questionable research practices, it is no surprise that research on the replicability of research published in the social sciences demonstrates that a large proportion of published findings claiming a general regularity in human behavior cannot be replicated with new data by new researchers. This conclusion holds not only for psychology (Open Science Collaboration, 2015), but also for other fields of the social and behavioral sciences, even for publications in the highest ranked journals, such as Nature and Science (Camerer et al., 2018). In psychology, 97% of 100 original studies reported significant effects with a standardized effect size of .40. In independent replications, only 36% produced significant effects, with a standardized effect size of .20 (Open Science Collaboration, 2015). The independent replication project of studies published in Nature and Science found significant effects for 62% of original studies with an effect size of .25, also half of the original effect size (Camerer et al., 2018).
A key difference between original studies and replications explaining why replications are much less likely to achieve significant results is that original studies are not pre-registered. Registered reports in psychology are achieving positive results in only 44% of hypotheses tests, while standard reports obtain positive results in 96% of tests (Scheel, Schijen & Lakens, 2021).
The finding that individual studies are likely to present overestimates also implies that meta-analyses of published findings are too positive (Friese & Frankenbach, 2000). The presence of publication bias, data fabrication and falsification and questionable research practices in the body of peer-reviewed publications implies that statistical meta-analysis is fundamentally unfit to estimate the size of an effect based on previous research. The only way to obtain an accurate estimate of a published association is to conduct an independent, preregistered replication (Van Elk et al., 2017). Effect sizes reported in meta-analyses are two to three times as large as independent preregistered replications (Kvarven, Strømland, & Johannesson, 2020).
These findings imply that the reliability of published research is low. For each pair of published studies, only one will replicate in the original direction. For each published study that does replicate, the magnitude of the association is only half of the original. The implication of these findings is that reliability of published research is low, and you cannot trust roughly half of all published research. You will have to evaluate the quality of published research yourself. A key indicator is whether it was pre-registered. As a rule of thumb, divide the effect size reported in a study that was not preregistered by two, and the effect size from a meta-analysis by three.
7. Why ‘data available upon request’ is not enough
One way to detect data fabrication and falsification and questionable research practices is through close inspection of research data. Fabricated and falsified data contain patterns that original research data do not (Judge & Schechter, 2009; Heathers et al., 2018).
However, researchers rarely provide access to research data. A recent estimate by Serghiou et al. (2021), showed that only 8% of 27,000 articles from the Social Sciences included in the PubMed database provided access to data. In a smaller sample of 250 publications in Scopus-indexed outlets from the period 2014-2017, Hardwicke et al. (2020) find that 7% provided access to data.
A journal encouragement to share data is not a guarantee that authors actually do share data. In a study among researchers who published in Nature and Science, which both require authors to promise they will give access to research data, still only 40% of psychologists and social scientists complied with a request to access the data (Tedersoo et al., 2021). In a study among economists who had indicated in their publications that data and materials were available upon request (Krawczyk & Reuben, 2012). In practice, only 44% complied. Among psychologists who had published in the top journals of the field and promised data and materials would be available upon request, only 26% complied (Wicherts et al., 2006).
Thus, introducing a data sharing policy by itself is an ineffective journal policy if it is not enforced (Stodden, Seiler, & Ma, 2018; Christensen et al., 2019). Authors should not only be required to share data and code, but a data editor should also verify the computational reproducibility of the data and code. In other words, the promise that “data are available upon request” usually means that the data are not made available. A policy relying on such promises is not strict enough to prevent the publication of manuscripts containing results based on questionable research practices and on fabricated and falsified data.
8. Artificial intelligence can support peer review
Just like artificial intelligence facilitates plagiarism detection, it can also support the peer review process by screening manuscripts for errors and the presence of information about relevant indicators of research quality. One example of a useful tool is StatCheck, which helps reviewers check the consistency between reported p-values and the test-statistics (http://statcheck.io; Nuijten & Polanin, 2020). Another example of a useful tool is the p-curve app, which quickly provides reviewers with relevant information about the evidentiary value of a set of experiments (https://shinyapps.org/apps/p-checker/; see Simonsohn, Nelson & Simmons, 2014a, 2014b).
Advancements in natural language processing have enabled software engineers to build tools that automatically screen full texts of articles and extract information about ethics statements, randomization, sample sizes, sharing of data and code, and other indicators of research quality (Menke et al., 2020; Riedel, Kip & Bobrov, 2020; Serghiou et al., 2021; Zavalis & Ioannidis, 2022). Publishers should create an infrastructure in which new submissions are screened automatically and transparency indicators are reported. While peer review should not be automated altogether, artificial intelligence will certainly help improve peer review (Checco et al., 2021; Schulz et al., 2022).
9. Introducing registered reports will improve the credibility of research
A registered report is a new submission format that effectively eliminates an evaluation of the results of research from the review process (Nosek & Lakens, 2014; Chambers & Tzavella, 2022). Reviewers only evaluate the research design: the hypotheses, data collection and analysis plans. Authors receive feedback and may alter their plans in a revised version. Editors then decide whether to accept the study for publication. Only after authors receive the acceptance letter they proceed to collect and analyze the data. This format blinds both reviewers and researchers to the results, and increases the likelihood that null-findings and negative results are published. Journals across all fields of the social sciences are introducing registered reports (Hardwicke & Ioannidis, 2018). When this format becomes the standard for academic research publications the reliability of published research will increase.
10. Replications should be encouraged and actively facilitated
One way to encourage replications is to invite authors to submit preregistered replication reports of published research. A preregistration is a document describing the hypotheses, data collection and analysis plans for a study before it is conducted (Nosek, Ebersole, DeHaven & Mellor, 2018). Public preregistrations enable reviewers to check whether authors changed the hypotheses, report on all of them, and how the data analysis reported in manuscripts is different from the original plans. The use of preregistrations is increasing across all fields of the social sciences (Nosek et al., 2022). The combination of preregistrations with a registered report effectively reduces questionable research practices such altering hypotheses after results are known, hiding negative results, and researcher degrees of freedom to obtain significant results (Soderberg et al., 2021).
Conclusion
The credibility crisis in the social sciences should lead us to redesign the industry of academic research and publications to raise the bar in quality control procedures. Enforcement of open science practices is the solution. Voluntary accountability mechanisms such as promises to uphold standards of good conduct and symbolic rewards such as badges depend on the intrinsic motivation of researchers. At the same time, tenure and promotion systems as well as award criteria in grant proposal competitions introduce extrinsic incentives that lead researchers to produce a high level of output at the expense of quality. Universities, research funders and journals should redesign reward systems so that prestige depends solely on research quality, not quantity. While such reforms are under way, professional training of reviewers and artificial intelligence facilitates the enhanced detection and deterrence of bad research practice.
References
ALLEA (2017). European Code of Conduct for Research Integrity. https://www.allea.org/wp-content/uploads/2017/05/ALLEA-European-Code-of-Conduct-for-Research-Integrity-2017.pdf
Altman, D. G. (1994). The scandal of poor medical research. British Medical Journal, 308(6924), 283-284. https://doi.org/10.1136/bmj.308.6924.283
Arel-Bundock, V., Briggs, R. C., Doucouliagos, H., Mendoza Aviña, M., & Stanley, T. D. (2022). Quantitative Political Science Research is Greatly Underpowered. I4R Discussion Paper Series, No. 6. http://hdl.handle.net/10419/265531
Bollen, K., Cacioppo, J.T., Kaplan, R.M., Krosnick, J.A., & Olds, J.A. (2015). Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science. Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences. https://nsf.gov/sbe/AC_Materials/SBE_Robust_and_Reliable_Research_Report.pdf
Brembs, B. (2018). Prestigious Science Journals Struggle to Reach Even Average Reliability. Frontiers of Human Neuroscience, 12 (37): 1‐7. https://doi.org/10.3389/fnhum.2018.00037
Brembs, B. (2019). Reliable novelty: New should not trump true. PLoS Biology, 17(2), e3000117. https://doi.org/10.1371/journal.pbio.3000117
Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H., Adem, M., Adriaans, J., … & Van Assche, J. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences, 119(44), e2203150119. https://doi.org/10.1073/pnas.2203150119
Brodeur, A., Cook, N. & Heyes, A. (2022). We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments. IZA Discussion Paper No 15478. https://www.econstor.eu/bitstream/10419/265699/1/dp15478.pdf
Brodeur, A., Cook, N. & Neisser, C. (2022). P-Hacking, Data Type and Data-Sharing Policy, IZA Discussion Papers, No. 15586. http://hdl.handle.net/10419/265807
Brodeur, A., Lé, M., Sangnier, M., & Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1), 1-32. https://doi.org/10.1257/app.20150044
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., … & Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637-644. https://doi.org/10.1038/s41562-018-0399-z
Chambers, C.D., & Tzavella, L. (2022). The past, present, and future of Registered Reports. Nature Human Behavior, 6: 29-42. https://doi.org/10.1038/s41562-021-01193-7
Checco, A., Bracciale, L., Loreti, P., Pinfield, S., & Bianchi, G. (2021). AI-assisted peer review. Humanities and Social Sciences Communications, 8(1), 1-11. https://doi.org/10.1057/s41599-020-00703-8
Christensen, G., Dafoe, A., Miguel, E., Moore, D.A., & Rose, A.K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS ONE 14(12): e0225883. https://doi.org/10.1371/journal.pone.0225883
Delios, A., Clemente, E. G., Wu, T., Tan, H., Wang, Y., Gordon, M., … & Uhlmann, E. L. (2022). Examining the generalizability of research findings from archival data. Proceedings of the National Academy of Sciences, 119(30), e2120377119. https://doi.org/10.1073/pnas.2120377119
Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891-904. https://doi.org/10.1007/s11192-011-0494-7
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502-1505. https://doi.org/10.1126/science.1255484
Friese, M., & Frankenbach, J. (2020). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods, 25(4), 456. https://doi.org/10.1037/met0000246
Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don’t hold up. American Scientist, 102(6), 460-465. https://www.jstor.org/stable/43707868
Gerber, A. S., & Malhotra, N. (2008a). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37(1), 3-30. https://doi.org/10.1177/0049124108318973
Gerber, A., & Malhotra, N. (2008b). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3(3), 313-326. http://dx.doi.org/10.1561/100.00008024
Gopalakrishna, G., Ter Riet, G., Vink, G., Stoop, I., Wicherts, J. M., & Bouter, L. M. (2022a). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PloS one, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023
Gopalakrishna, G., Wicherts, J. M., Vink, G., Stoop, I., van den Akker, O. R., ter Riet, G., & Bouter, L. M. (2022b). Prevalence of responsible research practices among academics in The Netherlands. F1000Research, 11(471), 471. https://doi.org/10.12688/f1000research.110664.2
Grimes, D. R., Bauch, C. T., & Ioannidis, J. P. (2018). Modelling science trustworthiness under publish or perish pressure. Royal Society Open Science, 5(1), 171511. https://doi.org/10.1098/rsos.171511
Groves, R. M., Couper, M. P., Presser, S., Singer, E., Tourangeau, R., Acosta, G. P., & Nelson, L. (2006). Experiments in producing nonresponse bias. International Journal of Public Opinion Quarterly, 70(5), 720-736. https://doi.org/10.1093/poq/nfl036
Hardwicke, T.E., Ioannidis, J.P.A. (2018). Mapping the universe of registered reports. Nature Human Behavior, 2: 793–796. https://doi.org/10.1038/s41562-018-0444-y
Hardwicke, T.E., Wallach, J.D., Kidwell, M.C., Bendixen, T., Crüwell, S. & Ioannidis, J.P.A. (2020). An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014–2017). Royal Society Open Science, 7: 190806. https://doi.org/10.1098/rsos.190806
Heathers, J. A., Anaya, J., van der Zee, T., & Brown, N. J. (2018). Recovering data from summary statistics: Sample parameter reconstruction via iterative techniques (SPRITE). PeerJ Preprints, e26968v1. https://doi.org/10.7287/peerj.preprints.26968v1
Ioannidis, J., Stanley, T. D., & Doucouliagos, H. (2017). The Power of Bias in Economics Research. Economic Journal, 127(605): F236-F265. https://doi.org/10.1111/ecoj.12461
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524-532. https://doi.org/10.1177/0956797611430953
Judge, G., & Schechter, L. (2009). Detecting problems in survey data using Benford’s Law. Journal of Human Resources, 44(1), 1-24. https://doi.org/10.3368/jhr.44.1.1
KNAW, NFU, NWO, TO2‐federatie, Vereniging Hogescholen & VSNU (2018). Netherlands Code of Conduct for Research Integrity. http://www.vsnu.nl/files/documents/Netherlands%20Code%20of%20Conduct%20for%20Research%20Integrity%202018.pdf
Krawczyk, M. & Reuben, E. (2012). (Un)Available upon Request: Field Experiment on Researchers’ Willingness to Share Supplementary Materials. Accountability in Research, 19: 175–186. https://doi.org/10.1080/08989621.2012.678688
Kvarven, A., Strømland, E., & Johannesson, M. (2020). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour, 4(4), 423-434. https://doi.org/10.1038/s41562-019-0787-z
Lakomý, M., Hlavová, R. & Machackova, H. (2019). Open Science and the Science-Society Relationship. Society, 56, 246–255. https://doi.org/10.1007/s12115-019-00361-w
Maxwell, S. E. (2004). The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies. Psychological Methods, 9(2), 147–163. https://doi.org/10.1037/1082-989X.9.2.147
Menke, J., Roelandse, M., Ozyurt, B., Martone, M., & Bandrowski, A. (2020). The rigor and transparency index quality metric for assessing biological and medical science methods. iScience, 23(11): 101698. https://doi.org/10.1016/j.isci.2020.101698.
Menkveld, A. J., Dreber, A., Holzmeister, F., Huber, J., Johannesson, M., Kirchler, M., … & Weitzel, U. (2021). Non-standard errors. Working paper. https://dx.doi.org/10.2139/ssrn.3961574
Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J. & Ioannidis, J. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1-9. https://doi.org/10.1038/s41562-016-0021
Nosek, B.A., Ebersole, C.R., DeHaven, A.C. & Mellor, D.T. (2018). The Preregistration Revolution. Proceedings of the National Academy of Sciences, 115(11): 2600-2606. http://www.pnas.org/cgi/doi/10.1073/pnas.1708274114
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., … & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-114157
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7(6), 615–631. https://doi.org/10.1177/1745691612459058
Nuijten, M.B. & Polanin, J.R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta‐analyses. Research Synthesis Methods, 11(5): 574–579. https://doi.org/10.1002/jrsm.1408
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716
Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., … & Schooler, J. (2020). High replicability of newly-discovered social-behavioral findings is achievable. PsyArxiv preprint. https://psyarxiv.com/n2a9x/
Riedel, N., Kip, M., & Bobrov, E. (2020). ODDPub—a text‑mining algorithm to detect data sharing in biomedical publications. Data Science Journal, 19:42. http://doi.org/10.5334/dsj-2020-042
Schäfer, T. & Schwarz, M.A. (2019). The Meaningfulness of Effect Sizes in Psychological Research: Differences Between Sub-Disciplines and the Impact of Potential Biases. Frontiers of Psychology, 10:813. https://doi.org/10.3389/fpsyg.2019.00813
Scheel, A. M., Schijen, M. R., & Lakens, D. (2021). An excess of positive results: Comparing the standard Psychology literature with Registered Reports. Advances in Methods and Practices in Psychological Science, 4(2). https://doi.org/10.1177/25152459211007467
Schroter, S., Black, N., Evans, S., Godlee, F., Osorio, L., & Smith, R. (2008). What errors do peer reviewers detect, and does training improve their ability to detect them? Journal of the Royal Society of Medicine, 101(10), 507-514. https://doi.org/10.1258/jrsm.2008.080062
Schulz, R., Barnett, A., Bernard, R., Brown, N. J., Byrne, J. A., Eckmann, P., … & Weissgerber, T. L. (2022). Is the future of peer review automated? BMC Research Notes, 15(1), 1-5. https://doi.org/10.1186/s13104-022-06080-6
Schweinsberg, M., Feldman, M., Staub, N., van den Akker, O. R., van Aert, R. C., Van Assen, M. A., … & Schulte-Mecklenbeck, M. (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes, 165, 228-249. https://doi.org/10.1016/j.obhdp.2021.02.003
Serghiou, S., Contopoulos-Ioannidis, D.G., Boyack, K.W., Riedel, N., Wallach, J.D., & Ioannidis, J.P.A. (2021). Assessment of transparency indicators across the biomedical literature: How open is open? PLoS Biology, 19(3): e3001107. https://doi.org/10.1371/journal.pbio.3001107
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356. https://doi.org/10.1177/2515245917747646
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 1359-1366. https://doi.org/10.1177/0956797611417632
Simonsohn, U., Nelson, L.D. & Simmons, J.P. (2014a). P-Curve: A Key to the File Drawer. Journal of Experimental Psychology: General, 143 (2): 534–547. https://doi.org/10.1037/a0033242
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014b). p-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9(6), 666-681. https://doi.org/10.1177/1745691614553988
Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384
Smaldino, P. E., Turner, M. A., & Contreras Kallens, P. A. (2019). Open science and modified funding lotteries can impede the natural selection of bad science. Royal Society Open Science, 6(7), 190194. https://doi.org/10.1098/rsos.190194
Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine, 99: 178–182. https://dx.doi.org/10.1177/014107680609900414
Smith, R. (2010). Classical peer review: an empty gun. Breast Cancer Research, 12 (Suppl 4), S13. https://doi.org/10.1186/bcr2742
Soderberg, C. K., Errington, T. M., Schiavone, S. R., Bottesini, J., Thorn, F. S., Vazire, S., … & Nosek, B. A. (2021). Initial evidence of research quality of registered reports compared with the standard publishing model. Nature Human Behaviour, 5(8), 990-997. https://doi.org/10.1038/s41562-021-01142-4
Song, H., Markowitz, D. M., & Taylor, S. H. (2022). Trusting on the shoulders of open giants? Open science increases trust in science for the public and academics. Journal of Communication, 72(4), 497-510. https://doi.org/10.1093/joc/jqac017
Steneck, N. H. (2007). Introduction to the Responsible Conduct of Research. Washington, DC: Department of Health and Human Services, Office of Research Integrity. https://ori.hhs.gov/sites/default/files/2018-04/rcrintro.pdf
Stodden, V., Seiler, J. & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11): 2584-2589. https://doi.org/10.1073/pnas.1708290115
Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K. & Sepp, T. (2021). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, 8(1), 1-11. https://doi.org/10.1038/s41597-021-00981-0
Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859-883. https://doi.org/10.1037/0033-2909.133.5.859
Trisovic, A., Lau, M.K., Pasquier, T. & Crosas, M. (2022). A large-scale study on research code quality and execution. Scientific Data, 9, 60. https://doi.org/10.1038/s41597-022-01143-6
Tsai, A. C., Kohrt, B. A., Matthews, L. T., Betancourt, T. S., Lee, J. K., Papachristos, A. V., … & Dworkin, S. L. (2016). Promises and pitfalls of data sharing in qualitative research. Social Science & Medicine, 169, 191-198. https://doi.org/10.1016/j.socscimed.2016.08.004
Van Elk, M., Matzke, D., Gronau, Q., Guang, M., Vandekerckhove, J., & Wagenmakers, E. J. (2015). Meta-analyses are no substitute for registered replications: A skeptical perspective on religious priming. Frontiers in Psychology, 1365. https://doi.org/10.3389/fpsyg.2015.01365
Vazire, S. (2019, March 14). The Credibility Revolution in Psychological Science. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2404.
Vlaeminck, S. (2021) : Dawning of a New Age? Economics Journals’ Data Policies on the Test Bench. LIBER Quarterly, 31 (1): 1–29. https://doi.org/10.53377/lq.10940
Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. https://doi.org/10.1037/0003-066X.61.7.726
Xie, Y., Wang, K., & Kong, Y. (2021). Prevalence of research misconduct and questionable research practices: a systematic review and meta-analysis. Science and Engineering Ethics, 27(4), 1-28. https://doi.org/10.1007/s11948-021-00314-9
Zavalis, E.A., & Ioannidis, J.P.A. (2022) A meta-epidemiological assessment of transparency indicators of infectious disease models. PLoS ONE 17(10): e0275380. https://doi.org/10.1371/journal.pone.0275380