Category Archives: fraud

A Data Transparency Policy for Results Based on Experiments

pdf

Transparency is a key condition for robust and reliable knowledge, and the advancement of scholarship over time. Since January 1, 2020, I am the Area Editor for Experiments submitted to Nonprofit & Voluntary Sector Quarterly (NVSQ), the leading journal for academic research in the interdisciplinary field of nonprofit research. In order to improve the transparency of research published in NVSQ, the journal is introducing a policy requiring authors of manuscripts reporting on data from experiments to provide, upon submission, access to the data and the code that produced the results reported. This will be a condition for the manuscript to proceed through the blind peer review process.

The policy will be implemented as a pilot for papers reporting results of experiments only. For manuscripts reporting on other types of data, the submission guidelines will not be changed at this time.

 

Rationale

This policy is a step forward strengthening research in our field through greater transparency about research design, data collection and analysis. Greater transparency of data and analytic procedures will produce fairer, more constructive reviews and, ultimately, even higher quality articles published in NVSQ. Reviewers can only evaluate the methodologies and findings fully when authors describe the choices they made and provide the materials used in their study.

Sample composition and research design features can affect the results of experiments, as can sheer coincidence. To assist reviewers and readers in interpreting the research, it is important that authors describe relevant features of the research design, data collection, and analysis. Such details are also crucial to facilitate replication. NVSQ receives very few, and thus rarely publishes replications, although we are open to doing so. Greater transparency will facilitate the ability to reinforce, or question, research results through replication (Peters, 1973; Smith, 1994; Helmig, Spraul & Temp, 2012).

Greater transparency is also good for authors. Articles with open data appear to have a citation advantage: they are cited more frequently in subsequent research (Colavizza et al., 2020; Drachen et al., 2016). The evidence is not experimental: the higher citation rank of articles providing access to data may be a result of higher research quality. Regardless of whether the policy improves the quality of new research or attracts higher quality existing research – if higher quality research is the result, then that is exactly what we want.

Previously, the official policy of our publisher, SAGE, was that authors were ‘encouraged’ to make the data available. It is likely though that authors were not aware of this policy because it was not mentioned on the journal website. In any case, this voluntary policy clearly did not stimulate the provision of data because data are available for only a small fraction of papers in the journal. Evidence indicates that a data sharing policy alone is ineffective without enforcement (Stodden, Seiler, & Ma, 2018; Christensen et al., 2019). Even when authors include a phrase in their article such as ‘data are available upon request,’ research shows that this does not mean that authors comply with such requests (Wicherts et al., 2006; Krawczyk & Reuben, 2012). Therefore, we are making the provision of data a requirement for the assignment of reviewers.

 

Data Transparency Guidance for Manuscripts using Experiments

Authors submitting manuscripts to NVSQ in which they are reporting on results from experiments are kindly requested to provide a detailed description of the target sample and the way in which the participants were invited, informed, instructed, paid, and debriefed. Also, authors are requested to describe all decisions made and questions answered by the participants and provide access to the stimulus materials and questionnaires. Most importantly, authors are requested to share the data and code that produced the reported findings available for the editors and reviewers. Please make sure you do so anonymously, i.e. without identifying yourself as an author of the manuscript.

When you submit the data, please ensure that you are complying with the requirements of your institution’s Institutional Review Board or Ethics Review Committee, the privacy laws in your country such as the GDPR, and other regulations that may apply. Remove personal information from the data you provide (Ursin et al., 2019). For example, avoid logging IP and email addresses in online experiments and any other personal information of participants that may identify their identities.

The journal will not host a separate archive. Instead, deposit the data at a platform of your choice, such as Dataverse, Github, Zenodo, or the Open Science Framework. We accept data in Excel (.xls, .csv), SPSS (.sav, .por) with syntax (.sps), data in Stata (.dta) with a do-file, and projects in R.

When authors have successfully submitted the data and code along with the paper, the Area Editor will verify whether the data and code submitted actually produce the results reported. If (and only if) this is the case, then the submission will be sent out to reviewers. This means that reviewers will not have to verify the computational reproducibility of the results. They will be able to check the integrity of the data and the robustness of the results reported.

As we introduce the data availability policy, we will closely monitor the changes in the number and quality of submissions, and their scholarly impact, anticipating both collective and private benefits (Popkin, 2019). We have scored the data transparency of 20 experiments submitted in the first six months of 2020, using a checklist counting 49 different criteria. In 4 of these submissions some elements of the research were preregistered. The average transparency was 38 percent. We anticipate that the new policy improves transparency scores.

The policy takes effect for new submissions on July 1, 2020.

 

Background: Development of the Policy

The NVSQ Editorial Team has been working on policies for enhanced data and analytic transparency for several years, moving forward in a consultative manner.  We established a Working Group on Data Management and Access which provided valuable guidance in its 2018 report, including a preliminary set of transparency guidelines for research based on data from experiments and surveys, interviews and ethnography, and archival sources and social media. A wider discussion of data transparency criteria was held at the 2019 ARNOVA conference in San Diego, as reported here. Participants working with survey and experimental data frequently mentioned access to the data and code as a desirable practice for research to be published in NVSQ.

Eventually, separate sets of guidelines for each type of data will be created, recognizing that commonly accepted standards vary between communities of researchers (Malicki et al., 2019; Beugelsdijk, Van Witteloostuijn, & Meyer, 2020). Regardless of which criteria will be used, reviewers can only evaluate these criteria when authors describe the choices they made and provide the materials used in their study.

 

References

Beugelsdijk, S., Van Witteloostuijn, A. & Meyer, K.E. (2020). A new approach to data access and research transparency (DART). Journal of International Business Studies, https://link.springer.com/content/pdf/10.1057/s41267-020-00323-z.pdf

Christensen, G., Dafoe, A., Miguel, E., Moore, D.A., & Rose, A.K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS ONE 14(12): e0225883. https://doi.org/10.1371/journal.pone.0225883

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416, https://doi.org/10.1371/journal.pone.0230416

Drachen, T.M., Ellegaard, O., Larsen, A.V., & Dorch, S.B.F. (2016). Sharing Data Increases Citations. Liber Quarterly, 26 (2): 67–82. https://doi.org/10.18352/lq.10149

Helmig, B., Spraul, K. & Tremp, K. (2012). Replication Studies in Nonprofit Research: A Generalization and Extension of Findings Regarding the Media Publicity of Nonprofit Organizations. Nonprofit and Voluntary Sector Quarterly, 41(3): 360–385. https://doi.org/10.1177%2F0899764011404081

Krawczyk, M. & Reuben, E. (2012). (Un)Available upon Request: Field Experiment on Researchers’ Willingness to Share Supplementary Materials. Accountability in Research, 19:3, 175-186, https://doi.org/10.1080/08989621.2012.678688

Malički, M., Aalbersberg, IJ.J., Bouter, L., & Ter Riet, G. (2019). Journals’ instructions to authors: A cross-sectional study across scientific disciplines. PLoS ONE, 14(9): e0222157. https://doi.org/10.1371/journal.pone.0222157

Peters, C. (1973). Research in the Field of Volunteers in Courts and Corrections: What Exists and What Is Needed. Journal of Voluntary Action Research, 2 (3): 121-134. https://doi.org/10.1177%2F089976407300200301

Popkin, G. (2019). Data sharing and how it can benefit your scientific career. Nature, 569: 445-447. https://www.nature.com/articles/d41586-019-01506-x

Smith, D.H. (1994). Determinants of Voluntary Association Participation and Volunteering: A Literature Review. Nonprofit and Voluntary Sector Quarterly, 23 (3): 243-263. https://doi.org/10.1177%2F089976409402300305

Stodden, V., Seiler, J. & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. PNAS, 115(11): 2584-2589. https://doi.org/10.1073/pnas.1708290115

Ursin, G. et al., (2019), Sharing data safely while preserving privacy. The Lancet, 394: 1902. https://doi.org/10.1016/S0140-6736(19)32633-9

Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726-728. http://dx.doi.org/10.1037/0003-066X.61.7.726

Working Group on Data Management and Access (2018). A Data Availability Policy for NVSQ. April 15, 2018. https://renebekkers.files.wordpress.com/2020/06/18_04_15-nvsq-working-group-on-data.pdf

Leave a comment

Filed under academic misconduct, data, experiments, fraud, methodology, open science, statistical analysis

How to review a paper

Including a Checklist for Hypothesis Testing Research Reports *

See https://osf.io/6cw7b/ for a pdf of this post

 

Academia critically relies on our efforts as peer reviewers to evaluate the quality of research that is published in journals. Reading the reviews of others, I have noticed that the quality varies considerably, and that some reviews are not helpful. The added value of a journal article above and beyond the original manuscript or a non-reviewed preprint is in the changes the authors made in response to the reviews. Through our reviews, we can help to improve the quality of the research. This memo provides guidance on how to review a paper, partly inspired by suggestions provided by Alexander (2005), Lee (1995) and the Committee on Publication Ethics (2017). To improve the quality of the peer review process, I suggest that you use the following guidelines. Some of the guidelines – particularly the criteria at the end of this post – are peculiar for the kind of research that I tend to review – hypothesis testing research reports relying on administrative data and surveys, sometimes with an experimental design. But let me start with guidelines that I believe make sense for all research.

Things to check before you accept the invitation
First, I encourage you to check whether the journal aligns with your vision of science. I find that a journal published by an exploitative publisher making a profit in the range of 30%-40% is not worth my time. A journal that I have submitted my own work to and gave me good reviews is worth the number of reviews I received for my article. The review of a revised version of the paper does not count as a separate paper.
Next, I check whether I am the right person to review the paper. I think it is a good principle to describe my disciplinary background and expertise in relation to the manuscript I am invited to review. Reviewers do not need to be experts in all respects. If you do not have useful expertise to improve the paper, politely decline.

Then I check whether I know the author(s). If I do, and I have not collaborated with the author(s), if I am not currently collaborating or planning to do so, I describe how I know the author(s) and ask the editor whether it is appropriate for me to review the paper. If I have a conflict of interest, I notify the editor and politely decline. It is a good principle to let the editor know immediately if you are unable to review a paper, so the editor can start to look for someone else to review the paper. Your non-response means a delay for the authors and the editor.

Sometimes I get requests to review a paper that I have reviewed before, for a conference or another journal. In these cases I let the editor know and ask the editor whether she would like to see the previous review. For the editor it will be useful to know whether the current manuscript is the same as the version, or includes revisions.

Finally, I check whether the authors have made the data and code available. I have made it a requirement that authors have to fulfil before I accept an invitation to review their work. An exception can be made for data that would be illegal or dangerous to make available, such as datasets that contain identifying information that cannot be removed. In most cases, however, the authors can provide at least partial access to the data by excluding variables that contain personal information.

A paper that does not provide access to the data analyzed and the code used to produce the results in the paper is not worth my time. If the paper does not provide a link to the data and the analysis script, I ask the editor to ask the authors to provide the data and the code. I encourage you to do the same. Almost always the editor is willing to ask the authors to provide access. If the editor does not respond to your request, that is a red flag to me. I decline future invitation requests from the journal. If the authors do not respond to the editor’s request, or are unwilling to provide access to the data and code, that is a red flag for the editor.

The tone of the review
When I write a review, I think of the ‘golden rule’: treat others as you would like to be treated. I write the review report that I would have liked to receive if I had been the author. I use the following principles:

  • Be honest but constructive. You are not at war. There is no need to burn a paper to the ground.
  • Avoid addressing the authors personally. Say: “the paper could benefit from…” instead of “the authors need”.
  • Stay close to the facts. Do not speculate about reasons why the authors have made certain choices beyond the arguments stated in the paper.
  • Take a developmental approach. Any paper will contain flaws and imperfections. Your job is to improve science by identifying problems and suggesting ways to repair them. Think with the authors about ways they can improve the paper in such a way that it benefits collective scholarship. After a quick glance at the paper, I determine whether I think the paper has the potential to be published, perhaps after revisions. If I think the paper is beyond repair, I explain this to the editor.
  • Try to see beyond bad writing style and mistakes in spelling. Also be mindful of disciplinary and cultural differences between the authors and yourself.

The substance of the advice
In my view, it is a good principle to begin the review report by describing your expertise and the way you reviewed the paper. If you searched for literature, checked the data and verified the results, ran additional analyses, state this. It will allow the editor to adjudicate the review.

Then give a brief overview of the paper. If the invitation asks you to provide a general recommendation, consider whether you’d like to give one. Typically, you are invited to recommend ‘reject’, ‘revise & resubmit’ – with major or minor revisions, or ‘accept’. Because the recommendation is the first thing the editor wants to know it is convenient to state it early in the review.

When giving such a recommendation, I start from the assumption that the authors have invested a great deal of time in the paper and that they want to improve it. Also I consider the desk-rejection rate at the journal. If the editor sent the paper out for review, she probably thinks it has the potential to be published.

To get to the general recommendation, I list the strengths and the weaknesses of the paper. To ease the message you can use the sandwich principle: start with the strengths, then discuss the weaknesses, and conclude with an encouragement.

For authors and editors alike it is convenient to give actionable advice. For the weaknesses in the paper I suggest ways to repair them. I distinguish major issues such as not discussing alternative explanations from minor issues such as missing references and typos. It is convenient for both the editor and the authors to number your suggestions.

The strengths could be points that the authors are underselling. In that case, I identify them as strengths that the authors can emphasize more strongly.

It is handy to refer to issues with direct quotes and page numbers. To refer to the previous sentence: “As the paper states on page 3, [use] “direct quotes and page numbers””.

In 2016 I have started to sign my reviews. This is an accountability device: by exposing who I am to the authors of the paper I’m reviewing, I set higher standards for myself. I encourage you to think about this as an option, though I can imagine that you may not want to risk retribution as a graduate student or an early career researcher. Also some editors do not appreciate signed reviews and may take away your identifying information.

How to organize the review work
Usually, I read a paper twice. First, I go over the paper superficially and quickly. I do not read it closely. This gets me a sense of where the authors are going. After the first superficial reading, I determine whether the paper is good enough to be revised and resubmitted, and if so, I provide more detailed comments. After the report is done, I revisit my initial recommendation.

The second time I go over the paper, I do a very close reading. Because the authors had a word limit, I assume that literally every word in the manuscript is absolutely necessary – the paper should have no repetitions. Some of the information may be in the supplementary information provided with the paper.

Below you find a checklist of things I look for in a paper. The checklist reflects the kind of research that I tend to review, which is typically testing a set of hypotheses based on theory and previous research with data from surveys, experiments, or archival sources. For other types of research – such as non-empirical papers, exploratory reports, and studies based on interviews or ethnographic material – the checklist is less appropriate. The checklist may also be helpful for authors preparing research reports.

I realize that this is an extensive set of criteria for reviews. It sets the bar pretty high. A review checking each of the criteria will take you at least three hours, but more likely between five and eight hours. As a reviewer, I do not always check all criteria myself. Some of the criteria do not necessarily have to be done by peer reviewers. For instance, some journals employ data editors who check whether data and code provided by authors produce the results reported.

I do hope that journals and editors can get to a consensus on a set of minimum criteria that the peer review process should cover, or at least provide clarity about the criteria that they do check.

After the review
If the authors have revised their paper, it is a good principle to avoid making new demands for the second round that you have not made before. Otherwise the revise and resubmit path can be very long.

 

References
Alexander, G.R. (2005). A Guide to Reviewing Manuscripts. Maternal and Child Health Journal, 9 (1): 113-117. https://doi.org/10.1007/s10995-005-2423-y
Committee on Publication Ethics Council (2017). Ethical guidelines for peer reviewers. https://publicationethics.org/files/Ethical_Guidelines_For_Peer_Reviewers_2.pdf
Lee, A.S. (1995). Reviewing a manuscript for publication. Journal of Operations Management, 13: 87-92. https://doi.org/10.1016/0272-6963(95)94762-W

 

Review checklist for hypothesis testing reports

Research question

  1. Is it clear from the beginning what the research question is? If it is in the title, that’s good. In the first part of the abstract is good too. Is it at the end of the introduction section? In most cases that is too late.
  2. Is it clearly formulated? By the research question alone, can you tell what the paper is about?
  3. Does the research question align with what the paper actually does – or can do – to answer it?
  4. Is it important to know the answer to the research question for previous theory and methods?
  5. Does the paper address a question that is important from a societal or practical point of view?

 

Research design

  1. Does the research design align with the research question? If the question is descriptive, do the data actually allow for a representative and valid description? If the question is a causal question, do the data allow for causal inference? If not, ask the authors to report ‘associations’ rather than ‘effects’.
  2. Is the research design clearly described? Does the paper report all the steps taken to collect the data?
  3. Does the paper identify mediators of the alleged effect? Does the paper identify moderators as boundary conditions?
  4. Is the research design waterproof? Does the study allow for alternative interpretations?
  5. Has the research design been preregistered? Does the paper refer to a public URL where the preregistration is posted? Does the preregistration include a statistical power analysis? Is the number of observations sufficient for statistical tests of hypotheses? Are deviations from the preregistered design reported?
  6. Has the experiment been approved by an Internal or Ethics Review Board (IRB/ERB)? What is the IRB registration number?

 

Theory

  1. Does the paper identify multiple relevant theories?
  2. Does the theory section specify hypotheses? Have the hypotheses been formulated before the data were collected? Before the data were analyzed?
  3. Do hypotheses specify arguments why two variables are associated? Have alternative arguments been considered?
  4. Is the literature review complete? Does the paper cover the most relevant previous studies, also outside the discipline? Provide references to research that is not covered in the paper, but should definitely be cited.

 

Data & Methods

  1. Target group – Is it identified? If mankind, is the sample a good sample of mankind? Does it cover all relevant units?
  2. Sample – Does the paper identify the procedure used to obtain the sample from the target group? Is the sample a random sample? If not, has selective non-response been dealt with, examined, and have constraints on generality been identified as a limitation?
  3. Number of observations – What is the statistical power of the analysis? Does the paper report a power analysis?
  4. Measures – Does the paper provide the complete topic list, questionnaire, instructions for participants? To what extent are the measures used valid? Reliable?
  5. Descriptive statistics – Does the paper provide a table of descriptive statistics (minimum, maximum, mean, standard deviation, number of observations) for all variables in the analyses? If not, ask for such a table.
  6. Outliers – Does the paper identify treatment of outliers, if any?
  7. Is the multi-level structure (e.g., persons in time and space) identified and taken into account in an appropriate manner in the analysis? Are standard errors clustered?
  8. Does the paper report statistical mediation analyses for all hypothesized explanation(s)? Do the mediation analyses evaluate multiple pathways, or just one?
  9. Do the data allow for testing additional explanations that are not reported in the paper?

 

Results

  1. Can the results be reproduced from the data and code provided by the authors?
  2. Are the results robust to different specifications?

Conclusion

  1. Does the paper give a clear answer to the research question posed in the introduction?
  2. Does the paper identify implications for the theories tested, and are they justified?
  3. Does the paper identify implications for practice, and are they justified given the evidence presented?

 

Discussion

  1. Does the paper revisit the limitations of the data and methods?
  2. Does the paper suggest future research to repair the limitations?

 

Meta

  1. Does the paper have an author contribution note? Is it clear who did what?
  2. Are all analyses reported, if they are not in the main text, are they available in an online appendix?
  3. Are references up to date? Does the reference list include a reference to the dataset analyzed, including an URL/DOI?

 

 

* This work is licensed under a Creative Commons Attribution 4.0 International License. Thanks to colleagues at the Center for Philanthropic Studies at Vrije Universiteit Amsterdam, in particular Pamala Wiepking, Arjen de Wit, Theo Schuyt and Claire van Teunenbroek, for insightful comments on the first version. Thanks to Robin Banks, Pat Danahey Janin, Rense Corten, David Reinstein, Eleanor Brilliant, Claire Routley, Margaret Harris, Brenda Bushouse, Craig Furneaux, Angela Eikenberry, Jennifer Dodge, and Tracey Coule for responses to the second draft. The current text is the fourth draft. The most recent version of this paper is available as a preprint at https://doi.org/10.31219/osf.io/7ug4w. Suggestions continue to be welcome at r.bekkers@vu.nl.

Leave a comment

Filed under academic misconduct, data, experiments, fraud, helping, incentives, law, open science, sociology, survey research

Closing the Age of Competitive Science

In the prehistoric era of competitive science, researchers were like magicians: they earned a reputation for tricks that nobody could repeat and shared their secrets only with trusted disciples. In the new age of open science, researchers share by default, not only with peer reviewers and fellow researchers, but with the public at large. The transparency of open science reduces the temptation of private profit maximization and the collective inefficiency in information asymmetries inherent in competitive markets. In a seminar organized by the University Library at Vrije Universiteit Amsterdam on November 1, 2018, I discussed recent developments in open science and its implications for research careers and progress in knowledge discovery. The slides are posted here. The podcast is here.

2 Comments

Filed under academic misconduct, data, experiments, fraud, incentives, law, Netherlands, open science, statistical analysis, survey research, VU University

Tools for the Evaluation of the Quality of Experimental Research

pdf of this post

Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.

The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.

The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.

Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.

Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.

In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.

The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori  are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.

The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).

  • In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
  • In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
  • In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
  • In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.

Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).

Alarm bells, red flags and other warning signs

Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.

Warning signs

  • The power of the analysis is too low.
  • The results are too good to be true.
  • All hypotheses are confirmed.
  • P-values are just below critical thresholds (e.g., p<.05)
  • A groundbreaking result is reported but not replicated in another sample.
  • The data and code are not made available upon request.
  • The data are not made available upon article submission.
  • The code is not made available upon article submission.
  • Materials (manipulations, survey questions) are described superficially.
  • Descriptive statistics are not reported.
  • The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
  • The research is not preregistered.
  • No details of an IRB procedure are given.
  • Participant recruitment procedures are not described.
  • Exact details of time and location of the data collection are not described.
  • A power analysis is lacking.
  • Unusual / non-validated measures are used without justification.
  • Different dependent variables are analyzed in different studies within the same article without justification.
  • Variables are (log)transformed or recoded in unusual categories without justification.
  • Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
  • A one-sided test is reported when a two-sided test would be appropriate.
  • Test-statistics (p-values, F-values) reported are incorrect.

With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.

  • Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
  • Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
  • Full disclosure of methodological details of research submitted for publication, for instance through psychdisclosure.org is now required by major journals in psychology.
  • Apps such as Statcheck, p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.

As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.

The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:

  • Preregistration of research, for instance on Aspredicted.org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
  • Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.

A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.

References

Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554.

Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135.

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217.

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366.

Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588

Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract

1 Comment

Filed under academic misconduct, experiments, fraud, incentives, open science, psychology, survey research

Four Reasons Why We Are Converting to Open Science

The Center for Philanthropic Studies I am leading at VU Amsterdam is converting to Open Science.

Open Science offers four advantages to the scientific community, nonprofit organizations, and the public at large:

  1. Access: we make our work more easily accessible for everyone. Our research serves public goods, which are served best by open access.
  2. Efficiency: we make it easier for others to build on our work, which saves time.
  3. Quality: we enable others to check our work, find flaws and improve it.
  4. Innovation: ultimately, open science facilitates the production of knowledge.

What does the change mean in practice?

First, the source of funding for contract research we conduct will always be disclosed.

Second, data collection – interviews, surveys, experiments – will follow a prespecified protocol. This includes the number of observations forseen, the questions to be asked, measures to be included, hypotheses to be tested, and analyses to be conducted. New studies will be preferably be preregistered.

Third, data collected and the code used to conduct the analyses will be made public, through the Open Science Framework for instance. Obviously, personal or sensitive data will not be made public.

Fourth, results of research will preferably be published in open access mode. This does not mean that we will publish only in Open Access journals. Research reports and papers for academic will be made available online in working paper archives, as a ‘preprint’ version, or in other ways.

 

December 16, 2015 update:

A fifth reason, following directly from #1 and #2, is that open science reduces the costs of science for society.

See this previous post for links to our Giving in the Netherlands Panel Survey data and questionnaires.

 

July 8, 2017 update:

A public use file of the Giving in the Netherlands Panel Survey and the user manual are posted at the Open Science Framework.

1 Comment

Filed under academic misconduct, Center for Philanthropic Studies, contract research, data, fraud, incentives, methodology, open science, regulation, survey research

Wat zegt het CBF-Keur voor goede doelen?

Het Financieel Dagblad besteedt een lang artikel aan de betekenis van het CBF-Keur voor goede doelen naar aanleiding van de vraag: “Waar blijft mijn gedoneerde euro?” Het “keurmerk en boekhoudregels zijn geen garantie voor een zinvolle besteding”, volgens de krant. Verderop in het artikel staat mijn naam genoemd bij de stelling dat het CBF-Keur ‘fraude of veel te hoge kosten niet uitsluit’ en zelfs dat het ‘nietszeggend’ zou zijn. Inderdaad zegt het feit dat een goed doel over het CBF-Keur beschikt niet dat de organisatie perfect werkt. Het maakt fraude niet onmogelijk en dwingt organisaties ook niet altijd tot de meest efficiënte besteding van beschikbare middelen. In het verleden zijn misstanden bij verschillende CBF-Keurmerkhouders in het nieuws gekomen, die bij sommige organisaties hebben geleid tot intrekking van het keurmerk.

Maar helemaal ‘nietszeggend’ is het CBF-Keur ook weer niet. Zo denk ik er ook niet over. Het CBF-Keur zegt wel degelijk wat. Voordat een organisatie het CBF-Keur mag voeren moet het een uitgebreide procedure door om aan eisen te voldoen aan financiële verslaggeving, onafhankelijkheid van het bestuur, kosten van fondsenwerving, en de formulering van beleidsplannen. Dit zijn relevante criteria. Zij zorgen ervoor dat je als donateur erop kunt vertrouwen dat de organisatie op een professionele manier werkt. Het CBF-Keur zegt alleen niet zoveel over de efficiëntie van de bestedingen van een goed doel. Veel mensen denken dat wel, zo constateerden we in onderzoek uit 2009.

Het is lastige materie. Garantie krijg je op een product dat je koopt in de winkel, waardoor je het terug kunt brengen als het niet functioneert of binnen korte tijd stuk gaat. Zulke garanties zijn moeilijk te geven voor giften aan goede doelen. Een dergelijke garantie zou je alleen kunnen geven als de kwaliteit van het werk van goede doelenorganisaties gecontroleerd kan worden en er een minimumeis voor te formuleren valt. Dat lijkt mij onmogelijk. Het CBF-Keur is niet zoiets als een rijbewijs dat je moet hebben voordat je een auto mag besturen. De markt voor goede doelen is vrij toegankelijk; iedereen mag de weg op. Sommige goede doelen hebben een keurmerk, maar dat zegt vooral hoeveel ze betaald hebben voor de benzine, in wat voor auto ze rijden en wie er achter het stuur zit. Het zegt nog niet zoveel over de hoeveelheid ongelukken die ze ooit hebben mee gemaakt of veroorzaakt, en of dat de kortste of de snelste weg is.

Vorig jaar stelde de commissie-De Jong voor om een autoriteit filantropie in te stellen, die organisaties zou gaan controleren voordat ze de markt voor goede doelen op mogen. Er zou een goede doelen politie komen die ook op de naleving van de regels mag controleren en boetes mag uitdelen. Dat voorstel was te duur voor de overheid. Voor de goede doelen was het onaantrekkelijk omdat zij aan nieuwe regels zouden moeten gaan voldoen. Bovendien was het niet duidelijk of die nieuwe regels ook echt het aantal ongelukken zou verlagen. Het is op dit moment überhaupt niet duidelijk hoe goed de bestuurders van goede doelen de weg kennen en hoeveel ongelukken ze maken. Een beter systeem zou moeten beginnen met een meting van het aantal overtredingen in het goede doelen verkeer en een telling van het aantal bestuurders met en zonder rijbewijs. Vervolgens zou het goed zijn om een rijopleiding op te zetten die iedereen die de markt op wil kan volgen en in staat stelt de vaardigheden op te doen waarover elke bestuurder moet beschikken. Ik hoop dat het artikel in het Financieel Dagblad tot een discussie leidt die dit duidelijk maakt.

Intussen heeft het CBF gereageerd met de verzekering dat er gewerkt wordt aan uitwerking van richtlijnen voor ‘reactief toezicht op prestaties’. Ook de VFI, branchevereniging voor goede doelen, kwam met een reactie van die strekking. Dat is goed nieuws. Maar die nieuwe richtlijnen zijn er nog lang niet. In de tussentijd geeft het CBF keurmerken af en publiceren de Nederlandse media – die na Finland de meest vrije ter wereld zijn – af en toe een flitspaalfoto van wegmisbruikers. Dat lijkt voldoende te zijn om het goede doelen verkeer zichzelf te laten regelen en de ergste ongelukken te voorkomen. Want die zijn er maar weinig.

Leave a comment

Filed under charitable organizations, fraud, household giving, impact, incentives, law, philanthropy, policy evaluation, politics

Valkuilen in het nieuwe systeem van toezicht op goededoelenorganisaties

Deze bijdrage verscheen op 27 januari op Filanthropium.nl.
Dank aan Theo Schuyt voor commentaar op een eerdere versie van dit stuk en aan Sigrid Hemels en Frans Nijhof voor correcties van enkele feitelijke onjuistheden. PDF? Klik hier.

De contouren van het toezicht op goededoelenorganisaties in de toekomst worden zo langzamerhand duidelijk. Het nieuwe systeem is een compromis dat op termijn veel kan veranderen, maar net zo goed een faliekante mislukking kan worden.

Hoe ziet het nieuwe systeem eruit?
In opdracht van de Minister van Veiligheid en Justitie heeft de Commissie De Jong een voorstel gedaan voor een nieuw systeem. De commissie stelt voor een Autoriteit Filantropie op te richten die de fondsenwervende goededoelenorganisaties moet registreren. De autoriteit is een nieuw orgaan dat onder het ministerie van Veiligheid en Justitie valt, maar eigen wettelijke bevoegdheden krijgt. Burgers kunnen de registratie online raadplegen. Het uitgangspunt van het nieuwe systeem is een kostenbesparing. Geregistreerde goededoelenorganisaties hoeven geen keurmerk meer aan te vragen en krijgen automatisch toegang tot de markt voor fondsenwerving. Organisaties die geen fondsen werven zoals vermogensfondsen en organisaties die alleen onder hun leden fondsen werven zoals kerken hoeven zich niet te registreren. De autoriteit maakt de huidige registratie van Algemeen Nut Beogende Instellingen (ANBI’s) door de belastingdienst grotendeels overbodig.

Winnaars en verliezers
Het nieuwe systeem is een overwinning voor vijf partijen: de vermogensfondsen, de kerken, de bekende goededoelenorganisaties, het Ministerie van Veiligheid en Justitie, en de Belastingdienst. De meeste vermogensfondsen en de kerken winnen in het nieuwe systeem omdat zij niet door de registratie heen hoeven wanneer zij geen fondsen werven onder het publiek. Zij blijven als ANBI’s geclassicificeerd bij de belastingdienst. De bekende goededoelenorganisaties winnen in het systeem omdat zij invloed krijgen op de criteria die voor registratie zullen gaan gelden. Het Ministerie van Veiligheid en Justitie wint omdat zij volledige controle krijgt over goededoelenorganisaties. De Belastingdienst wint omdat zij afscheid kan nemen van een groot aantal werknemers die voor de registratie van goededoelenorganisaties zorgden.

De verliezers in het nieuwe systeem zijn de huidige toezichthouders op goededoelenorganisaties (waaronder het Centraal Bureau Fondsenwerving , CBF) en de kleinere goededoelenorganisaties. Het CBF verliest klanten omdat de nieuwe registratie gaat gelden als toegangsbewijs voor de Nederlandse markt voor fondsenwerving en daarmee het keurmerk van het CBF (en een aantal andere, minder bekende, keurmerken) overbodig maakt. De autoriteit filantropie krijgt de mogelijkheid overtreders te beboeten. De Belastingdienst heeft deze mogelijkheid in het huidige systeem niet, zij kan alleen de ANBI-status intrekken. Ook het CBF kan geen boetes innen, maar alleen het keurmerk intrekken.

De criteria waarop potentiële gevers goededoelenorganisaties kunnen gaan beoordelen zijn nog niet geformuleerd. Omdat de kleinere goededoelenorganisaties in Nederland niet of niet goed georganiseerd zijn is het lastig om hun belangen in de autoriteit filantropie een stem te geven. Het gevaar dreigt dat de grotere goededoelenorganisaties de overhand krijgen in de discussie over de regels. Ook is onduidelijk hoe streng de controle gaat worden. De belastingdienst gaat deze controle in ieder geval niet meer doen. De commissie stelt voor dat vooral voorafgaand aan de registratie controle plaatsvindt.

De winst- en verliesrekening voor de burger – als potentiële gever en belastingbetaler – is minder duidelijk. De kosten van de hele operatie zijn niet berekend. De commissie stelt voor dat alle organisaties die zich registreren om toegang te krijgen tot de Nederlandse markt voor fondsenwerving mee gaan betalen. Het ANBI-register telt momenteel zo’n 60.000 inschrijvingen; een deel betreft organisaties die zich in het nieuwe systeem niet meer hoeven te registreren (kerken, vermogensfondsen). Als er 20.000 registraties overblijven kan het nieuwe systeem voor de goededoelenorganisaties aanmerkelijk goedkoper worden. Op dit moment betalen 269 landelijk wervende goededoelenorganisaties voor het CBF-keurmerk. In het huidige systeem worden alle keurmerkhouders gecontroleerd. De autoriteit zal slechts steekproefsgewijs en bij klachten controles uitvoeren.

Gebrekkige probleemanalyse
Het advies vertrekt vanuit de probleemanalyse dat het vertrouwen in goededoelenorganisaties daalt door schandalen en affaires. Deze analyse is niet goed onderbouwd. De publieke verontwaardiging over de salariëring van (interim)managers zoals bij Plan Nederland en de Hartstichting in 2004 en het breken van (onmogelijke) beloften over gratis fondsenwerving zoals bij Alpe D’huZes vorig jaar bedreigen vooral de inkomsten van getroffen organisaties, niet de giften aan de goededoelensector als geheel. Het vertrouwen in goededoelenorganisaties daalt al tijden structureel, zo blijkt uit het Geven in Nederland onderzoek van de Vrije Universiteit en de peilingen van het Nederlands Donateurs Panel.

Vervolgens stelt het advies dat het doel van een nieuw systeem is om het vertrouwen in goededoelenorganisaties onder burgers te vergroten. Dat burgers in vertrouwen moeten kunnen geven door het nieuwe systeem lijkt een legitiem doel. Het is echter de vraag of overheid de imago- en communicatieproblemen van de goededoelensector op moet lossen. We zouden de sector daar immers ook zelf verantwoordelijk voor kunnen houden, zoals in de Verenigde Staten gebeurt. Voor het imago van de overheid en het vertrouwen in de politiek is het echter verstandig de controle op organisaties die fiscale voordelen krijgen waterdicht te maken, zodat er geen vragen komen over de doelmatigheid van de besteding van belastinggeld. Daarnaast is het vanuit de politieke keuze voor de participatiesamenleving verstandig meer inzicht te vragen in de prestaties van goededoelenorganisaties. Als burgers zelf meer verantwoordelijkheid krijgen voor het publiek welzijn via goededoelenorganisaties willen we wel kunnen zien of zij die verantwoordelijkheid inderdaad waarmaken. Dat zou via het register van de Autoriteit Filantropie kunnen.

Nieuw systeem zorgt niet automatisch voor meer vertrouwen
Het is echter de vraag of de burger door het nieuwe systeem ook inderdaad weer meer vertrouwen krijgt in goededoelenorganisaties. Het advies van de Commissie de Jong heeft veel details van het nieuwe systeem nog niet ingevuld. Vertrouwen drijft op de betrouwbaarheid van de controlerende instantie. Die organisatie moet onafhankelijk én streng zijn. Conflicterende belangen bedreigen het vertrouwen. Als de te controleren organisaties vertegenwoordigd zijn in de autoriteit of haar activiteiten kunnen beïnvloeden is zij niet onafhankelijk. Een gebrek aan controle is eveneens een risicofactor voor het publieksvertrouwen, vooral als er later problemen blijken te zijn. Het is belangrijk dat de autoriteit proactief handelt en niet slechts achteraf na gebleken onregelmatigheden een onderzoek instelt.

Blijkbaar is er iets mis met de huidige controle. De probleemanalyse van de commissie de Jong gaat ook op dit punt kort door de bocht. Het advies omschrijft niet hoe de controle op goededoelenorganisaties op dit moment plaatsvindt. De commissie analyseert evenmin wat de problemen zijn in het huidige systeem. Op dit moment gebeurt de controle op goededoelenorganisaties niet door de overheid. De belastingdienst registreert ‘Algemeen nut beogende instellingen’ (ANBI’s), maar controleert deze instellingen nauwelijks als ze eenmaal geregistreerd zijn.

Sterke en zwakke punten van het huidige systeem
In feite heeft de overheid de controle op goededoelenorganisaties nu uitbesteed aan een vrije markt van toezichthouders. Dit zijn organisaties zoals het CBF die keurmerken verstrekken. In theorie is dit een goed werkend systeem omdat de vrijwillige deelname een signaal van kwaliteit geeft aan potentiële donateurs. Goededoelenorganisaties kunnen ervoor kiezen om aan eisen te voldoen die aan deze keurmerken zijn verbonden. Organisaties die daarvoor kiezen willen en kunnen openheid geven; de organisaties die dat niet doen laden de verdenking op zich dat zij minder betrouwbaar zijn. Het systeem werkt als de toezichthouder onafhankelijk is, de controle streng, en de communicatie daarover effectief. Het CBF heeft in de afgelopen jaren echter verzuimd om de criteria scherp te hanteren en uit te leggen aan potentiële donateurs. Het CBF-Keur stelt bijvoorbeeld geen maximum aan de salarissen van medewerkers. Ook bij de onafhankelijkheid kunnen vragen worden gesteld. De grote goededoelenorganisaties zijn met twee afgevaardigden van de VFI vertegenwoordigd in het CBF, en zijn daarnaast in feite klanten die betalen voor de kosten van het systeem. Zij hebben er belang bij de eisen niet aan te scherpen omdat dan de kosten te hoog oplopen.

Twee valkuilen
In het nieuwe systeem dreigen zowel de onafhankelijkheid als de pakkans voor problemen te zorgen. De commissie laat het aan de autoriteit over om te bepalen welke regels zullen worden gehanteerd. Maar wie komen er in die autoriteit? De commissie beveelt aan ‘diverse belanghebbenden (sector, wetenschap, overheid)’ in het bestuur van de autoriteit te laten vertegenwoordigen. Het is echter onduidelijk welke partijen er in de autoriteit precies zitting krijgen, en in welke machtsverhoudingen. Wel is duidelijk dat de autoriteit in eerste instantie uitgaat van het zelfregulerend vermogen van de sector. De goededoelenorganisaties mogen dus zelf met voorstellen komen voor de regels. De commissie legt de verantwoordelijkheid voor de vaststelling van de regels vervolgens bij de overheid, en meer in het bijzonder bij de Minister van Veiligheid en Justitie. Het is dan de vraag in hoeverre de minister gevoelig is voor de lobby van goededoelenorganisaties.

De commissie stelt ook voor de controle op grond van risicoanalyses uit te voeren en om af te gaan op klachten. Dat kan in de praktijk voldoende blijken te zijn. Het nieuwe systeem neemt echter als uitgangspunt de kosten te minimaliseren. Deze kosten moeten bovendien door de te controleren goededoelenorganisaties worden opgebracht. Zij krijgen er belang bij de controle licht en oppervlakkig mogelijk te maken. Als er onvoldoende controle plaatsvindt, zoals in de Verenigde Staten het geval is, zullen ook geregistreerde organisaties onbetrouwbaar blijken te zijn. Dit is natuurlijk helemaal desastreus voor het vertrouwen.

De commissie de Jong stelt bovendien voor dat alle fondsenwervende goededoelenorganisaties van enige betekenisvolle omvang verplicht geregistreerd worden. Het nieuwe systeem biedt geen zicht op de prestaties van vermogensfondsen en kerken omdat zij onder de belastingdienst blijven vallen. Zij krijgen een voorkeursbehandeling omdat zij geen fondsen werven, of alleen onder leden. Dit is een oneigenlijk argument. Het criterium van algemeen nut betreft niet de herkomst van de fondsen, maar de prestaties. Ook de activiteiten van kerken en vermogensfondsen moeten ten goede komen aan het algemeen nut.

Het middel van verplichte registratie is waarschijnlijk niet effectief in het vergroten van het publieksvertrouwen. Een verplichte registratie heeft geen signaalfunctie voor potentiële donateurs. Als alle goededoelenorganisaties aan de eisen voldoen, zijn ze dan allemaal even betrouwbaar? Dat is niet erg waarschijnlijk. Ofwel de lat wordt in het nieuwe systeem zo laag gelegd dat alle organisaties er overheen kunnen springen, ofwel de lat wordt op papier weliswaar hoog gelegd maar in de praktijk stelt de controle niets voor.

Het zou veel beter zijn de autoriteit een vrijwillig sterrensysteem te laten ontwerpen waarin donateurs kunnen zien hoe professioneel de organisatie is aan het aantal sterren die onafhankelijke controle heeft opgeleverd. Donateurs kunnen dan professionelere organisaties verkiezen, voor zover ze bereid zijn daarvoor te betalen tenminste. Geld werven kost geld, en geld effectief besteden ook. Met een financiële bijsluiter kan de autoriteit filantropie inzichtelijk maken wat de te verwachten risico’s zijn van private investeringen in goededoelenorganisaties. Zo dwingt de markt de goededoelenorganisaties tot concurrentie op prestaties voor het publiek welzijn. Een waarlijk onafhankelijke autoriteit die scherp controleert op naleving van (naar keuze) strenge of minder strenge regels is ook binnen die contouren mogelijk en lijkt mij gezien de maatschappelijke betekenis van de filantropie in Nederland van belang.

2 Comments

Filed under charitable organizations, corporate social responsibility, foundations, fraud, household giving, incentives, law, philanthropy, policy evaluation, politics, taxes, trends, trust

More But Not Less: a University Research and Education Reform Proposal

Yes, the incentive structure in the higher education and research industry should be reformed in order to reduce the inflation of academic degrees and research. That much is clear from the increasing numbers of cases of outright fraud and academic misconduct, including more subtle forms of data manipulation, p-hacking, and rising rates of (false) positive publication bias as a result. It is also clear from the declining numbers of professors employed by universities to teach the rising numbers of students, up to the PhD level. Yes, the increasing numbers of peer-reviewed journal publications and academic degrees awarded imply that the productivity of academia has increased in the past decades. But the marginal returns on investiment are now approaching zero or perhaps even becoming negative. The recent Science in Transition position paper identifies the issues. So what should we do? It is not just important to diagnose the symptoms, it is time for a reform. This takes years, and an international approach, as the chairman of the board of Erasmus University Rotterdam Pauline van der Meer-Mohr said recently in a radio interview. Here are some ideas.

  1. Evaluate the quality of research rather than the quantity. Examine a proportion of publications through audits, screening them for results that are too good to be true, statistical analysis and reporting errors, and the availability of data and coding for replication. Rankings of universities are often based in part on numbers of publications. Universities that want to climb on the rankings will promote or hire more productive researchers. Granting agencies and universities should reduce the influence of rankings and the current publication culture on promotion and granting decisions. Prohibit the payment of bonuses for publications (including those in specific high-impact journals).
  2. Evaluate the quality of education rather than the quantity. Examine a proportion of courses through mystery shoppers, screening them for tests that are too easy to pass, accuracy of grades for assignments, and the availability of student guidelines in course manuals. Rankings of universities are often based on evaluations by course-enrolled students. Universities that want to climb on the rankings will please the students and the evaluators. Accreditation bodies should reduce the self-selection of evaluators for academic programs. Prohibit the payment of departments and universities for letting students pass.
  3. We can have the cake and eat it at the same time. Let all students pass courses if the requirements for presence at meetings and submission of assignments are met, but give grades based on performance. This change puts students back in control and reduces the tendency among instructors to help students to pass.

Leave a comment

Filed under academic misconduct, fraud, incentives, politics

How Incentives Lead Us Astray in Academia

PDF of this post

The Teaching Trap

I did it again this week – I tried to teach students. Yes, It’s my job, and I love it. But that’s completely my own fault. If it were for the incentives I encounter in the academic institution where I work, it would be far better to not spend time on teaching at all. For my career in academia, the thing that counts most heavily is how many publications in top journals I can realize. For some, this is even the only thing that counts. Their promotion only depends on the number of publications. Last week going home on the train I overheard one young researcher from the medical school of our university saying to a colleague “I would be a sucker to spend time on teaching!”

I remember what I did when I was their age. I worked at another university in an era where excellent publications were not yet counted by the impact factors of journals. My dissertation supervisor asked me to teach a Sociology 101 class, and I spent all of my time on it. I loved it. I developed fun class assignments with creative methods. I gave weekly writing assignments to students and scribbled extensive comments in the margins of their essays. Students learned and wrote much better essays at the end of the course than at the beginning.

A few years later things started to change. We were told to ‘extensify’ teaching: spend less time as teachers, keeping the students as busy as ever. I developed checklists for students (‘Does my essay have a title?’ – ‘Is the reference list in alphabetical order and complete?’) and codes to grade essays with, ranging from ‘A. This sentence is not clear’ to ‘Z. Remember the difference between substance and significance: a p-value only tells you something about statistical significance, and not necessarily something about the effect size’. It was efficient for me – grading was much faster using the codes – and kept students busy – they could figure out themselves where they could improve their work. It was less attractive for students though and they progressed less than they used to. The extensification was required because the department spent too much time on teaching relative to the compensation it received from the university. I realized then that the department and my university earns money with teaching. For every student that passes a course the department earns money from the university, because for every student that graduates the university earns money from the Ministry of Education.

This incentive structure is still in place, and it is completely destroying the quality of teaching and the value of a university diploma. As a professor I can save a lot of time by just letting students pass the courses I teach without trying to have the students learn anything: by not giving them feedback on their essays, by not having them write essays, by not having them do a retake after a failed exam, or even by grading their exams with at least a ‘passed’ mark without reading what they wrote.

Allemaal_een_Tien

The awareness that incentives lead us astray has become clearer to me ever since the time the ‘extensify’ movement dawned. The latest illustration came to me earlier this academic year when I talked to a group of people interested in doing dissertation work as external PhD candidates. The university earns a premium from the Ministry of Education for each PhD dissertation that is defended successfully. Back in the old days, way before I got into academia, a dissertation was an eloquent monograph. When I graduated, the dissertation had become a set of four connected articles introduced by a literature review and a conclusion and discussion chapter. Today, the dissertation is a compilation of three articles, of which one could be a literature review. The process of diploma inflation has worked its way up to the PhD level. The minimum level of quality of required for dissertations has also declined. The procedures in place to check whether the research work by external PhD candidates conforms to minimum standards are weak. And why should they, if stringent criteria lower the profit for universities?

The Rat Race in Research

Academic careers are evaluated and shaped primarily by the number of publications, the impact factors of the journals in which they are published, and the number of citations by other researchers. At higher ranks the size and prestige of research grants starts to count as well. The dominance of output evaluations not only works against the attention paid to teaching, but also has perverse effects on research itself. The goal of research these days is not so much to get closer to the truth but to get published as frequently as possible in the most prestigious journals. A classic example of the replacement of substantive with instrumental rationality or the inversion between means and ends: an instrument becomes a goal in itself.[1] At some universities researchers can earn a salary bonus for each publication in a ‘top journal’. This leads to opportunistic behavior: salami tactics (thinly slicing the same research project in as many publications as possible), self-plagiarism (publishing the same or virtually the same research in different journals), self-citations, and even outright data fabrication.

What about the self-correcting power of science? Will reviewers not weed out the bad apples? Clearly not. The number of retractions in academic journals is increasing and not because reviewers are able to catch more cheaters. It is because colleagues and other bystanders witness misbehavior and are concerned about the reputation of science, or because they personally feel cheated or exploited. The recent high-profile cases of academic misbehavior as well as the growing number of retractions show it is surprisingly easy to engage in sloppy science. Because incentives lead us astray, it really comes down to our self-discipline and moral standards.

As an author of academic research articles I have rarely encountered reviewers who were doubting the validity of my analyses. Never did I encounter reviewers who asked for a more elaborate explanation of the procedures used or who wanted to see the data themselves. Only once I received a request from a graduate student from another university who asked me to provide a dataset and the code I used in an article. I do feel good about being able to provide the original data and the code even though they were located on a computer that I had not used for three years and were stored with software that has received 7 updates since that time. But why haven’t I received such requests on other occasions?

As a reviewer, I recently tried to replicate analyses of a publicly available dataset reported in a paper. It was the first time I ever went to the trouble of locating the data, interpreting the description of the data handling in the manuscript and replicating the analyses. I arrived at different estimates and discovered several omissions and other mistakes in the analyses. Usually it is not even possible to replicate results because the data on which they are based are not publicly available. But they should be made available. Secret data are not permissible.[2] Next time I review an article I might ask: ‘Show, don’t tell’.

As an author, I have experienced how easy and tempting it is to engage in p-hacking: “exploiting –perhaps unconsciously- researcher degrees-of-freedom until p<.05”.[3] It is not really difficult to publish a paper with a fun finding from an experiment that was initially designed to test a hypothesis predicting another finding.[4] The hypothesis was not confirmed, and that result was less appealing than the fun finding. I adapted the title of the paper to reflect the fun finding, and people loved it.

The temptation to report fun findings and not to report rejections is enhanced by the behavior of reviewers and journal editors. On multiple occasions I encountered reviewers who did not like my findings when they led to rejections of hypotheses – usually hypotheses they had promulgated in their own previous research. The original publication of a surprising new finding is rarely followed by a null-finding. Still I try to publish null-findings, and increasingly so.[5] It may take a few years, and the article ends up in a B-journal.[6] But persistence is fertile. Recently a colleague took the lead in an article in which we replicate that null-finding using five different datasets.

In the field of criminology, it is considered a trivial fact that crime increases with its profitability and decreases with the risk of detection. Academic misbehavior is like crime: the more profitable it is, and the lower the risk of getting caught, the more attractive it becomes. The low detection risk and high profitability create strong incentives. There must be an iceberg of academic misbehavior. Shall we crack it under the waterline or let it hit a cruise ship full of tourists?


[1] In 1917, this was Max Weber’s criticism of capitalism in The Protestant Ethic and the Spirit of Capitalism.

[2] As Graham Greene wrote in Our Man in Havana: “With a secret remedy you don’t have to print the formula. And there is something about a secret which makes people believe… perhaps a relic of magic.”

[3] The description is from Uri Simonsohn, http://opim.wharton.upenn.edu/~uws/SPSP/post.pdf

[4] The title of the paper is ‘George Gives to Geology Jane: The Name Letter Effect and Incidental Similarity Cues in Fundraising’. It appeared in the International Journal of Nonprofit and Voluntary Sector Marketing, 15 (2): 172-180.

[5] On average, 55% of the coefficients reported in my own publications are not significant. The figure increased from 46% in 2005 to 63% in 2011.

[6] It took six years before the paper ‘Trust and Volunteering: Selection or Causation? Evidence from a Four Year Panel Study’ was eventually published in Political Behavior (32 (2): 225-247), after initial rejections at the American Political Science Review and the American Sociological Review.

3 Comments

Filed under academic misconduct, fraud, incentives, law, methodology, psychology

Risk factors for fraud and academic misconduct in the social sciences

This note (also available in pdf here) aims to feed the discussion about how to deal with fraud and other forms of academic misconduct in the wake of the Stapel and Smeesters affair and the publication of the report by the Schuyt Commission of the Royal Dutch Academy of Sciences (KNAW).

The recent fraud cases in psychology (the report of the Levelt committee that investigated the Stapel fraud is here: http://www.tilburguniversity.edu/nl/nieuws-en-agenda/finalreportLevelt.pdf; read more on Retraction Watch here) do not only call the credibility of that particular field of science into question, but also reduce the reputation of social science research generally. The KNAW report urges universities to educate employees and students in academic honesty but does not suggest to implement a specific policy to detect fraud and other forms of academic misconduct. The diversity in research practices between disciplines makes it difficult to impose a general policy to detect and deter misconduct. However, skeptics may view the reluctance of the KNAW to increase scrutiny as a way to cover up fraud and misconduct. Universities and science in general run a serious risk of losing their credibility in society if they do not deal with misconduct. With every new case that comes to light the public will ask: how is it possible that this case was not detected and prevented? Anticipating a large scale national investigation universities screen their employees using a list of risk factors for fraud and misconduct. This screening exercise may give a rough sense of how prevalent and serious academic misconduct is at their institution is. Below I give some suggestions for such risk factors, relying on research on academic misconduct.

At present it is unclear how prevalent and serious academic misconduct is at universities. It is difficult to obtain complete, valid and reliable estimates of the prevalence and severity of academic misconduct. Just as in crime outside the walls of academia, it is likely that there is a dark number for academic misconduct that does not come to light because there are no victims or because the victims or other witnesses have no incentive to report misconduct or an incentive not to report it. Relying on a survey among 435 European economists (a 17% response rate), Feld, Necker & Frey (2012) report that less than a quarter of all forms of academic misconduct is reported. There is no official registration of cases of academic misconduct. Cases of misconduct are sometimes covered by news media or by academics on blogs like Retraction Watch (http://retractionwatch.wordpress.com/). Using surveys, researchers have tried to estimate misconduct relying on self-reports and peer reports. In a Gallup 2008 survey among NIH grantees, 7.4% of the respondents reported suspected misconduct (Wells, 2008). Other surveys suggest a much higher incidence of misconduct. John, Loewenstein and Prelec (2012) conducted a study among psychologists with incentives for truth-telling and found that 36% admitted to having engaged in at least one ‘questionable research practice, a much higher incidence than the 9.5% reported by Fanelli (2009). The research available shows that fraud is certainly not unique to experimental (social) psychology, as the high-profile cases of Stapel, Smeesters and Sanna from the University of Michigan seem to suggest. Fraud occurs in many fields of science. Retraction Watch profiles the cases of Jan Hendrik Schön in nanotechnology, Marc Hauser in biology, Hwang Woo-suk in stem cell research, Jon Sudbø and Dipak Das in medicine, and many other researchers working in the medical and natural sciences.

What forms of misconduct should be distinguished? Below is a list of behaviors that are mentioned in discussions on academic dishonesty and the code of conduct of the Association of Universities in the Netherlands (VSNU).

  • Fabrication of data. Stapel fabricated ‘data’: he claimed data were collected in experiments while in fact no experiment was conducted and no data were collected. In less severe cases researchers data points are fabricated and added to a real dataset. List et al. (2001) report that 4% of economists admit having fabricated data. A similar estimate emerges from the more recent survey by Feld, Necker & Frey (2012). John, Loewenstein & Prelec report that 1.7% of psychologists admit fabrication of data. However, from this number, they estimate the true prevalence to be around 9%.
  • Omission of data points. Smeesters admitted to have worked datasets such that the hypotheses were confirmed, e.g. by fabricating and adding ‘data points’ that increase the and by omitting those that reduced the p-value. John, Loewenstein & Prelec report that 43.4% of psychologists admit this.
  • Invalid procedures for data handling. Errors in recoding, reporting or interpreting, inspired by and leading to support for the hypotheses. Research by Bakker & Wichterts (2011) shows this is quite common in psychology: 18% of statistical results in 2008 are incorrectly reported, commonly in the direction of the hypothesis favored by the author.
  • ‘Data snooping’: ending data collection before a target sample is achieved when a significant result is realized. This increases the likelihood of false positives or Type I errors (Strube, 2006). John, Loewenstein & Prelec report that 58% of psychologists admit this.
  • Cherry picking: not reporting on data that were collected because the results did not support the hypothesis. John, Loewenstein & Prelec report that 50% of psychologists admit this. Cherry picking results in the file drawer problem: the ‘unexpected’ results disappear into a drawer.
  • ‘Harking’: Hypothesizing After Results are Known (Kerr, 1998). In a paper, reporting an unexpected finding as having been predicted from the start. John, Loewenstein & Prelec report that 35% of psychologists admit this.

All of the above forms of misconduct lead to artificially strong positive results that are difficult to replicate (Jha, 2012; Simmons, Nelson & Simonsohn, 2011). The positive publication bias is enhanced by high-impact journals that want ‘novel findings’ and refuse to report (failed) replications. In addition to forms of misconduct that lead to positive publication bias, there are several other forms of misconduct:

  • Plagiarism. Cut & paste of text without quotation marks and/or proper references.
  • Double publication. Sending essentially the same manuscript to different journals without informing them and accepting simultaneous publication without cross-references. Ironically, Bruno Frey, the third author of the Feld, Necker & Frey (2012) paper cited above, has engaged in this form of misconduct on several occasions. The Frey case is documented extensively by Olaf Storbeck on his Economics Intelligence blog (http://economicsintelligence.com/2012/03/19/self-plagiarism-bruno-frey-gets-away-with-a-slap-on-the-wrist/).
  • Undeserved authorship. Putting the name of a co-author on a paper who did not contribute to the paper. List et al. lumped undeserved authorship and sending manuscripts simultaneously to two journals and report that 7 to 10% of economists have engaged in this behavior.
  • Not disclosing conflicts of interest (e.g., reviewing your own paper, a paper to which you contributed or a paper by a close colleague; sponsorship of the research by a party with interests in a certain outcome).
  • Not observing professional codes of conduct. Each academic discipline has its own code of conduct. The content of these codes vary widely. Being aware of the code is phase 1; knowledge of its content is phase 2; observing it is phase 3.

Trends in misconduct

The recent Stapel and Smeesters cases suggest that misconduct is increasing. While Giner-Sorolla (2012) argues that the problems so vividly put on the agenda in this ‘year of horrors’ (Wagenmakers, 2012) are not new at all, Steen (2011) shows that the number of retractions of papers from academic journals covered in PubMed has increased sharply in the past years. These are the cases that form the tip of the iceberg because journal editors considered the evidence for misconduct so convincing. Whether the iceberg has in fact grown is not clear. Fanelli (2012) shows that negative results are disappearing from most disciplines and countries published in ISI journal articles. Most troubling is that the proportion of positive results in journal articles from the Netherlands is stronger than in many other countries (OR: 1.16, reference category: US). Also the proportion is higher in the Social Sciences (OR: 2.14; reference category: Space Science) than in other disciplines – though less strong than in Neuroscience (3.16), Psychology (OR: 2.99) and Economics (2.65).

Characteristics of those who engage in misconduct

Little is known about the characteristics of those who engage in misconduct. List et al.. (2001) find virtually no significant associations between self-reported misconduct and characteristics of economists. Stroebe, Postmes and Spears (2012) compared cases of academics caught for fraud and identified a set of common characteristics of these cases: the fraudsters were highly respected as researchers, published journal articles proficiently, were very quick in making their career, and had perfect datasets. Nosek, Spies & Motyl (2012) vividly illustrate the social dilemma for young researchers trying to build a career with novel findings that they cannot replicate. Pretty much the same sketch emerges from an analysis of retracted publications in PubMed (Steen, 2011). While Stapel and Smeesters seem to have been isolated fraudsters, Steen (2011) find that a fraudster whose PubMed publication has been retracted “more frequently publishes with at least one co-author who also has fraudulent publications”.

What can be done to reduce misconduct?

Nosek, Spies & Motyl (2012) and Stroebe (Hamel, 2012, Witlox, 2012) are skeptical about self-correction in science. At present, the benefits of misconduct are too high and the risk of getting caught is simply too low. The fraudsters lined up by Stroebe at al. were able to pass peer review procedures because the procedures were not stringent enough. Reviewers should be more aware of the possibility of fraud (Matías-Guiu & García-Ramos, 2010). Audits in which random samples of journal articles are drawn and the authors would be a solution because they increase the dection risk, Stroebe et al. argue. Food scientist Katan proposed such an audit at a KNAW conference on data sharing in 2011 (KNAW, 2012, p.47). However, audits are costly procedures. Another recommendation is that replications should be made public. This has also been the dominant response in academic psychology to the Stapel case (Wicherts, 2011). Researchers are often unwilling or reluctant to share data (Wichterts, Borsboom, Kats & Molenaar, 2006). At present the incentives discourage researchers to share data. Researchers save time by not making their data available (Firebaugh, 2007; Giner-Sorolla, 2012). The costs required to make data available are often not budgeted. If research funders such as the Netherlands Organization for Scientific Research (NWO) impose a data sharing requirement this will create an additional cost for researchers. This makes it improbable that scientists will adopt the solution without force. At present, reluctance to share data indicates lower quality of research (Wicherts, Bakker & Molenaar, 2011). While data-sharing is desirable for replication purposes, it is not something that universities can impose and only works in the long run. Journal editors and reviewers could insist on data-sharing, however. This also goes for the idea to require a power analysis for experiments (Ioannidis, 2005; Ioannidis & Trikalinos, 2007; Simmons, Nelson, & Simonsohn, 2011), the proposal that reviews are published along the articles (Mooneyham, Franklin, Mrazek, & Schooler, 2012) and various other ideas proposed by Nosek & Bar-Anan (2012) such as a completely open access data repository. An even longer term proposal is that researchers pre-register their studies and indicate in advance the analyses they intend to conduct (Wagenmakers, Wetzels, Borsboom, Van der Maas, & Kievit, 2012). Generally speaking, academic misconduct is likely to be more prevalent and more severe as the benefits of misconduct are higher, the costs are lower, and the detection risk is lower. Stroebe makes this point in two recent interviews (Hamel, 2012; Witlox, 2012). The increasing publication pressure in many sciences increases the benefit of misconduct (John, Loewenstein & Prelec, 2012). The lack of attention to details from overburdened reviewers, co-authors who are happy to score an additional publication and from dissertation supervisors loaded with work reduces the detection risk. The rat race increases the likelihood that the isolated cases of Stapel and Smeesters will be the ones who were stupid enough not to organize into a pack.

Lifelong anesthesia researcher Mutch (2011: 784) advises the following remedies against misconduct: “good mentoring, appropriately trained and diligent support staff, blinded assessment of data, data review by all investigators, a consensus agreement on data interpretation, a vigorous and independent Research Office, effective internal and external committees to assess adherence to protocols, and strong departmental leadership.” Conversely, it is likely that in the absence of these conditions, there are more opportunities for misconduct. Given all of the above, I propose the following list of conditions that increase the potential for fraud and academic misconduct. The list can be used as a checklist or screening device for academic publications. Obviously an article with a high score is not necessarily fraudulent and requires more detailed attention. I encourage universities, journal editors, and reviewers to use this list, and to suggest additions or modifications. It is by no means intended to be a definitive list, and replication is necessary.

Condition Potential misconduct Detection method
1. The researcher worked alone. Nobody else had or has access to the ‘data’. Co-authors were not involved in the ‘data collection’ and/or ‘data analysis’. Data fabrication as well as less serious forms of misconduct. Ask co-authors.
2. The ‘data’ were not collected by others, but by the researcher. Data fabrication. Ask co-authors and co-workers.
3. There are no witnesses of the ‘data’ collection. Data fabrication. Ask co-authors and co-workers.
4. The raw ‘data’ (documents, fieldwork notes, questionnaires, videos, electronic data files) are not available (anymore). They are reported confidential, missing, lost, or located on a previous computer. Data fabrication. Ask author and co-authors, check data archive.
5. The statistics are ‘too good to be true’. The p-values of statistical tests are more often just below .050 than would be expected based on chance (Krawczyk, 2008; Simonsohn, 2012). Data fabrication, selective omission of data points, cherry picking, and harking. Compare p-values to expected distribution.
6. The research only finds support for the hypotheses (Fanelli, 2012). Cherry picking and harking. Count the proportion of  hypotheses supported.
7. There is no fieldwork report or lab log entry available. Data fabrication. Check data archive, lab log, ask author.
8. Data are provided but original code or description of procedures followed by the author is not available or it is unclear for others how to replicate the research. Data fabrication, cherry picking, harking. Ask author.
9. Replication of the research is impossible with the available raw data and procedures and analyses described by the author. Cherry picking, harking. Ask author.
10. Replication of the research is possible but yields no support for the original findings. Cherry picking, harking. Try to replicate the findings.
11. The research appeared in high impact journals. Misconduct with higher benefits. Check impact factor.
12. The author is early in his/her career. Misconduct with higher benefits. Check career stage.

December 17 update:
If you’re involved with academic journals as an editor or editorial board member, read this COPE discussion paper drafted by Elizabeth Wagner in April 2011: “How should editors respond to plagiarism?”
December 20 update:
In het nieuwe nummer van Mens & Maatschappij schreef Aafke Komter een artikel met een vergelijkbare vraagstelling. Zeer lezenswaardig!
February 22 update:
The p-hacking debate is still raging. Read more about it here. Colleagues at the Department of Communication Science did a preliminary analysis of publications in their field and also found the blob just below .05.p_hacking

References

2 Comments

Filed under academic misconduct, fraud, incentives, law, methodology, psychology