In the prehistoric era of competitive science, researchers were like magicians: they earned a reputation for tricks that nobody could repeat and shared their secrets only with trusted disciples. In the new age of open science, researchers share by default, not only with peer reviewers and fellow researchers, but with the public at large. The transparency of open science reduces the temptation of private profit maximization and the collective inefficiency in information asymmetries inherent in competitive markets. In a seminar organized by the University Library at Vrije Universiteit Amsterdam on November 1, 2018, I discussed recent developments in open science and its implications for research careers and progress in knowledge discovery. The slides are posted here. The podcast is here.
Category Archives: academic misconduct
Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.
The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.
The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.
Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.
Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.
In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.
The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.
The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).
- In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
- In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
- In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
- In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.
Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).
Alarm bells, red flags and other warning signs
Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.
- The power of the analysis is too low.
- The results are too good to be true.
- All hypotheses are confirmed.
- P-values are just below critical thresholds (e.g., p<.05)
- A groundbreaking result is reported but not replicated in another sample.
- The data and code are not made available upon request.
- The data are not made available upon article submission.
- The code is not made available upon article submission.
- Materials (manipulations, survey questions) are described superficially.
- Descriptive statistics are not reported.
- The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
- The research is not preregistered.
- No details of an IRB procedure are given.
- Participant recruitment procedures are not described.
- Exact details of time and location of the data collection are not described.
- A power analysis is lacking.
- Unusual / non-validated measures are used without justification.
- Different dependent variables are analyzed in different studies within the same article without justification.
- Variables are (log)transformed or recoded in unusual categories without justification.
- Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
- A one-sided test is reported when a two-sided test would be appropriate.
- Test-statistics (p-values, F-values) reported are incorrect.
With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.
- Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
- Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
- Full disclosure of methodological details of research submitted for publication, for instance through psychdisclosure.org is now required by major journals in psychology.
- Apps such as Statcheck, p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.
As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.
The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:
- Preregistration of research, for instance on Aspredicted.org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
- Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.
A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.
Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554.
Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135.
Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217.
Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html
Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366.
Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588
Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract
The Center for Philanthropic Studies I am leading at VU Amsterdam is converting to Open Science.
Open Science offers four advantages to the scientific community, nonprofit organizations, and the public at large:
- Access: we make our work more easily accessible for everyone. Our research serves public goods, which are served best by open access.
- Efficiency: we make it easier for others to build on our work, which saves time.
- Quality: we enable others to check our work, find flaws and improve it.
- Innovation: ultimately, open science facilitates the production of knowledge.
What does the change mean in practice?
First, the source of funding for contract research we conduct will always be disclosed.
Second, data collection – interviews, surveys, experiments – will follow a prespecified protocol. This includes the number of observations forseen, the questions to be asked, measures to be included, hypotheses to be tested, and analyses to be conducted. New studies will be preferably be preregistered.
Third, data collected and the code used to conduct the analyses will be made public, through the Open Science Framework for instance. Obviously, personal or sensitive data will not be made public.
Fourth, results of research will preferably be published in open access mode. This does not mean that we will publish only in Open Access journals. Research reports and papers for academic will be made available online in working paper archives, as a ‘preprint’ version, or in other ways.
December 16, 2015 update:
A fifth reason, following directly from #1 and #2, is that open science reduces the costs of science for society.
See this previous post for links to our Giving in the Netherlands Panel Survey data and questionnaires.
July 8, 2017 update:
Often I encounter academics thinking that a high proportion of explained variance is the ideal outcome of a statistical analysis. The idea is that in regression analyses a high R Square is better than a low R Square. In my view, the emphasis on a high R2 should be reduced. A high R2 should not be a goal in itself. The reason is that a higher R2 can easily be obtained by using procedures that actually lower the external validity of coefficients.
It is possible to increase the proportion of variance explained in regression analyses in several ways that do not in fact our ability to ‘understand’ the behavior we are seeking to ‘explain’ or ‘predict’. One way to increase the R2 is to remove anomalous observations, such as ‘outliers’ or people who say they ‘don’t know’ and treat them like the average respondent. Replacing missing data by mean scores or using multiple imputation procedures often increases the Rsquare. I have used this procedure in several papers myself, including some of my dissertation chapters.
But in fact outliers can be true values. I have seen quite a few of them that destroyed correlations and lowered R squares while being valid observations. E.g., a widower donating a large amount of money to a charity after the death of his wife. A rare case of exceptional behavior for very specific reasons that seldom occur. In larger samples these outliers may become more frequent, affecting the R2 less strongly.
Also ‘Don’t Know’ respondents are often systematically different from the average respondent. Treating them as average respondents eliminates some of the real variance that would otherwise be hard to predict.
Finally, it is often possible to increase the proportion of variance explained by including more variables. This is particularly problematic if variables that are the result of the dependent variable are included as predictors. For instance if network size is added to the prediction of volunteering the R Square will increase. But a larger network not only increases volunteering; it is also a result of volunteering. Especially if the network questions refer to the present (do you know…) while the volunteering questions refer to the past (in the past year, have you…) it is dubious to ‘predict’ volunteering in the past by a measure of current network size.
As a reviewer, I give authors reporting an R2 exceeding 40% a treatment of high-level scrutiny for dubious decisions in data handling and inclusion of variables.
As a rule, R Squares tend to be higher at higher levels of aggregation, e.g. when analyzing cross-situational tendencies in behavior rather than specific behaviors in specific contexts; or when analyzing time-series data or macro-level data about countries rather than individuals. Why people do the things they do is often just very hard to predict, especially if you try to predict behavior in a specific case.
Academic misconduct figures prominently in the press this week: Peter Nijkamp, a well-known Dutch economist at VU University Amsterdam, supervised a dissertation in which self-plagiarism occurred, according to a ruling of an integrity committee of the National Association of Universities in the Netherlands. The complaint led two national newspapers to dig into the work of Nijkamp. NRC published an article by research journalist Frank van Kolfschooten, who took a small sample of his publications and found 6 cases of plagiarism, and 8 cases of self-plagiarism. Today De Volkskrant reports self-plagiarism in 60% of 115 articles co-authored by Nijkamp. VU University rector Frank van der Duyn Schouten said in a preliminary statement that he does not believe Nijkamp plagiarized on purpose, that the criteria for self-plagiarism have been changing in the past decades, and that they are currently not clear. The university issued a full investigation of Nijkamp’s publications.
Nijkamp’s profile on Google Scholar is polluted. It counts 28,860 citations, but includes papers written by others, like Zoltan Acs and Nobel-prize winner Daniel Kahneman. A Web of Knowledge author search yielded 3,638 citations of his 426 (co-authored) publications, 3,310 excluding self-citations. That’s 7.8 citations per article. His H-index is 29. Typically Nijkamp appears as a co-author on publications. He is the single author of only one of his top 10 most cited articles, ranking 10th, with 58 citations.
The Nijkamp case looks different from another prominent case of self-citation in economics, by Bruno Frey. Frey submitted nearly identical research papers to different journals. Nijkamp seems to have allowed his many co-authors to copy and paste sentences and sometimes entire paragraphs from other articles he co-authored – which can be classified as self-plagiarism.
January 15, 2014 update: Nijkamp responded in a letter posted here that there may have been some flaws and accidents, but that these are to be expected in what he calls “the beautiful industry of academic publishing”.
Yes, the incentive structure in the higher education and research industry should be reformed in order to reduce the inflation of academic degrees and research. That much is clear from the increasing numbers of cases of outright fraud and academic misconduct, including more subtle forms of data manipulation, p-hacking, and rising rates of (false) positive publication bias as a result. It is also clear from the declining numbers of professors employed by universities to teach the rising numbers of students, up to the PhD level. Yes, the increasing numbers of peer-reviewed journal publications and academic degrees awarded imply that the productivity of academia has increased in the past decades. But the marginal returns on investiment are now approaching zero or perhaps even becoming negative. The recent Science in Transition position paper identifies the issues. So what should we do? It is not just important to diagnose the symptoms, it is time for a reform. This takes years, and an international approach, as the chairman of the board of Erasmus University Rotterdam Pauline van der Meer-Mohr said recently in a radio interview. Here are some ideas.
- Evaluate the quality of research rather than the quantity. Examine a proportion of publications through audits, screening them for results that are too good to be true, statistical analysis and reporting errors, and the availability of data and coding for replication. Rankings of universities are often based in part on numbers of publications. Universities that want to climb on the rankings will promote or hire more productive researchers. Granting agencies and universities should reduce the influence of rankings and the current publication culture on promotion and granting decisions. Prohibit the payment of bonuses for publications (including those in specific high-impact journals).
- Evaluate the quality of education rather than the quantity. Examine a proportion of courses through mystery shoppers, screening them for tests that are too easy to pass, accuracy of grades for assignments, and the availability of student guidelines in course manuals. Rankings of universities are often based on evaluations by course-enrolled students. Universities that want to climb on the rankings will please the students and the evaluators. Accreditation bodies should reduce the self-selection of evaluators for academic programs. Prohibit the payment of departments and universities for letting students pass.
- We can have the cake and eat it at the same time. Let all students pass courses if the requirements for presence at meetings and submission of assignments are met, but give grades based on performance. This change puts students back in control and reduces the tendency among instructors to help students to pass.
The Teaching Trap
I did it again this week – I tried to teach students. Yes, It’s my job, and I love it. But that’s completely my own fault. If it were for the incentives I encounter in the academic institution where I work, it would be far better to not spend time on teaching at all. For my career in academia, the thing that counts most heavily is how many publications in top journals I can realize. For some, this is even the only thing that counts. Their promotion only depends on the number of publications. Last week going home on the train I overheard one young researcher from the medical school of our university saying to a colleague “I would be a sucker to spend time on teaching!”
I remember what I did when I was their age. I worked at another university in an era where excellent publications were not yet counted by the impact factors of journals. My dissertation supervisor asked me to teach a Sociology 101 class, and I spent all of my time on it. I loved it. I developed fun class assignments with creative methods. I gave weekly writing assignments to students and scribbled extensive comments in the margins of their essays. Students learned and wrote much better essays at the end of the course than at the beginning.
A few years later things started to change. We were told to ‘extensify’ teaching: spend less time as teachers, keeping the students as busy as ever. I developed checklists for students (‘Does my essay have a title?’ – ‘Is the reference list in alphabetical order and complete?’) and codes to grade essays with, ranging from ‘A. This sentence is not clear’ to ‘Z. Remember the difference between substance and significance: a p-value only tells you something about statistical significance, and not necessarily something about the effect size’. It was efficient for me – grading was much faster using the codes – and kept students busy – they could figure out themselves where they could improve their work. It was less attractive for students though and they progressed less than they used to. The extensification was required because the department spent too much time on teaching relative to the compensation it received from the university. I realized then that the department and my university earns money with teaching. For every student that passes a course the department earns money from the university, because for every student that graduates the university earns money from the Ministry of Education.
This incentive structure is still in place, and it is completely destroying the quality of teaching and the value of a university diploma. As a professor I can save a lot of time by just letting students pass the courses I teach without trying to have the students learn anything: by not giving them feedback on their essays, by not having them write essays, by not having them do a retake after a failed exam, or even by grading their exams with at least a ‘passed’ mark without reading what they wrote.
The awareness that incentives lead us astray has become clearer to me ever since the time the ‘extensify’ movement dawned. The latest illustration came to me earlier this academic year when I talked to a group of people interested in doing dissertation work as external PhD candidates. The university earns a premium from the Ministry of Education for each PhD dissertation that is defended successfully. Back in the old days, way before I got into academia, a dissertation was an eloquent monograph. When I graduated, the dissertation had become a set of four connected articles introduced by a literature review and a conclusion and discussion chapter. Today, the dissertation is a compilation of three articles, of which one could be a literature review. The process of diploma inflation has worked its way up to the PhD level. The minimum level of quality of required for dissertations has also declined. The procedures in place to check whether the research work by external PhD candidates conforms to minimum standards are weak. And why should they, if stringent criteria lower the profit for universities?
The Rat Race in Research
Academic careers are evaluated and shaped primarily by the number of publications, the impact factors of the journals in which they are published, and the number of citations by other researchers. At higher ranks the size and prestige of research grants starts to count as well. The dominance of output evaluations not only works against the attention paid to teaching, but also has perverse effects on research itself. The goal of research these days is not so much to get closer to the truth but to get published as frequently as possible in the most prestigious journals. A classic example of the replacement of substantive with instrumental rationality or the inversion between means and ends: an instrument becomes a goal in itself. At some universities researchers can earn a salary bonus for each publication in a ‘top journal’. This leads to opportunistic behavior: salami tactics (thinly slicing the same research project in as many publications as possible), self-plagiarism (publishing the same or virtually the same research in different journals), self-citations, and even outright data fabrication.
What about the self-correcting power of science? Will reviewers not weed out the bad apples? Clearly not. The number of retractions in academic journals is increasing and not because reviewers are able to catch more cheaters. It is because colleagues and other bystanders witness misbehavior and are concerned about the reputation of science, or because they personally feel cheated or exploited. The recent high-profile cases of academic misbehavior as well as the growing number of retractions show it is surprisingly easy to engage in sloppy science. Because incentives lead us astray, it really comes down to our self-discipline and moral standards.
As an author of academic research articles I have rarely encountered reviewers who were doubting the validity of my analyses. Never did I encounter reviewers who asked for a more elaborate explanation of the procedures used or who wanted to see the data themselves. Only once I received a request from a graduate student from another university who asked me to provide a dataset and the code I used in an article. I do feel good about being able to provide the original data and the code even though they were located on a computer that I had not used for three years and were stored with software that has received 7 updates since that time. But why haven’t I received such requests on other occasions?
As a reviewer, I recently tried to replicate analyses of a publicly available dataset reported in a paper. It was the first time I ever went to the trouble of locating the data, interpreting the description of the data handling in the manuscript and replicating the analyses. I arrived at different estimates and discovered several omissions and other mistakes in the analyses. Usually it is not even possible to replicate results because the data on which they are based are not publicly available. But they should be made available. Secret data are not permissible. Next time I review an article I might ask: ‘Show, don’t tell’.
As an author, I have experienced how easy and tempting it is to engage in p-hacking: “exploiting –perhaps unconsciously- researcher degrees-of-freedom until p<.05”. It is not really difficult to publish a paper with a fun finding from an experiment that was initially designed to test a hypothesis predicting another finding. The hypothesis was not confirmed, and that result was less appealing than the fun finding. I adapted the title of the paper to reflect the fun finding, and people loved it.
The temptation to report fun findings and not to report rejections is enhanced by the behavior of reviewers and journal editors. On multiple occasions I encountered reviewers who did not like my findings when they led to rejections of hypotheses – usually hypotheses they had promulgated in their own previous research. The original publication of a surprising new finding is rarely followed by a null-finding. Still I try to publish null-findings, and increasingly so. It may take a few years, and the article ends up in a B-journal. But persistence is fertile. Recently a colleague took the lead in an article in which we replicate that null-finding using five different datasets.
In the field of criminology, it is considered a trivial fact that crime increases with its profitability and decreases with the risk of detection. Academic misbehavior is like crime: the more profitable it is, and the lower the risk of getting caught, the more attractive it becomes. The low detection risk and high profitability create strong incentives. There must be an iceberg of academic misbehavior. Shall we crack it under the waterline or let it hit a cruise ship full of tourists?
 In 1917, this was Max Weber’s criticism of capitalism in The Protestant Ethic and the Spirit of Capitalism.
 As Graham Greene wrote in Our Man in Havana: “With a secret remedy you don’t have to print the formula. And there is something about a secret which makes people believe… perhaps a relic of magic.”
 The title of the paper is ‘George Gives to Geology Jane: The Name Letter Effect and Incidental Similarity Cues in Fundraising’. It appeared in the International Journal of Nonprofit and Voluntary Sector Marketing, 15 (2): 172-180.
 On average, 55% of the coefficients reported in my own publications are not significant. The figure increased from 46% in 2005 to 63% in 2011.
 It took six years before the paper ‘Trust and Volunteering: Selection or Causation? Evidence from a Four Year Panel Study’ was eventually published in Political Behavior (32 (2): 225-247), after initial rejections at the American Political Science Review and the American Sociological Review.