Category Archives: psychology

Tools for the Evaluation of the Quality of Experimental Research

pdf of this post

Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.

The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.

The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.

Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.

Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.

In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.

The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori  are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.

The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).

  • In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
  • In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
  • In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
  • In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.

Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).

Alarm bells, red flags and other warning signs

Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.

Warning signs

  • The power of the analysis is too low.
  • The results are too good to be true.
  • All hypotheses are confirmed.
  • P-values are just below critical thresholds (e.g., p<.05)
  • A groundbreaking result is reported but not replicated in another sample.
  • The data and code are not made available upon request.
  • The data are not made available upon article submission.
  • The code is not made available upon article submission.
  • Materials (manipulations, survey questions) are described superficially.
  • Descriptive statistics are not reported.
  • The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
  • The research is not preregistered.
  • No details of an IRB procedure are given.
  • Participant recruitment procedures are not described.
  • Exact details of time and location of the data collection are not described.
  • A power analysis is lacking.
  • Unusual / non-validated measures are used without justification.
  • Different dependent variables are analyzed in different studies within the same article without justification.
  • Variables are (log)transformed or recoded in unusual categories without justification.
  • Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
  • A one-sided test is reported when a two-sided test would be appropriate.
  • Test-statistics (p-values, F-values) reported are incorrect.

With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.

  • Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
  • Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
  • Full disclosure of methodological details of research submitted for publication, for instance through psychdisclosure.org is now required by major journals in psychology.
  • Apps such as Statcheck, p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.

As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.

The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:

  • Preregistration of research, for instance on Aspredicted.org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
  • Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.

A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.

References

Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554.

Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135.

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217.

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366.

Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588

Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract

1 Comment

Filed under academic misconduct, experiments, fraud, incentives, open science, psychology, survey research

Resilience and Philanthropy

This post in pdf

With the year 2020 on the horizon, the recently published work programme for Research & Innovation from European Commission for the years 2016-2017 is organized around a limited set of Societal Challenges. Europe defined these challenges after a long process of lobbying and consultation with many stakeholders. Going through the list I could not help thinking that something was missing. I do not mean that the list of challenges is a result of a political process and does not seem to reflect an underlying vision of Europe. I am thinking about the current refugee crisis. The stream of refugees arriving at the gates of Europe poses new challenges to Europe, in many areas: humanitarian assistance, citizenship, poverty, inclusion, access to education, and jobs. The stream of refugees also raises important questions for philanthropy. How will Europe deal with these challenges? How resilient is Europe? Will governments, nonprofit organizations and citizens be able to deal with this challenge? In the definition of the Rockefeller Foundation, resilience is the capacity of individuals, communities and systems to survive, adapt, and grow in the face of stress and shocks, and even transform when conditions require it. I define resilience as the mobilization of resources for the improvement of welfare in the face of adversity.

Among refugees, who are seeking a better future for themselves and their children, we see resilience. Threatened by adversity in their home countries, they take grave risks by placing their fate in the hands of human traffickers, foreign police officers. They rely on each other and their inner strength, hoping that what they left behind is worse than their future. We see a lack of resilience in Europe. The continent was not ready for the large stream of refugees. Some member states pass on the stream to each other by closing their borders. Other national governments try to accommodate refugees seeking asylum, but face barriers in finding housing, and resistance from groups of citizens who oppose accommodation of refugees in their communities. At the same time we see a willingness to help among other citizens, who offer assistance in the form of volunteer time, food and other goods. Perhaps the response of citizens is related to their own levels of resilience.

Resilience is not just the ability to withstand adversity or change by not changing at all. Resilience is not just sitting it out, or a strategy based on a rational computation of risks, the avoidance of risks, or flexibility and absorption of shocks. The resilient actor adapts to new situations and grows.  Neither is resilience an immutable trait of individuals, a matter of luck in the genetic lottery. Resilience has often been studied at the individual level in psychology. Resilience requires will power, perseverance, self-esteem, creativity, a proactive attitude, optimism, intrinsic motivation, inner strength, a long term orientation to the future, willingness to change for the better, risk-taking, using the force of your opponent, problem solving ability, and intelligence.

The questions for research on resilience require social scientists to study not only the response of individual citizens, but also of social systems: informal networks of citizens, social groups, nonprofit organizations, nations, and supra-national institutions. How are resilience-related traits related to philanthropy at the level of groups and systems? How can resilience among organizations be fostered? How do nonprofit organizations build and on resilience of target groups? Resilience is a very useful concept to apply to each of the societal challenges of Europe. The classic welfare state was a system that created resilience for society as a whole, reducing the need for resilience among individual citizens. The modern activating welfare state requires resilience among citizens as a condition for support. Welfare state support becomes more like charity: we favor victims of natural disasters that try to make the best of their lives and welfare recipients that are actively seeking a job.

As nonprofit organizations are trying to respond to the refugee crisis, they are also facing adversity themselves. In the United Kingdom, fundraising practices by charities have recently come under attack. In the Dutch nonprofit sector, cuts in government funding to arts and culture organizations have been a major source of adversity in the past years. Further cuts have been announced to organizations in international relief and development. In our research at the Center for Philanthropic Studies at VU Amsterdam we have asked: how willing are Dutch citizens to increase private contributions to charities when the government is lowering their financial support? Not much, is what our research shows. While some may have believed that citizens would compensate lower income from government grants through increased donations, this has not happened. When the cuts to the arts and culture organizations were announced, the minister for Education, Arts and Science said that cultural organizations should do more to raise funds from private sources and should rely less on government grants. The culture change in the cultural sector is taking place, slowly. Some organizations were not ready for this change and simply discontinued their activities. Most have decided to do with less, and see what opportunities they may have to increase fundraising income. Some have done well. On the whole, the increase in private contributions is marginal, and much less than the loss in government grants.

For nonprofit organizations, the refugee crisis poses a challenge, but also an opportunity to mobilize citizen support in an effective manner. By offering their support to the government, working together effectively, and channeling the willingness to volunteer they can demonstrate the societal impact that nonprofit organizations may have. This would be a much needed demonstration when trust in charitable organizations is low.

Leave a comment

Filed under disaster relief, empathy, Europe, foundations, helping, impact, Netherlands, philanthropy, psychology, trust

Five challenging questions on philanthropy

The recent success of the Ice Bucket Challenge for ALS across the world raises numerous questions on philanthropy. In this post I give some background information to answer five of these questions.

 

1. Where will it end?

It is hard to predict how much money will be raised for ALS through the Ice Bucket Challenge. Some two weeks after the campaign really took off it has raised more than £100 million according to this UK source.  The growth of donations to the ALS Association in the US now shows signs of decline, suggesting that the campaign is losing energy.

IceBucket_graph

Source: Tweet by Ethan O. Perlstein, August 29, 2014

If the S-shape in the graph above continues, total donations to the ALS Association in the US could reach $120 million.

IceBucket_graph_extra

 

2. Will other charities lose from the challenge?

It is often assumed that donors think about donations from a fixed annual budget: a dollar donated to the ALS Association cannot go to other charities. From this perspective, the Ice Bucket Challenge would come at the expense of other charities. However, it is also possible that the campaign does not affect other charities. There are many examples of campaigns that have not decreased amounts to other charities. In the Netherlands, the success of the Alpe d’Huzes bike rides against cancer has increased the amounts donated to the Dutch Cancer Society, while other health charities on average do not seem to have lost. Also for the Cancer Society itself the success of the bike ride has not come at the expense of regular fundraising campaigns, until questions were asked about the ‘no overhead costs’ policy promoted by the organizers of the event.

Also there is the possibility that people will donate more to health charities (or charities in general) because they become more aware of the need for donations. When I was nominated for the challenge by my wife my response was to donate to the Rare Diseases Foundation (ZZF), a Dutch foundation supporting research on a variety of rare diseases. My best bet is that the Ice Bucket Challenge is a fortuitous fundraising event that does not come at the expense of donations to other charities.

 

3. Is the success of the Ice Bucket Challenge ‘fair’ given the relative rarity of ALS as a disease?

Looking at all deaths in the course of a year, ALS is a relatively rare cause of death, as US data from the CDC show. Fi Douglas made a comparison with amounts donated, showing that donations do not seem to be directed towards the most lethal diseases.

diseases_donations

Source: Tweet by Fi Douglas, August 23, 2014

In a paper I published back in 2008, I compared donations to charities fighting groups of diseases and the number of deaths that these diseases cause. Giving in Netherlands to health charities seems more needs-based. It should be noted that the relatively high donations to charities fighting diseases of the nervous system is not due to the Netherlands ALS association, but mainly to other health charities.

Fundraising_Income_Needs_Netherlands_2008

 

4. What is the effectiveness of donations to the ALS Association?

When people think about the effectiveness of donations, they often look for financial information about revenues and expenses. These numbers have limited value, but let’s look at them for what they are worth. According to its annual report, the Netherlands ALS association raised €6.5 million in 2013 and spent about €7 million on research, dipping into its endowment. The costs of fundraising approached €0.5 million, a relatively low proportion relative to the ALS Association in the US (ALSA). The ALSA annual report tells us the association spent $7 million on research in 2013, and $3.6 million on fundraising, having raised a total of $29 million. One could say fundraising in the US is less effective, more difficult, or simply more expensive than in the Netherlands.

However, these numbers tell us nothing about the effectiveness of Ice Bucket Challenge donations. Their effectiveness depends completely on how the millions that are raised will be spent. From my limited knowledge on ALS it seems that the development of treatments or drugs against the disease is not on the verge of a breakthrough. Even though it would be premature to expect an effective ALS treatment any time soon, the sheer size of the amounts donated now will enable researchers to make some big steps. Now the stakes have been raised, donors may expect a well thought-through strategy of the ALS associations to spend the money in a responsible manner. The challenge for the ALS associations across the world is to manage donor expectations: to carefully communicate the uncertainty inherent in the development of medical innovations while avoiding disappointment and anger among donors expecting quick results.

Moreover, some have questioned the utility of health research charities relative to other charities, saying that there are more effective ways to spend donations. In the Netherlands this opinion was expressed by my colleague from Rotterdam, Kellie Liket, in one of the major national newspapers, De Volkskrant. Some of the responses to this op-ed piece have identified the same substitution logic that we saw above; a logic that can be questioned. More importantly, the opinion depends on the assumptions made about what counts towards the ‘effect’ of a donation. If we count lives saved per dollar contributed, medical research does not have a strong position in the debate. We can save many more lives by donating to improve health and living conditions in developing countries, where life is much less expensive to begin with. The same $100 buys more health in a poorer country, all else being equal. But this is not the health of people we know, or the health of loved ones who have suffered from a disease. It is our greater empathy for people close to us that makes us donate more readily to certain causes than others.

 

5. Why should we give to a certain cause or organization?

Perhaps the most fundamental question raised by the Ice Bucket Challenge is a moral one. While research on philanthropy may show that we give out of compassion for people we know, there are many other reasons for people to give to charity. The joy of giving, aversion of guilt, being asked to give or seeing someone else give, the desire to obtain prestige, or simply an unexpected windfall or a ray of sunshine can motivate people to give. What we think of these circumstances and reasons is a different matter. The wisdom on the ethics of giving is much older than the 120 years of empirical research on philanthropy since Thorstein Veblen’s description of donations by the late 19th century New York elite as forms of conspicuous consumption. In the 12th century, Maimonides described eight levels of charity. Giving in response to a request is lower than anonymous giving; the highest form of giving would make recipients self-reliant and their dependence on charity disappear. Because of its largely public nature, the Ice Bucket Challenge can be placed on the lower rungs of Rambam’s Golden Ladder of Charity; but you can choose your favorite manner of donating in response to the challenge. And who knows: in the very long run, even your grudgingly accepted challenge and public donation may contribute to a cure for ALS – making victims of the disease less dependent on the charity of their loved ones.

4 Comments

Filed under altruism, charitable organizations, household giving, philanthropy, psychology, trends

Haiyan Typhoon Relief Donations: Research Insights

To address the needs of people affected by the Super Typhoon Haiyan – locally known as Yolanda – that hit the Philippines on November 8, 2013 international relief organizations in the Netherlands are collectively raising funds on Monday, November 18, 2013. Commercial and public national TV and radio stations work together in the fundraising campaign. In the past week many journalists have asked the question “Will the campaign be a success?” Because it is strange to give references to academic research papers in  interviews here are some studies that looked at determinants of giving to disaster relief campaigns.

Update, December 2, 2013:

When asked to make a prediction about the total amount raised in a TV interview, I replied that the Dutch would give between €50 and €60 million. That prediction was a ‘hunch’, it was not based on a calculation of data. It turned out to be way too positive. The total amount raised by November 25 is €30 million.
 Filippijnen3

In retrospect, the declining donor confidence index could have prevented such an optimistic estimate. In almost every year since its inception in 2005 we see an increase in donor confidence in the final quarter. The year 2013 is as bad as the crisis year 2009: we see a decline in donor confidence. It may be even worse: in 2009 donor confidence declined along with consumer confidence. In 2013, however, donor confidence declined in the final quarter despite an increase in consumer confidence.
13_donateursvertrouwen

 

1 Comment

Filed under altruism, charitable organizations, disaster relief, empathy, experiments, household giving, philanthropy, psychology

Lunch Talk: “Generalized Trust Through Civic Engagement? Evidence from Five National Panel Studies”

Does civic engagement breed trust? According to a popular version of social capital theory, civic engagement should produce generalized trust among citizens. In a new paper accepted for publication in Political Psychology, Erik van Ingen (Tilburg University) and I put this theory to the test by examining the causal connection between civic engagement and generalized trust using multiple methods and multiple (prospective) panel datasets. We found participants to be more trusting. This was mostly likely caused by selection effects: the causal effects of civic engagement on trust were very small or non-significant. In the cases where small causal effects were found, they turned out not to last. We found no differences across types of organizations and only minor variations across countries.

At the PARIS colloquium of the Department of Sociology at VU University on November 12, 2013 (Room Z531, 13.00-14.00), I will not just be talking about this paper published in Political Behavior and about the new paper forthcoming in Political Psychology (here is the prepublication version). In addition to a substantive story about a research project there is also a story about the process of getting a paper accepted with a null-finding that goes against received wisdom. This story is quite informative about the publication factory that we are all in.

Leave a comment

Filed under data, psychology, survey research, trust, volunteering

How Incentives Lead Us Astray in Academia

PDF of this post

The Teaching Trap

I did it again this week – I tried to teach students. Yes, It’s my job, and I love it. But that’s completely my own fault. If it were for the incentives I encounter in the academic institution where I work, it would be far better to not spend time on teaching at all. For my career in academia, the thing that counts most heavily is how many publications in top journals I can realize. For some, this is even the only thing that counts. Their promotion only depends on the number of publications. Last week going home on the train I overheard one young researcher from the medical school of our university saying to a colleague “I would be a sucker to spend time on teaching!”

I remember what I did when I was their age. I worked at another university in an era where excellent publications were not yet counted by the impact factors of journals. My dissertation supervisor asked me to teach a Sociology 101 class, and I spent all of my time on it. I loved it. I developed fun class assignments with creative methods. I gave weekly writing assignments to students and scribbled extensive comments in the margins of their essays. Students learned and wrote much better essays at the end of the course than at the beginning.

A few years later things started to change. We were told to ‘extensify’ teaching: spend less time as teachers, keeping the students as busy as ever. I developed checklists for students (‘Does my essay have a title?’ – ‘Is the reference list in alphabetical order and complete?’) and codes to grade essays with, ranging from ‘A. This sentence is not clear’ to ‘Z. Remember the difference between substance and significance: a p-value only tells you something about statistical significance, and not necessarily something about the effect size’. It was efficient for me – grading was much faster using the codes – and kept students busy – they could figure out themselves where they could improve their work. It was less attractive for students though and they progressed less than they used to. The extensification was required because the department spent too much time on teaching relative to the compensation it received from the university. I realized then that the department and my university earns money with teaching. For every student that passes a course the department earns money from the university, because for every student that graduates the university earns money from the Ministry of Education.

This incentive structure is still in place, and it is completely destroying the quality of teaching and the value of a university diploma. As a professor I can save a lot of time by just letting students pass the courses I teach without trying to have the students learn anything: by not giving them feedback on their essays, by not having them write essays, by not having them do a retake after a failed exam, or even by grading their exams with at least a ‘passed’ mark without reading what they wrote.

Allemaal_een_Tien

The awareness that incentives lead us astray has become clearer to me ever since the time the ‘extensify’ movement dawned. The latest illustration came to me earlier this academic year when I talked to a group of people interested in doing dissertation work as external PhD candidates. The university earns a premium from the Ministry of Education for each PhD dissertation that is defended successfully. Back in the old days, way before I got into academia, a dissertation was an eloquent monograph. When I graduated, the dissertation had become a set of four connected articles introduced by a literature review and a conclusion and discussion chapter. Today, the dissertation is a compilation of three articles, of which one could be a literature review. The process of diploma inflation has worked its way up to the PhD level. The minimum level of quality of required for dissertations has also declined. The procedures in place to check whether the research work by external PhD candidates conforms to minimum standards are weak. And why should they, if stringent criteria lower the profit for universities?

The Rat Race in Research

Academic careers are evaluated and shaped primarily by the number of publications, the impact factors of the journals in which they are published, and the number of citations by other researchers. At higher ranks the size and prestige of research grants starts to count as well. The dominance of output evaluations not only works against the attention paid to teaching, but also has perverse effects on research itself. The goal of research these days is not so much to get closer to the truth but to get published as frequently as possible in the most prestigious journals. A classic example of the replacement of substantive with instrumental rationality or the inversion between means and ends: an instrument becomes a goal in itself.[1] At some universities researchers can earn a salary bonus for each publication in a ‘top journal’. This leads to opportunistic behavior: salami tactics (thinly slicing the same research project in as many publications as possible), self-plagiarism (publishing the same or virtually the same research in different journals), self-citations, and even outright data fabrication.

What about the self-correcting power of science? Will reviewers not weed out the bad apples? Clearly not. The number of retractions in academic journals is increasing and not because reviewers are able to catch more cheaters. It is because colleagues and other bystanders witness misbehavior and are concerned about the reputation of science, or because they personally feel cheated or exploited. The recent high-profile cases of academic misbehavior as well as the growing number of retractions show it is surprisingly easy to engage in sloppy science. Because incentives lead us astray, it really comes down to our self-discipline and moral standards.

As an author of academic research articles I have rarely encountered reviewers who were doubting the validity of my analyses. Never did I encounter reviewers who asked for a more elaborate explanation of the procedures used or who wanted to see the data themselves. Only once I received a request from a graduate student from another university who asked me to provide a dataset and the code I used in an article. I do feel good about being able to provide the original data and the code even though they were located on a computer that I had not used for three years and were stored with software that has received 7 updates since that time. But why haven’t I received such requests on other occasions?

As a reviewer, I recently tried to replicate analyses of a publicly available dataset reported in a paper. It was the first time I ever went to the trouble of locating the data, interpreting the description of the data handling in the manuscript and replicating the analyses. I arrived at different estimates and discovered several omissions and other mistakes in the analyses. Usually it is not even possible to replicate results because the data on which they are based are not publicly available. But they should be made available. Secret data are not permissible.[2] Next time I review an article I might ask: ‘Show, don’t tell’.

As an author, I have experienced how easy and tempting it is to engage in p-hacking: “exploiting –perhaps unconsciously- researcher degrees-of-freedom until p<.05”.[3] It is not really difficult to publish a paper with a fun finding from an experiment that was initially designed to test a hypothesis predicting another finding.[4] The hypothesis was not confirmed, and that result was less appealing than the fun finding. I adapted the title of the paper to reflect the fun finding, and people loved it.

The temptation to report fun findings and not to report rejections is enhanced by the behavior of reviewers and journal editors. On multiple occasions I encountered reviewers who did not like my findings when they led to rejections of hypotheses – usually hypotheses they had promulgated in their own previous research. The original publication of a surprising new finding is rarely followed by a null-finding. Still I try to publish null-findings, and increasingly so.[5] It may take a few years, and the article ends up in a B-journal.[6] But persistence is fertile. Recently a colleague took the lead in an article in which we replicate that null-finding using five different datasets.

In the field of criminology, it is considered a trivial fact that crime increases with its profitability and decreases with the risk of detection. Academic misbehavior is like crime: the more profitable it is, and the lower the risk of getting caught, the more attractive it becomes. The low detection risk and high profitability create strong incentives. There must be an iceberg of academic misbehavior. Shall we crack it under the waterline or let it hit a cruise ship full of tourists?


[1] In 1917, this was Max Weber’s criticism of capitalism in The Protestant Ethic and the Spirit of Capitalism.

[2] As Graham Greene wrote in Our Man in Havana: “With a secret remedy you don’t have to print the formula. And there is something about a secret which makes people believe… perhaps a relic of magic.”

[3] The description is from Uri Simonsohn, http://opim.wharton.upenn.edu/~uws/SPSP/post.pdf

[4] The title of the paper is ‘George Gives to Geology Jane: The Name Letter Effect and Incidental Similarity Cues in Fundraising’. It appeared in the International Journal of Nonprofit and Voluntary Sector Marketing, 15 (2): 172-180.

[5] On average, 55% of the coefficients reported in my own publications are not significant. The figure increased from 46% in 2005 to 63% in 2011.

[6] It took six years before the paper ‘Trust and Volunteering: Selection or Causation? Evidence from a Four Year Panel Study’ was eventually published in Political Behavior (32 (2): 225-247), after initial rejections at the American Political Science Review and the American Sociological Review.

2 Comments

Filed under academic misconduct, fraud, incentives, law, methodology, psychology

Frequently Unanswered Questions (FUQ)

Dear journalists, before we embark on a journey along all too familiar landscapes, please read this.

Q (Question) 1. Mr. Bekkers, you study ‘giving to charities’. How do you know whether a donation to a charity is well spent?

  • U (Unanswer) 1. Well, I don’t, actually. Indeed my research is about giving to charities. I do not study how charities spend the funds they raise. I can tell you that donors say they care about how charities spend their money. In fact this is often an excuse. People who complain about inefficiency of charities are typically those who would never donate money in a million years, regardless of whatever evidence showing that donations are efficient.

Q2. Mr. Bekkers, what is the reason why people give to charity?

  • U2. There is not one reason, there’s 8 different types of reasons, also called ‘mechanisms’, buttons you can push to create more giving. You can read more about them here. You said you wanted fewer reasons? Well, I can give you a list of four reasons: egoism, altruism, collectivism, and principlism. Oh no, there’s only three types of reasons: emotions, cognitions, and things we are not aware of. Wait, there’s only two reasons: truly altruistic reasons and disguised egoism.

Q3. Speaking of altruism, isn’t all seemingly altruistic behavior in the end somewhat egoistic?

  • U3. Yes, you’re probably right. I would say about 95% of all giving (just a ball park figure) is motivated by non-altruistic concerns, like being asked, knowing someone who suffered from a problem, knowing someone who benefited,  benefiting oneself, getting tax breaks and deductions, social pressure to comply with requests for donations, feeling good about giving, having an impact on others, feeling in power, paternalism, having found a cookie or something else that cheered you up, or letting the wife decide about charities to keep her busy and save the marriage.

Q4. Sorry, what I meant to ask is this: does true altruism exist at all?

  • U4. No, probably not, but we don’t know. Nobody has ever come up with a convincing experiment that rules out all non-altruistic motives for giving. Many people have tried, but they have been unsuccessful. It is hard to eliminate all emotions, cognitions, awareness of the donor about the consequences of the donation.

Q5. I mean, isn’t all giving in the end also about helping ourselves, like when you’re feeling good about giving?

  • U5. That could be right, we can’t rule out the ‘warm glow’ without blowing out the candle. But if you would only be interested in feeling good, then having a chocolate bar might be a lot cheaper.

Q6. Why do people volunteer?

  • A1. See U2 above. In many respects, giving money is like giving time.

Q7. Are you a generous man yourself? What do you give to charity?

  • U6. I am not at liberty to answer this question.

Q8. How much do we give in the Netherlands?

  • A2. Read all about the numbers in our Giving in the Netherlands volume, published biennially. A summary in English is here. The latest estimates are about 2011. On April 23, 2015, we will publish new estimates (about 2013).

Q9. Is it true that the Dutch are a very generous population?

Q10. Is altruism part of human nature?

  • U8. I will answer this question with the only decent scientific answer a scientist can ever give: “Well, it depends”. In this case, it all depends on what you call ‘altruism’ (and ‘human nature’ of course). If you view helping in the absence of rewards spontaneously and repeatedly toward humans and conspecifics as altruism, then chimpanzees are altruistic; if you view cooperation in order to maintain mating access to single females against other males as altruism, bottlenose dolphins are altruistic; and if you view promoting chances of survival of your genes as altruism even maize plants can be  altruistic.

Hattips to Roel van Geene and Melissa Brown

Update: 16 July 2014

Leave a comment

Filed under altruism, charitable organizations, household giving, philanthropy, psychology, volunteering