Category Archives: experiments

Global Giving: Open Grant Proposal

Here’s an unusual thing for you to read: I am posting a brief description of a grant proposal that I will submit for the ‘vici’-competition of the Netherlands Organization for Scientific Research 2019 later this year. You can download the “pre-proposal” here. It is called “Global Giving”. With the study I aim to describe and explain philanthropy in a large number of countries across the world. I invite you to review the “pre-proposal” and suggest improvements; please use the comments box below, or write to me directly.

You may have heard the story that university researchers these days spend a lot of their time writing grant proposals for funding competitions. Also you may have heard the story that chances of success in such competitions are getting smaller and smaller. These stories are all true. But the story you seldom hear is how such competitions actually work: they are a source of stress, frustration, burnouts and depression, and a complete waste of the precious time of the smartest people in the world. Recently, Gross and Bergstrom found that “the effort researchers waste in writing proposals may be comparable to the total scientific value of the research that the funding supports”.

Remember the last time you saw the announcement of prize winners in a research grant competition? I have not heard a single voice in the choir of the many near-winners speak up: “Hey, I did not get a grant!” It is almost as if everybody wins all the time. It is not common in academia to be open about failures to win. How many vitaes you have seen recently contain a list of failures? This is a grave distortion of reality. Less than one in ten applications is succesful. This means that for each winning proposal there are at least nine proposals that did not get funding. I want you to know how much time is wasted by this procedure. So here I will be sharing my experiences with the upcoming ‘vici’-competition.

single-shot-santa

First let me tell you about the funny name of the competition. The name ‘vici’ derives from roman emperor Caesar’s famous phrase in Latin: ‘veni, vidi, vici’, which he allegedly used to describe a swift victory. The translation is: “I came, I saw, I conquered”. The Netherlands Organization for Scientific Research (‘Nederlandse organisatie voor Wetenschappelijk Onderzoek’, NWO) thought it fitting to use these names as titles of their personal grant schemes. The so-called ‘talent schemes’ are very much about the personal qualities of the applicant. The scheme heralds heroes. The fascination with talent goes against the very nature of science, where the value of an idea, method or result is not measured by the personality of the author, but by its validity and reliability. That is why peer review is often double blind and evaluators do not know who wrote the research report or proposal.

plt132

Yet in the talent scheme, the personality of the applicant is very important. The fascination with talent creates Matthew effects, first described in 1968 by Robert K. Merton. The name ‘Matthew effect’ derives from the biblical phrase “For to him who has will more be given” (Mark 4:25). Simply stated: success breeds success. Recently, this effect has been documented in the talent scheme by Thijs Bol, Matthijs de Vaan and Arnout van de Rijt. When two applicants are equally good but one – by mere chance – receives a grant and the other does not, the ‘winner’ is ascribed with talent and the ‘loser’ is not. The ‘winner’ then gets a tremendously higher chance of receiving future grants.

As a member of committees for the ‘veni’ competition I have seen how this works in practice. Applicants received scores for the quality of their proposal from expert reviewers before we interviewed them. When we had minimal differences between the expert reviewer scores of candidates – differing only in the second decimal – personal characteristics of the researchers such as their self-confidence and manner of speaking during the interview often made the difference between ‘winners’ and ‘losers’. Ultimately, such minute differences add up to dramatically higher chances to be a full professor 10 years later, as the analysis in Figure 4 of the Bol, De Vaan & Van de Rijt paper shows.

matthew

My career is in this graph. In 2005, I won a ‘veni’-grant, the early career grant that the Figure above is about. The grant gave me a lot of freedom for research and I enjoyed it tremendously. I am pretty certain that the freedom that the grant gave me paved the way for the full professorship that I was recently awarded, thirteen years later. But back then, the size of the grant did not feel right. I felt sorry for those who did not make it. I knew I was privileged, and the research money I obtained was more than I needed. It would be much better to reduce the size of grants, so that a larger number of researchers can be funded. Yet the scheme is there, and it is a rare opportunity for researchers in the Netherlands to get funding for their own ideas.

This is my third and final application for a vici-grant. The rules for submission of proposals in this competition limit the number of attempts to three. Why am I going public with this final attempt?

The Open Science Revolution

You will have heard about open science. Most likely you will associate it with the struggle to publish research articles without paywalls, the exploitation of government funded scientists by commercial publishers, and perhaps even with Plan S. You may also associate open science with the struggle to get researchers to publish the data and the code they used to get to their results. Perhaps you have heard about open peer review of research publications. But most likely you will not have heard about open grant review. This is because it rarely happens. I am not the first to publish my proposal; the Open Grants repository currently contains 160 grant proposals. These proposals were shared after the competitions had run. The RIO Journal published 52 grant proposals. This is only a fraction of all grant proposals being created, submitted and reviewed. The many advantages of open science are not limited to funded research, they also apply to research ideas and proposals. By publishing my grant proposal before the competition, the expert reviews, the recommendations of the committee, my responses and experiences with the review process, I am opening up the procedure of grant review as much as possible.

Stages in the NWO Talent Scheme Grant Review Procedure

Each round of this competition takes almost a year, and proceeds in eight stages:

  1. Pre-application – March 26, 2019 <– this is where we are now
  2. Non-binding advice from committee: submit full proposal, or not – Summer 2019
  3. Full proposal – end of August 2019
  4. Expert reviews – October 2019
  5. Rebuttal to criticism in expert reviews – end of October 2019
  6. Selection for interview – November 2019
  7. Interview – January or February 2020
  8. Grant, or not – March 2020

If you’re curious to learn how this application procedure works in practice,
check back in a few weeks. Your comments and suggestions on the ideas above and the pre-proposal are most welcome!

Advertisements

Leave a comment

Filed under altruism, charitable organizations, data, economics, empathy, experiments, fundraising, happiness, helping, household giving, incentives, methodology, open science, organ donation, philanthropy, politics, principle of care, psychology, regression analysis, regulation, sociology, statistical analysis, survey research, taxes, trends, trust, volunteering, wealth

Uncertain Future for Giving in the Netherlands Panel Survey

By Barbara Gouwenberg and René Bekkers

At the Center for Philanthropic Studies we have been working hard to secure funding for three rounds of funding for the Giving in the Netherlands Study, including the Giving in the Netherlands Panel Survey for the years 2020-2026. During the previous round of the research, the ministry of Justice and Security has said that it would no longer fund the study on its own, because the research is important not only for the government but also for the philanthropic sector. The national government no longer sees itself as the sole funder of the research.

The ministry does think the research is important and is prepared to commit funding for the research in the form of a 1:1 matching subsidy to contributions received by VU Amsterdam from other funders. To strengthen the societal relevance and commitment for the Giving in the Netherlands study the Center has engaged in a dialogue with relevant stakeholders, including the council of foundations, the association of fundraising organizations, and several endowed foundations and fundraising charities in the Netherlands. The goal of these talks was to get science and practice closer together. From these talks we have gained three important general insights:

  • The Giving in the Netherlands study contributes to the visibility of philanthropy in the Netherlands. This is important for the legitimacy of an autonomous and growing sector.
  • It is important to engage in a conversation with relevant stakeholders before the fieldwork for a next round starts, in order to align the research more strongly with practice.
  • After the analyses have been completed, communication with relevant stakeholders about the results should be improved. Stakeholders desire more conversations about the application of insights from the research in practice.

The center includes these issues in the plans for the upcoming three editions. VU Amsterdam has been engaged in conversations with branch organizations and individual foundations in the philanthropic sector for a long time, in order to build a sustainable financial model for the future of the research. However, at the moment we do not have the funds together to continue the research. That is why we did not collect data for the 2018 wave of the Giving in the Netherlands Panel Survey. As a result, we will not publish estimates for the size and composition of philanthropy in the Netherlands in spring 2019. We do hope that after this gap year we can restart the research next year, with a publication of new estimates in 2020.

Your ideas and support are very welcome at r.bekkers@vu.nl.

2 Comments

Filed under Center for Philanthropic Studies, charitable organizations, contract research, data, experiments, foundations, fundraising, household giving, methodology, Netherlands, philanthropy, policy evaluation, statistical analysis, survey research

Closing the Age of Competitive Science

In the prehistoric era of competitive science, researchers were like magicians: they earned a reputation for tricks that nobody could repeat and shared their secrets only with trusted disciples. In the new age of open science, researchers share by default, not only with peer reviewers and fellow researchers, but with the public at large. The transparency of open science reduces the temptation of private profit maximization and the collective inefficiency in information asymmetries inherent in competitive markets. In a seminar organized by the University Library at Vrije Universiteit Amsterdam on November 1, 2018, I discussed recent developments in open science and its implications for research careers and progress in knowledge discovery. The slides are posted here. The podcast is here.

2 Comments

Filed under academic misconduct, data, experiments, fraud, incentives, law, Netherlands, open science, statistical analysis, survey research, VU University

Multiple comparisons in a regression framework

Gordon Feld posted a comparison of results from a repeated measures ANOVA with paired samples t-tests.

Using Stata, I wondered how these results would look in a regression framework. For those of you who want to replicate this: I used the data provided by Gordon. The do-file is here. Because wordpress does not accept .do files you will have to rename the file from .docx to .do to make it work. The Stata commands are below, all in block quotes. The output is given in images. In the explanatory notes, commands are italicized, and variables are underlined.

A pdf of this post is here.

First let’s examine the data. You will have to insert your local path at which you have stored the data.

. import delimited “ANOVA_blog_data.csv”, clear

. pwcorr before_treatment after_treatment before_placebo after_placebo

These commands get us the following table of correlations:

There are some differences in mean values, from 98.8 before treatment to 105.0 after treatment. Mean values for the placebo measures are 100.8 before and 100.2 after. Across all measures, the average is 101.2035.

Let’s replicate the t-test for the treatment effect.

The increase in IQ after the treatment is 6.13144 (SE = 2.134277), which is significant in this one-sample paired t-test (p = .006). Now let’s do the t-test for the placebo conditions.

The decrease in IQ after the placebo is -.6398003 (SE = 1.978064), which is not significant (p = .7477).

The question is whether we have taken sufficient account of the nesting of the data.

We have four measures per participant: one before the treatment, one after, one before the placebo, and one after.

In other words, we have 50 participants and 200 measures.

To get the data into the nested structure, we have to reshape them.

The data are now in a wide format: one row per participant, IQ measures in different columns.

But we want a long format: 4 rows per participant, IQ in just one column.

To get this done we first assign a number to each participant.

. gen id = _n

We now have a variable id with a unique number for each of the 50 participants.
The Stata command for reshaping data requires the data to be set up in such a way that variables measuring the same construct have the same name.
We have 4 measures of IQ, so the new variables will be called iq1, iq2, iq3 and iq4.

. rename (before_treatment after_treatment before_placebo after_placebo) (iq1 iq2 iq3 iq4).

Now we can reshape the data. The command below assigns a new variable ‘mIQ’ to identify the 4 consecutive measures of IQ.

. reshape long iq, i(id) j(mIQ)

Here’s the result.

We now have 200 lines of data, each one is an observation of IQ, numbered 1 to 4 on the new variable mIQ for each participant. The variable mIQ indicates the order of the IQ measurements.

Now we identify the structure of the two experiments. The first two measures in the data are for the treatment pre- and post-measures.

. replace treatment = 1 if mIQ < 3 (100 real changes made) . replace treatment = 0 if mIQ > 2
(100 real changes made)

Observations 3 and 4 are for the placebo pre- and post-measures.

. replace placebo = 0 if mIQ < 3 (100 real changes made) . replace placebo = 1 if mIQ > 2
(100 real changes made)

. tab treatment placebo

We have 100 observations in each of the experiments.

OK, we’re ready for the regressions now. Let’s first conduct an OLS to quantify the changes within participants in the treatment and placebo conditions.

The regression shows that the treatment increased IQ by 6.13144 points, but with an SE of 3.863229 the change is not significant (p = .116). The effect estimate is correct, but the SE is too large and hence the p-value is too high as well.

. reg iq mIQ if placebo == 1


The placebo regression shows the familiar decline of .6398003, but with an SE of 3.6291, which is too high (p = .860). The SE and p-values are incorrect because OLS does not take the nested structure of the data into account.

With the xtset command we identify the nesting of the data: measures of IQ (mIQ) are nested within participants (id).

. xtset id mIQ

First we run an ’empty model’ – no predictors are included.

. xtreg iq

Here’s the result:

Two variables in the output are worth commenting on.

  1. The constant (_cons) is the average across all measures, 101.2033. This is very close to the average we have seen before.
  2. The rho is the intraclass correlation – the average correlation of the 4 IQ measures within individuals. It is .7213, which seems right.

Now let’s replicate the t-test results in a regression framework.

. xtreg iq mIQ if treatment == 1

In the output below we see the 100 observations in 50 groups (individuals). We obtain the same effect estimate of the treatment as before (6.13144) and the correct SE of 2.134277, but the p-value is too small (p = .004).

Let’s fix this. We put fixed effects on the participants by adding , fe at the end of the xtreg command:

. xtreg iq mIQ if treatment == 1, fe

We now get the accurate p-value (0.006):

Let’s run the same regression for the placebo conditions.

. xtreg iq mIQ if placebo == 1, fe


The placebo effect is the familiar -.6398003, SE = 1.978064, now with the accurate p-value of .748.

Leave a comment

Filed under data, experiments, methodology, regression, regression analysis, statistical analysis, survey research

Research internship @VU Amsterdam

Social influences on prosocial behaviors and their consequences

While self-interest and prosocial behavior are often pitted against each other, it is clear that much charitable giving and volunteering for good causes is motivated by non-altruistic concerns (Bekkers & Wiepking, 2011). Helping others by giving and volunteering feels good (Dunn, Aknin & Norton, 2008). What is the contribution of such helping behaviors on happiness?

The effect of helping behavior on happiness is easily overestimated using cross-sectional data (Aknin et al., 2013). Experiments provide the best way to eradicate selection bias in causal estimates. Monozygotic twins provide a nice natural experiment to investigate unique environmental influences on prosocial behavior and its consequences for happiness, health, and trust. Any differences within twin pairs cannot be due to additive genetic effects or shared environmental effects. Previous research has investigated environmental influences of the level of education and religion on giving and volunteering (Bekkers, Posthuma and Van Lange, 2017), but no study has investigated the effects of helping behavior on important outcomes such as trust, health, and happiness.

The Midlife in the United States (MIDUS) and the German Twinlife surveys provide rich datasets including measures of health, life satisfaction, and social integration, in addition to demographic and socioeconomic characteristics and measures of helping behavior through nonprofit organizations (giving and volunteering) and in informal social relationships (providing financial and practical assistance to friends and family).

In the absence of natural experiments, longitudinal panel data are required to ascertain the chronology in acts of giving and their correlates. The same holds for the alleged effects of volunteering on trust (Van Ingen & Bekkers, 2015) and health (De Wit, Bekkers, Karamat Ali, & Verkaik, 2015). Since the mid-1990s, a growing number of panel studies have collected data on volunteering and charitable giving and their alleged consequences, such as the German Socio-Economic Panel (GSOEP), the British Household Panel Survey (BHPS) / Understanding Society, the Swiss Household Panel (SHP), the Household, Income, Labour Dynamics in Australia survey (HILDA), the General Social Survey (GSS) in the US, and in the Netherlands the Longitudinal Internet Studies for the Social sciences (LISS) and the Giving in the Netherlands Panel Survey (GINPS).

Under my supervision, students can write a paper on social influences of education, religion and/or helping behavior in the form of volunteering, giving, and informal financial and social support on outcomes such as health, life satisfaction, and trust, using either longitudinal panel survey data or data on twins. Students who are interested in writing such a paper are invited to present their research questions and research design via e-mail to r.bekkers@vu.nl.

René Bekkers, Center for Philanthropic Studies, Faculty of Social Sciences, Vrije Universiteit Amsterdam

References

Aknin, L. B., Barrington-Leigh, C. P., Dunn, E. W., Helliwell, J. F., Burns, J., Biswas-Diener, R., … Norton, M. I. (2013). Prosocial spending and well-being: Cross-cultural evidence for a psychological universal. Journal of Personality and Social Psychology, 104(4), 635–652. https://doi.org/10.1037/a0031578

Bekkers, R., Posthuma, D. & Van Lange, P.A.M. (2017). The Pursuit of Differences in Prosociality Among Identical Twins: Religion Matters, Education Does Not. https://osf.io/ujhpm/ 

Bekkers, R., & Wiepking, P. (2011). A Literature Review of Empirical Studies of Philanthropy: Eight Mechanisms That Drive Charitable Giving. Nonprofit and Voluntary Sector Quarterly, 40: https://doi.org/10.1177/0899764010380927

De Wit, A., Bekkers, R., Karamat Ali, D., & Verkaik, D. (2015). Welfare impacts of participation. Deliverable 3.3 of the project: “Impact of the Third Sector as Social Innovation” (ITSSOIN), European Commission – 7th Framework Programme, Brussels: European Commission, DG Research. http://itssoin.eu/site/wp-content/uploads/2015/09/ITSSOIN_D3_3_The-Impact-of-Participation.pdf

Dunn, E. W., Aknin, L. B., & Norton, M. I. (2008). Spending Money on Others Promotes Happiness. Science, 319(5870): 1687–1688. https://doi.org/10.1126/science.1150952

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel Studies. Political Psychology, 36 (3): 277-294. https://renebekkers.files.wordpress.com/2015/05/vaningen_bekkers_15.pdf

Leave a comment

Filed under altruism, Center for Philanthropic Studies, data, experiments, happiness, helping, household giving, Netherlands, philanthropy, psychology, regression analysis, survey research, trust, volunteering

Tools for the Evaluation of the Quality of Experimental Research

pdf of this post

Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.

The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.

The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.

Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.

Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.

In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.

The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori  are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.

The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).

  • In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
  • In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
  • In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
  • In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.

Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).

Alarm bells, red flags and other warning signs

Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.

Warning signs

  • The power of the analysis is too low.
  • The results are too good to be true.
  • All hypotheses are confirmed.
  • P-values are just below critical thresholds (e.g., p<.05)
  • A groundbreaking result is reported but not replicated in another sample.
  • The data and code are not made available upon request.
  • The data are not made available upon article submission.
  • The code is not made available upon article submission.
  • Materials (manipulations, survey questions) are described superficially.
  • Descriptive statistics are not reported.
  • The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
  • The research is not preregistered.
  • No details of an IRB procedure are given.
  • Participant recruitment procedures are not described.
  • Exact details of time and location of the data collection are not described.
  • A power analysis is lacking.
  • Unusual / non-validated measures are used without justification.
  • Different dependent variables are analyzed in different studies within the same article without justification.
  • Variables are (log)transformed or recoded in unusual categories without justification.
  • Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
  • A one-sided test is reported when a two-sided test would be appropriate.
  • Test-statistics (p-values, F-values) reported are incorrect.

With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.

  • Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
  • Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
  • Full disclosure of methodological details of research submitted for publication, for instance through psychdisclosure.org is now required by major journals in psychology.
  • Apps such as Statcheck, p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.

As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.

The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:

  • Preregistration of research, for instance on Aspredicted.org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
  • Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.

A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.

References

Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554.

Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135.

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217.

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366.

Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588

Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract

1 Comment

Filed under academic misconduct, experiments, fraud, incentives, open science, psychology, survey research

The Fishy Business of Philanthropy

Breaking news today: the essential amino acid L-Tryptophan (TRP) makes people generous! Three psychologists at the University of Leiden, Laura Steenbergen, Roberta Sellara, and Lorenza Colzato, secretly gave 16 participants in an experiment a dose of TRP, solved in a glass of orange juice. The 16 other participants in the study drank plain orange juice, without TRP. The psychologists did not write where the experiment was conducted, but describe the participants as 28 female and 4 male students in southern Europe – which is likely to be Italy, given the names of the second and third authors. Next, the participants were kept busy for 30 minutes with an ‘attentional blink task that requires the detection of two targets in a rapid visual on-screen presentation’. After they had completed a task, they were given a reward of €10. Then the participants were given an opportunity to donate to four charities: Unicef, Amnesty International, Greenpeace, and World Wildlife Fund. And behold the wonders of L-Tryptophan: the 0,8 grams of TRP more than doubled the amount donated from €0.47 (yes, that is less than five percent of the €10 earned) to €1.00. Even though the amount donated is small, the increase due to TRP is huge: +112%.

Why is this good to know? Why does tryptophan increase generosity? Steenbergen, Sellara and Colzato reasoned that TRP influences synthesis of the neurotransmitter serotonin (called 5-HT), which has been found to be associated with charitable giving in several economic experiments. The participants in the experiment were not tested for serotonin levels, but the results are consistent with these previous experiments. The new experiment takes us one step further into the biology of charity, by showing that the intake of food enriched by tryptohan is making female students in Italy more generous to charity.

Tryptophan is an essential amino acid, commonly found in protein-rich foods such as chocolate, eggs, milk, poultry, fish, and spinach. Rense Corten, a former colleague of mine, asked on Twitter: how much spinach the participants would have had to digest to obtain a TRP intake that would make them give an additional €1 to charity? Just for fun I computed this: it is about 438 grams of spinach. Less than the 1161 grams of chocolate it would take to generate the same dose of TRP as the participants got in their orange juice.

The fairly low level of giving in the experiment is somewhat surprising given the overall level of charitable giving in Italy. According to the Gallup World Poll some 62% of Italians made donations to charity in 2011, ranking the country 14th in the world. But wait – Italians eat quite some fish, don’t they? If there is a lot of tryptophan in fish, Italians should be more generous than inhabitants of other countries that consume less fish. Indeed the annual fish consumption per capita in Italy (some 25 kilograms, ranking the country 14th in the world) is much higher than in the Czech Republic (10 kilograms; rank: 50), and the Czech population is less likely to give to charity (31%, rank: 30).

Of course this comparison of just two countries in Europe is not representative of the any part of the world. And yes, it is cherry-picked: an initial comparison with Austria (14 kilograms of fish per year, much less than in Italy) did not yield a result in the same direction (69% gives, more than in Italy). But lining up all countries in the world for which there are data on fish consumption and engagement in charity does yield a positive correlation between the two. Here is the excel file including the data. The relationship is modest (r = .30), but still: we now know that inhabitants of countries that consume more fish per capita are somewhat more likely to give to charity.

fishconsumption_givingtocharities

Leave a comment

Filed under experiments, household giving, methodology, philanthropy