Category Archives: methodology

A Data Transparency Policy for Results Based on Experiments


Transparency is a key condition for robust and reliable knowledge, and the advancement of scholarship over time. Since January 1, 2020, I am the Area Editor for Experiments submitted to Nonprofit & Voluntary Sector Quarterly (NVSQ), the leading journal for academic research in the interdisciplinary field of nonprofit research. In order to improve the transparency of research published in NVSQ, the journal is introducing a policy requiring authors of manuscripts reporting on data from experiments to provide, upon submission, access to the data and the code that produced the results reported. This will be a condition for the manuscript to proceed through the blind peer review process.

The policy will be implemented as a pilot for papers reporting results of experiments only. For manuscripts reporting on other types of data, the submission guidelines will not be changed at this time.



This policy is a step forward strengthening research in our field through greater transparency about research design, data collection and analysis. Greater transparency of data and analytic procedures will produce fairer, more constructive reviews and, ultimately, even higher quality articles published in NVSQ. Reviewers can only evaluate the methodologies and findings fully when authors describe the choices they made and provide the materials used in their study.

Sample composition and research design features can affect the results of experiments, as can sheer coincidence. To assist reviewers and readers in interpreting the research, it is important that authors describe relevant features of the research design, data collection, and analysis. Such details are also crucial to facilitate replication. NVSQ receives very few, and thus rarely publishes replications, although we are open to doing so. Greater transparency will facilitate the ability to reinforce, or question, research results through replication (Peters, 1973; Smith, 1994; Helmig, Spraul & Temp, 2012).

Greater transparency is also good for authors. Articles with open data appear to have a citation advantage: they are cited more frequently in subsequent research (Colavizza et al., 2020; Drachen et al., 2016). The evidence is not experimental: the higher citation rank of articles providing access to data may be a result of higher research quality. Regardless of whether the policy improves the quality of new research or attracts higher quality existing research – if higher quality research is the result, then that is exactly what we want.

Previously, the official policy of our publisher, SAGE, was that authors were ‘encouraged’ to make the data available. It is likely though that authors were not aware of this policy because it was not mentioned on the journal website. In any case, this voluntary policy clearly did not stimulate the provision of data because data are available for only a small fraction of papers in the journal. Evidence indicates that a data sharing policy alone is ineffective without enforcement (Stodden, Seiler, & Ma, 2018; Christensen et al., 2019). Even when authors include a phrase in their article such as ‘data are available upon request,’ research shows that this does not mean that authors comply with such requests (Wicherts et al., 2006; Krawczyk & Reuben, 2012). Therefore, we are making the provision of data a requirement for the assignment of reviewers.


Data Transparency Guidance for Manuscripts using Experiments

Authors submitting manuscripts to NVSQ in which they are reporting on results from experiments are kindly requested to provide a detailed description of the target sample and the way in which the participants were invited, informed, instructed, paid, and debriefed. Also, authors are requested to describe all decisions made and questions answered by the participants and provide access to the stimulus materials and questionnaires. Most importantly, authors are requested to share the data and code that produced the reported findings available for the editors and reviewers. Please make sure you do so anonymously, i.e. without identifying yourself as an author of the manuscript.

When you submit the data, please ensure that you are complying with the requirements of your institution’s Institutional Review Board or Ethics Review Committee, the privacy laws in your country such as the GDPR, and other regulations that may apply. Remove personal information from the data you provide (Ursin et al., 2019). For example, avoid logging IP and email addresses in online experiments and any other personal information of participants that may identify their identities.

The journal will not host a separate archive. Instead, deposit the data at a platform of your choice, such as Dataverse, Github, Zenodo, or the Open Science Framework. We accept data in Excel (.xls, .csv), SPSS (.sav, .por) with syntax (.sps), data in Stata (.dta) with a do-file, and projects in R.

When authors have successfully submitted the data and code along with the paper, the Area Editor will verify whether the data and code submitted actually produce the results reported. If (and only if) this is the case, then the submission will be sent out to reviewers. This means that reviewers will not have to verify the computational reproducibility of the results. They will be able to check the integrity of the data and the robustness of the results reported.

As we introduce the data availability policy, we will closely monitor the changes in the number and quality of submissions, and their scholarly impact, anticipating both collective and private benefits (Popkin, 2019). We have scored the data transparency of 20 experiments submitted in the first six months of 2020, using a checklist counting 49 different criteria. In 4 of these submissions some elements of the research were preregistered. The average transparency was 38 percent. We anticipate that the new policy improves transparency scores.

The policy takes effect for new submissions on July 1, 2020.


Background: Development of the Policy

The NVSQ Editorial Team has been working on policies for enhanced data and analytic transparency for several years, moving forward in a consultative manner.  We established a Working Group on Data Management and Access which provided valuable guidance in its 2018 report, including a preliminary set of transparency guidelines for research based on data from experiments and surveys, interviews and ethnography, and archival sources and social media. A wider discussion of data transparency criteria was held at the 2019 ARNOVA conference in San Diego, as reported here. Participants working with survey and experimental data frequently mentioned access to the data and code as a desirable practice for research to be published in NVSQ.

Eventually, separate sets of guidelines for each type of data will be created, recognizing that commonly accepted standards vary between communities of researchers (Malicki et al., 2019; Beugelsdijk, Van Witteloostuijn, & Meyer, 2020). Regardless of which criteria will be used, reviewers can only evaluate these criteria when authors describe the choices they made and provide the materials used in their study.



Beugelsdijk, S., Van Witteloostuijn, A. & Meyer, K.E. (2020). A new approach to data access and research transparency (DART). Journal of International Business Studies,

Christensen, G., Dafoe, A., Miguel, E., Moore, D.A., & Rose, A.K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS ONE 14(12): e0225883.

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416,

Drachen, T.M., Ellegaard, O., Larsen, A.V., & Dorch, S.B.F. (2016). Sharing Data Increases Citations. Liber Quarterly, 26 (2): 67–82.

Helmig, B., Spraul, K. & Tremp, K. (2012). Replication Studies in Nonprofit Research: A Generalization and Extension of Findings Regarding the Media Publicity of Nonprofit Organizations. Nonprofit and Voluntary Sector Quarterly, 41(3): 360–385.

Krawczyk, M. & Reuben, E. (2012). (Un)Available upon Request: Field Experiment on Researchers’ Willingness to Share Supplementary Materials. Accountability in Research, 19:3, 175-186,

Malički, M., Aalbersberg, IJ.J., Bouter, L., & Ter Riet, G. (2019). Journals’ instructions to authors: A cross-sectional study across scientific disciplines. PLoS ONE, 14(9): e0222157.

Peters, C. (1973). Research in the Field of Volunteers in Courts and Corrections: What Exists and What Is Needed. Journal of Voluntary Action Research, 2 (3): 121-134.

Popkin, G. (2019). Data sharing and how it can benefit your scientific career. Nature, 569: 445-447.

Smith, D.H. (1994). Determinants of Voluntary Association Participation and Volunteering: A Literature Review. Nonprofit and Voluntary Sector Quarterly, 23 (3): 243-263.

Stodden, V., Seiler, J. & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. PNAS, 115(11): 2584-2589.

Ursin, G. et al., (2019), Sharing data safely while preserving privacy. The Lancet, 394: 1902.

Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726-728.

Working Group on Data Management and Access (2018). A Data Availability Policy for NVSQ. April 15, 2018.

Leave a comment

Filed under academic misconduct, data, experiments, fraud, methodology, open science, statistical analysis

A Conversation About Data Transparency

The integrity of the research process serves as the foundation for excellence in research on nonprofit and voluntary action. While transparency does not guarantee credibility, it guarantees you will get the credibility you deserve. Therefore we are developing criteria for transparency standards with regards to the reporting of methods and data.

We started this important conversation at the 48th ARNOVA Conference in San Diego, on Friday, November 22, 2019. In the session, we held a workshop to survey which characteristics of data and methods transparency that help review research and utilize past work as building blocks for future research.

This session was well attended and very interactive. After a short introduction by the editors of NVSQ, the leading journal in the field, we split up in three groups of researchers that work with the same type of data. One group for data from interviews, one for survey data, and one for administrative data such as 990s. In each group we first took 10 minutes for ourselves, formulating criteria for transparency that allow readers to assess the quality of research. All participants received colored sticky notes, and wrote down one idea per note: laudable indicators on green notes, and bad signals on red notes.


Next, we put the notes on the wall and grouped them. Each cluster received a name on a yellow note. Finally, we shared the results of the small group sessions with the larger group.


Though the different types of data to some extent have their own quality indicators, there were striking parallels in the match between theory and research design, ethics, sampling, measures, analysis, coding, interpretation, and write-up of results. After the workshop, we collected the notes. I’ve summarized the results in a report about the workshop. In a nutshell, all groups distinguished five clusters of criteria:

  • A. Meta-criteria: transparency about the research process and the data collection in particular;
  • B. Before data collection: research design and sampling;
  • C. Characteristics of the data as presented: response, reliability, validity;
  • D. Decisions about data collected: analysis and causal inference;
  • E. Write-up: interpretation of and confidence in results presented.


Here is the full report about the workshop. Do you have suggestions about the report? Let me know!

1 Comment

Filed under data, experiments, methodology, open science, survey research

Global Giving: Open Grant Proposal

Here’s an unusual thing for you to read: I am posting a brief description of a grant proposal that I will submit for the ‘vici’-competition of the Netherlands Organization for Scientific Research 2019 later this year. You can download the “pre-proposal” here. It is called “Global Giving”. With the study I aim to describe and explain philanthropy in a large number of countries across the world. I invite you to review the “pre-proposal” and suggest improvements; please use the comments box below, or write to me directly.

You may have heard the story that university researchers these days spend a lot of their time writing grant proposals for funding competitions. Also you may have heard the story that chances of success in such competitions are getting smaller and smaller. These stories are all true. But the story you seldom hear is how such competitions actually work: they are a source of stress, frustration, burnouts and depression, and a complete waste of the precious time of the smartest people in the world. Recently, Gross and Bergstrom found that “the effort researchers waste in writing proposals may be comparable to the total scientific value of the research that the funding supports”.

Remember the last time you saw the announcement of prize winners in a research grant competition? I have not heard a single voice in the choir of the many near-winners speak up: “Hey, I did not get a grant!” It is almost as if everybody wins all the time. It is not common in academia to be open about failures to win. How many vitaes you have seen recently contain a list of failures? This is a grave distortion of reality. Less than one in ten applications is succesful. This means that for each winning proposal there are at least nine proposals that did not get funding. I want you to know how much time is wasted by this procedure. So here I will be sharing my experiences with the upcoming ‘vici’-competition.


First let me tell you about the funny name of the competition. The name ‘vici’ derives from roman emperor Caesar’s famous phrase in Latin: ‘veni, vidi, vici’, which he allegedly used to describe a swift victory. The translation is: “I came, I saw, I conquered”. The Netherlands Organization for Scientific Research (‘Nederlandse organisatie voor Wetenschappelijk Onderzoek’, NWO) thought it fitting to use these names as titles of their personal grant schemes. The so-called ‘talent schemes’ are very much about the personal qualities of the applicant. The scheme heralds heroes. The fascination with talent goes against the very nature of science, where the value of an idea, method or result is not measured by the personality of the author, but by its validity and reliability. That is why peer review is often double blind and evaluators do not know who wrote the research report or proposal.


Yet in the talent scheme, the personality of the applicant is very important. The fascination with talent creates Matthew effects, first described in 1968 by Robert K. Merton. The name ‘Matthew effect’ derives from the biblical phrase “For to him who has will more be given” (Mark 4:25). Simply stated: success breeds success. Recently, this effect has been documented in the talent scheme by Thijs Bol, Matthijs de Vaan and Arnout van de Rijt. When two applicants are equally good but one – by mere chance – receives a grant and the other does not, the ‘winner’ is ascribed with talent and the ‘loser’ is not. The ‘winner’ then gets a tremendously higher chance of receiving future grants.

As a member of committees for the ‘veni’ competition I have seen how this works in practice. Applicants received scores for the quality of their proposal from expert reviewers before we interviewed them. When we had minimal differences between the expert reviewer scores of candidates – differing only in the second decimal – personal characteristics of the researchers such as their self-confidence and manner of speaking during the interview often made the difference between ‘winners’ and ‘losers’. Ultimately, such minute differences add up to dramatically higher chances to be a full professor 10 years later, as the analysis in Figure 4 of the Bol, De Vaan & Van de Rijt paper shows.


My career is in this graph. In 2005, I won a ‘veni’-grant, the early career grant that the Figure above is about. The grant gave me a lot of freedom for research and I enjoyed it tremendously. I am pretty certain that the freedom that the grant gave me paved the way for the full professorship that I was recently awarded, thirteen years later. But back then, the size of the grant did not feel right. I felt sorry for those who did not make it. I knew I was privileged, and the research money I obtained was more than I needed. It would be much better to reduce the size of grants, so that a larger number of researchers can be funded. Yet the scheme is there, and it is a rare opportunity for researchers in the Netherlands to get funding for their own ideas.

This is my third and final application for a vici-grant. The rules for submission of proposals in this competition limit the number of attempts to three. Why am I going public with this final attempt?

The Open Science Revolution

You will have heard about open science. Most likely you will associate it with the struggle to publish research articles without paywalls, the exploitation of government funded scientists by commercial publishers, and perhaps even with Plan S. You may also associate open science with the struggle to get researchers to publish the data and the code they used to get to their results. Perhaps you have heard about open peer review of research publications. But most likely you will not have heard about open grant review. This is because it rarely happens. I am not the first to publish my proposal; the Open Grants repository currently contains 160 grant proposals. These proposals were shared after the competitions had run. The RIO Journal published 52 grant proposals. This is only a fraction of all grant proposals being created, submitted and reviewed. The many advantages of open science are not limited to funded research, they also apply to research ideas and proposals. By publishing my grant proposal before the competition, the expert reviews, the recommendations of the committee, my responses and experiences with the review process, I am opening up the procedure of grant review as much as possible.

Stages in the NWO Talent Scheme Grant Review Procedure

Each round of this competition takes almost a year, and proceeds in eight stages:

  1. Pre-application – March 26, 2019 <– this is where we are now
  2. Non-binding advice from committee: submit full proposal, or not – Summer 2019
  3. Full proposal – end of August 2019
  4. Expert reviews – October 2019
  5. Rebuttal to criticism in expert reviews – end of October 2019
  6. Selection for interview – November 2019
  7. Interview – January or February 2020
  8. Grant, or not – March 2020

If you’re curious to learn how this application procedure works in practice,
check back in a few weeks. Your comments and suggestions on the ideas above and the pre-proposal are most welcome!

Leave a comment

Filed under altruism, charitable organizations, data, economics, empathy, experiments, fundraising, happiness, helping, household giving, incentives, methodology, open science, organ donation, philanthropy, politics, principle of care, psychology, regression analysis, regulation, sociology, statistical analysis, survey research, taxes, trends, trust, volunteering, wealth

Uncertain Future for Giving in the Netherlands Panel Survey

By Barbara Gouwenberg and René Bekkers

At the Center for Philanthropic Studies we have been working hard to secure funding for three rounds of funding for the Giving in the Netherlands Study, including the Giving in the Netherlands Panel Survey for the years 2020-2026. During the previous round of the research, the ministry of Justice and Security has said that it would no longer fund the study on its own, because the research is important not only for the government but also for the philanthropic sector. The national government no longer sees itself as the sole funder of the research.

The ministry does think the research is important and is prepared to commit funding for the research in the form of a 1:1 matching subsidy to contributions received by VU Amsterdam from other funders. To strengthen the societal relevance and commitment for the Giving in the Netherlands study the Center has engaged in a dialogue with relevant stakeholders, including the council of foundations, the association of fundraising organizations, and several endowed foundations and fundraising charities in the Netherlands. The goal of these talks was to get science and practice closer together. From these talks we have gained three important general insights:

  • The Giving in the Netherlands study contributes to the visibility of philanthropy in the Netherlands. This is important for the legitimacy of an autonomous and growing sector.
  • It is important to engage in a conversation with relevant stakeholders before the fieldwork for a next round starts, in order to align the research more strongly with practice.
  • After the analyses have been completed, communication with relevant stakeholders about the results should be improved. Stakeholders desire more conversations about the application of insights from the research in practice.

The center includes these issues in the plans for the upcoming three editions. VU Amsterdam has been engaged in conversations with branch organizations and individual foundations in the philanthropic sector for a long time, in order to build a sustainable financial model for the future of the research. However, at the moment we do not have the funds together to continue the research. That is why we did not collect data for the 2018 wave of the Giving in the Netherlands Panel Survey. As a result, we will not publish estimates for the size and composition of philanthropy in the Netherlands in spring 2019. We do hope that after this gap year we can restart the research next year, with a publication of new estimates in 2020.

Your ideas and support are very welcome at


Filed under Center for Philanthropic Studies, charitable organizations, contract research, data, experiments, foundations, fundraising, household giving, methodology, Netherlands, philanthropy, policy evaluation, statistical analysis, survey research

Onderzoek Geven in Nederland in gevaar

Door Barbara Gouwenberg – uit de nieuwsbrief van de werkgroep Filantropische Studies aan de VU (december 2018)

Het Centrum voor Filantropische Studies werkt momenteel met man en macht om de financiering voor het onderzoek Geven in Nederland voor de komende 6 jaar (3 edities) veilig te stellen. Het Ministerie van Justitie en Veiligheid (J&V) heeft bij de opzet van Geven in Nederland 2017 medio 2015 te kennen gegeven dat het onderzoek niet langer alleen door de overheid zal worden gefinancierd, met als belangrijkste argumentatie dat het onderzoek van belang is voor overheid én sector filantropie. De overheid ziet zichzelf niet langer als enige verantwoordelijke voor de financiering van het onderzoek.

Het Ministerie van J&V wil zich wel voor een langere tijd structureel verbinden aan Geven in Nederland en geeft 1:1 matching voor financiële bijdragen die de VU vanuit de sector ontvangt.

Om de maatschappelijke relevantie van – en commitment voor – het onderzoek Geven in Nederland te versterken heeft de VU de afgelopen maanden de dialoog opgezocht met diverse relevante doelgroepen. Doel: wetenschap en praktijk dichter bij elkaar brengen.

Deze rondgang heeft ons – naast specifieke inzichten – drie belangrijke algemene inzichten opgeleverd; te weten:

  • ‘Geven in Nederland’ draagt bij aan de zichtbaarheid van maatschappelijk initiatief in Nederland. Belangrijk ter legitimatie van een zelfstandige en snel groeiende sector.
  • Communicatie met relevante doelgroepen vóór de start van het onderzoek dient verbeterd te worden met als doel om inhoudelijk beter aansluiting te vinden bij praktijk en beleid.
  • Communicatie over onderzoeksresultaten naar relevante doelgroepen dient verbeterd te worden. Het gaat dan om de praktische toepasbaarheid van het onderzoek, de vertaling van de onderzoeksresultaten naar de praktijk.

De onderzoekers nemen deze verbeterpunten mee in hun plan van aanpak voor de komende drie edities. De VU is al enige tijd in gesprek met de brancheorganisaties en individuele fondsen om tot een duurzaam financieringsmodel voor de toekomst te komen. Op dit moment is de continuering van het onderzoek echter nog niet gegarandeerd. Dat betekent dat er helaas geen Geven in Nederland 2019 komt en dus ook geen presentatie van de nieuwe onderzoeksresultaten zoals u van ons gewend bent op de Dag van de Filantropie. We spreken echter onze hoop uit dat we zeer binnenkort met een Geven in Nederland 2020 kunnen starten!

Leave a comment

Filed under Center for Philanthropic Studies, charitable organizations, contract research, data, foundations, fundraising, household giving, methodology, Netherlands, open science, philanthropy, statistical analysis, survey research, trends, VU University

Multiple comparisons in a regression framework

Gordon Feld posted a comparison of results from a repeated measures ANOVA with paired samples t-tests.

Using Stata, I wondered how these results would look in a regression framework. For those of you who want to replicate this: I used the data provided by Gordon. The do-file is here. Because wordpress does not accept .do files you will have to rename the file from .docx to .do to make it work. The Stata commands are below, all in block quotes. The output is given in images. In the explanatory notes, commands are italicized, and variables are underlined.

A pdf of this post is here.

First let’s examine the data. You will have to insert your local path at which you have stored the data.

. import delimited “ANOVA_blog_data.csv”, clear

. pwcorr before_treatment after_treatment before_placebo after_placebo

These commands get us the following table of correlations:

There are some differences in mean values, from 98.8 before treatment to 105.0 after treatment. Mean values for the placebo measures are 100.8 before and 100.2 after. Across all measures, the average is 101.2035.

Let’s replicate the t-test for the treatment effect.

The increase in IQ after the treatment is 6.13144 (SE = 2.134277), which is significant in this one-sample paired t-test (p = .006). Now let’s do the t-test for the placebo conditions.

The decrease in IQ after the placebo is -.6398003 (SE = 1.978064), which is not significant (p = .7477).

The question is whether we have taken sufficient account of the nesting of the data.

We have four measures per participant: one before the treatment, one after, one before the placebo, and one after.

In other words, we have 50 participants and 200 measures.

To get the data into the nested structure, we have to reshape them.

The data are now in a wide format: one row per participant, IQ measures in different columns.

But we want a long format: 4 rows per participant, IQ in just one column.

To get this done we first assign a number to each participant.

. gen id = _n

We now have a variable id with a unique number for each of the 50 participants.
The Stata command for reshaping data requires the data to be set up in such a way that variables measuring the same construct have the same name.
We have 4 measures of IQ, so the new variables will be called iq1, iq2, iq3 and iq4.

. rename (before_treatment after_treatment before_placebo after_placebo) (iq1 iq2 iq3 iq4).

Now we can reshape the data. The command below assigns a new variable ‘mIQ’ to identify the 4 consecutive measures of IQ.

. reshape long iq, i(id) j(mIQ)

Here’s the result.

We now have 200 lines of data, each one is an observation of IQ, numbered 1 to 4 on the new variable mIQ for each participant. The variable mIQ indicates the order of the IQ measurements.

Now we identify the structure of the two experiments. The first two measures in the data are for the treatment pre- and post-measures.

. replace treatment = 1 if mIQ < 3 (100 real changes made) . replace treatment = 0 if mIQ > 2
(100 real changes made)

Observations 3 and 4 are for the placebo pre- and post-measures.

. replace placebo = 0 if mIQ < 3 (100 real changes made) . replace placebo = 1 if mIQ > 2
(100 real changes made)

. tab treatment placebo

We have 100 observations in each of the experiments.

OK, we’re ready for the regressions now. Let’s first conduct an OLS to quantify the changes within participants in the treatment and placebo conditions.

The regression shows that the treatment increased IQ by 6.13144 points, but with an SE of 3.863229 the change is not significant (p = .116). The effect estimate is correct, but the SE is too large and hence the p-value is too high as well.

. reg iq mIQ if placebo == 1

The placebo regression shows the familiar decline of .6398003, but with an SE of 3.6291, which is too high (p = .860). The SE and p-values are incorrect because OLS does not take the nested structure of the data into account.

With the xtset command we identify the nesting of the data: measures of IQ (mIQ) are nested within participants (id).

. xtset id mIQ

First we run an ’empty model’ – no predictors are included.

. xtreg iq

Here’s the result:

Two variables in the output are worth commenting on.

  1. The constant (_cons) is the average across all measures, 101.2033. This is very close to the average we have seen before.
  2. The rho is the intraclass correlation – the average correlation of the 4 IQ measures within individuals. It is .7213, which seems right.

Now let’s replicate the t-test results in a regression framework.

. xtreg iq mIQ if treatment == 1

In the output below we see the 100 observations in 50 groups (individuals). We obtain the same effect estimate of the treatment as before (6.13144) and the correct SE of 2.134277, but the p-value is too small (p = .004).

Let’s fix this. We put fixed effects on the participants by adding , fe at the end of the xtreg command:

. xtreg iq mIQ if treatment == 1, fe

We now get the accurate p-value (0.006):

Let’s run the same regression for the placebo conditions.

. xtreg iq mIQ if placebo == 1, fe

The placebo effect is the familiar -.6398003, SE = 1.978064, now with the accurate p-value of .748.

Leave a comment

Filed under data, experiments, methodology, regression, regression analysis, statistical analysis, survey research

Introducing Mega-analysis

How to find truth in an ocean of correlations – with breakers, still waters, tidal waves, and undercurrents? In the old age of responsible research and publication, we would collect estimates reported in previous research, and compute a correlation across correlations. Those days are long gone.

In the age of rat race research and publication it became increasingly difficult to do a meta-analysis. It is a frustrating experience for anyone who has conducted one: endless searches on the Web of Science and Google Scholar to collect all published research, input the estimates in a database, find that a lot of fields are blank, email authors for zero-order correlations and other statistics they had failed to report in their publications and get very little response.

Meta-analysis is not only a frustrating experience, it is also a bad idea when results that authors do not like do not get published. A host of techniques have been developed to find and correct publication bias, but the problem that we do not know the results that do not get reported is not solved easily.

As we enter the age of open science,  we do not have to rely any longer on the far from perfect cooperation from colleagues who have moved to a different university, left academia, died, or think you’re trying to prove them wrong and destroy their career – and yours in retribution. We can simply download all the raw data and analyze them.

Enter mega-analysis: include all the data points relevant for a certain hypothesis, cluster them by original publication, date, country, or any potentially relevant property of the research design, and add the substantial predictors you find documented in the literature. The results reveal not only the underlying correlations between substantial variables, but also the differences between studies, periods, countries and design properties that affect these correlations.

The method itself is not new. In epidemiology, Steinberg et al. (1997) labeled it ‘meta-analysis of individual patient data’. In human genetics, genome wide association studies (GWAS) by large international consortia are common examples of mega-analysis.

Mega-analysis includes the file-drawer of papers that never saw the light of day after they were put in. It also includes the universe of papers that have never been written because the results were unpublishable.

If meta-analysis gives you an estimate for the universe of published research, mega-analysis can be used to detect just how unique that universe is in the milky way. My prediction would be that correlations in published research are mostly further from zero than the same correlation in a mega-analysis.

Mega-analysis bears great promise for the social sciences. Samples for population surveys are large, which enables optimal learning from variations in sampling procedures, data collection mode, and questionnaire design. It is time for a Global Social Science Consortium that pools all of its data. As an illustration, I have started a project on the Open Science Framework that mega-analyzes generalized social trust. It is a public project: anyone can contribute. We have reached mark of 1 million observations.

The idea behind mega-analysis originated from two different projects. In the first project, Erik van Ingen and I analyzed the effects of volunteering on trust, to check if results from an analysis of the Giving in the Netherlands Panel Survey (Van Ingen & Bekkers, 2015) would replicate with data from other panel studies. We found essentially the same results in five panel studies, although subtle differences emerged in the quantative estimates. In the second project, with Arjen de Wit and colleagues from the Center for Philanthropic Studies at VU Amsterdam, we analyzed the effects of volunteering on well-being conducted as part of the EC-FP7 funded ITSSOIN study. We collected 845.733 survey responses from 154.970 different respondents in six panel studies, spanning 30 years (De Wit, Bekkers, Karamat Ali & Verkaik, 2015). We found that volunteering is associated with a 1% increase in well-being.

In these projects, the data from different studies were analyzed separately. I realized that we could learn much more if the data are pooled in one single analysis: a mega-analysis.


De Wit, A., Bekkers, R., Karamat Ali, D., & Verkaik, D. (2015). Welfare impacts of participation. Deliverable 3.3 of the project: “Impact of the Third Sector as Social Innovation” (ITSSOIN), European Commission – 7th Framework Programme, Brussels: European Commission, DG Research.

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel StudiesPolitical Psychology, 36 (3): 277-294.

Steinberg, K.K., Smith, S.J., Stroup, D.F., Olkin, I., Lee, N.C., Williamson, G.D. & Thacker, S.B. (1997). Comparison of Effect Estimates from a Meta-Analysis of Summary Data from Published Studies and from a Meta-Analysis Using Individual Patient Data for Ovarian Cancer Studies. American Journal of Epidemiology, 145: 917-925.

1 Comment

Filed under data, methodology, open science, regression analysis, survey research, trends, trust, volunteering

Four Reasons Why We Are Converting to Open Science

The Center for Philanthropic Studies I am leading at VU Amsterdam is converting to Open Science.

Open Science offers four advantages to the scientific community, nonprofit organizations, and the public at large:

  1. Access: we make our work more easily accessible for everyone. Our research serves public goods, which are served best by open access.
  2. Efficiency: we make it easier for others to build on our work, which saves time.
  3. Quality: we enable others to check our work, find flaws and improve it.
  4. Innovation: ultimately, open science facilitates the production of knowledge.

What does the change mean in practice?

First, the source of funding for contract research we conduct will always be disclosed.

Second, data collection – interviews, surveys, experiments – will follow a prespecified protocol. This includes the number of observations forseen, the questions to be asked, measures to be included, hypotheses to be tested, and analyses to be conducted. New studies will be preferably be preregistered.

Third, data collected and the code used to conduct the analyses will be made public, through the Open Science Framework for instance. Obviously, personal or sensitive data will not be made public.

Fourth, results of research will preferably be published in open access mode. This does not mean that we will publish only in Open Access journals. Research reports and papers for academic will be made available online in working paper archives, as a ‘preprint’ version, or in other ways.


December 16, 2015 update:

A fifth reason, following directly from #1 and #2, is that open science reduces the costs of science for society.

See this previous post for links to our Giving in the Netherlands Panel Survey data and questionnaires.


July 8, 2017 update:

A public use file of the Giving in the Netherlands Panel Survey and the user manual are posted at the Open Science Framework.

1 Comment

Filed under academic misconduct, Center for Philanthropic Studies, contract research, data, fraud, incentives, methodology, open science, regulation, survey research

The Fishy Business of Philanthropy

Update, December 6, 2019: the paper discussed below reports an unlikely large effect size, and is co-authored by a researcher who has been investigated for research misconduct. The report does not mention this particular paper.

Breaking news today: the essential amino acid L-Tryptophan (TRP) makes people generous! Three psychologists at the University of Leiden, Laura Steenbergen, Roberta Sellara, and Lorenza Colzato, report that 16 participants in an experiment were secretly given a dose of TRP, solved in a glass of orange juice. The 16 other participants in the study drank plain orange juice, without TRP. The psychologists did not write where the experiment was conducted, but describe the participants as 28 female and 4 male students in southern Europe – which is likely to be Italy, given the names of the second and third authors. Next, the participants were kept busy for 30 minutes with an ‘attentional blink task that requires the detection of two targets in a rapid visual on-screen presentation’. After they had completed a task, they were given a reward of €10. Then the participants were given an opportunity to donate to four charities: Unicef, Amnesty International, Greenpeace, and World Wildlife Fund. And behold the wonders of L-Tryptophan: the 0,8 grams of TRP more than doubled the amount donated from €0.47 (yes, that is less than five percent of the €10 earned) to €1.00. Even though the amount donated is small, the reported increase due to TRP is huge: +112%.

Why is this good to know? Why would tryptophan increase generosity? Steenbergen, Sellara and Colzato reasoned that TRP influences synthesis of the neurotransmitter serotonin (called 5-HT), which has been found to be associated with charitable giving in several economic experiments. The participants in the experiment were not tested for serotonin levels, but the results seem consistent with these previous experiments. The new experiment takes us one step further into the biology of charity, by showing that the intake of food enriched by tryptohan is making female students in Italy more generous to charity.

Tryptophan is an essential amino acid, commonly found in protein-rich foods such as chocolate, eggs, milk, poultry, fish, and spinach. Rense Corten, a former colleague of mine, asked on Twitter: how much spinach the participants would have had to digest to obtain a TRP intake that would make them give an additional €1 to charity? Just for fun I computed this: it is about 438 grams of spinach. Less than the 1161 grams of chocolate it would take to generate the same dose of TRP as the participants got in their orange juice.

The fairly low level of giving in the experiment is somewhat surprising given the overall level of charitable giving in Italy. According to the Gallup World Poll some 62% of Italians made donations to charity in 2011, ranking the country 14th in the world. But wait – Italians eat quite some fish, don’t they? If there is a lot of tryptophan in fish, Italians should be more generous than inhabitants of other countries that consume less fish. Indeed the annual fish consumption per capita in Italy (some 25 kilograms, ranking the country 14th in the world) is much higher than in the Czech Republic (10 kilograms; rank: 50), and the Czech population is less likely to give to charity (31%, rank: 30).

Of course this comparison of just two countries in Europe is not representative of the any part of the world. And yes, it is cherry-picked: an initial comparison with the land locked neighboring country of Austria (14 kilograms of fish per year, much less than in Italy) did not yield a result in the same direction. In Austria, 69% gives, a bit higher than in Italy. But lining up all countries in the world for which there are data on fish consumption and engagement in charity does yield a positive correlation between the two. Note that a low rank indicates a high proportion of the population engaging in charity and a high consumption of fish. Here is the excel file including the data. The relationship is modest (r = .30), but still: we now know that inhabitants of countries that consume more fish per capita are somewhat more likely to give to charity.


Leave a comment

Filed under experiments, household giving, methodology, philanthropy

Why a high R Square is not necessarily better

Often I encounter academics thinking that a high proportion of explained variance is the ideal outcome of a statistical analysis. The idea is that in regression analyses a high R Square is better than a low R Square. In my view, the emphasis on a high R2 should be reduced. A high R2 should not be a goal in itself. The reason is that a higher R2 can easily be obtained by using procedures that actually lower the external validity of coefficients.

It is possible to increase the proportion of variance explained in regression analyses in several ways that do not in fact our ability to ‘understand’ the behavior we are seeking to ‘explain’ or ‘predict’. One way to increase the R2 is to remove anomalous observations, such as ‘outliers’ or people who say they ‘don’t know’ and treat them like the average respondent. Replacing missing data by mean scores or using multiple imputation procedures often increases the Rsquare. I have used this procedure in several papers myself, including some of my dissertation chapters.

But in fact outliers can be true values. I have seen quite a few of them that destroyed correlations and lowered R squares while being valid observations. E.g., a widower donating a large amount of money to a charity after the death of his wife. A rare case of exceptional behavior for very specific reasons that seldom occur. In larger samples these outliers may become more frequent, affecting the R2 less strongly.

Also ‘Don’t Know’ respondents are often systematically different from the average respondent. Treating them as average respondents eliminates some of the real variance that would otherwise be hard to predict.

Finally, it is often possible to increase the proportion of variance explained by including more variables. This is particularly problematic if variables that are the result of the dependent variable are included as predictors. For instance if network size is added to the prediction of volunteering the R Square will increase. But a larger network not only increases volunteering; it is also a result of volunteering. Especially if the network questions refer to the present (do you know…) while the volunteering questions refer to the past (in the past year, have you…) it is dubious to ‘predict’ volunteering in the past by a measure of current network size.

As a reviewer, I give authors reporting an R2 exceeding 40% a treatment of high-level scrutiny for dubious decisions in data handling and inclusion of variables.

As a rule, R Squares tend to be higher at higher levels of aggregation, e.g. when analyzing cross-situational tendencies in behavior rather than specific behaviors in specific contexts; or when analyzing time-series data or macro-level data about countries rather than individuals. Why people do the things they do is often just very hard to predict, especially if you try to predict behavior in a specific case.

1 Comment

Filed under academic misconduct, data, methodology, regression analysis, survey research