Five Reasons Why Social Science is So Hard 

1. No Laws 

All we have is probabilities. 

2. All Experts 

The knowledge we have is continuously contested. The objects of study think they know why they do what they do. 

3. Zillions of Variables 

Everything is connected, and potentially a cause – like a bowl of well-tossed spaghetti. 

4. Many Levels of Action 

Nations, organizations, networks, individuals, time all have different dynamics. 

5. Imprecise Measures 

Few instruments have near perfect validity and reliability. 

Conclusion
 

Social science is not as easy as rocket science. It is way more complicated.

Leave a comment

Filed under Uncategorized

Tools for the Evaluation of the Quality of Experimental Research

pdf of this post

Experiments can have important advantages above other research designs. The most important advantage of experiments concerns internal validity. Random assignment to treatment reduces the attribution problem and increases the possibilities for causal inference. An additional advantage is that control over participants reduces heterogeneity of treatment effects observed.

The extent to which these advantages are realized in the data depends on the design and execution of the experiment. Experiments have a higher quality if the sample size is larger, the theoretical concepts are more reliably measured, and have a higher validity. The sufficiency of the sample size can be checked with a power analysis. For most effect sizes in the social sciences, which are small (d = 0.2), a sample of 1300 participants is required to detect it at conventional significance levels (p < .05) and 95% power (see appendix). Also for a stronger effect size (0.4) more than 300 participants are required. The reliability of normative scale measures can be judged with Cronbach’s alpha. A rule of thumb for unidimensional scales is that alpha should be at least .63 for a scale consisting of 4 items, .68 for 5 items, .72 for 6 items, .75 for 7 items, and so on. The validity of measures should be justified theoretically and can be checked with a manipulation check, which should reveal a sizeable and significant association with the treatment variables.

The advantages of experiments are reduced if assignment to treatment is non-random and treatment effects are confounded. In addition, a variety of other problems may endanger internal validity. Shadish, Cook & Campbell (2002) provide a useful list of such problems.

Also it should be noted that experiments can have important disadvantages. The most important disadvantage is that the external validity of the findings is limited to the participants in the setting in which their behavior was observed. This disadvantage can be avoided by creating more realistic decision situations, for instance in natural field experiments, and by recruiting (non-‘WEIRD’) samples of participants that are more representative of the target population. As Henrich, Heine & Norenzayan (2010) noted, results based on samples of participants in Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries have limited validity in the discovery of universal laws of human cognition, emotion or behavior.

Recently, experimental research paradigms have received fierce criticism. Results of research often cannot be reproduced (Open Science Collaboration, 2015), publication bias is ubiquitous (Ioannidis, 2005). It has become clear that there is a lot of undisclosed flexibility, in all phases of the empirical cycle. While these problems have been discussed widely in communities of researchers conducting experiments, they are by no means limited to one particular methodology or mode of data collection. It is likely that they also occur in communities of researchers using survey or interview data.

In the positivist paradigm that dominates experimental research, the empirical cycle starts with the formulation of a research question. To answer the question, hypotheses are formulated based on established theories and previous research findings. Then the research is designed, data are collected, a predetermined analysis plan is executed, results are interpreted, the research report is written and submitted for peer review. After the usual round(s) of revisions, the findings are incorporated in the body of knowledge.

The validity and reliability of results from experiments can be compromised in two ways. The first is by juggling with the order of phases in the empirical cycle. Researchers can decide to amend their research questions and hypotheses after they have seen the results of their analyses. Kerr (1989) labeled the practice of reformulating hypotheses HARKING: Hypothesizing After Results are Known. Amending hypotheses is not a problem when the goal of the research is to develop theories to be tested later, as in grounded theory or exploratory analyses (e.g., data mining). But in hypothesis-testing research harking is a problem, because it increases the likelihood of publishing false positives. Chance findings are interpreted post hoc as confirmations of hypotheses that a priori  are rather unlikely to be true. When these findings are published, they are unlikely to be reproducible by other researchers, creating research waste, and worse, reducing the reliability of published knowledge.

The second way the validity and reliability of results from experiments can be compromised is by misconduct and sloppy science within various stages of the empirical cycle (Simmons, Nelson & Simonsohn, 2011). The data collection and analysis phase as well as the reporting phase are most vulnerable to distortion by fraud, p-hacking and other questionable research practices (QRPs).

  • In the data collection phase, observations that (if kept) would lead to undesired conclusions or non-significant results can be altered or omitted. Also, fake observations can be added (fabricated).
  • In the analysis of data researchers can try alternative specifications of the variables, scale constructions, and regression models, searching for those that ‘work’ and choosing those that reach the desired conclusion.
  • In the reporting phase, things go wrong when the search for alternative specifications and the sensitivity of the results with respect to decisions in the data analysis phase is not disclosed.
  • In the peer review process, there can be pressure from editors and reviewers to cut reports of non-significant results, or to collect additional data supporting the hypotheses and the significant results reported in the literature.

Results from these forms of QRPs are that null-findings are less likely to be published, and that published research is biased towards positive findings, confirming the hypotheses, published findings are not reproducible, and when a replication attempt is made, published findings are found to be less significant, less often positive, and of a lower effect size (Open Science Collaboration, 2015).

Alarm bells, red flags and other warning signs

Some of the forms of misconduct mentioned above are very difficult to detect for reviewers and editors. When observations are fabricated or omitted from the analysis, only inside information, very sophisticated data detectives and stupidity of the authors can help us. Also many other forms of misconduct are difficult to prove. While smoking guns are rare, we can look for clues. I have developed a checklist of warning signs and good practices that editors and reviewers can use to screen submissions (see below). The checklist uses terminology that is not specific to experiments, but applies to all forms of data. While a high number of warning signs in itself does not prove anything, it should alert reviewers and editors. There is no norm for the number of flags. The table below only mentions the warning signs; the paper version of this blog post also shows a column with the positive poles. Those who would like to count good practices and reward authors for a higher number can count gold stars rather than red flags. The checklist was developed independently of the checklist that Wicherts et al. (2016) recently published.

Warning signs

  • The power of the analysis is too low.
  • The results are too good to be true.
  • All hypotheses are confirmed.
  • P-values are just below critical thresholds (e.g., p<.05)
  • A groundbreaking result is reported but not replicated in another sample.
  • The data and code are not made available upon request.
  • The data are not made available upon article submission.
  • The code is not made available upon article submission.
  • Materials (manipulations, survey questions) are described superficially.
  • Descriptive statistics are not reported.
  • The hypotheses are tested in analyses with covariates and results without covariates are not disclosed.
  • The research is not preregistered.
  • No details of an IRB procedure are given.
  • Participant recruitment procedures are not described.
  • Exact details of time and location of the data collection are not described.
  • A power analysis is lacking.
  • Unusual / non-validated measures are used without justification.
  • Different dependent variables are analyzed in different studies within the same article without justification.
  • Variables are (log)transformed or recoded in unusual categories without justification.
  • Numbers of observations mentioned at different places in the article are inconsistent. Loss or addition of observations is not justified.
  • A one-sided test is reported when a two-sided test would be appropriate.
  • Test-statistics (p-values, F-values) reported are incorrect.

With the increasing number of retractions of articles reporting on experimental research published in scholarly journals the awareness of the fallibility of peer review as a quality control mechanism has increased. Communities of researchers employing experimental designs have formulated solutions to these problems. In the review and publication stage, the following solutions have been proposed.

  • Access to data and code. An increasing number of science funders require grantees to provide open access to the data and the code that they have collected. Likewise, authors are required to provide access to data and code at a growing number of journals, such as Science, Nature, and the American Journal of Political Science. Platforms such as Dataverse, the Open Science Framework and Github facilitate sharing of data and code. Some journals do not require access to data and code, but provide Open Science badges for articles that do provide access.
  • Pledges, such as the ‘21 word solution’, a statement designed by Simmons, Nelson and Simonsohn (2012) that authors can include in their paper to ensure they have not fudged the data: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”
  • Full disclosure of methodological details of research submitted for publication, for instance through org is now required by major journals in psychology.
  • Apps such as Statcheck, p-curve, p-checker, and r-index can help editors and reviewers detect fishy business. They also have the potential to improve research hygiene when researchers start using these apps to check their own work before they submit it for review.

As these solutions become more commonly used we should see the quality of research go up. The number of red flags in research should decrease and the number of gold stars should increase. This requires not only that reviewers and editors use the checklist, but most importantly, that also researchers themselves use it.

The solutions above should be supplemented by better research practices before researchers submit their papers for review. In particular, two measures are worth mentioning:

  • Preregistration of research, for instance on org. An increasing number of journals in psychology require research to be preregistered. Some journals guarantee publication of research regardless of its results after a round of peer review of the research design.
  • Increasing the statistical power of research is one of the most promising strategies to increase the quality of experimental research (Bakker, Van Dijk & Wicherts, 2012). In many fields and for many decades, published research has been underpowered, using samples of participants that are not large enough the reported effect sizes. Using larger samples reduces the likelihood of both false positives as well as false negatives.

A variety of institutional designs have been proposed to encourage the use of the solutions mentioned above, including reducing the incentives in careers of researchers and hiring and promotion decisions for using questionable research practices, rewarding researchers for good conduct through badges, the adoption of voluntary codes of conduct, and socialization of students and senior staff through teaching and workshops. Research funders, journals, editors, authors, reviewers, universities, senior researchers and students all have a responsibility in these developments.

References

Bakker, M., Van Dijk, A. & Wicherts, J. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6): 543–554.

Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33: 61 – 135.

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

Kerr, N.L. (1989). HARKing: Hypothesizing After Results are Known. Personality and Social Psychology Review, 2: 196-217.

Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science, 349. http://www.sciencemag.org/content/349/6251/aac4716.full.html

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 1359–1366.

Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2012). A 21 Word Solution. Available at SSRN: http://ssrn.com/abstract=2160588

Wicherts, J.M., Veldkamp, C.L., Augusteijn, H.E., Bakker, M., Van Aert, R.C & Van Assen, M.L.A.M. (2016). Researcher degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers of Psychology, 7: 1832. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.01832/abstract

Leave a comment

Filed under incentives, psychology, fraud, academic misconduct, experiments, survey research, open science

Introducing Mega-analysis

How to find truth in an ocean of correlations – with breakers, still waters, tidal waves, and undercurrents? In the old age of responsible research and publication, we would collect estimates reported in previous research, and compute a correlation across correlations. Those days are long gone.

In the age of rat race research and publication it became increasingly difficult to do a meta-analysis. It is a frustrating experience for anyone who has conducted one: endless searches on the Web of Science and Google Scholar to collect all published research, input the estimates in a database, find that a lot of fields are blank, email authors for zero-order correlations and other statistics they had failed to report in their publications and get very little response.

Meta-analysis is not only a frustrating experience, it is also a bad idea when results that authors do not like do not get published. A host of techniques have been developed to find and correct publication bias, but the problem that we do not know the results that do not get reported is not solved easily.

As we enter the age of open science,  we do not have to rely any longer on the far from perfect cooperation from colleagues who have moved to a different university, left academia, died, or think you’re trying to prove them wrong and destroy your career. We can simply download all the raw data and analyze them.

Enter mega-analysis: include all the data points relevant for a certain hypothesis, cluster them by original publication, date, country, or any potentially relevant property of the research design, and add the substantial predictors you find documented in the literature. The results reveal not only the underlying correlations between substantial variables, but also the differences between studies, periods, countries and design properties that affect these correlations.

The method itself is not new. In epidemiology, and Steinberg et al. (1997) labeled it ‘meta-analysis of individual patient data’. In human genetics, genome wide association studies (GWAS) by large international consortia are common examples of mega-analysis.

Mega-analysis includes the file-drawer of papers that never saw the light of day after they were put in. It also includes the universe of papers that have never been written because the results were unpublishable.

If meta-analysis gives you an estimate for the universe of published research, mega-analysis can be used to detect just how unique that universe is in the milky way. My prediction would be that correlations in published research are mostly further from zero than the same correlation in a mega-analysis.

Mega-analysis bears great promise for the social sciences. Samples for population surveys are large, which enables optimal learning from variations in sampling procedures, data collection mode, and questionnaire design. It is time for a Global Social Science Consortium that pools all of its data. As an illustration, I have started a project on the Open Science Framework that mega-analyzes generalized social trust. It is a public project: anyone can contribute. We have reached mark of 1 million observations.

The idea behind mega-analysis originated from two different projects. In the first project, Erik van Ingen and I analyzed the effects of volunteering on trust, to check if results from an analysis of the Giving in the Netherlands Panel Survey (Van Ingen & Bekkers, 2015) would replicate with data from other panel studies. We found essentially the same results in five panel studies, although subtle differences emerged in the quantative estimates. In the second project, with Arjen de Wit and colleagues from the Center for Philanthropic Studies at VU Amsterdam, we analyzed the effects of volunteering on well-being conducted as part of the EC-FP7 funded ITSSOIN study. We collected 845.733 survey responses from 154.970 different respondents in six panel studies, spanning 30 years (De Wit, Bekkers, Karamat Ali & Verkaik, 2015). We found that volunteering is associated with a 1% increase in well-being.

In these projects, the data from different studies were analyzed separately. I realized that we could learn much more if the data are pooled in one single analysis: a mega-analysis.

References

De Wit, A., Bekkers, R., Karamat Ali, D., & Verkaik, D. (2015). Welfare impacts of participation. Deliverable 3.3 of the project: “Impact of the Third Sector as Social Innovation” (ITSSOIN), European Commission – 7th Framework Programme, Brussels: European Commission, DG Research.

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel StudiesPolitical Psychology, 36 (3): 277-294.

Steinberg, K.K., Smith, S.J., Stroup, D.F., Olkin, I., Lee, N.C., Williamson, G.D. & Thacker, S.B. (1997). Comparison of Effect Estimates from a Meta-Analysis of Summary Data from Published Studies and from a Meta-Analysis Using Individual Patient Data for Ovarian Cancer Studies. American Journal of Epidemiology, 145: 917-925.

Leave a comment

Filed under data, methodology, open science, regression analysis, survey research, trends, trust, volunteering

Gevonden: Student assistent voor het onderzoek Geven in Nederland

De werkgroep Filantropische Studies van de Faculteit Sociale Wetenschappen aan de Vrije Universiteit Amsterdam is het expertisecentrum op het gebied van onderzoek naar filantropie in Nederland. De werkgroep houdt zich bezig met vragen zoals: Waarom geven mensen vrijwillig geld aan goede doelen? Waarom verrichten mensen vrijwilligerswerk? Hoeveel geld gaat er om in de filantropische sector? Voor het onderzoek Geven in Nederland heeft de werkgroep Suzanne Felix aangenomen als onderzoeksassistent.

 

Werkzaamheden

Geven in Nederland is een van de belangrijkste onderzoeksprojecten van de werkgroep. Sinds 1995 wordt het geefgedrag van huishoudens, individuen, fondsen, bedrijven en goededoelenloterijen elke twee jaar in kaart gebracht en samengevoegd tot een macro-economisch overzicht. De werkgroep Filantropische Studies brengt de resultaten van het onderzoek tweejaarlijks uit in het boek ‘Geven in Nederland’. Felix werkt mee aan het onderzoek naar nalatenschappen en giften door vermogensfondsen en huishoudens.
update: 3 september 2016

Leave a comment

Filed under Uncategorized

Brief guide to understand fMRI studies

RQ: Which regions of the brain are active when task X is performed?

Results: Activity in some regions Y is higher than in others.

Leave a comment

Filed under Uncategorized

Heeft de culturele sector de cultuuromslag naar ondernemerschap gemaakt?

Presentatie rapport Culturele instellingen in Nederland’

Werkgroep Filantropische Studies Vrije Universiteit Amsterdam

 

Vrijdag 10 juni 2016, Theater Griffioen, Uilenstede 106, 1183 AM, Amstelveen

 

In 2012 werd de Geefwet ingevoerd met een multiplier die de aftrekbaarheid van giften aan culturele instellingen verhoogde. Bovendien kregen culturele instellingen meer mogelijkheden eigen inkomsten te genereren uit commerciële activiteiten. Tegelijk kregen veel instellingen te maken met bezuinigingen en de vraag om meer ondernemerschap. Hoe hebben Nederlandse particulieren en bedrijven met een hart voor cultuur gereageerd op de verhoogde aftrekbaarheid van giften aan cultuur? Zijn zij ook inderdaad meer gaan geven? En hoe hebben de culturele instellingen gereageerd op de bezuinigingen enerzijds en de multiplier anderzijds? Wat voor instellingen hebben de omslag naar ondernemerschap wel kunnen maken en wat voor instellingen niet?

Deze vragen stonden centraal in een onderzoek dat de werkgroep Filantropische Studies heeft uitgevoerd op verzoek van het ministerie van OCW naar de effecten van de Geefwet op het genereren van inkomsten door culturele instellingen. Het onderzoek verschaft inzicht in de stand van zaken van de culturele sector op dit gebied en de mate waarin de Geefwet bijdraagt aan de versterking van de culturele sector door stimulering van giften aan cultuur.

U bent van harte welkom op een symposium waarop de onderzoekers de resultaten presenteren aan de culturele sector. U kunt zich hier aanmelden.

 


Programma

15.30    Aanmelden

16.00    Presentatie onderzoek door prof. dr. René Bekkers

16.30    Annabelle Birnie, Drents Museum

16.45    Marielle Hendriks, Boekmanstichting

17.00    Drankje

 

 

Locatie

Theater Griffioen, Uilenstede 106, 1183 AM, Amstelveen

Routebeschrijving – klik hier

 

 

Meer informatie

Meer informatie over het onderzoek vindt u op www.cultuursector.nl

Leave a comment

Filed under Uncategorized

Philanthropy: from Charity to Prosocial Investment

Contribution to the March 2016 edition of the European Research Network on Philanthropy (ERNOP) newsletter. PDF version here.

Philanthropy can take many forms. It ranges from the student who showed up at my doorstep with a collection tin to raise small contributions for legal assistance to the poor to the recent announcement by Facebook co-founder Mark Zuckerberg and his wife Priscilla Chan of the establishment of a $42 billion charitable foundation. The media focused on the question why Zuckerberg and Chan would put 99% of their wealth in a foundation. The legal form of the foundation allowed Zuckerberg to keep control over the shares without having to pay taxes. Leaving aside the difficult question what motivation the legal form confesses for the moment, my point is that a change is taking place in the face that philanthropy takes.

Entrepreneurial forms of philanthropy, manifesting a strategic investment orientation, become more visible. We see them in social impact bonds, in social enterprises, in venture philanthropy and in the investments of foundations in the development of new drugs and treatments. A reliable count of the prevalence of such prosocial investments is not available, but 2015 was certainly a memorable year: the first Ebola vaccine was produced in a lab funded by the Wellcome Trust and polio was eradicated from Africa through coordinated efforts supported by a coalition of the WHO, Unicef, the Rotary International Foundation, and the Bill and Melinda Gates Foundation.

Of course there are limitations to philanthropy. Some problems are just too big to handle, even for the wealthiest foundations on earth, using the most innovative forms of investments. The refugee crisis continues to challenge the resilience of Europe. NGOs are delivering relief aid in the most difficult circumstances. But these efforts are band aids, as long as political leaders are struggling to gather the will power to solve it together.

The Zuckerberg/Chan announcement revived previous critiques of philanthrocapitalism. Isn’t it dangerous to have so much money in so few hands? Can we rely on wealthy foundations to invest in socially responsible ways? Foundations are the freest institutions on earth and can take risks that governments cannot afford. But the track records of the corporations that gave rise to the current foundation fortunes are not immaculate, monopolizing markets and evading taxes. Wealthy foundations can have a significant impact on society and influence public policy, limiting the influence of governments. It is political will that enables the existence and facilitates the fortune of wealthy foundations. Ultimately, the realization that the interests of the people should not be harmed enables the activities of foundations. Hence the talk about the importance of giving back to society.

The sociologist Alvin Gouldner is famous for his 1960 article ‘The Norm of Reciprocity’, which describes how reciprocity works. He also wrote a second classic, much less known: ‘The Importance of Something for Nothing.’ In this follow-up (1973), he stresses the norm of beneficence: “This norm requires men to give others such help as they need. Rather than making help contingent upon past benefits received or future benefits expected, the norm of beneficence calls upon men to aid others without thought of what they have done or what they can do for them, and solely in terms of a need imputed to the potential recipient.” In a series of studies I co-authored with Mark Ottoni-Wilhelm, an economist from the Lilly Family School of Philanthropy at Indiana University, we call this norm ‘the principle of care’.

With this quote I return to the question about motivation. The letter to their daughter in which Zuckerberg and Chan announced their foundation reveals noble concerns for the future of mankind. It is not their child’s need that motivated them, but the needs of the world in which she is born. This is the genesis of true philanthropy. Pretty much like the awareness of need that the law student demonstrated at my doorstep.

References

Bekkers, R. & Ottoni-Wilhelm, M. (2016). Principle of Care and Giving to Help People in Need. European Journal of Personality.  

Gouldner, A.W. (1960). The Norm of Reciprocity: A Preliminary Statement. American Sociological Review, 25 (2): 161-178. http://www.jstor.org/stable/2092623

Gouldner, A.W. (1973). The Importance of Something for Nothing. In: Gouldner, A.W. (Ed.). For Sociology, Harmondsworth: Penguin.

Wilhelm, M.O., & Bekkers, R. (2010). Helping Behavior, Dispositional Empathic Concern, and the Principle of Care. Social Psychology Quarterly, 73 (1): 11-32.

Leave a comment

Filed under altruism, charitable organizations, foundations, law, philanthropy, principle of care, taxes