Category Archives: survey research

Why a high R Square is not necessarily better

Often I encounter academics thinking that a high proportion of explained variance is the ideal outcome of a statistical analysis. The idea is that in regression analyses a high R Square is better than a low R Square. In my view, the emphasis on a high R2 should be reduced. A high R2 should not be a goal in itself. The reason is that a higher R2 can easily be obtained by using procedures that actually lower the external validity of coefficients.

It is possible to increase the proportion of variance explained in regression analyses in several ways that do not in fact our ability to ‘understand’ the behavior we are seeking to ‘explain’ or ‘predict’. One way to increase the R2 is to remove anomalous observations, such as ‘outliers’ or people who say they ‘don’t know’ and treat them like the average respondent. Replacing missing data by mean scores or using multiple imputation procedures often increases the Rsquare. I have used this procedure in several papers myself, including some of my dissertation chapters.

But in fact outliers can be true values. I have seen quite a few of them that destroyed correlations and lowered R squares while being valid observations. E.g., a widower donating a large amount of money to a charity after the death of his wife. A rare case of exceptional behavior for very specific reasons that seldom occur. In larger samples these outliers may become more frequent, affecting the R2 less strongly.

Also ‘Don’t Know’ respondents are often systematically different from the average respondent. Treating them as average respondents eliminates some of the real variance that would otherwise be hard to predict.

Finally, it is often possible to increase the proportion of variance explained by including more variables. This is particularly problematic if variables that are the result of the dependent variable are included as predictors. For instance if network size is added to the prediction of volunteering the R Square will increase. But a larger network not only increases volunteering; it is also a result of volunteering. Especially if the network questions refer to the present (do you know…) while the volunteering questions refer to the past (in the past year, have you…) it is dubious to ‘predict’ volunteering in the past by a measure of current network size.

As a reviewer, I give authors reporting an R2 exceeding 40% a treatment of high-level scrutiny for dubious decisions in data handling and inclusion of variables.

As a rule, R Squares tend to be higher at higher levels of aggregation, e.g. when analyzing cross-situational tendencies in behavior rather than specific behaviors in specific contexts; or when analyzing time-series data or macro-level data about countries rather than individuals. Why people do the things they do is often just very hard to predict, especially if you try to predict behavior in a specific case.

1 Comment

Filed under academic misconduct, data, methodology, regression analysis, survey research

Lunch Talk: “Generalized Trust Through Civic Engagement? Evidence from Five National Panel Studies”

Does civic engagement breed trust? According to a popular version of social capital theory, civic engagement should produce generalized trust among citizens. In a new paper accepted for publication in Political Psychology, Erik van Ingen (Tilburg University) and I put this theory to the test by examining the causal connection between civic engagement and generalized trust using multiple methods and multiple (prospective) panel datasets. We found participants to be more trusting. This was mostly likely caused by selection effects: the causal effects of civic engagement on trust were very small or non-significant. In the cases where small causal effects were found, they turned out not to last. We found no differences across types of organizations and only minor variations across countries.

At the PARIS colloquium of the Department of Sociology at VU University on November 12, 2013 (Room Z531, 13.00-14.00), I will not just be talking about this paper published in Political Behavior and about the new paper forthcoming in Political Psychology (here is the prepublication version). In addition to a substantive story about a research project there is also a story about the process of getting a paper accepted with a null-finding that goes against received wisdom. This story is quite informative about the publication factory that we are all in.

Leave a comment

Filed under data, psychology, survey research, trust, volunteering

Update: Giving in the Netherlands Panel Survey User Manual

A new version of the User Manual for the Giving in the Netherlands Panel Survey is now available: version 2.2.

The GINPS12 questionnaire is here (in Dutch).

Leave a comment

Filed under data, empathy, experiments, helping, household giving, methodology, philanthropy, principle of care, survey research, trends, trust, volunteering, wealth

You are welcome to use our data

Update: June 26, 2020

“Can I please use your data on giving and volunteering?” Yes you can! In fact, you are very welcome to use the data we have collected at the Center for Philanthropic Studies. The data from the Giving in the Netherlands Panel Study (GINPS) on households are currently being used by students in Amsterdam, Rotterdam and Utrecht in statistics tutorials, by students in Amsterdam for Master Thesis projects, and by PhD candidates and established researchers around the world for academic research. The panel design allows for dynamic analyses of giving and volunteering, answering questions like:

  • How does volunteering affect the size and composition of social networks?
  • Are giving and volunteering substitutes or complements?
  • How does household giving change as people age?

To get access to the data, here’s what you will need to do.

  1. Go to the Open Science Framework page for the Giving in the Netherlands Panel Survey. You will find a Public Use File there and user manual describing the variables available. Page 17: The Public Use File covers the panel data from 2002 to 2019, including variables on household giving, volunteering, age, gender, marital status, level of education, province, household income, income from wealth, home ownership, religious affiliation and attendance. The file does not include the immigrant samples, the HNW samples, the oversample of Protestants and the oversample of respondents from an earlier survey for OC&W, nor does it include data from the extra wave conducted in 2015. Weighting variables are provided for each year.
  2. Check whether the variables you need are included in the Public Use File. If not, send a request to the data manager at cfs [at], describing the goal of your research and the variables required.
  3. Copy me at r.bekkers [at] and I will get back to you.

Note that if you just need aggregate statistics on giving and volunteering you will not need access to the micro-level data. You can probably find the data you need in our biennial ‘Giving in the Netherlands’ book. A summary in English of the 2015 edition is here. The English summary of the most recent 2020 edition is here.

User manual: all waves, 2002-2019 (in English)

Original questionnaires:

The data on corporate social responsibility and corporate philanthropy are less well documented, but also available to researchers.


Filed under corporate social responsibility, data, experiments, household giving, methodology, survey research, volunteering