What Oxfam cancellations tell us about donor motivation

What can we learn from the drop in donations to Oxfam after the child abuse news broke? In the UK, about 7,000 donors cancelled, in the Netherlands 1,700, and in Hong Kong 715.

First, does not tell us much about what makes people give. Most donors have continued to give. The 7,000 who cancelled in the UK represent 3.5% of income in the UK. The 1,700 donors who cancelled in the Netherlands are 0.5% of all donors. This means that defaults save lives. The default is to do nothing and continue to give. We’re seeing a small fraction go.

But for those discontinuing their gifts protest, we could say that we can tell why they were giving in the first place by looking at their reactions.

If they gave to Oxfam for altruistic reasons, they will find other charities to give to. They may find it hard to trust Oxfam now, and other charities named in the media.

There is the ‘one bad apple spoils the entire basket’ idea that donors will find faults with other charities as well once one gets bad publicity.

We’ll have to see how much that idea is worth. In previous episodes in the Netherlands, bad publicity about one charity usually did not spill over to other charities. In the Netherlands and Hong Kong it seems altogether more puzzling why donors stopped giving, as the abuse – as far as we know – did not involve the Netherlands or Hong Kong branch.

In my view the cancellations are a result of empathic anger. The more you care about children, the more angry you will be. While empathy has been heralded as an important factor in altruism, it also has a non-altruistic side. The emotion of anger itself and the cancellation may be viewed and communicated as a sign of caring. But it is not effective helping.

There is also a role for public relations. It may be that the abuse corrected an image that charity workers are holy superhumans. A charity that ‘paints itself as whiter than white’ reinforces that image. In times of PR crises like these such an image boomerangs donors away. If donors reckon with the possibility that a charity may attract bad apples as workers, they realize that one bad apple is not evidence of a disease, but of lax quality control.


Leave a comment

Filed under Uncategorized

Wat is normaal?

Geeft de gemiddelde Nederlander echt 559 euro per jaar aan goede doelen, zoals Arnon Grunberg gisteren schreef op de voorpagina van de Volkskrant?

Nee, dat is onwaarschijnlijk. Grunberg verwees naar een cijfer dat werd genoemd in het HUMAN televisieprogramma ‘Hoe normaal ben jij?’

Het cijfer klopt niet om twee redenen.

1. Het bedrag is veel hoger dan uit ander onderzoek naar filantropie naar voren komt. Het cijfer van Human komt uit een onderzoek dat waarschijnlijk niet representatief is voor alle Nederlanders. Human geeft geen informatie over de peiling die gehouden is, maar het is waarschijnlijk dat het een zogenaamde gelegenheidsgroep is: op de site kan iedereen deelnemen. Degenen die dat doen zijn bijna nooit representatief voor de Nederlandse bevolking.

Het standaard onderzoek naar filantropie, Geven in Nederland (GIN), voert de Vrije Universiteit Amsterdam uit sinds 1995. Het geeft een navolgbaar representatief beeld. Gemiddeld geven huishoudens 341 euro, zo blijkt uit de laatste editie van het GIN onderzoek uit 2017.

2. Het cijfer over een gemiddelde, en dat is niet normaal. Als je het rekenkundig gemiddelde berekent over alle Nederlandse huishoudens, dan zie je niet goed wat de typische Nederlander geeft. De helft van de Nederlandse huishoudens geeft namelijk minder dan 60 euro, blijkt uit GIN. Het gemiddelde wordt sterk beïnvloed door een klein aantal huishoudens dat heel veel geeft. De grafiek kun je gebruiken om te zien hoe normaal je bent: geef je tussen de €150-€200 per jaar, dan hoor je in het derde kwartiel, de groep van ongeveer een kwart van de bevolking die meer geeft dan helft van de Nederlanders. Het kwart meest gevende Nederlanders geeft vaak meer dan €1.000.



Leave a comment

Filed under Center for Philanthropic Studies, household giving, survey research, Uncategorized

Research internship @VU Amsterdam

Social influences on prosocial behaviors and their consequences

While self-interest and prosocial behavior are often pitted against each other, it is clear that much charitable giving and volunteering for good causes is motivated by non-altruistic concerns (Bekkers & Wiepking, 2011). Helping others by giving and volunteering feels good (Dunn, Aknin & Norton, 2008). What is the contribution of such helping behaviors on happiness?

The effect of helping behavior on happiness is easily overestimated using cross-sectional data (Aknin et al., 2013). Experiments provide the best way to eradicate selection bias in causal estimates. Monozygotic twins provide a nice natural experiment to investigate unique environmental influences on prosocial behavior and its consequences for happiness, health, and trust. Any differences within twin pairs cannot be due to additive genetic effects or shared environmental effects. Previous research has investigated environmental influences of the level of education and religion on giving and volunteering (Bekkers, Posthuma and Van Lange, 2017), but no study has investigated the effects of helping behavior on important outcomes such as trust, health, and happiness.

The Midlife in the United States (MIDUS) and the German Twinlife surveys provide rich datasets including measures of health, life satisfaction, and social integration, in addition to demographic and socioeconomic characteristics and measures of helping behavior through nonprofit organizations (giving and volunteering) and in informal social relationships (providing financial and practical assistance to friends and family).

In the absence of natural experiments, longitudinal panel data are required to ascertain the chronology in acts of giving and their correlates. The same holds for the alleged effects of volunteering on trust (Van Ingen & Bekkers, 2015) and health (De Wit, Bekkers, Karamat Ali, & Verkaik, 2015). Since the mid-1990s, a growing number of panel studies have collected data on volunteering and charitable giving and their alleged consequences, such as the German Socio-Economic Panel (GSOEP), the British Household Panel Survey (BHPS) / Understanding Society, the Swiss Household Panel (SHP), the Household, Income, Labour Dynamics in Australia survey (HILDA), the General Social Survey (GSS) in the US, and in the Netherlands the Longitudinal Internet Studies for the Social sciences (LISS) and the Giving in the Netherlands Panel Survey (GINPS).

Under my supervision, students can write a paper on social influences of education, religion and/or helping behavior in the form of volunteering, giving, and informal financial and social support on outcomes such as health, life satisfaction, and trust, using either longitudinal panel survey data or data on twins. Students who are interested in writing such a paper are invited to present their research questions and research design via e-mail to r.bekkers@vu.nl.

René Bekkers, Center for Philanthropic Studies, Faculty of Social Sciences, Vrije Universiteit Amsterdam


Aknin, L. B., Barrington-Leigh, C. P., Dunn, E. W., Helliwell, J. F., Burns, J., Biswas-Diener, R., … Norton, M. I. (2013). Prosocial spending and well-being: Cross-cultural evidence for a psychological universal. Journal of Personality and Social Psychology, 104(4), 635–652. https://doi.org/10.1037/a0031578

Bekkers, R., Posthuma, D. & Van Lange, P.A.M. (2017). The Pursuit of Differences in Prosociality Among Identical Twins: Religion Matters, Education Does Not. https://osf.io/ujhpm/ 

Bekkers, R., & Wiepking, P. (2011). A Literature Review of Empirical Studies of Philanthropy: Eight Mechanisms That Drive Charitable Giving. Nonprofit and Voluntary Sector Quarterly, 40: https://doi.org/10.1177/0899764010380927

De Wit, A., Bekkers, R., Karamat Ali, D., & Verkaik, D. (2015). Welfare impacts of participation. Deliverable 3.3 of the project: “Impact of the Third Sector as Social Innovation” (ITSSOIN), European Commission – 7th Framework Programme, Brussels: European Commission, DG Research. http://itssoin.eu/site/wp-content/uploads/2015/09/ITSSOIN_D3_3_The-Impact-of-Participation.pdf

Dunn, E. W., Aknin, L. B., & Norton, M. I. (2008). Spending Money on Others Promotes Happiness. Science, 319(5870): 1687–1688. https://doi.org/10.1126/science.1150952

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel Studies. Political Psychology, 36 (3): 277-294. https://renebekkers.files.wordpress.com/2015/05/vaningen_bekkers_15.pdf

Leave a comment

Filed under altruism, Center for Philanthropic Studies, data, experiments, happiness, helping, household giving, Netherlands, philanthropy, psychology, regression analysis, survey research, trust, volunteering

How not to solve the research competition crisis

Scientists across the globe spend a substantial part of their time writing research proposals for competitive grant schemes. Usually, less than one in seven proposals gets funded. Moreover, the level of competition and the waste of time invested in research proposals that do not receive funding are increasing.

The most important funder of science in the Netherlands, the Netherlands Organization for Scientific Research (NWO), is painfully aware of the research competition crisis. On April 4, 2017, more than one hundred of the nation’s scientists gathered in a conference to come up with solutions for the crisis. I was one of them.

The conference made clear that the key problem is that we have too many good candidates and high quality research proposals that cannot be funded with the current budget. Without an increase in the budget for research funding, however, that problem is unlikely to go away.


Stan Gielen, the new director of NWO, opened the conference. Because the universities and NWO lack bargaining power in the government that determines the budget for NWO, he asked the scientists at the conference to think about ‘streamlining procedures’. In roundtable discussions, researchers talked about questions like: “How can the time it takes between a final ranking in a grant competition and the announcement of the result to applicants be reduced?”

Many proposals came up during the meeting. The more radical proposals were to discontinue funding for NWO altogether and to reallocate funding back to the universities, to give a larger number of smaller grants, to allocate funding through lotteries among top-rated applications, and the idea by Scheffer to give researchers voting rights on funding allocations. I left the meeting with an increased sense of urgency but with little hope for a solution. Gielen concluded the meeting with the promise to initiate conversations with the ministry for Education, Culture and Science about the results of the conference and to report back within six months.

Yesterday, NWO presented its proposals. None of the ideas above made it. Instead, a set of measures were announced that are unlikely to increase chances of funding. The press release does not say why ineffective measures were favored over effective measures.

Two of the proposals by NWO shift work to the universities, giving them responsibility in pre-evaluations of proposals. At the Vrije Universiteit Amsterdam we already make quite an investment in such pre-evaluations, but not all universities do so. Also the universities are now told to use an instrument to reduce the number of proposals: the financial guarantee. Also this proposal is akin to a measure we already had in place, the obligatory budget check. The financial guarantee is an additional hurdle applicants have to take.

The proposal to give non-funded but top-rated ERC proposals a second chance at NWO reduces some of the work for applicants, but does not increase chances for funding.

A final proposal is to ask applicants to work together with other applicants with related ideas. It may be a good idea for other reasons, but does not increase chances for funding.


Now what?

One of the causes of the problem that funding chances are declining is the reward that universities get for graduations of PhD candidates (‘promotiepremie’). This reward keeps up the supply of good researchers. PhD candidates are prepared and motivated for careers in science. But these careers are increasingly hard to get into. As long as the dissertation defense reward is in place, one long term solution is to change the curriculum in graduate schools, orienting them to non-academic careers.

Another long-term solution is to diversify funding sources for science. In the previous cabinets, the ministry of Economic Affairs has co-controlled funding allocations to what were labeled ‘topsectors’. Evaluations of this policy have been predominantly negative. One of the problems is that the total budget for science was not increased, but the available budget was partly reallocated for applied research in energy, water, logistics etcetera. It is unclear how the new government thinks about this, but it seems a safe bet not to have much hope for creative ideas from this side. But there is hope for a private sector solution.

There is a huge amount of wealth in the Netherlands that investment bankers are trying to invest responsibly. As a result of increases in wealth, the number of private foundations established that support research and innovation has increased strongly in the past two decades. These foundations are experimenting with new financial instruments like impact investing and venture philanthropy. The current infrastructure and education at universities, however, is totally unfit to tap into this potential of wealth. Which graduate program offers a course in creating a business case for investments in research?




Leave a comment

Filed under Europe, incentives, policy evaluation, politics, VU University

Buying Time Promotes Happiness

In a new paper, we used data from the Giving in the Netherlands Panel Survey to examine the relationship between spending money to outsource household tasks and happiness. The key result is that those who do spend money in this way are happier. The paper was published in PNAS and is freely available through the open access option. The paper is lead-authored by Ashley Whillans (Harvard Business School), and co-authored by Elizabeth Dunn (University of British Columbia), Paul Smeets (Maastricht University) and Michael Norton (Harvard Business School). All study data and study materials are available through the OSF (https://osf.io/vr9pa/). Hypotheses for the analyses were preregistered here.


Click here to read the paper.


Filed under happiness, time use, wealth

Twenty Years of Generosity in the Netherlands

PaperARNOVA 2017 Presentation – Materials at Open Science Framework

In the past two decades, philanthropy in the Netherlands has gained significant attention, from the general public, from policy makers, as well as from academics. Research on philanthropy in the Netherlands has documented a substantial increase in amounts donated to charitable causes since data on giving in the Netherlands have become available in the mid-1990s (Bekkers, Gouwenberg & Schuyt, 2017). What has remained unclear, however, is how philanthropy has developed in relation to the growth of the economy at large and the growth of consumer expenditure. For the first time, we bring together all the data on philanthropy available from eleven editions of the Giving in the Netherlands survey among households (n = 16,344), to answer the research question: how can trends in generosity in the Netherlands in the past 20 years be explained?


The Giving in the Netherlands Panel Survey

One of the strengths of the GINPS is the availability of data on prosocial values and attitudes towards charitable causes. In 2002, the Giving in the Netherlands survey among households was transformed from a cross-sectional to a longitudinal design (Bekkers, Boonstoppel & De Wit, 2017). The GIN Panel Survey has been used primarily to answer questions on the development of these values and attitudes in relation to changes in volunteering activities (Bekkers, 2012; Van Ingen & Bekkers, 2015; Bowman & Bekkers, 2009). Here we use the GINPS in a different way. First we describe trends in generosity, i.e. amounts donated as a proportion of income. Then we seek to explain these trends, focusing on prosocial values and attitudes towards charitable causes.


How generous are the Dutch?

Vis-à-vis the rich history of charity and philanthropy in the Netherlands (Van Leeuwen, 2012), the current state of giving is rather poor. On average, charitable donations per household in 2015 amounted to €180 per year or 0,4% of household income. The median gift is €50 (De Wit & Bekkers, 2017). In the past fifteen years, the trend in generosity is downward: the proportion of income has declined slowly but steadily since 1999 (Bekkers, De Wit & Wiepking, 2017). In 2015, giving as a proportion of income has declined by one-fifth of its peak in 1999 (see Figure 1).


Figure 1: Household giving as a proportion of consumer expenditure (Source: Bekkers, De Wit & Wiepking, 2017)


Why has generosity of households in the Netherlands declined?

The first explanation is declining religiosity. Because giving is encouraged by religious communities, the decline of church affiliation and practice has reduced charitable giving, as in the US (Wilhelm, Rooney & Tempel, 2007). The disappearance of religiosity from Dutch society has reduced charitable giving because the non-religious have become more numerous. The decline in religiosity explains about 40% of the decline in generosity we observe in the period 2001-2015. In Figure 2 we see a similar decline in generosity to religion (the red line) as to other organizations (the blue line).


Figure 2: Household giving to religion (red) and to other causes (blue) as a proportion of household income (Source: Bekkers, De Wit & Wiepking, 2017)


We also find that those who are still religious have become much more generous. Figure 3 shows that the amounts donated by Protestants (the green line) have almost doubled in the past 20 years. The amounts donated by Catholics (the red line) have also doubled, but are much lower. The non-religious have not increased their giving at all in the past 20 years. However, the increasing generosity of the religious has not been able to turn the tide.


Figure 3: Household giving by non-religious (blue), Catholics (red) and Protestants (green) in Euros (Source: Bekkers, De Wit & Wiepking, 2017)

The second explanation is that prosocial values have declined. Because generosity depends on empathic concern and moral values such as the principle of care (Bekkers & Ottoni-Wilhelm, 2016), the loss of such prosocial values has reduced generosity. Prosocial values have lost support, and the loss of prosociality explains about 15% of the decline in generosity. The loss of prosocial values itself, however, is closely connected to the disappearance of religion. About two thirds of the decline in empathic concern and three quarters of the decline in altruistic values is explained by the reduction of religiosity.

In addition, we see that prosocial values have also declined among the religious. Figure 4 shows that altruistic values have declined not only for the non-religious (blue), but also for Catholics (red) and Protestants (green).


Figure 4: Altruistic values among the non-religious (blue), Catholics (red) and Protestants (green) (Source: Giving in the Netherlands Panel Survey, 2002-2014).

Figure 5 shows a similar development for generalized social trust.


Figure 5: Generalized social trust among the non-religious (blue), Catholics (red) and Protestants (green)  (Source: Giving in the Netherlands Panel Survey, 2002-2016).

Speaking of trust: as donations to charitable causes rely on a foundation of charitable confidence, it may be argued that the decline of charitable confidence is responsible for the decline in generosity (O’Neill, 2009). However, we find that the decline in generosity is not strongly related to the decline in charitable confidence, once changes in religiosity and prosocial values are taken into account. This finding indicates that the decline in charitable confidence is a sign of a broader process of declining prosociality.


What do our findings imply?

What do these findings mean for theories and research on philanthropy and for the practice of fundraising?

First, our research clearly demonstrates the utility of including questions on prosocial values in surveys on philanthropy, as they have predictive power not only for generosity and changes therein over time, but also explain relations of religiosity with generosity.

Second, our findings illustrate the need to develop distinctive theories on generosity. Predictors of levels of giving measured in euros can be quite different from predictors of generosity as a proportion of income.

For the practice of fundraising, our research suggests that the strategies and propositions of charitable causes need modification. Traditionally, fundraising organizations have appealed to empathic concern for recipients and prosocial values such as duty. As these have become less prevalent, propositions appealing to social impact with modest returns on investment may prove more effective.

Also fundraising campaigns in the past have been targeted primarily at loyal donors. This strategy has proven effective and religious donors have shown resilience in their increasing financial commitment to charitable causes. But this is not a feasible long term strategy as the size of this group is getting smaller. A new strategy is required to commit new generations of donors.




Bekkers, R. (2012). Trust and Volunteering: Selection or Causation? Evidence from a Four Year Panel Study. Political Behavior, 32 (2): 225-247.

Bekkers, R., Boonstoppel, E. & De Wit, A. (2017). Giving in the Netherlands Panel Survey – User Manual, Version 2.6. Center for Philanthropic Studies, VU Amsterdam.

Bekkers, R. & Bowman, W. (2009). The Relationship Between Confidence in Charitable Organizations and Volunteering Revisited. Nonprofit and Voluntary Sector Quarterly, 38 (5): 884-897.

Bekkers, R., De Wit, A. & Wiepking, P. (2017). Jubileumspecial: Twintig jaar Geven in Nederland. In: Bekkers, R. Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

Bekkers, R. & Ottoni-Wilhelm, M. (2016). Principle of Care and Giving to Help People in Need. European Journal of Personality, 30(3): 240-257.

Bekkers, R., Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

De Wit, A. & Bekkers, R. (2017). Geven door huishoudens. In: Bekkers, R., Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

O’Neill, M. (2009). Public Confidence in Charitable Nonprofits. Nonprofit and Voluntary Sector Quarterly, 38: 237–269.

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel Studies. Political Psychology, 36 (3): 277-294.

Wilhelm, M.O., Rooney, P.M. and Tempel, E.R. (2007). Changes in religious giving reflect changes in involvement: age and cohort effects in religious giving, secular giving, and attendance. Journal for the Scientific Study of Religion, 46 (2): 217–32.

Van Leeuwen, M. (2012). Giving in early modern history: philanthropy in Amsterdam in the Golden Age. Continuity & Change, 27(2): 301-343.


Filed under Center for Philanthropic Studies, data, household giving, Netherlands, survey research, trends

Hunting Game: Targeting the Big Five

Do not use the personality items included in the World Values Survey. That is the recommendation of Steven Ludeke and Erik Gahner Larsen in a recent paper published in the journal Personality and Individual Differences. The journal is owned by Elsevier so the official publication is paywalled. Still I am writing about it because the message of the paper is extremely important. Ludeke and Gahner Larsen formulate their recommendation a little more subtle: “we suggest it is thus hard to justify the use of this data in future research.”

What went wrong here? Join me in a hunting game, targeting the Big Five.

The World Values Survey (WVS) is the largest, non-commercial survey in the world. It is frequently used in social science research. The most recent edition contained a short, 10 item measure of personality characteristics (BFI-10), validated in a well-cited paper by Rammstedt and John in the Journal of Research in Personality. The inclusion of the BFI-10 enables researchers to study how the Big Five personality traits is related to political participation, happiness, education, and health, among many other things.

So what is wrong with the personality data in the WVS? Ludeke and Gahner Larsen found that the pairs of adjectives designed to measure the five personality traits Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism are not correlated as expected. To measure openness, for instance, the survey asked participants to indicate agreement with the statement “I see myself as someone who: has few artistic interests” and “I see myself as someone who: has an active imagination”. One would expect a negative relation between the responses to the two statements. However, the correlation between the two items across all countries is positive, r = .164. This correlation is not strong, but in the wrong direction. Similar discrepancies were found between items designed to measure the four other dimensions of personality.

The BFI-10 included in the WVS is this set of statements (an r indicates a reverse-scored item):

I see myself as someone who:

  • is reserved (E1r)
  • is generally trusting (A1)
  • tends to be lazy (C1r)
  • is relaxed, handles stress well (N1r)
  • has few artistic interests (O1r)
  • is outgoing, sociable (E2)
  • tends to find fault with others (A2r)
  • does a thorough job (C2)
  • gets nervous easily (N2)
  • has an active imagination (O2)

In a factor analysis of the 1o items, we would expect to find the five dimensions. However, that is not the result of an exploratory factor analysis applying the conventional criterion of an Eigen value > 1. In this analysis and all following analyses negative items are reverse scored. Including all countries, a three factor solution emerges that is very difficult to interpret. Multiple items show high loadings on multiple factors. Removing these one by one, as is usually done in inventories with large numbers of items, we are left with a two-factor solution. If a five-factor solution is forced, we obtain the following component matrix. This is a mess.



2 3 4


O1 not artistic (r)


-.054 .105 -.049


O2 active imagination


.162 -.031 .197


C1 lazy (r)


-.004 .836 -.045


C2 thorough


.425 .231 .078


E1 reserved (r)


-.825 -.022 -.183


E2 outgoing


.097 -.004 -.105 -.068
A1 trusting


.722 .003 -.160 -.137
A2 fault with others (r)


.079 .614 -.259


N1 relaxed (r)

-.461 -.377 .235 .534


N2 nervous

.188 .133 -.291 .770


So what is wrong with these data?

Upon closer inspection, Ludeke and Gahner Larsen found that the correlations were markedly different across countries. Bahrain is a clear outlier. The weakly positive correlation between O1 and O2r is due in part to the inclusion of data from Bahrain. Without this country, the correlation is only .135. Still positive, but not as strongly. The data for Bahrain are not only strange for openness, but also for other factors. In the table below I have computed the correlations among recoded items for the five dimensions.

Without Bahrain, the correlations are still strange, but a little less strange.


With Bahrain


.238 -.207 -.036


Without Bahrain


.275 -.181 -.009


What is wrong with the data for Bahrain? The patterns of responses for cases from Bahrain, it turns out, are surprisingly often a series of ten exactly the same values, such as 1111111111 or 555555555555. I routinely check data from surveys for such patterns. While it is impossible to prove this, serial response patterns suggest fabrication of data. Participants and/or interviewers skipping questions may follow such patterns. Almost half of all the cases from Bahrain follow such a pattern. Other countries with a relatively high proportion of serial pattern responses are South Africa, Singapore, and China. The two countries for which the BFI-10 behaves close to what previous research has reported, the Netherlands and Germany, have a very low occurrence of serial pattern responses.

Number of serial pattern responses




South Africa















Even without the data for Bahrain and the serial responses from all other countries, however, the factor structure is err…not what one would expect. Still a mess.



2 3 4


O1 not artistic (r)

-.094 -.040 .086 -.031 .968
O2 active imagination


.150 -.046 .158 -.130
C1 lazy (r)


.023 .815 -.017 .146
C2 thorough


.410 .241 .050 .088
E1 reserved (r)


-.828 -.033 -.158 -.058

E2 outgoing

.771 .070 -.001 -.140


A1 trusting

.192 .710 .022 -.190


A2 fault with others (r)

-.405 .080 .628 -.230


N1 relaxed (r)

-.421 -.352 .218 .592


N2 nervous

.192 .133 -.315 .750


Only for Germany and the Netherlands the factor structure is somewhat in line with previous research. Here is the solution for the two countries combined. In both countries, the two statements for agreeableness do not correlate as expected. Also the second statement for conscientiousness (thorough) has a cross-loading with one of the agreeableness items (trusting).



2 3 4


O1 not artistic (r)


-.056 .842 .120


O2 active imagination


.050 .729 -.140 .173
C1 lazy (r)


-.083 -.040 .865 -.087
C2 thorough


.053 .057 .627 .440
E1 reserved (r)


-.113 .130 .032 -.219

E2 outgoing

.732 -.166 .126 .166


A1 trusting

-.008 -.100 .042 .049


A2 fault with others (r)

-.657 -.272 .090 .177


N1 relaxed (r)

.012 .804 -.002 .116


N2 nervous

-.052 .835 -.006 -.160


This leaves us with three possibilities.

One possibility was raised by Christopher Soto on Twitter: acquiescence bias could be driving the results. In a study using data from another multi-country survey in the International Social Survey Program (ISSP), Rammstedt, Kemper & Borg subtracted each respondent’s mean response across all BFI-10 items from his or her score on each single item. Doing this, however, does not clear the sky. Looking again at the correlations for the pairs of items measuring the same constructs, we see that they are not ‘better’ in the second row. In contrast, they are less positive.






.286 -.166 .001




.078 -.235 -.107 .049

Also the factor structure of the attenuated scores is not anything like the ‘regular’ five-factor structure. Still a mess.



2 3 4




-.025 .096 -.117




.190 -.319 .034




.469 .617 -.460




.681 -.005 .050




-.846 .080 -.250




.029 .034 .045




.285 .026 .821




-.246 .555 .274




-.345 -.223 -.345




.043 -.854 -.031


The second possibility is that things went wrong in the translation of the questionnaire. The same adjectives or statements may mean different things in different countries or languages, which makes them useless as operationalizations of the same underlying construct. It will require a detailed study of the translations to see if anything went wrong. The questionnaires are available at the World Values Survey website. The Dutch questionnaire is good. I looked at a few other languages. The Spanish questionnaire for Ecuador also seems right. “Me veo como alguien que…… es confiable” is quite close to “I see myself as someone who is… generally trusting”. My Spanish is not very good though. Rene Gempp wrote on Twitter that the BFI-10 is a Likert-type scale, but the Spanish translation asks about the frequency, and one of the options, “para nada frecuentemente” is *very* confusing in Spanish.

I am not sure about your fluency in Kinyarwanda, the language spoken in Rwanda, but the backtranslation of the questionnaire in English does not give me much confidence. Apparently, “…wizera muri rusange” is the translation of “is generally trusting”. The backtranslation is “…believe in congregation”.


The third possibility is that personality structure may indeed be different in different countries. This would be the most problematic one.

Data from the 2010 AmericasBarometer Study, conducted by the Latin American Public Opinion Project (LAPOP) support this interpretation. The survey included a different short form of the Big Five, the TIPI, developed by Gosling, Rentfrow, and SwannA recent study by Weinschenk published in Social Science Quarterly shows that personality scores based on the TIPI are hardly related to turnout in elections in the Americas. This result may be logical in countries where voting is mandatory, such as Brazil. But the more disconcerting methodological problem is that the Big Five are not reliably measured with pairs of statements in most of the countries included in the survey. Here are the correlations between the pairs of items for each of the five dimensions, taken from the supplementary online materials of the Weinschenk paper.


The graphs show that the TIPI items only work well in the US and Canada – the two ‘WEIRD’ countries in the study. In Brazil, to take one example, the correlations are <.10 for extraversion, agreeableness and conscientiousness, and lower than .25 for emotional stability and openness.

Back to the WVS case, which raises important questions about the peer review process. Two journal articles based on the WVS (here and here) were able to pass peer review because neither the reviewers nor the editors asked questions about the reliability of the items being used. Neither did the authors check, apparently. Obviously, researchers should check the reliability of measures they use in an analysis. In case authors fail to check this, reviewers and editors should ask. Weinschenk reported the low correlations in the online supplementary materials, but did not report reliability coefficients in the paper.

The good thing is that because the WVS is in the public domain, these problems came to light relatively quickly. Of course, they could have been avoided if the WVS had scrutinized the reliability of the measure before putting the data online, if the authors of the papers using the data had checked the reliability of the items or if the reviewers and editors had asked the right questions. Another good thing is that the people at the WVS (volunteers?) at the WVS twitter account have been frank in tweeting about the problems found in the data.

Summing up:

  1. We still do not know why the BFI-10 measure of the Big Five personality does not perform as in previous research.
  2. It is probably not due to acquiescence bias. Translations may be problematic for some countries.
  3. Do not use the WVS BFI-10 data from countries other than Germany and the Netherlands.
  4. Treat the WVS data from Bahrain and with great caution, and to be on the safe side, just exclude it from your analyses.
  5. The reliability of short Big Five measures is very low in non-WEIRD countries.

The code for the analyses reported in this blog is posted at the Open Science Framework.

Update 22 March 2017. The factor loadings in the table with the results of the analysis of attenuated scores has been updated. The table displayed previously was based on a division of the original scores by the total agreement scores. Rammstedt et al. subtracted the original scores from the total agreement scores. The results of the new analysis are close to the previous one and still confusing. The code on the OSF has been updated. Also a clarification was added that the negative items used in the factor analyses were all recoded such that they scored positively (HT to Christopher Soto).


Filed under personality, survey research