Category Archives: survey research

Multiple comparisons in a regression framework

Gordon Feld posted a comparison of results from a repeated measures ANOVA with paired samples t-tests.

Using Stata, I wondered how these results would look in a regression framework. For those of you who want to replicate this: I used the data provided by Gordon. The do-file is here. Because wordpress does not accept .do files you will have to rename the file from .docx to .do to make it work. The Stata commands are below, all in block quotes. The output is given in images. In the explanatory notes, commands are italicized, and variables are underlined.

A pdf of this post is here.

First let’s examine the data. You will have to insert your local path at which you have stored the data.

. import delimited “ANOVA_blog_data.csv”, clear

. pwcorr before_treatment after_treatment before_placebo after_placebo

These commands get us the following table of correlations:

There are some differences in mean values, from 98.8 before treatment to 105.0 after treatment. Mean values for the placebo measures are 100.8 before and 100.2 after. Across all measures, the average is 101.2035.

Let’s replicate the t-test for the treatment effect.

The increase in IQ after the treatment is 6.13144 (SE = 2.134277), which is significant in this one-sample paired t-test (p = .006). Now let’s do the t-test for the placebo conditions.

The decrease in IQ after the placebo is -.6398003 (SE = 1.978064), which is not significant (p = .7477).

The question is whether we have taken sufficient account of the nesting of the data.

We have four measures per participant: one before the treatment, one after, one before the placebo, and one after.

In other words, we have 50 participants and 200 measures.

To get the data into the nested structure, we have to reshape them.

The data are now in a wide format: one row per participant, IQ measures in different columns.

But we want a long format: 4 rows per participant, IQ in just one column.

To get this done we first assign a number to each participant.

. gen id = _n

We now have a variable id with a unique number for each of the 50 participants.
The Stata command for reshaping data requires the data to be set up in such a way that variables measuring the same construct have the same name.
We have 4 measures of IQ, so the new variables will be called iq1, iq2, iq3 and iq4.

. rename (before_treatment after_treatment before_placebo after_placebo) (iq1 iq2 iq3 iq4).

Now we can reshape the data. The command below assigns a new variable ‘mIQ’ to identify the 4 consecutive measures of IQ.

. reshape long iq, i(id) j(mIQ)

Here’s the result.

We now have 200 lines of data, each one is an observation of IQ, numbered 1 to 4 on the new variable mIQ for each participant. The variable mIQ indicates the order of the IQ measurements.

Now we identify the structure of the two experiments. The first two measures in the data are for the treatment pre- and post-measures.

. replace treatment = 1 if mIQ < 3 (100 real changes made) . replace treatment = 0 if mIQ > 2
(100 real changes made)

Observations 3 and 4 are for the placebo pre- and post-measures.

. replace placebo = 0 if mIQ < 3 (100 real changes made) . replace placebo = 1 if mIQ > 2
(100 real changes made)

. tab treatment placebo

We have 100 observations in each of the experiments.

OK, we’re ready for the regressions now. Let’s first conduct an OLS to quantify the changes within participants in the treatment and placebo conditions.

The regression shows that the treatment increased IQ by 6.13144 points, but with an SE of 3.863229 the change is not significant (p = .116). The effect estimate is correct, but the SE is too large and hence the p-value is too high as well.

. reg iq mIQ if placebo == 1


The placebo regression shows the familiar decline of .6398003, but with an SE of 3.6291, which is too high (p = .860). The SE and p-values are incorrect because OLS does not take the nested structure of the data into account.

With the xtset command we identify the nesting of the data: measures of IQ (mIQ) are nested within participants (id).

. xtset id mIQ

First we run an ’empty model’ – no predictors are included.

. xtreg iq

Here’s the result:

Two variables in the output are worth commenting on.

  1. The constant (_cons) is the average across all measures, 101.2033. This is very close to the average we have seen before.
  2. The rho is the intraclass correlation – the average correlation of the 4 IQ measures within individuals. It is .7213, which seems right.

Now let’s replicate the t-test results in a regression framework.

. xtreg iq mIQ if treatment == 1

In the output below we see the 100 observations in 50 groups (individuals). We obtain the same effect estimate of the treatment as before (6.13144) and the correct SE of 2.134277, but the p-value is too small (p = .004).

Let’s fix this. We put fixed effects on the participants by adding , fe at the end of the xtreg command:

. xtreg iq mIQ if treatment == 1, fe

We now get the accurate p-value (0.006):

Let’s run the same regression for the placebo conditions.

. xtreg iq mIQ if placebo == 1, fe


The placebo effect is the familiar -.6398003, SE = 1.978064, now with the accurate p-value of .748.

Advertisements

Leave a comment

Filed under data, experiments, methodology, regression, regression analysis, statistical analysis, survey research

The force of everyday philanthropy

Public debates on philanthropy link charitable giving to wealth. In the media we hear a lot about the giving behavior of billionaires – about the giving pledge, the charitable foundations of the wealthy, how the causes they support align their business interests, and how they relate to government programs. Yes – the billions of tech giants go a long way. Imagine a world without support from foundations created by wealthy. But we hear a lot less about the everyday philanthropy of people like you and me. The media rarely report on everyday acts of generosity. The force of philanthropy is not only in its focus and mass, but also in its breadth and popularity.

It is one of the common remarks I hear when family, friends and colleagues return from holidays in ‘developing countries’ like Moldova, Myanmar or Morocco: “the people there have nothing, but they are so kind and generous!” The kindness and generosity that we witness as tourists are manifestations of prosociality, the very same spirit that is the ultimate foundation of everyday philanthropy. And also within our own nations, we find that most people give to charity. Why are people in Europe so strongly engaged in philanthropy?

The answer is trust

In Europe we are much more likely to think that most people can be trusted than in other parts of the world. It is this faith in humanity that is crucial for philanthropy. We can see this in a comparison of countries within Europe. The figure combines data from the World Giving Index reports of CAF from 2010-2017 on the proportion of the population giving to charity with data from the Global Trust Research Consortium on generalized social trust. The figure shows that citizens of more trusting countries in Europe are much more likely to give to charities (you can get the data here, and the code is here). The correlation is .52, which is strong.

Trust_Giving_EU

Egalité et fraternité

One of the reasons why citizens in more trusting countries are more likely to give to charity is that trust is lower in more unequal countries. Combining the data on trust with data from the OECD on income inequality (GINI) reveals a substantial negative correlation of -.37. The larger the differences in income and wealth in a country become, the lower the level of trust that people have in each other. As the wealth of the rich increases, the poor get increasingly envious, and the rich feel an increasing urge to protect their wealth. In such a context, conspiracy theories thrive and institutions that should be impartial and fair to all are trusted less. The criticism that wealthy donors face also stems from this foundation: those concerned with equality and fairness fear the elite power of philanthropy. Et voila: here is the case why it is in the best interest of foundations to reduce inequality.

Leave a comment

Filed under data, Europe, household giving, philanthropy, survey research, trust, wealth

Wat is normaal?

Geeft de gemiddelde Nederlander echt 559 euro per jaar aan goede doelen, zoals Arnon Grunberg gisteren schreef op de voorpagina van de Volkskrant?

Nee, dat is onwaarschijnlijk. Grunberg verwees naar een cijfer dat werd genoemd in het HUMAN televisieprogramma ‘Hoe normaal ben jij?’

Het cijfer klopt niet om twee redenen.

1. Het bedrag is veel hoger dan uit ander onderzoek naar filantropie naar voren komt. Het cijfer van Human komt uit een onderzoek dat waarschijnlijk niet representatief is voor alle Nederlanders. Human geeft geen informatie over de peiling die gehouden is, maar het is waarschijnlijk dat het een zogenaamde gelegenheidsgroep is: op de site kan iedereen deelnemen. Degenen die dat doen zijn bijna nooit representatief voor de Nederlandse bevolking.

Het standaard onderzoek naar filantropie, Geven in Nederland (GIN), voert de Vrije Universiteit Amsterdam uit sinds 1995. Het geeft een navolgbaar representatief beeld. Gemiddeld geven huishoudens 341 euro, zo blijkt uit de laatste editie van het GIN onderzoek uit 2017.

2. Het cijfer gaat over een gemiddelde, en dat is niet normaal. Als je het rekenkundig gemiddelde berekent over alle Nederlandse huishoudens, dan zie je niet goed wat de typische Nederlander geeft. De helft van de Nederlandse huishoudens geeft namelijk minder dan 60 euro, blijkt uit GIN. Het gemiddelde wordt sterk beïnvloed door een klein aantal huishoudens dat heel veel geeft. De grafiek kun je gebruiken om te zien hoe normaal je bent: geef je tussen de €150-€200 per jaar, dan hoor je in het derde kwartiel, de groep van ongeveer een kwart van de bevolking die meer geeft dan helft van de Nederlanders. Het kwart meest gevende Nederlanders geeft vaak meer dan €1.000.

GIN17_kwartielen

Leave a comment

Filed under Center for Philanthropic Studies, household giving, survey research, Uncategorized

Research internship @VU Amsterdam

Social influences on prosocial behaviors and their consequences

While self-interest and prosocial behavior are often pitted against each other, it is clear that much charitable giving and volunteering for good causes is motivated by non-altruistic concerns (Bekkers & Wiepking, 2011). Helping others by giving and volunteering feels good (Dunn, Aknin & Norton, 2008). What is the contribution of such helping behaviors on happiness?

The effect of helping behavior on happiness is easily overestimated using cross-sectional data (Aknin et al., 2013). Experiments provide the best way to eradicate selection bias in causal estimates. Monozygotic twins provide a nice natural experiment to investigate unique environmental influences on prosocial behavior and its consequences for happiness, health, and trust. Any differences within twin pairs cannot be due to additive genetic effects or shared environmental effects. Previous research has investigated environmental influences of the level of education and religion on giving and volunteering (Bekkers, Posthuma and Van Lange, 2017), but no study has investigated the effects of helping behavior on important outcomes such as trust, health, and happiness.

The Midlife in the United States (MIDUS) and the German Twinlife surveys provide rich datasets including measures of health, life satisfaction, and social integration, in addition to demographic and socioeconomic characteristics and measures of helping behavior through nonprofit organizations (giving and volunteering) and in informal social relationships (providing financial and practical assistance to friends and family).

In the absence of natural experiments, longitudinal panel data are required to ascertain the chronology in acts of giving and their correlates. The same holds for the alleged effects of volunteering on trust (Van Ingen & Bekkers, 2015) and health (De Wit, Bekkers, Karamat Ali, & Verkaik, 2015). Since the mid-1990s, a growing number of panel studies have collected data on volunteering and charitable giving and their alleged consequences, such as the German Socio-Economic Panel (GSOEP), the British Household Panel Survey (BHPS) / Understanding Society, the Swiss Household Panel (SHP), the Household, Income, Labour Dynamics in Australia survey (HILDA), the General Social Survey (GSS) in the US, and in the Netherlands the Longitudinal Internet Studies for the Social sciences (LISS) and the Giving in the Netherlands Panel Survey (GINPS).

Under my supervision, students can write a paper on social influences of education, religion and/or helping behavior in the form of volunteering, giving, and informal financial and social support on outcomes such as health, life satisfaction, and trust, using either longitudinal panel survey data or data on twins. Students who are interested in writing such a paper are invited to present their research questions and research design via e-mail to r.bekkers@vu.nl.

René Bekkers, Center for Philanthropic Studies, Faculty of Social Sciences, Vrije Universiteit Amsterdam

References

Aknin, L. B., Barrington-Leigh, C. P., Dunn, E. W., Helliwell, J. F., Burns, J., Biswas-Diener, R., … Norton, M. I. (2013). Prosocial spending and well-being: Cross-cultural evidence for a psychological universal. Journal of Personality and Social Psychology, 104(4), 635–652. https://doi.org/10.1037/a0031578

Bekkers, R., Posthuma, D. & Van Lange, P.A.M. (2017). The Pursuit of Differences in Prosociality Among Identical Twins: Religion Matters, Education Does Not. https://osf.io/ujhpm/ 

Bekkers, R., & Wiepking, P. (2011). A Literature Review of Empirical Studies of Philanthropy: Eight Mechanisms That Drive Charitable Giving. Nonprofit and Voluntary Sector Quarterly, 40: https://doi.org/10.1177/0899764010380927

De Wit, A., Bekkers, R., Karamat Ali, D., & Verkaik, D. (2015). Welfare impacts of participation. Deliverable 3.3 of the project: “Impact of the Third Sector as Social Innovation” (ITSSOIN), European Commission – 7th Framework Programme, Brussels: European Commission, DG Research. http://itssoin.eu/site/wp-content/uploads/2015/09/ITSSOIN_D3_3_The-Impact-of-Participation.pdf

Dunn, E. W., Aknin, L. B., & Norton, M. I. (2008). Spending Money on Others Promotes Happiness. Science, 319(5870): 1687–1688. https://doi.org/10.1126/science.1150952

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel Studies. Political Psychology, 36 (3): 277-294. https://renebekkers.files.wordpress.com/2015/05/vaningen_bekkers_15.pdf

Leave a comment

Filed under altruism, Center for Philanthropic Studies, data, experiments, happiness, helping, household giving, Netherlands, philanthropy, psychology, regression analysis, survey research, trust, volunteering

Twenty Years of Generosity in the Netherlands

PaperARNOVA 2017 Presentation – Materials at Open Science Framework

In the past two decades, philanthropy in the Netherlands has gained significant attention, from the general public, from policy makers, as well as from academics. Research on philanthropy in the Netherlands has documented a substantial increase in amounts donated to charitable causes since data on giving in the Netherlands have become available in the mid-1990s (Bekkers, Gouwenberg & Schuyt, 2017). What has remained unclear, however, is how philanthropy has developed in relation to the growth of the economy at large and the growth of consumer expenditure. For the first time, we bring together all the data on philanthropy available from eleven editions of the Giving in the Netherlands survey among households (n = 16,344), to answer the research question: how can trends in generosity in the Netherlands in the past 20 years be explained?

 

The Giving in the Netherlands Panel Survey

One of the strengths of the GINPS is the availability of data on prosocial values and attitudes towards charitable causes. In 2002, the Giving in the Netherlands survey among households was transformed from a cross-sectional to a longitudinal design (Bekkers, Boonstoppel & De Wit, 2017). The GIN Panel Survey has been used primarily to answer questions on the development of these values and attitudes in relation to changes in volunteering activities (Bekkers, 2012; Van Ingen & Bekkers, 2015; Bowman & Bekkers, 2009). Here we use the GINPS in a different way. First we describe trends in generosity, i.e. amounts donated as a proportion of income. Then we seek to explain these trends, focusing on prosocial values and attitudes towards charitable causes.

 

How generous are the Dutch?

Vis-à-vis the rich history of charity and philanthropy in the Netherlands (Van Leeuwen, 2012), the current state of giving is rather poor. On average, charitable donations per household in 2015 amounted to €180 per year or 0,4% of household income. The median gift is €50 (De Wit & Bekkers, 2017). In the past fifteen years, the trend in generosity is downward: the proportion of income has declined slowly but steadily since 1999 (Bekkers, De Wit & Wiepking, 2017). In 2015, giving as a proportion of income has declined by one-fifth of its peak in 1999 (see Figure 1).

GIV_CEX

Figure 1: Household giving as a proportion of consumer expenditure (Source: Bekkers, De Wit & Wiepking, 2017)

 

Why has generosity of households in the Netherlands declined?

The first explanation is declining religiosity. Because giving is encouraged by religious communities, the decline of church affiliation and practice has reduced charitable giving, as in the US (Wilhelm, Rooney & Tempel, 2007). The disappearance of religiosity from Dutch society has reduced charitable giving because the non-religious have become more numerous. The decline in religiosity explains about 40% of the decline in generosity we observe in the period 2001-2015. In Figure 2 we see a similar decline in generosity to religion (the red line) as to other organizations (the blue line).

REL_NREL

Figure 2: Household giving to religion (red) and to other causes (blue) as a proportion of household income (Source: Bekkers, De Wit & Wiepking, 2017)

 

We also find that those who are still religious have become much more generous. Figure 3 shows that the amounts donated by Protestants (the green line) have almost doubled in the past 20 years. The amounts donated by Catholics (the red line) have also doubled, but are much lower. The non-religious have not increased their giving at all in the past 20 years. However, the increasing generosity of the religious has not been able to turn the tide.

REL_DEN

Figure 3: Household giving by non-religious (blue), Catholics (red) and Protestants (green) in Euros (Source: Bekkers, De Wit & Wiepking, 2017)

The second explanation is that prosocial values have declined. Because generosity depends on empathic concern and moral values such as the principle of care (Bekkers & Ottoni-Wilhelm, 2016), the loss of such prosocial values has reduced generosity. Prosocial values have lost support, and the loss of prosociality explains about 15% of the decline in generosity. The loss of prosocial values itself, however, is closely connected to the disappearance of religion. About two thirds of the decline in empathic concern and three quarters of the decline in altruistic values is explained by the reduction of religiosity.

In addition, we see that prosocial values have also declined among the religious. Figure 4 shows that altruistic values have declined not only for the non-religious (blue), but also for Catholics (red) and Protestants (green).

REL_AV

Figure 4: Altruistic values among the non-religious (blue), Catholics (red) and Protestants (green) (Source: Giving in the Netherlands Panel Survey, 2002-2014).

Figure 5 shows a similar development for generalized social trust.

REL_TRUST

Figure 5: Generalized social trust among the non-religious (blue), Catholics (red) and Protestants (green)  (Source: Giving in the Netherlands Panel Survey, 2002-2016).

Speaking of trust: as donations to charitable causes rely on a foundation of charitable confidence, it may be argued that the decline of charitable confidence is responsible for the decline in generosity (O’Neill, 2009). However, we find that the decline in generosity is not strongly related to the decline in charitable confidence, once changes in religiosity and prosocial values are taken into account. This finding indicates that the decline in charitable confidence is a sign of a broader process of declining prosociality.

 

What do our findings imply?

What do these findings mean for theories and research on philanthropy and for the practice of fundraising?

First, our research clearly demonstrates the utility of including questions on prosocial values in surveys on philanthropy, as they have predictive power not only for generosity and changes therein over time, but also explain relations of religiosity with generosity.

Second, our findings illustrate the need to develop distinctive theories on generosity. Predictors of levels of giving measured in euros can be quite different from predictors of generosity as a proportion of income.

For the practice of fundraising, our research suggests that the strategies and propositions of charitable causes need modification. Traditionally, fundraising organizations have appealed to empathic concern for recipients and prosocial values such as duty. As these have become less prevalent, propositions appealing to social impact with modest returns on investment may prove more effective.

Also fundraising campaigns in the past have been targeted primarily at loyal donors. This strategy has proven effective and religious donors have shown resilience in their increasing financial commitment to charitable causes. But this is not a feasible long term strategy as the size of this group is getting smaller. A new strategy is required to commit new generations of donors.

 

 

References

Bekkers, R. (2012). Trust and Volunteering: Selection or Causation? Evidence from a Four Year Panel Study. Political Behavior, 32 (2): 225-247.

Bekkers, R., Boonstoppel, E. & De Wit, A. (2017). Giving in the Netherlands Panel Survey – User Manual, Version 2.6. Center for Philanthropic Studies, VU Amsterdam.

Bekkers, R. & Bowman, W. (2009). The Relationship Between Confidence in Charitable Organizations and Volunteering Revisited. Nonprofit and Voluntary Sector Quarterly, 38 (5): 884-897.

Bekkers, R., De Wit, A. & Wiepking, P. (2017). Jubileumspecial: Twintig jaar Geven in Nederland. In: Bekkers, R. Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

Bekkers, R. & Ottoni-Wilhelm, M. (2016). Principle of Care and Giving to Help People in Need. European Journal of Personality, 30(3): 240-257.

Bekkers, R., Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

De Wit, A. & Bekkers, R. (2017). Geven door huishoudens. In: Bekkers, R., Schuyt, T.N.M., & Gouwenberg, B.M. (Eds.). Geven in Nederland 2017: Giften, Sponsoring, Legaten en Vrijwilligerswerk. Amsterdam: Lenthe Publishers.

O’Neill, M. (2009). Public Confidence in Charitable Nonprofits. Nonprofit and Voluntary Sector Quarterly, 38: 237–269.

Van Ingen, E. & Bekkers, R. (2015). Trust Through Civic Engagement? Evidence From Five National Panel Studies. Political Psychology, 36 (3): 277-294.

Wilhelm, M.O., Rooney, P.M. and Tempel, E.R. (2007). Changes in religious giving reflect changes in involvement: age and cohort effects in religious giving, secular giving, and attendance. Journal for the Scientific Study of Religion, 46 (2): 217–32.

Van Leeuwen, M. (2012). Giving in early modern history: philanthropy in Amsterdam in the Golden Age. Continuity & Change, 27(2): 301-343.

4 Comments

Filed under Center for Philanthropic Studies, data, household giving, Netherlands, survey research, trends

Hunting Game: Targeting the Big Five

Do not use the personality items included in the World Values Survey. That is the recommendation of Steven Ludeke and Erik Gahner Larsen in a recent paper published in the journal Personality and Individual Differences. The journal is owned by Elsevier so the official publication is paywalled. Still I am writing about it because the message of the paper is extremely important. Ludeke and Gahner Larsen formulate their recommendation a little more subtle: “we suggest it is thus hard to justify the use of this data in future research.”

What went wrong here? Join me in a hunting game, targeting the Big Five.

The World Values Survey (WVS) is the largest, non-commercial survey in the world. It is frequently used in social science research. The most recent edition contained a short, 10 item measure of personality characteristics (BFI-10), validated in a well-cited paper by Rammstedt and John in the Journal of Research in Personality. The inclusion of the BFI-10 enables researchers to study how the Big Five personality traits is related to political participation, happiness, education, and health, among many other things.

So what is wrong with the personality data in the WVS? Ludeke and Gahner Larsen found that the pairs of adjectives designed to measure the five personality traits Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism are not correlated as expected. To measure openness, for instance, the survey asked participants to indicate agreement with the statement “I see myself as someone who: has few artistic interests” and “I see myself as someone who: has an active imagination”. One would expect a negative relation between the responses to the two statements. However, the correlation between the two items across all countries is positive, r = .164. This correlation is not strong, but in the wrong direction. Similar discrepancies were found between items designed to measure the four other dimensions of personality.

The BFI-10 included in the WVS is this set of statements (an r indicates a reverse-scored item):

I see myself as someone who:

  • is reserved (E1r)
  • is generally trusting (A1)
  • tends to be lazy (C1r)
  • is relaxed, handles stress well (N1r)
  • has few artistic interests (O1r)
  • is outgoing, sociable (E2)
  • tends to find fault with others (A2r)
  • does a thorough job (C2)
  • gets nervous easily (N2)
  • has an active imagination (O2)

In a factor analysis of the 1o items, we would expect to find the five dimensions. However, that is not the result of an exploratory factor analysis applying the conventional criterion of an Eigen value > 1. In this analysis and all following analyses negative items are reverse scored. Including all countries, a three factor solution emerges that is very difficult to interpret. Multiple items show high loadings on multiple factors. Removing these one by one, as is usually done in inventories with large numbers of items, we are left with a two-factor solution. If a five-factor solution is forced, we obtain the following component matrix. This is a mess.

Component

1

2 3 4

5

O1 not artistic (r)

-.116

-.054 .105 -.049

.961

O2 active imagination

.687

.162 -.031 .197

-.140

C1 lazy (r)

.249

-.004 .836 -.045

.159

C2 thorough

.640

.425 .231 .078

.071

E1 reserved (r)

-.110

-.825 -.022 -.183

-.047

E2 outgoing

.781

.097 -.004 -.105 -.068
A1 trusting

.210

.722 .003 -.160 -.137
A2 fault with others (r)

-.430

.079 .614 -.259

-.051

N1 relaxed (r)

-.461 -.377 .235 .534

.144

N2 nervous

.188 .133 -.291 .770

-.112

So what is wrong with these data?

Upon closer inspection, Ludeke and Gahner Larsen found that the correlations were markedly different across countries. Bahrain is a clear outlier. The weakly positive correlation between O1 and O2r is due in part to the inclusion of data from Bahrain. Without this country, the correlation is only .135. Still positive, but not as strongly. The data for Bahrain are not only strange for openness, but also for other factors. In the table below I have computed the correlations among recoded items for the five dimensions.

Without Bahrain, the correlations are still strange, but a little less strange.

O

C E A N
With Bahrain

-.164

.238 -.207 -.036

.008

Without Bahrain

-.135

.275 -.181 -.009

.044

What is wrong with the data for Bahrain? The patterns of responses for cases from Bahrain, it turns out, are surprisingly often a series of ten exactly the same values, such as 1111111111 or 555555555555. I routinely check data from surveys for such patterns. While it is impossible to prove this, serial response patterns suggest fabrication of data. Participants and/or interviewers skipping questions may follow such patterns. Almost half of all the cases from Bahrain follow such a pattern. Other countries with a relatively high proportion of serial pattern responses are South Africa, Singapore, and China. The two countries for which the BFI-10 behaves close to what previous research has reported, the Netherlands and Germany, have a very low occurrence of serial pattern responses.

Number of serial pattern responses

%
Bahrain

598

49.83%

South Africa

250

7.08%

Singapore

108

5.48%

China

52

2.26%

Netherlands

8

0.42%

Germany

2

0.10%

Even without the data for Bahrain and the serial responses from all other countries, however, the factor structure is err…not what one would expect. Still a mess.

Component

1

2 3 4

5

O1 not artistic (r)

-.094 -.040 .086 -.031 .968
O2 active imagination

.691

.150 -.046 .158 -.130
C1 lazy (r)

.297

.023 .815 -.017 .146
C2 thorough

.637

.410 .241 .050 .088
E1 reserved (r)

-.098

-.828 -.033 -.158 -.058

E2 outgoing

.771 .070 -.001 -.140

-.052

A1 trusting

.192 .710 .022 -.190

-.133

A2 fault with others (r)

-.405 .080 .628 -.230

-.048

N1 relaxed (r)

-.421 -.352 .218 .592

.123

N2 nervous

.192 .133 -.315 .750

-.104

Only for Germany and the Netherlands the factor structure is somewhat in line with previous research. Here is the solution for the two countries combined. In both countries, the two statements for agreeableness do not correlate as expected. Also the second statement for conscientiousness (thorough) has a cross-loading with one of the agreeableness items (trusting).

Component

1

2 3 4

5

O1 not artistic (r)

-.047

-.056 .842 .120

-.089

O2 active imagination

.208

.050 .729 -.140 .173
C1 lazy (r)

.061

-.083 -.040 .865 -.087
C2 thorough

-.064

.053 .057 .627 .440
E1 reserved (r)

.715

-.113 .130 .032 -.219

E2 outgoing

.732 -.166 .126 .166

.210

A1 trusting

-.008 -.100 .042 .049

.853

A2 fault with others (r)

-.657 -.272 .090 .177

-.001

N1 relaxed (r)

.012 .804 -.002 .116

-.259

N2 nervous

-.052 .835 -.006 -.160

.117

This leaves us with three possibilities.

One possibility was raised by Christopher Soto on Twitter: acquiescence bias could be driving the results. In a study using data from another multi-country survey in the International Social Survey Program (ISSP), Rammstedt, Kemper & Borg subtracted each respondent’s mean response across all BFI-10 items from his or her score on each single item. Doing this, however, does not clear the sky. Looking again at the correlations for the pairs of items measuring the same constructs, we see that they are not ‘better’ in the second row. In contrast, they are less positive.

O

C E A

N

Unadjusted

-.122

.286 -.166 .001

.053

Attenuated

-.310

.078 -.235 -.107 .049

Also the factor structure of the attenuated scores is not anything like the ‘regular’ five-factor structure. Still a mess.

Component

1

2 3 4

5

O1a

-.192

-.025 .096 -.117

-.957

O2a

.509

.190 -.319 .034

.269

C1a

-.133

.469 .617 -.460

.174

C2a

.351

.681 -.005 .050

.071

E1a

-.043

-.846 .080 -.250

.017

E2a

.823

.029 .034 .045

.114

A1a

.086

.285 .026 .821

.148

A2a

-.497

-.246 .555 .274

.067

N1a

-.598

-.345 -.223 -.345

-.047

N2a

-.123

.043 -.854 -.031

.178

The second possibility is that things went wrong in the translation of the questionnaire. The same adjectives or statements may mean different things in different countries or languages, which makes them useless as operationalizations of the same underlying construct. It will require a detailed study of the translations to see if anything went wrong. The questionnaires are available at the World Values Survey website. The Dutch questionnaire is good. I looked at a few other languages. The Spanish questionnaire for Ecuador also seems right. “Me veo como alguien que…… es confiable” is quite close to “I see myself as someone who is… generally trusting”. My Spanish is not very good though. Rene Gempp wrote on Twitter that the BFI-10 is a Likert-type scale, but the Spanish translation asks about the frequency, and one of the options, “para nada frecuentemente” is *very* confusing in Spanish.

I am not sure about your fluency in Kinyarwanda, the language spoken in Rwanda, but the backtranslation of the questionnaire in English does not give me much confidence. Apparently, “…wizera muri rusange” is the translation of “is generally trusting”. The backtranslation is “…believe in congregation”.

rwanda_back

The third possibility is that personality structure may indeed be different in different countries. This would be the most problematic one.

Data from the 2010 AmericasBarometer Study, conducted by the Latin American Public Opinion Project (LAPOP) support this interpretation. The survey included a different short form of the Big Five, the TIPI, developed by Gosling, Rentfrow, and SwannA recent study by Weinschenk published in Social Science Quarterly shows that personality scores based on the TIPI are hardly related to turnout in elections in the Americas. This result may be logical in countries where voting is mandatory, such as Brazil. But the more disconcerting methodological problem is that the Big Five are not reliably measured with pairs of statements in most of the countries included in the survey. Here are the correlations between the pairs of items for each of the five dimensions, taken from the supplementary online materials of the Weinschenk paper.

Big5_rel_LAPOP

The graphs show that the TIPI items only work well in the US and Canada – the two ‘WEIRD’ countries in the study. In Brazil, to take one example, the correlations are <.10 for extraversion, agreeableness and conscientiousness, and lower than .25 for emotional stability and openness.

Back to the WVS case, which raises important questions about the peer review process. Two journal articles based on the WVS (here and here) were able to pass peer review because neither the reviewers nor the editors asked questions about the reliability of the items being used. Neither did the authors check, apparently. Obviously, researchers should check the reliability of measures they use in an analysis. In case authors fail to check this, reviewers and editors should ask. Weinschenk reported the low correlations in the online supplementary materials, but did not report reliability coefficients in the paper.

The good thing is that because the WVS is in the public domain, these problems came to light relatively quickly. Of course, they could have been avoided if the WVS had scrutinized the reliability of the measure before putting the data online, if the authors of the papers using the data had checked the reliability of the items or if the reviewers and editors had asked the right questions. Another good thing is that the people at the WVS (volunteers?) at the WVS twitter account have been frank in tweeting about the problems found in the data.

Summing up:

  1. We still do not know why the BFI-10 measure of the Big Five personality does not perform as in previous research.
  2. It is probably not due to acquiescence bias. Translations may be problematic for some countries.
  3. Do not use the WVS BFI-10 data from countries other than Germany and the Netherlands.
  4. Treat the WVS data from Bahrain and with great caution, and to be on the safe side, just exclude it from your analyses.
  5. The reliability of short Big Five measures is very low in non-WEIRD countries.

The code for the analyses reported in this blog is posted at the Open Science Framework.

Update 22 March 2017. The factor loadings in the table with the results of the analysis of attenuated scores has been updated. The table displayed previously was based on a division of the original scores by the total agreement scores. Rammstedt et al. subtracted the original scores from the total agreement scores. The results of the new analysis are close to the previous one and still confusing. The code on the OSF has been updated. Also a clarification was added that the negative items used in the factor analyses were all recoded such that they scored positively (HT to Christopher Soto).

2 Comments

Filed under personality, survey research

Five Reasons Why Social Science is So Hard 

1. No Laws

All we have is probabilities.

2. All Experts

The knowledge we have is continuously contested. The objects of study think they know why they do what they do.

3. Zillions of Variables

Everything is connected, and potentially a cause – like a bowl of well-tossed spaghetti.

4. Many Levels of Action

Nations, organizations, networks, individuals, time all have different dynamics.

5. Imprecise Measures

Few instruments have near perfect validity and reliability.

Conclusion

Social science is not as easy as rocket science. It is way more complicated.

Leave a comment

Filed under survey research