Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Beyond revealed preferences: how gender, relative socioeconomic status, and social norms drive happiness and behavior
(USC Thesis Other)
Beyond revealed preferences: how gender, relative socioeconomic status, and social norms drive happiness and behavior
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
2 3 4 5 6 7 8 9 10 1 Introduction Women report higher life satisfaction 1 than men; much higher. In my data, all else equal, just being a woman increases life satisfaction reports more than moving more than a decile higher in the income distribution. 2 Where other studies have failed to explain the gender happiness gap well, this paper explains it entirely, and even reverses it by about 50%. Using anchoring vignettes as well as self-reports from the Gallup World Poll, I show that the gap is due to men and women using different response scales. Specifically, women use lower thresholds for each life satisfaction rating, so that for the same “objective” life satisfaction level, they give a higher rating. This has large implications for policy, and for life satisfaction research as a whole. Interest in measures of subjective well-being is increasing. In 2009, Joseph Stiglitz, Amartya Sen, and Jean-Paul Fitoussi published the Report by the Commission on the Measurement of Economic Performance and Social Progress (Stiglitz et al., 2009). It found that GDP alone was a poor measure of human progress and welfare, and suggested incorporating other relevant measures – sustainability and measures of well-being, including life satisfaction. Layard (2005) proposes the utilitarian ideal of getting the greatest happiness for the greatest number of people as the “ulti- mate” policy goal, rejecting GDP as an inadequate measure of welfare, and Helliwell et al. (2015) promote evaluating policies in terms of their impact on happiness, such as having a “critical level of extra happiness” per dollar to justify expenditures. In order to use life satisfaction as a central indicator of social progress, we must be confident that it is measuring what it purports to; thus methodological studies are needed, to validate the use of life satisfaction surveys and to find where they yield systematically biased results. (This line of research should include qualitative studies as well, to identify inconsistencies between what subjects are reporting and how academics are interpreting it.) At the same time, women’s rights and well-being are central policy concerns, for international or- ganizations (e.g. the U.N. Entity for Gender Equality and the Empowerment of Women) as well as for governments and other agencies around the globe (e.g. the White House Council on Women and Girls). These agencies attempt to improve life for women, who are often granted fewer rights, get worse representation, experience discrimination, and are more frequent victims of violence (UN Women, 2015). In life satisfaction studies, research consistently finds that education, income, and health are strong predictors of happiness; women tend to be less educated, have lower incomes, and have worse self-reported health. Yet, on average women report higher life satisfaction than men. Although this pattern does not exist everywhere, it is true in much of the world, and of the world on average, as we will see later. Why do women report greater happiness when they are objectively worse off in so many ways? In this paper, I reveal gender differences in life satisfaction as one area where we can now see that our international survey responses have been biased. When women and men use the same scale, women are less happy on average (although, as we will see, the marginal effect of being female is much smaller but still positive). And the method described here to show this bias, the use of anchoring vignettes, can be used similarly to root out other biases, to adjust self-reports to give them more validity, and thus make them a better basis for policy decisions. 1 Though there are some differences, I will follow much of the literature and use “life satisfaction,” “subjective well-being,” and “happiness” interchangeably; in all cases, I am referring to satisfaction with one’s life as a whole. 2 In the U.S., that corresponds to an increase in annual per-capita income of between $5,000 and $10,000. 11 Anchoring vignettes have been used to study different response scales in many applications, par- ticularly health, but also political efficacy, affect, corruption, job satisfaction, life satisfaction, and more. By asking respondents to rate hypothetical vignette characters on the same scale as their self-reports, we “anchor” their scales, making their responses comparable across heterogeneous groups. Although I focus here on gender, which has a very different effect in an unadjusted than in a vignette-adjusted model, another contribution of this paper is showing how consistent the effects of other variables are in both cases. Education, marital status, employment status, urban/rural, self- reported health, and income, remain significant determinants of life satisfaction, with coefficients and marginal effects of similar magnitudes, whether I use a vignette-adjusted model or not. 1.1 The gender happiness gap and anchoring vignettes in the literature Women are frequently found to report higher happiness levels than men (Helliwell et al., 2015), despite having higher incidence of disorders such as depression and anxiety (Nolen-Hoeksema and Rusting, 1999). Often, the finding is noted in a single line in the text or coefficient on a table meant to examine some other variable’s impact. The gap has been well documented, particularly in developed countries, using several data sources such as past rounds of the Gallup World Poll, the World Values Survey, and others (Graham and Chattopadhyay, 2012; Zweig, 2014; Stevenson and Wolfers, 2009). Few studies have attempted to explain the gap; among those that have, evidence is mixed. Similar to this paper, Arrossa and Gandelman (2016) use several major data sources and find that marginal effect of being female is much larger than it appears, given observed characteristics. Depending on the study, the gap is increasing with GDP (Graham and Chattopadhyay, 2012) or decreasing with GDP (Lima, 2011); decreasing over time (Stevenson and Wolfers, 2009) or not changing (Herbst, 2011). Some find trends in the gap according to religion, whether the country has a Communist past, and occupational patterns (Meisenberg and Woodley, 2015), or with age (Plagnol and East- erlin, 2008; Easterlin, 2003); others find no patterns at all (Zweig, 2014). Some studies have found evidence that life satisfaction declines among women following improve- ments in gender equality (Graham and Chattopadhyay, 2012; Stevenson and Wolfers, 2009). Steven- son and Wolfers (2009) propose that socioeconomic changes, changes in what the measures capture due to large social shifts (e.g. women previously considered happiness with their home lives, and now consider happiness with their home and professional lives), or changes in reference group (e.g. women may be unsatisfied with a wonderful home if they feel they have failed professionally) could explain the decline. In a similar vein, other authors have proposed that life satisfaction self-reports capture more than just true life satisfaction; they incorporate things like aspirations (Zweig, 2014; Plagnol and Easterlin, 2008) and optimism (Arrossa and Gandelman, 2016). Considering different domains of life, having different reference groups, different aspirations, or different levels of optimism could all lead to men and women using different response scales when answering life satisfaction self-reports. By using an anchoring vignette-adjusted model, I can iden- tify scale differences, and estimate to what extent they are driven by gender. Anchoring vignettes allow researchers to consider not only an individual’s self-report, but also the 12 scale she uses to report it, by asking her to rate hypothetical vignette subjects on the same scale as her self-report. By using the same vignettes for many people, we can see how rating scales systematically differ with individual characteristics such as gender, education level, or country. Anchoring vignettes are commonly used when self-reports are subjective, and an objective measure is unavailable or would be too difficult to obtain (King et al., 2004). Kapteyn et al. (2007) used anchoring vignettes about work disability, an objective measure of which is difficult or impossible to obtain, and found that people in the Netherlands were more likely to say someone was too disabled to work than people in the United States, given the same objec- tive description of disability. Much research has looked at cross-country differences in health, e.g. Bago d’Uva et al. (2008), King et al. (2004), Grol-prokopczyk et al. (2011). For example, Molina (2016) considers self-reports of health such as shortness of breath and memory loss, which are pos- sible but costly to measure, to examine cross-country differences. (She also finds that gender gaps in health self-reports are reduced once responses are adjusted for vignettes.) Other domains include political efficacy (King et al., 2004), job satisfaction (Kristensen and Johansson, 2008), drinking behavior (Van Soest et al., 2011), and corruption (Grzymala-Busse, 2007). Little work on life satisfaction using anchoring vignettes has been published thus far. In one paper, the authors study the relationship between income and life satisfaction in the United States and the Netherlands using surveys from each country (Kapteyn et al., 2010); in another, the authors use the Survey of Health, Ageing and Retirement in Europe (SHARE) to see how response styles change with age (Angelini et al., 2012). This paper proceeds as follows. Section 2 describes my data. Section 3 establishes the gender life satisfaction reporting gap and its magnitude. Section 4 considers three pieces of evidence that women may not actually be happier than men, prompting Section 5, which describes intuitively and then formally the vignette-adjustment model. Section 6 estimates the vignette-adjusted model and compares it with the unadjusted model, for the global sample and separately by country, and looks for trends in the gender life satisfaction gap by region, GDP per capita, gender equality, and religion. Finally, Section 7 discusses the results and concludes. 2 Data 3 Since 2005, the Gallup World Poll continually surveys residents in over 150 countries, interviewing about 1,000 randomly sampled individuals in each country. World Poll questions measure opinions about national institutions, corruption, youth development, community basics, diversity, optimism, violence, religiosity, and other topics. The World Poll questionnaire is translated into major lan- guages of each country. The translation process starts with an English, French, or Spanish version, depending on the region. A translator proficient in both original and target languages translates the survey into the target language. A second translator reviews the language version against the original version and recommends refinements. With some exceptions, all samples are probability-based and nationally representative of the resi- dent population aged 15 and older. The coverage area is the entire country including rural areas, and the sampling frame represents the entire civilian, non-institutionalized, aged 15 and older 3 Much of this description is also in another (coauthored) paper of mine using the same dataset, not yet published, “Life Satisfaction Within and Across Countries: Societal Capital and Relative Income.” 13 population of each country. Exceptions include areas where safety of interviewing staff is threat- ened, scarcely populated islands, and areas interviewers can reach only by foot, animal, or small boat. Specifically, sampling in the Central African Republic, Democratic Republic of the Congo, Lebanon, Pakistan, India, Syria, Azerbaijan, Georgia, Morocco, Myanmar (Burma), Chad, Mada- gascar, Moldova, and Sudan was affected by security; some of these as well as Canada, China, Laos, and small parts of Japan had non-representative sampling of some geographic regions. In Arab countries (Bahrain, Kuwait, Saudi Arabia), sampling was of citizens (including Arab expatri- ates) and those who could complete the survey in Arabic or English; in the United Arab Emirates, all non-Arabs were excluded, i.e. more than half of the population. In the Philippines, urban areas were over-sampled. Israel excludes East Jerusalem (Gallup reports Palestinian Territories separately). 4 Running the analysis without these sampling-compromised countries would seriously restrict my available data. I therefore present the results including these countries; still the main results remain broadly the same if I remove them (see appendix). Telephone surveys are used in countries where telephone coverage represents at least 80% of the population or is the customary survey methodology. In Central and Eastern Europe and most of the developing world, an area frame design is used for face-to-face interviewing. In some countries, over-samples are collected in major cities or areas of special interest. In some large countries, such as China and Russia, samples of at least 2,000 are collected. Gallup has created a worldwide data set with standardized income and education data. To make education comparable across countries, education descriptions are recoded into one of three rele- vant categories: “Elementary”: completed elementary education or less (up to eight years of basic education); “Secondary”: completed some education beyond elementary education (9 to 15 years of education); “Tertiary”: completed four years of education beyond “high school” and/or received a four-year college degree. Similarly, annual household income in international dollars is calcu- lated using the Individual Consumption Expenditure corrected for the Household PPP ratio from the World Bank. These PPP-corrected values correlate strongly (r=0.94) with the World Bank estimate of per-capita GDP (PPP-corrected). The result is a household income measure that is comparable across all respondents, countries, and local and global regions. Response rates are calculated according to AAPOR Standard Definitions (Callegaro and Disogra, 2008), and reported figures include completed and partial interviews, refusals, non-contacts, and unknown households. Gallup World Poll response rates vary by mode of survey and region. Re- sponse rates in Sub-Saharan Africa are higher than other world regions, ranging from a high of 96% in Sierra Leone to a low of 54% in Nigeria, with an average response rate of 80%. Average response rates for the Middle East, Asia, South America and former Soviet Union countries are 63%, 56%, 43%, and 50%, respectively. As part of a National Institute on Aging supported project, 5 Gallup added a module on the interna- tional comparison of well-being to surveys conducted in 109 countries conducted during 2011-2014. These countries provide the sample for the current paper. Eighteen countries were interviewed in 2011, 39 in 2012, 26 in 2013, and 26 in 2014. Most countries have approximately 1,000 observations, with the exceptions of Russia (1,500), India 4 http://www.gallup.com/services/177797/country-data-set-details.aspx 5 Awarded to Arie Kapteyn, James P. Smith, and Arthur van Soest. 14 (5,000), China (4,500), Germany (3,000), United Kingdom (3,000) and Haiti (500), for a total of about 120,000 observations. My primary interest is in the answers to the following question (the so-called Cantril ladder): “Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?” In addition to a self-report, subjects were asked to rate the life satisfaction of each person in a set of six vignettes. The interviewer randomly asked about one of two possible vignette sets, set A or set B. Although respondents used a 0-10 scale, because there are relatively few responses at the top and particularly the bottom of the scale, I combine ratings into a five-point scale: ratings 0-2 are recoded as 1, ratings 3-4 are recoded as 2, ratings 5-6 are recoded as 3, ratings 7-8 are recoded as 4, and ratings 9-10 are recoded as 5. I do this for both self-reports and vignette ratings. As discussed in Section 5.3.2, the B set of vignettes fails tests for one of the crucial assumptions of vignette adjustment models (vignette equivalence), so I restrict my analysis to respondents who answered the A set. Of those, approximately 200 observations are dropped because respondents didn’t answer at least two vignette questions. 6 After dropping observations that are missing an essential variable (life satisfaction, household income, etc.), I end up with 45,731 observations from 103 nations. Summary statistics for those respondents are given in Table 1. About 15% of the sample (18,956 observations) was missing household income, the most commonly missing variable. Georgia, Singapore, and Ecuador were missing data on unemployment. Egypt and Lebanon were missing data on health. Thus my sample contains 45,731 respondents, all of whom answered at least two A-set vignette questions, as well as providing all other key variables. 3 Do women report higher life satisfaction? In short, based on the self-reports, yes. As noted, the existing literature certainly says so. Here I’ll show it using the new 2011-2014 Gallup World Poll data. As shown in Table 1, the overall female minus male reported life satisfaction gap is 0.030, significant at the 1% level. Next I show the effect of being female on life satisfaction reporting at the global and then regional level. Table 2 shows the results of ordered probit regressions of life satisfaction on personal characteristics: column 1 includes just an indicator for female, age, and age 2 , 7 while column 2 adds an indicator for urban, marital status indicators, employment status indicators, education level indicators, whether they have health problems, 8 and log of equivalized income. The reference categories are single, employed full time for an employer, and low education (elementary or less). To get log equival- ized household income, I took reported household income and weighted each member according to OECD weights (1 for the first adult, 0.5 for each additional adult, and 0.3 for each child). This 6 Results are similar if I only use respondents that answered all six vignette questions, although it reduces the sample by about 1,000 observations. 7 Higher order polynomial terms were tested and found insignificant. 8 The health measure is the yes/no answer to the question “Do you have any health problems that prevent you from doing any of the things people your age normally can do?” 15 may overstate purchasing power of women, because income is reported at the household level. Both columns include country fixed effects. The coefficients in both columns fit well with the existing literature. Life satisfaction follows a U-shaped function with age, with minimums at 52 and 53 years of age respectively. Because this is a single cross-section, it is not possible to disentangle age effects from cohort effects: it may be that other factors affecting each generation yield this pattern. This pattern may also be different for women and men, as found by Plagnol and Easterlin (2008). Being married increases life satisfaction (compared with being single), while being separated, divorced, or widowed reduces it. Working less than one would like (unemployed or underemployed) leads to lower life satisfaction. Education and income increase life satisfaction, and having health problems decreases it. The effect of being female is positive in both models. In the first column, the coefficient on the female indicator is 0.028, significant at the 5% level, showing that unconditionally, women report higher happiness than men. In the second column, adding the other characteristics increases the coefficient on the female indicator – women are worse off in most of those characteristics, so in order to maintain the positive gap in life satisfaction, there must be a large positive effect of being female. Table 3 shows the marginal effects of changing individual characteristics. The first column shows estimated life satisfaction self-reports with that characteristic, holding all other characteristics con- stant, and the second shows the estimated change in life satisfaction reports versus the listed com- parison. The marginal effect of changing only gender, from male to female, appears in the last row. Because it holds all other variables constant, the marginal effect is much larger than the raw gap, at 0.076. The third column compares that row’s change with the gender effect. For example, moving from the 30th percentile of one’s country’s equivalized income distribution to the 40th percentile increases life satisfaction reports by 0.047, which is 0.66 times as large an effect as the gender effect. Moving up a decile in the equivalized income distribution increases life satisfaction reports by about 0.5 to 1 times as much as the gender effect. In the U.S., moving up a decile is equivalent to in- creasing equivalized income by $5000 (10th to 20th percentile) to $14,000 (80th to 90th percentile). Other characteristic changes have even larger effects than gender – compared to not having health problems, having them on average reduces life satisfaction by 0.215, nearly three times as much as the gender effect, and unemployment (versus working full time for an employer) is similar. Still, even the largest magnitude change is less than three times the gender effect. Table 4 reports the coefficients on female dummies in ordered probit regressions of life satisfaction on the same variables included in Table 2 at the regional level, as well as the marginal gender effect. All female coefficients are significantly positive, significant at the 10% level except for Latin America & the Caribbean. Marginal effects range from 0.045 in the transition economies to 0.160 in Australia/New Zealand/United States/Canada. Even the smallest marginal effect reported is quite large, given the comparisons in Table 3. 4 Why should women be happier? Perhaps they are not. Despite the strong findings of the previous section that women report higher life satisfaction, there are many reasons for women to be less satisfied with their lives than men. In this section, I will provide evidence using vignettes that women tend to rate the same objective life circumstances 16 more highly than men do. I will also review gender differences in responses to affect questions, which ask respondents about their feelings during the previous day. These responses will indicate that women are more likely to report all negative affects, and less likely to report some positive affects. Finally, I will attempt to decompose the gap in life satisfaction responses into the portion explained by differential characteristics between women and men (e.g. education), by how they differentially value those characteristics (e.g. how education influences life satisfaction for women vs. for men), and the gap explained by neither – that is, the portion of that gap that is just due to one’s gender. As it turns out, gender alone must explain a large gap. 4.1 Vignette ratings In addition to asking respondents on which step of the ladder they current stand, Gallup also asked them to state on which step of the ladder six hypothetical people stand. The exact vignettes were as follows: A1: Think of a female who is 40 years old and happily married with a good family life. Her monthly family income is about [median income]. She has severe back pain, which keeps her awake at night. On which step of the ladder do you think this person stands? A2: Think of a male who is 50 years old and divorced. He has a daughter with whom he has a good relationship. He has a secure job that pays about [twice median income] per month. He has no serious health problems. On which step of the ladder do you think this person stands? A3: Think of a male who is 25 years old and single without many friends. He makes about [half median income] per month. He feels he has little control over his job and worries about losing it. He has no health problems but feels stressed sometimes. On which step of the ladder do you think this person stands? A4: Think of a female who is 35 years old and married, with no children. Her monthly family income is about [median income]. Her work is a bit dull sometimes, but it is a very secure job. On which step of the ladder do you think this person stands? A5: Think of a female who is a 70-year-old widow. She receives about [half median income] in income each month. She has many friends. Lately, she suffers from back pain, which makes housework painful. On which step of the ladder do you think this person stands? A6: Think of a male who is 60 years old. His is single but has many friends his age. He no longer works but is comfortable with his decision to stop working. He receives about [twice median income] in income each month. He is very physically active. On which step of the ladder do you think this person stands? The income levels (median, half the median, or twice the median) were filled in with the the ap- propriate value for the respondent’s country. Every respondent saw the same six vignettes; they were not randomized by e.g. gender or income. Table 5 considers gender differences in ratings for each vignette. Women rate every single vignette higher, significantly so. This implies that women tend to give higher ratings for the same life circumstances; this is true whether they are rating vignettes of men or women. And, the differences between men’s and women’s vignette ratings – 0.03, 0.05, 0.04, 0.03, 0.02, 0.06 – are similar in 17 magnitude to the difference between women’s and men’s self-reports, 0.03. The reporting gap was particularly strong for male vignette characters (average: 0.050, compared with 0.027 for female characters). 4.2 Feelings day-to-day Beyond the Cantril’s ladder questions, respondents were also asked about their feelings during the previous day. Specifically they were asked, “Did you experience the following feelings during a lot of the day yesterday?” followed by several questions (e.g., “How about enjoyment?” and “How about worry?”). In all, about 99% of the sample was asked if they experienced physical pain, worry, sadness, stress, anger, enjoyment, whether they felt well rested, whether they were treated with respect, whether they learned something, and whether they smiled or laughed. All countries were asked these questions, though a small number in many countries was not. Experiential well-being, as measured by day-to-day experiences of feelings such as these, is not the same as evaluative well-being, as measured by Cantril’s ladder. There is modest correlation between experienced well-being and evaluative well-being (ρ = 0.3− 0.5), but in factor analyses, evaluative measures generally load on different factors than positive or negative experiential well- being (Kahneman and Deaton, 2010). Table 6 compares men’s and women’s responses to these questions. Women are more likely to experience negative feelings. For positive feelings, women were more likely to feel respected 9 and to smile/laugh, while men were more likely to feel well rested and to have learned something; neither men nor women were more likely to experience enjoyment. Overall, women were more likely to say they experienced all negative feelings, while men were more likely to say they experienced some positive ones. It seems that although women report having higher life satisfaction than men on average, their day-to-day lives are not more enjoyable. In itself, this does not contradict the finding that women report higher overall life satisfaction – any parent (or graduate student) understands taking on extra stress or worry because a child (or dissertation!) brings a deeper joy. But it is one piece of motivating evidence. 4.3 Decomposition analysis Perhaps women are happier despite being objectively disadvantaged, because they subjectively value their own characteristics more than men do. For example, perhaps women value being mar- ried more than men do, while men value being educated more than women. In that case, we may see that women’s life satisfaction ratings are high if many are married, even if they are not highly educated. In this section I show that this is not the case, and that in fact, women’s self-reports are much higher than they should be, given their characteristics and how they value them. To address this, I attempt to decompose the observed differences, using a two-fold Blinder-Oaxaca decomposition (Blinder 1973, Oaxaca 1973), in the gap between women’s and men’s life satisfac- tion ratings. This method is commonly used to decompose gender gaps in wages, to determine how much is due to men having more desirable attributes such as additional work experience, and how much is due to rewarding those attributes less when women have them. Rather than looking 9 The question did not specify by whom they felt respected; there may be gender differences in whose respect came to mind when answering this question (e.g. spouse, children, coworkers, neighbors, etc.). 18 at the difference between men’s and women’s wages, I decompose the difference between women’s and men’s life satisfaction. To do so, I report the portion of the gap attributable to differences in observed characteristics (e.g. education level, health) and the portion attributable to differences in how those characteristics translate into life satisfaction (i.e. the differences in coefficients on those observed characteristics). The portion explained by differences in characteristics is often called the explained portion, while the portion explained by differences in coefficients is called the unexplained portion. 10 . For a linear model, the two-fold Blinder-Oaxaca decomposition splits the observed difference in the mean as follows: LS f −LS m = [E(X f )−E(X m )] 0 β ∗ + [E(X f ) 0 (β f −β ∗ ) +E(X m ) 0 (β ∗ −β m )] (1) whereX f andX m are female and male characteristics respectively,β f andβ m are female and male coefficients on those characteristics in separate regressions, and β ∗ is coefficients from a pooled regression. The first term on the right hand side is the portion of the mean attributable to differences in char- acteristics, while the second term is the portion attributable to differences in coefficients. Thus the first term will capture whether women are happier because they have some advantage in their life circumstances (e.g. if they have higher incomes) and the second will capture whether women are happier because they differentially value the characteristics they have (e.g. if education improves life satisfaction more for women than for men). Several of the included observed characteristics are categorical (e.g. high, medium, or low educa- tion). The original decomposition proposed by Oaxaca (1973) and Blinder (1973) gave different estimates for the impacts of the individual characteristics and coefficients depending on which cat- egory was the reference category. Yun (2005) improved the method by incorporating normalized regression into the model, essentially requiring that the coefficients on the categorical dummy vari- ables sum to zero. Normalizing the regression coefficients on categorical variables does not change the overall explained and unexplained portions, nor the coefficients on other variables (Yun, 2005). I normalize the coefficients on all of the included categorical variables (marital status, employment status, education level). Standard errors are clustered at the country level. Results of the linear decomposition are shown in Table 7. (Linear models are generally good ap- proximations for ordered probit models, at least in life satisfaction surveys (Ferrer-i Carbonell and Frijters, 2004)) 11 . The total gap in self-reports is 0.030. The explained portion of the gap is -0.076, and the total unexplained portion is 0.105; that can further be broken down into the unexplained 10 Because there is no reason to presume that the “true” coefficients are from a regression using only males or only females, the reference coefficients are from a pooled regression including both groups, as proposed by Oaxaca and Ransom (1994) 11 In a nonlinear (ordered probit) model, (1) is replaced by the similar equation: LS f −LSm = [E β ∗(Y if |X if )−E β ∗(Yim|Xim)] + [(E β f (Y if |X if )−E β ∗(Y if |X if )) + (E β ∗(Yim|Xim)−E βm (Yim|Xim))] (2) The results are similar: the gap to be explained, 0.023 (100%), the explained portion, -0.056 (-244%), and the unexplained portion, 0.079 (+344%), tell the same story as the linear case. 19 portion due to coefficients on included variables, 0.025, and the constant term, 0.080, which mea- sures the remaining unexplained gap (the portion due to being female). In other words, women’s characteristics tend to be worse, so the contribution of these differences to the overall gap is actually negative; the gap increases once I include them. Women are more likely to be separated, divorced, or widowed, less likely to work full time for an employer or be self-employed, more likely to have low levels of education, more likely to report health problems, and have lower incomes. There is no characteristic here that explains the gap; all significant characteristics have negative contributions, increasing it further. On the unexplained side, women have different curvature of their age functions which leads to higher life satisfaction for them, that is, their life satisfaction reports evolve somewhat differently over the life cycle than men’s, which explains some of the gap. Women also mind less being sepa- rated, unemployed, or out of the workforce. On the other hand, women benefit less from working full time for an employer. Overall, differences in how women and men translate their characteristics into life satisfaction reduce the gap somewhat. The final gap, the effect of being female, all else equal, is 0.080, about 2.5 times the raw gap. Women have worse observable characteristics, which widens the gap. And although women do value some characteristics differently than men do, it is not nearly enough to close the gap. Interpreting the constant term in the unexplained portion as the value of “group membership” for women requires that I assume that everything else relevant to life satisfaction determination is included in the model. Of course, that is not true; there are many important variables that determine one’s happiness, and it is impossible to include them all. The large group membership bonus does imply that when considering some of the most consistently found determinants of happiness – employment status, education, health, income – they point to reasons that women should be less happy than men, not more. This suggests that something else is at play. 5 Anchoring vignette-adjusted model 5.1 Intuition Previous studies of life satisfaction ask each respondent for their self-report, and find correlates with these ratings. But in comparing life satisfaction ratings across individuals (in this case, between men and women), we are assuming that they are using comparable scales when rating their own lives. This is not obvious. As noted, women rated every vignette subject more highly than men did. This begs the question, is the scale men are using systematically different from women’s? Consider Figure 1. Imagine that men actually have higher average life satisfaction than women, as indicated by men’s distribution being shifted right of women’s, but that they are less generous in their ratings, as indicated by the right-shifted thresholds. Now imagine a hypothetical person with true life satisfaction at the dashed line. On the men’s scale, that’s a 2 out of 5. On the women’s scale, because the thresholds are shifted left, it’s a 4 out of 5. If men and women are using these different response scales in their self-evaluations, it will appear that women are happier, when in fact men are. To account for this, I want to put men’s and women’s responses on the same scale, by evaluating everyone’s self-reports using the same set of thresholds, in this case, the men’s. To do so, Gallup asked every individual to rate the same sets of vignettes. Because the vignettes shown to men 20 and women are the same, they have the same “true” underlying life satisfaction level. If men and women give systematically different ratings, it must be because of differential item functioning, or DIF, otherwise known as differing response scales (King et al., 2004). Once men and women are using the same adjusted response scale, I can reevaluate the life satis- faction gender gap by simulating women’s life satisfaction ratings when using men’s scales. This analysis depends on two key assumptions. First, response consistency means that individuals are using the same scale to rate their own life satisfaction as the vignette subjects’. King et al. (2004) and Van Soest et al. (2011) have provided good evidence that response consistency holds in other domains. Unlike some more controversial behavioral survey questions (e.g. regarding alcohol consumption), in which respondents may feel social pressure to change their self-reports but not vignette ratings, there is less incentive to misreport their own life satisfaction. Second, vignette equivalence means that men and women interpret the vignettes in the same way. Because my vignettes include incomes based on the median income, this assumes that people rate their life satisfaction in part based on their income relative to their country’s median; Kapteyn et al. (forthcoming) provide evidence that this relative measure is appropriate. 5.2 Formal model To do the vignette adjustments formally, I use a HOPIT (hierarchical ordered probit) model. HO- PIT operates similarly to ordered probit, except that in addition to affecting the underlying latent variable for life satisfaction, individual characteristics can also affect the thresholds between re- sponse levels. Each respondent answers the life satisfaction question for themselves, as well as for six hypothetical individuals. Y ri is a respondent i’s self-report, and it is a function of their characteristics as well as an error term ri which is normally distributed with zero mean and is independent of X i . As in an ordered probit model,Y ∗ ri is a latent variable such that Y ri = j ifY ∗ ri is above the thresholdτ j−1 i and below the threshold τ j i : Y ∗ ri =X i β + ri , ri ∼N(0,σ 2 r ) (3) Y ri =j if τ j−1 i <Y ∗ ri ≤τ j i (4) What makes this model different from an ordered probit model is how the thresholds are deter- mined. They are also functions of the subject’s characteristics, as well as an idiosyncratic error u i , which is independent of ri and X i : τ 0 i =−∞,τ 5 i =∞,τ 1 i =γ 1 X i +u i τ j i =τ j−1 i + exp (γ j X i ), j = 2, 3, 4 (5) The individual thresholds τ j i represent DIF. To make self-evaluations comparable, take one respondent’s scale as the benchmark scale. That respondent has characteristicsX i =X(B), and thus has thresholdsτ j B . I can now compare all other individuals using thresholds τ j B . Because the latent variable Y ∗ ri is not affected by the thresholds, this does not imply a new level of the latent variable. But it does imply a new rating, Y ri , for which 21 it is possible to simulate the adjusted distribution. Not all parameters are identified; namely, only the difference between β andγ 1 can be determined. Only looking at self-reports, I encounter the problem in my simple example: men and women are using different scales. To identify them separately, I use vignette ratings. Each respondent rated a set of six vignettes. Y li is the rating given by i to vignette l: Y ∗ li =θ l + li (6) Y li =j if τ j−1 i <Y ∗ li ≤τ j i , j = 1,..., 5, li ∼N(0,σ 2 ), independent of li , ri , and X i . (7) where θ l is a dummy variable indicating vignette l. Notice that the equation for Y ∗ li does not include any personal characteristics X i , in line with the assumption of vignette equivalence. The assumption of response consistency means that the τ j i here are the same as those used with Y ri . With this, I can identify β, γ 1 ,...,γ 5 , θ 1 ,...,θ 6 up to a normalization of scale and location. 5.3 Testing vignette equivalence To check vignette equivalence, I perform two tests. First, I investigate whether each individual’s vignette ordering matches the global ordering; if each individual is interpreting the vignettes in the same way, they should order them the same way (with some error). For this I follow Murray et al. (2003) by calculating adjusted Spearman rank order correlations between individual vignette orderings and the global ordering, as explained below. Second, following D’Uva et al. (2011), I test whether individual characteristics significantly determine vignette evaluations. If the vignettes are interpreted the same way by everyone, then the latent vignette ratingY ∗ li should be determined en- tirely by the vignette fixed effect, and any difference in the observed Y li should come from differing thresholds τ j i , which vary by individual characteristics (e.g. female vs. male). In the first test, the B vignettes underperform relative to the A vignettes. In the second test, the B vignettes fail entirely, and thus those respondents that received the B set of vignettes, roughly half, are removed from my sample. 5.3.1 Vignette ordering First, I quantify to what extent respondents deviated from the global vignette ordering in their individual orderings. The “global” vignette ordering is defined as the ordering when all respon- dents’ ratings are averaged. (The global ordering is the same if I define it as the mode ordering.) Following Murray et al. (2003), I call the benefit of the doubt (Spearman) rank order correlation coefficient (BDROCC) the Spearman correlation with ties resolved to favor the overall (global) ordering. Each respondent rated the vignettes on the same 0-10 scale, and thus, they could give the same rating to multiple vignettes. In finding which orderings are consistent with the global ordering, I resolve ties as if they matched the global order. In the extreme case, if an individual rated every vignette the same way, it would be perfectly consistent with the global ordering. (In practice, well under 1% of individuals did this in the A set, and about 1% did this in the B set.) A high BDROCC (near 1) means it is close to the global ordering. Table 8 summarizes the BDROCC (see the appendix for a full list of every country) for both sets of vignettes. Four individual countries were especially problematic: Chad (median BDROCC = 22 -0.337 for A set), Palestinian Territories (median BDROCC = -0.143 for B set), Japan (median BDROCC = 0.086 for B set), and United Arab Emirates (median BDROCC = 0.029 for B set). Chad is removed from all A-set analysis that follows. Correlations are higher in the A set. The median BDROCC for the B set is just 0.543, compared with 0.829 with the A set. While the majority of respondents (59%) were very close to the global ordering in the A set (perfect match, one single-rank inversion, two single-rank inversions, or one double-rank inversion), only about a third were very close in the B set. Looking back at Table 5, there is less variation in the average rankings for the B set than the A set; the range of averages is 3.95-6.60 for the A set and just 3.95-5.14 for the B set. This could reflect any number of factors, including how the vignettes were written (they may not be different enough), how the individuals interpreted the questions (individuals may emphasize different characteristics while developing their rankings, so that a vignette with a high income but bad health may be highly ranked by one person and poorly ranked by another), or how the survey was administered (the surveyors may have made mistakes in asking the questions, Gallup may have made a mistake in the materials they used, or the data may have been recorded incorrectly). For whatever reason, the people rating the B-set were not as consistent, and thus vignette equivalence may not hold for them. There is no theoretical cut- off for how high the BDROCC can be while still assuming vignette equivalence. Clearly, however, the B set is much less consistent than the A set. 5.3.2 The effect of gender on vignette ratings As a second check of vignette equivalence, I consider a slightly modified version of the model, replacing equation (6) with the following: Y ∗ l1 =θ 1 + l1 Y ∗ li =θ l +λ l X i + li ,l6= 1 (8) where l counts 1,..., 12 when both sets of vignettes are included and 1,..., 6 when only using one set. For identification, I must omit λ l X i from one vignette equation. If vignettes are equivalent, then coefficients λ l should be all equal to zero. Table 9 shows coefficients on female in (8). Using the pooled sample (columns 1 and 2), the A vignettes do not indicate a vignette equivalence violation, but the B vignettes do. In the separate samples, for both the full model and the female and only model, the pattern holds: female does seem to influence interpretation of the vignettes in the B set, but not the A set. Because of the evidence here that the B vignettes violate the vignette equivalence assumption, I focus my analysis on the sample that answered the A vignettes. 6 The gender gap with vignette adjustments To see how much of the gender gap is due to DIF, I compare results for ordered probit regressions (no vignette adjustments) to results for HOPIT regressions (with vignette adjustments). Likelihood ratio tests show that the HOPIT model is preferred in both cases. Table 10 shows coefficients in HOPIT regressions of the same specifications as Table 2: the first column regresses life satisfaction on a female indicator, age, and age 2 ; the second column adds 23 marital status, employment status, education level, health problems, and log equivalized household income. Both columns include country fixed effects. Each specification in Table 10 has two sets of coefficients: the left two columns are coefficients affecting the life satisfaction rating, and the right two columns re coefficients affecting τ 1 , the threshold between the lowest and second-lowest response categories. Here I only report the first threshold equation; see the appendix for coefficient estimates on all four response thresholds. A positive coefficient in the life satisfaction equation means increasing that variable increases life satisfaction. A positive coefficient onτ 1 means that increasing that variable is moving the threshold toward higher values ofY ∗ i , i.e., making a respondent withY ∗ i just above the threshold change their response to the lowest response category. A negative coefficient on τ 1 means the opposite. The first specification shows a striking difference in the effect of “female” in the ordered probit vs. HOPIT regressions. Because this specification includes only female and age as regressors, it shows the overall impact of being female, including all gender differences in characteristics. The second model includes additional individual characteristics, so that it isolates the impact of being female holding those factors constant. In specification (1), Table 2 shows that women have significantly higher life satisfaction, while the HOPIT model in Table 10 shows that there is essentially no effect. The coefficient on τ 1 explains the difference: women have lower τ 1 thresholds than men. As a result, it takes less for them to move up to a higher response category (or equivalently, it takes a worse situation for them to report a lower response category). The next specification tells a similar story. The coefficient on female in Table 2 is positive and strongly significant (0.087). In Table 10, the coefficient on female is still significant, but it is much smaller (0.043). Once again, we see that the coefficient on female in τ 1 equation is negative, im- plying that women use more generous thresholds. The rest of the coefficients in the HOPIT life satisfaction equation match the results of the ordered probit regressions in Table 2 in their signs, significance levels, and for the most part, even in mag- nitude. This tells us that these effects are not due simply to DIF, and that for example, education really increases life satisfaction – it isn’t that more educated people are just more generous in their self-reports. Looking at the coefficients in the τ 1 equation of Table 10, being female reduces the first thresh- old, as expected. Being more educated reduces the first threshold, while having health problems increases it, as does being divorced. Table 11 shows the marginal effects of changing individual characteristics in the HOPIT equation for life satisfaction alongside the ordered probit marginal effects from Table 3. The marginal ef- fects are similar for many variables, though there are some differences. Moving up a decile in the equivalized income distribution has a somewhat larger effect in the vignette-adjusted model, as does moving from low education to high education. The benefit of marriage is doubled, but the effects of being divorced or widowed are somewhat smaller. The harm from being unemployed or underemployed are even larger. In the last row, the marginal effect of gender is more than twice as large without vignette adjustment as with it. The ordered probit model overestimates the effect of being female because it fails to consider that women use more generous response scales as well. The HOPIT model in Table 11 isolates the improvement in life satisfaction for women, holding their response thresholds constant. Although being female still improves life satisfaction, all else equal, 24 the improvement is smaller in level as well as compared with other possible characteristic changes. Although the marginal effect of being female is still positive, because women are worse off in their other characteristics, their overall life satisfaction is lower than men’s if they use the same scale as men’s. Table 12 shows simulations of life satisfaction ratings for each specification shown in Table 10. The top panel shows estimates using each respondent’s own gender’s thresholds: women’s esti- mated life satisfaction, men’s estimated life satisfaction, the gap, and the simulated distribution of women’s and men’s responses. The bottom panel shows the same calculations, but adjusts women’s thresholds to match men’s (that is, to set the coefficient on “female” to zero in the equations for τ 1 throughτ 4 ). 12 . The last line shows the percentage of the top panel gap that is eliminated in the bottom panel. In both models, evaluating women’s self-reports using men’s thresholds completely reverses the gap, by about 150%. If women used the same response scale as men, they would have lower life satisfaction than men; the gap would be about half as large as it is now, in the other direction. Using men’s thresholds, women are more likely to put themselves in the highest response category and less likely to put themselves in the lowest response category. At the global level, women on average report higher life satisfaction than men. At the individual country level, there is heterogeneity. Table 13 shows simulations of the gender gap using respon- dents’ own thresholds (Gap own ) and using men’s thresholds (Gap m ), as in Table 12, but for models run at the individual country level. It also shows the number of men and women in the sample (N m ) and N( f ), the size of the scale adjustment (Avg. adj.), and the marginal effect of changing from male to female (Marg. eff.). To measure the size of the scale adjustment, I use the average ad- justment in women’s life satisfaction when women in that country move from their own thresholds to men’s thresholds: for each female observation, I subtract their expected life satisfaction rating using men’s thresholds minus their expected life satisfaction rating using their own thresholds. A negative number means their self-reports are lower with men’s thresholds, a positive number means the opposite. Specifically, the average life satisfaction adjustment for women when I set female equal to zero in the equations for τ 1 through τ 4 . (Subtracting Gap m - Gap own yields the same value.) Overall, the individual country results are similar to the global results, with some key differences. First, when estimated separately by country (and weighting each country equally), the average gender life satisfaction gap is smaller when respondents use their own thresholds, 0.020, and the gap reversal is much stronger, all the way to -0.026 (230% decrease). The overall marginal effect, 0.038, is somewhat larger than in the global results. The next section investigates patterns in Gap own Gap m , using the estimates from these individual country regressions. 6.1 Patterns in the gender gap To compare with the existing literature, here I consider trends in the gender gap when women use their own thresholds and when they use men’s thresholds, to see if they are associated with 12 This method does not account for the fact that women also have other characteristics that are different from men’s (e.g. men’s higher education levels should shift their τ 1 threshold left, as shown in Table 10) Overall, the results in the bottom panel are similar if I set all coefficients in the threshold equations to men’s values. 25 geographical region, GDP per capita, gender equality, or religion. All values reported here are based on the simulations reported in Table 13. Not all countries and individuals are included in the following tables. First, people who were not asked (or did not answer) their religion are not included in the religion table. Many countries are missing at least one component in the women’s rights and status table, and some are missing GDP per capita. 13 And, because these tables are all based on the adjustment at the national level, two countries that had a problem with fitting the HOPIT model are also excluded from all tables. 14 First, regionally. Table 14 shows the estimated gender gap when women use their own and men’s thresholds respectively, in Gap own and Gap m , and reports the average adjustment for women when switching from their own to men’s thresholds. Significance stars reflect t-tests comparing men’s and women’s average simulated life satisfaction. Every region saw a decrease in the gen- der gap – the largest is in the Middle East and North Africa, where the gap reduced from 0.02 to -0.12. The smallest adjustments were in East Asia and Sub-Saharan Africa. Gap own is most negative in Latin America and the Caribbean, East Asia, and Sub-Saharan Africa, where men have higher relative life satisfaction on average; on the other side, South Asia and Australia/New Zealand/Canada/United States have the largest positive Gap own . Gap m remains most positive in Australia/New Zealand/Canada/United States, though at a much smaller magnitude, and the Middle East and North Africa has the most negative gap, a very large -0.120. Evidence in the literature has been mixed about the relationship between the gender gap and GDP per capita. Table 15 shows results of regressions of Gap own (column 1) and Gap m (column 2) on log GDP per capita and log GDP per capita squared, while Figure 2 shows the same regres- sions graphically with 95% confidence intervals. The gap when women use their own thresholds has no significant relationship with GDP per capita. When women use men’s thresholds, there is somewhat of an inverse-U shaped relationship, with the remaining gap being at its highest among middle-income countries, implying that men’s average happiness advantage is greatest among low- and high-income countries. However, the coefficients are only marginally significant, and graphi- cally, the inverse-U shape does not appear to be a strong relationship. Table 16 considers the effects of some measures of women’s rights and status. Data on the percent of parliament made up of women comes from the World Bank Development Indicators, while the other measures come from the OECD Development Centre’s Social Institutions & Gender Index (SIGI). SIGI rates countries’ gender equality on five components: discriminatory family code (e.g. inheritance rights of widows/daughters), restricted physical integrity (e.g. laws on rape, prevalence of female genital mutilation), son bias, restricted resources and assets (e.g. women’s access to fi- nancial services), and restricted civil liberties (e.g. political representation, freedom of movement for women); some measures are missing for some countries. SIGI also creates an overall composite score including all five components. In all SIGI measures, which range from zero to one, a low value indicates better gender equality. Table 16 regresses Gap own and Gap m on each of these measures separately at the country level. Most measures indicate that having better gender equality reduces Gap own , toward higher men’s life satisfaction relative to women’s; son bias has the largest effect in both Gap own and Gap m . A discriminatory family code and women’s restricted access to resources and assets are also somewhat predictive of Gap m . 13 Argentina, Myanmar, Syria, Taiwan, Yemen. 14 Afghanistan and Senegal; these will be updated in a future draft. 26 Finally, I consider religion, both as reported at the individual level (Table 17 and the “main religion” of each country (Table 18). Individual religion was reported by most respondents; for completeness I show results from those that did not answer. Main religion data comes from the Pew Research Center’s Global Religion Landscape survey. A country has a main religion if at least 50% of the country identifies with that religion. “Unaffiliated” means not affiliated with any religion, while “no main religion” means that no religion has at least 50% in that country. In Table 17, Gap own is the most negative among Jewish respondents, and Gap m is even more negative. Christians, the largest reporting group, have a significantly positive Gap own and an in- significantly negative Gap m . Besides the relatively small number of Jewish respondents (who are mostly located in Israel), the largest adjustments are among Hindus and Muslims, who move from having a significantly positive Gap own to a significantly negative Gap m . The only group that main- tains a positive Gap own is the secular/non-religious. They, along with Christians and those who did not answer the question, had the smallest magnitude adjustments. Considering each country’s main religion, in Table 18, the findings are similar. Israel, the only Jewish-majority country, has similar Gap own and Gap m as in the individual religion table. Simi- larly, most Hindus in Table 17 are in India and Nepal, the two Hindu-majority countries. The main differences between Tables 17 and 18 are in religions that are less concentrated nationally: Chris- tians, Buddhists, and ”others.” Individual Christians have a significantly positive Gap own and no significant Gap m , while Christian-majority countries have no significant Gap own and a significantly negative Gap m ; Buddhists show a similar pattern. In both cases, the average adjustment is similar in both tables. 7 Discussion & conclusion 7.1 Discussion I agree with Stiglitz et al. (2009) – GDP isn’t an adequate measure of human welfare, and improve- ments in GDP per capita are not the best way to evaluate development. Life satisfaction surveys capture dimensions of well-being that income alone cannot. Indeed, the kind of well-being this paper focuses on, evaluative well-being, is just one of three main kinds recently studied (the other two are experiential – how one feels at any moment – and eudemonic – how fulfilling and meaningful one’s life is). All are vital to understanding human welfare. Yet, simply asking people how satisfied with their lives they are may not show us their underlying “true” life satisfaction. There are good reasons that GDP has been used for so long, despite most people (including the general public, not just social scientists) agreeing that it is a flawed metric: it is relatively consistently measured, and relatively easy to interpret. To begin considering a subjective well-being measure when examining potential policy decisions, we need to thoroughly validate that measure. Asking the incredibly diverse world population the same life satisfaction question and comparing their responses is not enough. The World Happiness Report summarizes findings related to happiness around the world, and de- scribes how happiness surveys can be used to evaluate progress. In a chapter of the 2015 report, titled “How to Make Policy When Happiness is the Goal,” Richard Layard and Gus O’Donnell pro- pose assessing policy options by determining a critical happiness per dollar of cost, and enacting policies that have the highest happiness per dollar above that threshold (Layard and O’Donnell, 27 2015). To apply this methodology, we need valid happiness measures. Consider Graham and Chat- topadhyay (2012)’s finding that women’s happiness actually decreased following improvements in gender equality. It may be that their response scales changed based on their new life experiences, or they may have become less happy for any number of reasons. In any case, improvements in gender equality should not be avoided because of this. Using only self-reports as a basis can lead to promoting policies that encourage higher happiness reporting, rather than genuine improvements in well-being. When we compare raw life satisfaction self-reports across heterogeneous groups, we are making many assumptions, such as: that it is possible to distill one’s life satisfaction into a single digit; that subjects are interpreting the question in the same way; that they are considering the same domains of life when responding; that they are using the response scale in the same way. Research in life satisfaction progresses not only by finding empirical (and theoretical) associations between happiness and other factors, but also by examining and testing the above assumptions and others. In one of the pioneering studies in life satisfaction, Cantril (1965) helped to validate this field of research with a qualitative study in 14 countries, asking respondents to name, without a provided list of possible responses, what constituted the best life to them. He found that people list the same factors regardless of where they are from: financial concerns, family, health, work, and other personal concerns. The consistency of their responses has helped to justify cross-country comparisons of life satisfaction self-reports, by evidencing that they are measuring the same thing. De Neve et al. (2013) de Neve et al. (2013) further demonstrated the usefulness of subjective well- being data, finding that lagged happiness predicts future health, productivity, and social outcomes. Stevenson and Wolfers (2009) found that research comparing across countries that use different survey methods (question ordering, response coding changes, changes in exact question text, etc.) can lead to bias, and suggested an alternative method. Several authors (e.g. Diener et al., 2013) have discussed within-individual reliability of life satisfaction ratings over relatively short periods. This chapter is an attempt to test one assumption listed above, that respondents are uniformly using the response scale in the same way when rating their own life satisfaction. It focuses on the often-observed life satisfaction gender gap as a case in which observed conditions (lower in- comes, less education, worse self-reported health, increased divorce/widowhood; higher incidence of depression, anxiety, and attempted suicide; restricted rights, representation, and autonomy; and so on) conflict with life satisfaction self-reports. Of course, counter-intuitive findings need not be cause for alarm in empirical research – many such results in economics make perfect sense once the explanation is found. In the present case, the explanation may be simply that women are more satisfied with their lives than men, in spite of their disadvantages, for cultural or biological reasons. My results do indicate that there is a positive marginal gender effect for women, but that there is also a gender scale bias. Table 12 shows that, with vignette adjustment, the female-male life satisfaction gender gap reverses by about 150%: when women’s life satisfaction is estimated on men’s response scale, women are less satisfied than men on average. Yet, the model I use to test the above assumption imposes key assumptions of its own: vignette equivalence and response consistency. Although I statistically test the vignette equivalence assump- tion, no statistical test can answer for sure, based on the information collected, whether all subjects were interpreting the anchoring vignettes in the same way. The vignettes are only a few lines each, and women may “fill in the gaps” more optimistically, imagining the character’s life as more full and fulfilling. Similarly, the assumption of response consistency may be wrong. As Deaton (2011) noted 28 regarding the use of anchoring vignettes on disability, respondents may have systematic differences in their ability to empathize with vignette characters. Men may be “tougher” in rating the lives of others than they are in rating themselves. Women may be more able to imagine themselves in the vignette characters’ positions. Research has indicated that women are more empathetic than men (e.g. Macaskill et al., 2002), although it is not clear that more empathy should lead to strictly higher vignette ratings. The question of what is actually being recorded in life satisfaction self-reports is not closed, and future research should continue to confront it. An open-ended, qualitative study similar to Cantril (1965)’s, asking respondents what they have in mind while rating the vignettes – what they imag- ined, whether they compared the vignette subject to someone in their own lives or themselves, whether/how they filled in the gaps to imagine a full-fledged individual to rate – may shed light on how subjects rate vignette characters. As suggested by Deaton (2011), having individuals rate the life satisfaction of vignette characters with the same attributes as themselves can help test the assumption of response consistency. Using panel data with individual fixed effects makes response scale differences less likely to cause bias, though it is limited in two key ways: it does not allow researchers to evaluate the impact of time-invariant conditions such as gender, and, we do not know whether response scale use changes over time. If scale use is changing over time, and over long periods it certainly may, the same source of bias can arise. If a woman says that she is an 8 on a 0-10 scale, and a man says that he is a 6, should we not take them at their word? In cases where there are objective standards, such as in political efficacy (King et al., 2004) or alcohol consumption (Van Soest et al., 2011), vignettes have generally done a good job in explaining differences in responses across countries, social groups, or individuals. One can interpret this as different people having different response scales (“a lot” does not mean the same everywhere) or different standards (I am more easily satisfied with my political influence because I don’t expect much). These two alternative interpretations are generally impossible to disentangle. This is certainly the case when no objective measure exists (as in the case of life satisfaction). One interpretation of my findings is that women use different scales; the other one is that women have different standards. There is no objective measure of life satisfaction. I find that women rate vignettes higher than men do. Under the response consistency assumption, this means that if they themselves were in the situation of the vignette person they would rate their own satisfaction higher than men. Whether this means that they only use different words to describe their situation or that they will be gen- uinely happier in the same situation than men is impossible to know. My results do not challenge many other cross-sectional conclusions – indeed it is reassuring that the systematic differences in response scale use are relatively small beyond gender. Many cross-sectional studies control for gender and even allow for heterogeneous effects. With adequate controls, my results support much of the past research. Still, this article provides evidence of a case in which we should not solely rely on life satisfaction self-reports for cross-sectional comparisons. Policymakers (and the academics who influence them) should think carefully about what their subjective well- being measures are measuring before attempting to use them for policy. 29 7.2 Conclusion Although there is heterogeneity, on a global scale, women report being more satisfied with their lives than men – considering self-reports only, the marginal effect of being female rather than male is equivalent to moving up one’s country’s income distribution by more than a decile. Yet, my analysis suggests that much of the gap is due to men and women using the life satisfaction response scale differently, that the true marginal gender effect is much smaller, and that on average, women have lower life satisfaction than men do. I use an anchoring vignette-adjusted model. In that model, individual characteristics can affect not only an individual’s life satisfaction, but also the thresholds between ratings on the response scales, moving them up (so that the same underlying life satisfaction level results in a lower rating) or down (so that the same underlying life satisfaction level results in a higher rating). I find that the marginal effect of gender is still positive, but much smaller: less than half as large as the unadjusted model estimates. And by simulating each woman’s life satisfaction self-report using her own thresholds as well as men’s thresholds, I find that putting men and women on the same scale completely reverses the gap, that in the end, women are less happy, and that the reversed gender gap is about half as large as the original. The reason the unadjusted model exaggerates the gender effect is that it cannot distinguish between an increase in life satisfaction and a lowering of response thresholds. There is a positive effect of being female on life satisfaction, but women give a higher rating for the same underlying life satisfaction, because their response thresholds are lower. My results are consistent with several previous papers’ findings on the gender gap in life satisfac- tion. Arrossa and Gandelman (2016) did a similar analysis as in my Oaxaca-Blinder decomposition, and with the same conclusion: the life satisfaction gap should be larger given women’s observed characteristics and their preferences over those characteristics. Indeed, the authors concluded that women must be more optimistic than men in their self-reports, which is precisely what my results show. Stevenson and Wolfers (2009) found that women’s happiness was declining, in absolute value and relative to men’s, in the United States and Western Europe over the last several decades. It may be that this is true; it may also be explained, at least in part, by women’s response scales con- verging with men’s. As other gender gaps have closed or narrowed, gender differences in happiness self-reports have likely narrowed as well. It is also worth noting, to the credit of existing life satisfaction research, that other variables (mar- ital status, employment status, education, health, income) influence life satisfaction in the same direction and in most cases in similar magnitudes in vignette-adjusted and unadjusted models. These results appear to be quite robust across many contexts. Unfortunately, the scale adjustments identified in this study are not enough to solve the problem of scale differences going forward; we cannot simply ask for life satisfaction ratings and then scale women’s responses down a little. Scale differences clearly vary geographically, as demonstrated here, and they likely vary over time as well. It is not trivial to add anchoring vignettes to a survey – respondents incur an effort cost for each additional question, and survey designers have a difficult task of creating vignettes that will result 30 in both response consistency and vignette equivalence. As shown in Section 5.3.2, vignette equiva- lence is not always easy to achieve. Here I’ve shown that looking only at self-reports leads to a misunderstanding of gender differences. Any comparison between groups – residents of different countries, people with different ethnic backgrounds, etc. – may suffer from a similar bias. Unfortunately, we cannot know what scale differences exist without including anchoring vignettes in surveys. Even with excellent vignettes, we cannot distinguish whether they are capturing the fact that heterogeneous groups use the scale differently, or whether they have different standards, as discussed in section 7.1. Continuing research into both the determinants and measurement of subjective well-being is crucial before such measures are used in policymaking. 31 References Angelini, V., Cavapozzi, D., Corazzini, L., and Paccagnella, O. (2012). Age, Health and Life Satisfaction Among Older Europeans. Social Indicators Research, 105(2):293–308. Arrossa, M. L. and Gandelman, N. (2016). Happiness Decomposition: Female Optimism. Journal of Happiness Studies, 17(2):731–756. Bago d’Uva, T., van Doorslaer, E., Lindeboom, M., and O’Donnell, O. (2008). Does reporting heterogeneity bias the measurement of health disparities? Health Economics, 17(3):351–375. Blinder, A. S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources, 8(4):436–455. Callegaro, M. and Disogra, C. (2008). Computing response metrics for online panels. Public Opinion Quarterly, 72(5):1008–1032. Cantril, H. (1965). The Pattern of Human Concerns. Rutgers University Press, New Brunswick. de Neve, J.-e., Diener, E., Tay, L., and Xuereb, C. (2013). The objective benefits of subjective well-being. CEP Discussion Paper No 1236, (1236):1–35. Deaton, A. (2011). Comment on ”Work Disability, Work, and Justification Bias in Europe and the United States”. In Wise, D. A., editor, Explorations in the Economics of Aging, pages 312–314. University of Chicago Press, Chicago. Diener, E., Inglehart, R., and Tay, L. (2013). Theory and Validity of Life Satisfaction Scales. Social Indicators Research, 112(3):497–527. D’Uva, T. B., Lindeboom, M., O’Donnell, O., and van Doorslaer, E. (2011). Slipping Anchor? Testing the Vignettes Approach to Identification and Correction of Reporting Heterogeneity. The Journal of Human Resources, 46(4):875–906. Easterlin, R. A. (2003). Happiness of Women and Men in Later Life: Nature Determinants and Prospects. In Advances in Quality-of-Life Theory and Research, volume 20, pages 13–26. Springer Netherlands. Ferrer-i Carbonell, A. and Frijters, P. (2004). How Important is Methodology for the Estimate of the Determinants of Happiness? The Economic Journal, 114(497):641–659. Graham, C. and Chattopadhyay, S. (2012). Human Capital and Economic Opportunity: Gender and Well-Being around the World: Some Insights from the Economics of Happiness. Brookings Institution. Grol-prokopczyk, H., Freese, J., and Hauser, R. M. (2011). Using Anchoring Vignettes to As- sess Group Differences in General Self-Rated Health. Journal of Health and Social Behavior, 52(2):246–261. Grzymala-Busse, A. (2007). Party competition and state exploitation in post-communist democra- cies. Cambridge University Press. Helliwell, J., Layard, R., and Sachs, J., editors (2015). 2015 World Happiness Report. United Nations, New York. 32 Herbst, C. M. (2011). ‘Paradoxical’ decline? Another look at the relative reduction in female happiness. Journal of Economic Psychology, 32(5):773–788. Kahneman, D. and Deaton, A. (2010). High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences, 107(38):16489–16493. Kapteyn, A., Smith, J. P., and Van Soest, A. (2007). Vignettes and Self-Report of Work Disability in the United States and the Netherlands. American Economic Review, 97(1):461–473. Kapteyn, A., Smith, J. P., and Van Soest, A. (2010). Life Satisfaction. In Diener, E., Kahneman, D., and Helliwell, J., editors, International Differences in Well-Being, chapter 4, pages 70–104. Oxford University Press, Oxford, UK. King, G., Murray, C. J. L., Salomon, J. A., and Tandon, A. (2004). Enchancing the Validty and Cross-Cultural Comparability of Measurement in Survey Research. American Political Science Review, 98(1):191–207. Kristensen, N. and Johansson, E. (2008). New evidence on cross-country differences in job satis- faction using anchoring vignettes. Labour Economics, 15(1):96–117. Layard, R. (2005). Rethinking Public Economics: The Implications of Rivalry and Habit. Eco- nomics and Happiness: Framing the Analysis, March. Layard, R. and O’Donnell, G. (2015). How to Make Policy when Happiness is the Goal. In 2015 World Happiness Report, chapter 4, pages 76–85. United Nations. Lima, S. V. (2011). A Cross-Country Investigation of the Determinants of the Happiness Gender Gap. Mimeo. Macaskill, A., Maltby, J., and Day, L. (2002). Forgiveness of Self and Others and Emotional Empathy. The Journal of Social Psychology, 142(5):663–665. Meisenberg, G. and Woodley, M. A. (2015). Gender Differences in Subjective Well-Being and Their Relationships with Gender Equality. Journal of Happiness Studies, 16(6):1539–1555. Molina, T. (2016). Reporting Heterogeneity and Health Disparities Across Gender and Education Levels: Evidence From Four Countries. Demography, 53(2):295–323. Murray, C. J., Ozaltin, E., Tandon, A., Saloman, J. A., Sadana, R., and Chatterji, S. (2003). Empirical Evalation of the Anchoring Vignette Approach in Health Surveys. In Health Systems Performance Assessment: Debates, Methods and Empiricism, pages 369–399. Nolen-Hoeksema, S. and Rusting, C. L. (1999). Gender differences in well-being. In Well-being: The foundations of hedonic psychology, pages 330–350. Oaxaca, R. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Eco- nomic Review, 14(3):693–709. Oaxaca, R. L. and Ransom, M. R. (1994). On discrimination and the decomposition of wage differentials. Journal of Econometrics, 61(1):5–21. Plagnol, A. C. and Easterlin, R. A. (2008). Aspirations, attainments, and satisfaction: Life cycle differences between American women and men. Journal of Happiness Studies, 9(4):601–619. 33 Stevenson, B. and Wolfers, J. (2009). The Paradox Of Declining Female Happiness. American Economic Journal: Economic Policy, 1(2):190–228. Stiglitz, J., Sen, A., and Fitoussi, J. P. (2009). Report by Stiglitz Commission on the Measurement of Economic Performance and Social Progress. UN Women (2015). Progress of the world’s women 2015-2016: Transforming economies, realizing rights. Technical report, New York. Van Soest, A., Delaney, L., Harmon, C., Kapteyn, A., and Smith, J. P. (2011). Validating the use of anchoring vignettes for the correction of response scale differences in subjective questions. Journal of the Royal Statistical Society. Series A: Statistics in Society, 174(3):575–595. Yun, M. S. (2005). A simple solution to the identification problem in detailed wage decompositions. Economic Inquiry, 43(4):766–772. Zweig, J. S. (2014). Are Women Happier than Men? Evidence from the Gallup World Poll. Journal of Happiness Studies, 16(2):515–541. 34 Tables and figures Table 1: Summary statistics Overall Female Male Difference Std. Err. Life satisfaction 2.966 2.980 2.950 0.030*** 0.010 Equivalized household income 7,058.70 6,397.51 7,838.88 -1,441.37*** 115.93 Primary education or less 34.8% 37.6% 31.4% 6.2%*** 0.4% Secondary education 50.7% 49.1% 52.7% -3.6%*** 0.5% Tertiary education & up 14.5% 13.3% 15.9% -2.5%*** 0.3% Single/never married 27.2% 23.6% 31.4% -7.8%*** 0.4% Married/domestic partner 60.3% 59.6% 61.1% -1.6%*** 0.5% Separated/divorced/widowed 12.5% 16.9% 7.5% 9.4%*** 0.3% Health problems 23.9% 25.9% 21.4% 4.5%*** 0.4% Unemployed 6.7% 6.8% 6.6% 0.2% 0.2% N 45,731 24,753 20,978 3,775 - 35 Table 2: Ordered probit regression (1) (2) Female 0.028** 0.087*** (0.014) (0.015) Age -0.012*** -0.015*** (0.002) (0.002) Age 2 0.000*** 0.000*** (0.000) (0.000) Urban 0.054*** (0.021) Married 0.050*** (0.019) Separated -0.120*** (0.036) Divorced -0.191*** (0.034) Widowed -0.127*** (0.032) Domestic partner -0.042 (0.032) Employed full time for self 0.008 (0.018) Employed part time do not want full time 0.029 (0.032) Unemployed -0.210*** (0.030) Employed part time want full time -0.080*** (0.029) Out of workforce -0.006 (0.024) Secondary education 0.147*** (0.021) Tertiary education 0.302*** (0.029) Health problems -0.246*** (0.032) Log equivalized household income 0.236*** (0.015) Observations 45,731 45,731 Pseudo R-squared 0.082 0.108 Ordered probit regression of life satisfaction on listed variables as well as country fixed effects (103 countries included). 36 Table 3: Ordered probit marginal effects Characteristic Comparison Life sat. Change x(gender eff.) Income at 10th percentile - 2.746 - - Income at 20th percentile 10th percentile 2.826 0.080 1.05 Income at 30th percentile 20th percentile 2.881 0.055 0.72 Income at 40th percentile 30th percentile 2.927 0.047 0.61 Income at 50th percentile 40th percentile 2.970 0.043 0.57 Income at 60th percentile 50th percentile 3.011 0.041 0.54 Income at 70th percentile 60th percentile 3.056 0.045 0.59 Income at 80th percentile 70th percentile 3.110 0.053 0.70 Income at 90th percentile 80th percentile 3.183 0.074 0.97 Low education - 2.862 - - Medium education Low 2.992 0.129 1.70 High education Low 3.127 0.264 3.47 No health problems - 3.017 - - Health problems No health problems 2.802 -0.215 -2.83 Single - 2.960 - - Married Single 3.003 0.043 0.57 Separated Single 2.855 -0.104 -1.37 Divorced Single 2.793 -0.167 -2.19 Widowed Single 2.849 -0.111 -1.45 Domestic partner Single 2.923 -0.036 -0.48 Employed full time for employer - 2.982 - - Employed full time for self Employed full time for emp. 2.989 0.007 0.09 Employed part time, don’t want full time Employed full time for emp. 3.008 0.026 0.34 Unemployed Employed full time for emp. 2.799 -0.183 -2.40 Employed part time, want full time Employed full time for emp. 2.912 -0.070 -0.92 Out of the workforce Employed full time for emp. 2.977 -0.005 -0.07 Rural - 2.947 - - Urban Rural 2.994 0.047 0.62 Male - 2.925 - - Female Male 3.001 0.076 1.00 Full sample (N = 45,731). Marginal effects of changing various characteristics, based on ordered probit regression coefficients in Table 2, column 2. Income percentiles are from respondent’s own country. “Life sat.” shows estimated life satisfaction with that characteristic, all other charac- teristics held constant. “x(gender eff.)” compares the magnitude of that marginal effect with the gender effect in the last row. Low education is elementary or less, medium education is secondary up to 3 years tertiary, high education is 4 years tertiary or more. 37 Table 4: Ordered probit regressions, by region Region N Coef. Std. err. p-value Marg. eff. Transition economies 7,683 0.052** 0.026 0.041 0.045 Latin America & the Caribbean 4,791 0.052 0.034 0.119 0.049 East Asia 3,551 0.066* 0.038 0.082 0.054 Sub-Saharan Africa 8,677 0.075*** 0.024 0.002 0.066 Central America 2,503 0.077* 0.045 0.089 0.080 Southeast Asia 3,459 0.104*** 0.039 0.007 0.089 Western Europe 3,215 0.115*** 0.039 0.004 0.091 South Asia 4,579 0.153*** 0.040 0.000 0.132 Aus/NZ/US/Can 1,655 0.213*** 0.056 0.000 0.160 Middle East & North Africa 5,618 0.163*** 0.032 0.000 0.161 Ordered probit regressions of life satisfaction on female, age, age 2 , urban indicator, marital status indicators, employment status indicators, education level indicators, health problems indicator, and log equivalized household income. Coefficient on female indicator. Table 5: Vignette ratings, by gender Vignette M/F N F avg M avg Difference p-value Result Vignette A1 F 45,411 2.524 2.493 0.031*** 0.000 Women higher Vignette A2 M 45,518 3.474 3.422 0.052*** 0.000 Women higher Vignette A3 M 45,435 2.325 2.286 0.039*** 0.000 Women higher Vignette A4 F 45,454 2.915 2.886 0.029*** 0.002 Women higher Vignette A5 F 45,426 2.273 2.251 0.022*** 0.002 Women higher Vignette A6 M 45,403 3.572 3.513 0.059*** 0.000 Women higher t-tests comparing men’s and women’s average ratings. Result column is based on significance at the 10% level. Table 6: Feelings yesterday, by gender N F avg M avg Diff p-value Result Pain 45,542 0.315 0.274 0.040*** 0.000 Women more Worry 45,401 0.374 0.341 0.033*** 0.000 Women more Sadness 45,369 0.238 0.193 0.046*** 0.000 Women more Stress 45,300 0.323 0.312 0.012*** 0.004 Women more Anger 45,362 0.199 0.190 0.008** 0.013 Women more Enjoyment 44,770 0.716 0.719 -0.003 0.771 Neither Well rested 45,415 0.680 0.693 -0.013*** 0.001 Men more Treated with respect 44,808 0.870 0.863 0.007** 0.014 Women more Learned something 44,808 0.502 0.528 -0.026*** 0.000 Men more Smiled or laughed 44,935 0.739 0.725 0.015*** 0.000 Women more Yes/no responses to “Did you experience the following feelings during a lot of yesterday? How about ...?” 38 Table 7: Linear Blinder-Oaxaca decomposition of gender life satisfaction gap Women average 2.980 Overall gap 0.030 100% Men average 2.950 Explained -0.076 -253% Observations 45,731 Total unexplained 0.105 350% Observed coefficients 0.025 83% Group membership 0.080 267% Explained Unexplained Urban 0.001 3% 0.016 53% (0.001) (0.010) Age 0.005 17% 0.140** 467% (0.004) (0.065) Age 2 -0.003 -10% -0.103** -343% (0.004) (0.041) Single -0.004** -13% -0.001 -3% (0.002) (0.010) Married -0.001 -3% -0.010 -33% (0.001) (0.015) Separated 0.000 0% 0.003** 10% (0.000) (0.001) Divorced -0.003*** -10% -0.001 -3% (0.001) (0.002) Widowed -0.010*** -33% -0.002 -7% (0.002) (0.002) Domestic partner 0.002* 7% -0.001 -3% (0.001) (0.002) Employed full time for employer -0.009*** -30% -0.023*** -77% (0.003) (0.007) Unemployed -0.000 0% 0.005** 17% (0.001) (0.002) Self employed -0.004** -13% -0.000 0% (0.002) (0.005) Employed part time (don’t want full time) 0.000 0% -0.002 -7% (0.000) (0.003) Employed part time (want full time) 0.000 0% -0.002 -7% (0.000) (0.002) Out of the workforce 0.005 17% 0.020** 67% (0.004) (0.008) Elementary education or less -0.009*** -30% -0.008 -27% (0.002) (0.007) Secondary education -0.001* -3% 0.008 27% (0.000) (0.007) Tertiary education -0.003*** -10% 0.001 3% (0.001) (0.003) Health problems -0.010*** -33% -0.009 -30% (0.002) (0.007) Log equivalized household income -0.031*** -103% -0.005 -17% (0.011) (0.087) Constant (group membership) 0.080 267% (0.093) Decomposes the life satisfaction gap into the “explained” (characteristics) portion and the “unex- plained” (coefficients) portions. The explained column shows the amount of the gap (value and percentage of the overall gap) explained by men and women’s different characteristic – a negative value means the gap increases. For example, women have lower education levels, so they should have lower life satisfaction than men. The unexplained column shows the amount of the gap explained by how men and women differentially value those characteristics. For example, women get a smaller reduction from being unemployed than men do. The last line, group membership, is the remaining gap that is attributable only to gender. 39 Figure 1: Illustration of differential item functioning (DIF). Adapted from a similar figure in Kapteyn et al. (2007). Table 8: Benefit of the Doubt (Spearman) Rank Order Correlation Coefficient Vignette set A Vignette set B Median BDROCC 0.829 0.543 Mean BDROCC 0.650 0.463 % perfect 13.9% 5.4% % near perfect 58.5% 32.8% Near perfect means above a correlation value that allows at most one double-rank inversion (includes one single-rank inversion and two single-rank inversions). 40 Table 9: Vignette equivalence testing: fffect of gender on vignette ratings Both vignette sets Vignette set A only Variables included Female & age only All Female & age only All Coef. Std. err. Coef. Std. err. Coef. Std. err. Coef. Std. err. Vignette A1 - - - - - - - - Vignette A2 0.003 (0.013) 0.012 (0.016) 0.002 (0.015) 0.014 (0.017) Vignette A3 0.021 (0.013) 0.005 (0.015) 0.024 (0.014) 0.006 (0.016) Vignette A4 -0.001 (0.012) -0.001 (0.014) -0.002 (0.012) -0.001 (0.015) Vignette A5 0.004 (0.013) 0.000 (0.014) 0.005 (0.014) 0.001 (0.015) Vignette A6 0.003 (0.015) 0.021 (0.017) 0.002 (0.016) 0.024 (0.018) Vignette B1 0.002 (0.014) 0.009 (0.015) - - - - Vignette B2 0.038** (0.017) 0.031 (0.019) - - - - Vignette B3 0.026* (0.015) 0.006 (0.017) - - - - Vignette B4 -0.032** (0.015) -0.043** (0.017) - - - - Vignette B5 -0.045*** (0.015) -0.064*** (0.016) - - - - Vignette B6 -0.014 (0.014) -0.033** (0.017) - - - - 41 Table 10: HOPIT regression Life satisfaction equation τ 1 equation (1) (2) (1) (2) Female -0.019 0.043*** -0.024** -0.032** (0.015) (0.015) (0.012) (0.013) Age -0.006*** -0.012*** 0.006*** 0.005*** (0.002) (0.002) (0.001) (0.001) Age 2 0.000 0.000*** -0.000*** -0.000*** (0.000) (0.000) (0.000) (0.000) Urban 0.057*** 0.020 (0.020) (0.020) Married 0.103*** 0.031 (0.025) (0.022) Separated -0.124*** 0.035 (0.042) (0.054) Divorced -0.158*** 0.062* (0.042) (0.035) Widowed -0.107*** 0.041 (0.037) (0.031) Domestic partner -0.015 0.046 (0.039) (0.031) Employed full time for self -0.001 0.006 (0.022) (0.024) Employed part time do not want full time 0.017 -0.034 (0.028) (0.027) Unemployed -0.236*** 0.014 (0.032) (0.024) Employed part time want full time -0.113*** -0.007 (0.026) (0.030) Out of workforce -0.014 -0.011 (0.023) (0.017) Secondary education 0.158*** -0.042** (0.021) (0.017) Tertiary education 0.376*** -0.088*** (0.034) (0.025) Health problems -0.238*** 0.075*** (0.029) (0.018) Log equivalized household income 0.284*** 0.015 (0.017) (0.012) Observations 45,731 45,731 45,731 45,731 HOPIT regressions of life satisfaction on listed variables as well as country fixed effects (103 countries included). The left two columns show the coefficients on the listed variables in the life satisfaction equation, the right two columns show the coefficients on the listed variables in the equation for τ 1 , the threshold between the lowest response category and the second-lowest. See appendix for the other threshold equations. 42 Table 11: HOPIT vs. ordered probit marginal effects HOPIT (vignette-adjusted) Ordered probit (unadjusted) Characteristic Comparison LS Change x(gen. eff.) LS Change x(gen. eff.) Income at 10th percentile - 2.706 - - 2.746 - - Income at 20th percentile 10th pctile 2.799 0.093 2.54 2.826 0.080 1.05 Income at 30th percentile 20th pctile 2.862 0.064 1.75 2.881 0.055 0.72 Income at 40th percentile 30th pctile 2.917 0.054 1.48 2.927 0.047 0.61 Income at 50th percentile 40th pctile 2.967 0.051 1.39 2.970 0.043 0.57 Income at 60th percentile 50th pctile 3.015 0.048 1.31 3.011 0.041 0.54 Income at 70th percentile 60th pctile 3.068 0.053 1.45 3.056 0.045 0.59 Income at 80th percentile 70th pctile 3.131 0.063 1.73 3.110 0.053 0.70 Income at 90th percentile 80th pctile 3.218 0.087 2.39 3.183 0.074 0.97 Low education - 2.847 - - 2.862 - - Medium education Low 2.981 0.134 3.67 2.992 0.129 1.70 High education Low 3.166 0.320 8.76 3.127 0.264 3.47 No health problems - 3.008 - - 3.017 - - Health problems No health probs 2.807 -0.201 -5.51 2.802 -0.215 -2.83 Single - 2.926 - - 2.960 - - Married Single 3.013 0.087 2.38 3.003 0.043 0.57 Separated Single 2.822 -0.104 -2.86 2.855 -0.104 -1.37 Divorced Single 2.794 -0.132 -3.63 2.793 -0.167 -2.19 Widowed Single 2.837 -0.090 -2.45 2.849 -0.111 -1.45 Domestic partner Single 2.913 -0.013 -0.35 2.923 -0.036 -0.48 Employed FT for employer - 2.983 - - 2.982 - - Employed FT for self Employed FT for emp. 2.982 -0.001 -0.03 2.989 0.007 0.09 Employed PT, don’t want FT Employed FT for emp. 2.998 0.015 0.40 3.008 0.026 0.34 Unemployed Employed FT for emp. 2.784 -0.199 -5.45 2.799 -0.183 -2.40 Employed PT, want FT Employed FT for emp. 2.888 -0.095 -2.60 2.912 -0.070 -0.92 Out of the workforce Employed FT for emp. 2.971 -0.012 -0.32 2.977 -0.005 -0.07 Rural - 2.939 - - 2.947 - - Urban Rural 2.987 0.048 1.31 2.994 0.047 0.62 Male - 2.939 - - 2.925 - - Female Male 2.976 0.036 1.00 3.001 0.076 1.00 Full sample (N = 45,731). Marginal effects of changing various characteristics, based on HOPIT regression coefficients in Table 10 column 2 and on ordered probit coefficients in Table 2 column 2. Income percentiles are from respondent’s own country. “LS” shows estimated life satisfaction with that characteristic, all other characteristics held constant. “x(gen. eff.)” compares the magnitude of that marginal effect with the gender effect in the last row. Low education is elementary or less, medium education is secondary up to 3 years tertiary, high education is 4 years tertiary or more. 43 Table 12: Simulated life satisfaction with own scales and men’s scales, global sample (1) (2) Own response thresholds Female average 2.972 2.972 Male average 2.945 2.945 Female - male gap 0.027*** 0.027*** Female Pr(Life sat. = 1) 9.0% 9.1% Female Pr(Life sat. = 2) 24.7% 24.4% Female Pr(Life sat. = 3) 34.9% 35.0% Female Pr(Life sat. = 4) 22.8% 23.3% Female Pr(Life sat. = 5) 8.6% 8.2% Male Pr(Life sat. = 1) 9.2% 9.3% Male Pr(Life sat. = 2) 25.5% 25.2% Male Pr(Life sat. = 3) 34.6% 34.8% Male Pr(Life sat. = 4) 22.7% 23.2% Male Pr(Life sat. = 5) 7.9% 7.5% Men’s response thresholds Female average 2.933 2.931 Male average 2.945 2.945 Female - male gap -0.012*** -0.013*** Female Pr(Life sat. = 1) 9.3% 9.6% Female Pr(Life sat. = 2) 25.7% 25.3% Female Pr(Life sat. = 3) 35.0% 34.9% Female Pr(Life sat. = 4) 22.5% 22.9% Female Pr(Life sat. = 5) 7.6% 7.4% % of gap explained 145% 149% Simulated based on coefficients in a global HOPIT model. Specifications match the columns in Table 10. Signifi- cance stars indicate the results of t-tests of average life satisfaction by gender in each panel. 44 Table 13: Simulated life satisfaction with own thresholds and men’s thresholds, by country Country N m N f Gap own Gap m Avg. adj. Marg. eff. Albania 189 229 0.181*** -0.033 -0.213 0.043 Argentina 131 281 -0.128*** -0.131*** -0.003 -0.080 Armenia 152 251 0.069* 0.102** 0.033 0.111 Australia 194 224 0.104*** 0.054** -0.050 0.042 Austria 222 207 -0.047* -0.008 0.040 0.036 Azerbaijan 196 168 0.086** -0.013 -0.099 0.096 Bahrain 339 175 -0.065*** -0.171*** -0.106 -0.160 Bangladesh 222 233 0.169*** 0.157*** -0.012 0.243 Belarus 120 267 -0.329*** -0.308*** 0.020 -0.150 Benin 217 167 0.160*** 0.051* -0.109 0.132 Bolivia 184 267 -0.020 -0.070*** -0.051 0.069 Bosnia & Herzegovina 169 197 0.003 -0.049 -0.052 0.057 Botswana 175 184 -0.209*** -0.275*** -0.067 -0.074 Brazil 176 299 -0.082*** -0.089*** -0.007 -0.038 Bulgaria 146 280 0.070** 0.119*** 0.049 0.092 Cambodia 150 329 -0.218*** -0.191*** 0.027 -0.101 Cameroon 247 173 0.150*** -0.019 -0.168 0.024 Canada 213 192 0.146*** 0.024 -0.122 0.063 Central Afr. Republic 209 260 -0.027 0.026 0.053 0.056 Chile 203 227 -0.114*** -0.191*** -0.077 -0.114 China 804 1141 -0.093*** -0.037** 0.056 0.011 Colombia 129 294 -0.166*** -0.213*** -0.048 -0.143 Costa Rica 153 186 0.208** 0.097*** -0.110 0.155 Croatia 181 162 -0.003*** 0.001*** 0.004 -0.142 Czech Republic 173 189 -0.098 -0.157 -0.059 -0.205 Dem. Rep. of the Congo 220 137 -0.062** -0.090*** -0.028 0.032 Dominican Republic 163 281 0.030 -0.014 -0.044 -0.012 El Salvador 176 217 -0.060* -0.044 0.015 -0.011 Ethiopia 146 340 0.072** 0.075** 0.003 0.069 France 157 206 -0.077** -0.174*** -0.098 -0.065 Germany 335 345 -0.024 -0.210*** -0.186 -0.114 Ghana 91 109 -0.057 0.095* 0.153 0.218 Greece 164 184 0.133*** 0.104** -0.029 0.174 Guatemala 222 207 -0.198*** -0.191*** 0.008 -0.053 Haiti 70 67 -0.179*** -0.275*** -0.095 -0.233 Honduras 216 246 0.093*** 0.011 -0.082 0.037 Hungary 147 225 -0.168*** -0.294*** -0.127 -0.048 India 1182 1161 0.032** -0.080*** -0.111 0.051 Indonesia 221 204 -0.030 0.024 0.054 0.021 Iran 203 194 0.246*** 0.260*** 0.015 0.161 Iraq 258 193 0.146*** 0.145*** 0.000 0.208 Israel 206 202 -0.258*** -0.344*** -0.085 -0.132 45 Italy 86 98 -0.082 -0.076 0.005 -0.050 Japan 173 168 0.040 -0.110*** -0.150 -0.069 Jordan 206 281 0.329*** 0.083*** -0.247 0.226 Kazakhstan 164 212 -0.040* 0.067** 0.107 0.028 Kenya 208 242 0.046* 0.083*** 0.038 0.215 Kuwait 312 197 0.320*** -0.005 -0.325 0.058 Kyrgyzstan 202 244 -0.036* -0.021 0.014 -0.012 Laos 139 221 -0.109*** -0.116*** -0.006 -0.017 Liberia 82 76 0.007 0.093 0.086 0.015 Macedonia 213 212 0.154*** 0.175*** 0.020 0.198 Madagascar 168 330 -0.019 -0.018 0.002 -0.035 Malawi 157 335 -0.053* -0.180*** -0.128 -0.176 Malaysia 279 198 0.197*** 0.134*** -0.063 0.227 Mauritania 207 146 -0.163*** -0.192*** -0.029 -0.030 Mexico 202 207 -0.080** -0.096*** -0.015 -0.078 Moldova 174 228 -0.080*** 0.016 0.096 0.074 Mongolia 226 248 0.229*** 0.131*** -0.098 0.161 Morocco 110 108 0.097* 0.107** 0.010 0.285 Myanmar 201 276 0.025 -0.143*** -0.168 -0.075 Namibia 130 186 0.109** -0.007 -0.116 0.220 Nepal 170 232 0.097*** 0.103*** 0.005 0.040 New Zealand 166 242 -0.041 -0.130*** -0.090 -0.068 Nicaragua 231 184 -0.161*** -0.155*** 0.006 -0.081 Nigeria 223 141 0.002 0.038 0.036 0.127 Pakistan 278 157 0.438*** 0.707*** 0.270 0.592 Palestine 123 334 0.331*** 0.223*** -0.108 0.242 Panama 202 263 0.083** 0.119*** 0.036 0.193 Paraguay 171 271 0.083** 0.058* -0.025 0.123 Peru 197 255 -0.092*** -0.085*** 0.007 -0.014 Philippines 188 276 0.219*** 0.109*** -0.110 -0.015 Poland 148 193 0.018 -0.029 -0.047 0.091 Portugal 188 236 0.004 -0.191*** -0.195 -0.138 Russia 160 362 0.076** 0.026 -0.049 0.074 Rwanda 190 236 -0.043 -0.018 0.025 0.080 Saudi Arabia 319 140 0.355*** -0.050** -0.405 -0.059 Slovakia 161 240 0.116** 0.049 -0.066 0.077 Slovenia 209 260 0.066* -0.005 -0.071 0.084 South Africa 233 259 0.050* 0.118*** 0.069 0.169 South Korea 180 223 -0.083** -0.224*** -0.141 -0.097 Spain 177 233 0.062** -0.051** -0.113 0.057 Sri Lanka 191 278 0.144*** 0.017 -0.127 -0.033 Sudan 202 119 0.176*** 0.045 -0.130 0.144 Syria 219 241 -0.092*** -0.069*** 0.023 -0.095 Taiwan 173 215 0.045 0.041 -0.004 0.178 Tajikistan 189 246 -0.046* -0.060** -0.014 0.037 Tanzania 210 280 -0.031 0.012 0.044 0.057 46 Thailand 160 330 0.166*** 0.142*** -0.023 0.040 Turkey 183 271 -0.056** -0.019 0.036 -0.038 Uganda 134 160 0.004 0.012 0.008 -0.006 United Arab Emirates 191 208 0.054*** -0.193*** -0.247 -0.238 United Kingdom 171 206 0.123*** 0.077** -0.046 0.085 United States 219 205 0.205*** 0.116*** -0.089 0.224 Uruguay 135 239 0.085*** 0.074** -0.011 0.070 Uzbekistan 183 242 0.027 -0.088*** -0.115 0.063 Venezuela 123 219 0.157*** 0.160*** 0.003 0.250 Vietnam 122 165 -0.026 -0.034 -0.008 0.042 Yemen 214 191 0.212*** -0.276*** -0.488 -0.292 Zambia 205 242 0.039 0.093** 0.053 0.186 Zimbabwe 183 239 0.127*** 0.143*** 0.016 0.265 Overall 20,431 24,346 0.020*** -0.026*** -0.046 0.038 Simulated based on coefficients in HOPIT models run separately for each country, with the same specification as in Table 10. N m and N f are observation counts for men and women respectively. Gap own is the gender life satisfaction gap when men and women use their own thresholds, Gap m is the gender life satisfaction gap when both men and women use men’s thresholds. Avg. Adj. is the average adjustment in women’s life satisfaction when women in that country move from their own thresholds to men’s thresholds. Marg. eff. is the marginal gender effect in that country. 47 Table 14: Gap own and Gap m by region N Gap own Gap m Avg. adj. Latin America & the Caribbean 4,791 -0.027** -0.053*** -0.027*** East Asia 3,551 -0.026** -0.035** -0.009*** Sub-Saharan Africa 8,198 -0.013* -0.023** -0.009*** Western Europe 3,215 0.000 -0.095*** -0.095*** Transition countries 7,683 0.000 -0.029*** -0.029*** Southeast Asia 3,459 0.005 -0.035** -0.040*** Central America 2,503 0.018 -0.001 -0.019*** Middle East & North Africa 5,618 0.022 -0.120*** -0.142*** South Asia 4,104 0.080*** 0.020* -0.060*** Australia/New Zealand/Canada/USA 1,655 0.108*** 0.022 -0.086*** Significance stars indicate the results of t-tests of Gap own and Gap m by gender. Table 15: Gap own Gap m , and GDP per capita (1) (2) Log GDP per capita 0.003 0.322* (0.186) (0.191) Log GDP per capita squared 0.001 -0.019* (0.010) (0.011) Constant -0.039 -1.328 (0.821) (0.844) Observations 96 96 R-squared 0.010 0.055 OLS regressions of Gap own (column 1) and Gap m (column 2). 48 Figure 2: Relationship between life satisfaction gender gaps and GDP per capita Quadratic regressions of Gap own (left) and Gap m on log GDP per capita. See Table 15. 49 Table 16: Gap own and Gap m by various measures of gender equality Gap own Gap m N (1) (2) Percent of parliament made up of women 99 -0.002* 0.000 (0.001) (0.001) SIGI: Disciminatory family code 99 0.155*** 0.108* (0.056) (0.06) SIGI: Restricted physical integrity 79 0.041 0.015 (0.07) (0.076) SIGI: Son bias 82 0.211*** 0.211*** (0.057) (0.064) SIGI: Restricted resources and assets 100 0.109** 0.113** (0.051) (0.054) SIGI: Restricted civil liberties 100 0.070 -0.024 (0.054) (0.058) SIGI: Overall 72 0.260** 0.123 (0.111) (0.124) Linear regressions of Gap own Gap m on measures of gender equality; see Section 6.1. Table 17: Gap own and Gap m by respondent’s religion N Gap own Gap m Avg. adj. Judaism 373 -0.272*** -0.355*** -0.082 Blank 2,927 -0.142*** -0.113*** 0.029 Buddhism 2,877 0.022 -0.038** -0.060 Christianity 21,755 0.023*** -0.008 -0.030 Hinduism 2,433 0.030* -0.057*** -0.087 Other 2,901 0.040** -0.033** -0.072 Islam 10,849 0.067*** -0.014* -0.082 Secular 662 0.105** 0.070* -0.035 Significance stars indicate the results of t-tests of Gap own and Gap m by gender. 50 Table 18: Scale adjustment by country’s main religion N Gap own Gap m Avg. adj. Judaism 408 -0.258*** -0.344*** -0.085 Unaffiliated 2,648 -0.103*** -0.085*** 0.018 Christianity 24,645 0.008 -0.024*** -0.032 Hinduism 2,745 0.028* -0.064*** -0.092 NoMain 1,442 0.045** 0.006 -0.038 Islam 10,140 0.071*** -0.014* -0.086 Buddhism 2,749 0.078*** 0.015 -0.063 A religion in a country is the “main” religion if more than 50 % of the country identifies with that religion. Significance stars indicate the results of t-tests of Gap own and Gap m by gender. 51 Appendix Table A1: Benefit of the doubt Spearman rank order correlation coefficients (BDROCCs), by coun- try, for each set of vignettes. See section 5.3.1. Table A2: Coefficients in the remaining threshold equations not reporting in Table 10. Tables A3 and A4: Replicate Table 10 (HOPIT model coefficients) and 12 (simulated life satisfaction with own and men’s scales) with all non-representative samples removed, as discussed in the Data section. 52 Table A1: Benefit of the doubt Spearman rank order correlation coefficients A vignette set B vignette set Median % near Median % near Country BDROCC % perfect perfect Country BDROCC % perfect perfect Chad -0.371 8.5% 29.4% Palestinian Terr. -0.086 0.4% 2.7% Czech Republic 0.257 15.4% 66.6% UAE 0.029 0.2% 7.7% Austria 0.429 20.7% 79.1% Japan 0.086 0.8% 8.4% Cameroon 0.486 14.5% 49.8% Chad 0.314 4.3% 23.1% Japan 0.543 6.6% 61.2% Syria 0.314 1.8% 14.6% Vietnam 0.543 7.2% 32.6% Bangladesh 0.371 2.8% 21.0% Jordan 0.543 13.9% 46.5% El Salvador 0.371 4.3% 24.1% Colombia 0.543 21.6% 80.7% Kuwait 0.371 4.5% 22.3% New Zealand 0.600 10.1% 68.2% Liberia 0.371 6.3% 26.3% Central Afr. Rep. 0.600 9.6% 43.3% South Korea 0.371 3.2% 24.2% Myanmar 0.600 24.7% 75.3% Vietnam 0.371 6.0% 23.9% Liberia 0.600 11.7% 39.0% Afghanistan 0.429 1.3% 17.9% Bolivia 0.600 11.6% 67.5% Bahrain 0.429 3.9% 24.4% India 0.600 19.4% 50.4% Canada 0.429 4.3% 23.0% Mongolia 0.600 10.9% 59.7% Central Afr. Rep. 0.429 1.8% 27.9% Nicaragua 0.600 10.3% 46.8% Iraq 0.429 4.7% 23.7% Germany 0.657 1.8% 69.5% Israel 0.429 2.0% 20.5% Bangladesh 0.657 7.9% 39.7% Mexico 0.429 5.7% 28.3% Brazil 0.657 12.2% 59.8% Morocco 0.429 4.0% 24.0% Greece 0.657 22.8% 76.8% Saudi Arabia 0.429 5.5% 25.1% Palestinian Terr. 0.657 2.5% 10.4% Spain 0.429 3.7% 26.2% Honduras 0.657 10.8% 44.0% Thailand 0.429 3.4% 24.2% France 0.657 21.1% 79.1% United Kingdom 0.429 3.1% 24.6% Indonesia 0.657 16.0% 65.9% Yemen 0.429 3.0% 24.1% Thailand 0.714 11.2% 57.3% Uganda 0.457 4.3% 24.7% Malawi 0.714 19.0% 60.0% Australia 0.486 3.1% 25.1% Uganda 0.714 9.5% 43.5% Germany 0.486 1.9% 23.2% Cambodia 0.714 12.6% 54.2% Ghana 0.486 3.5% 28.8% Croatia 0.714 15.9% 68.0% Haiti 0.486 7.4% 29.7% UAE 0.771 1.0% 7.2% Kenya 0.486 5.0% 31.0% Australia 0.771 9.6% 69.3% Macedonia 0.486 5.0% 28.0% United Kingdom 0.771 9.8% 72.3% Panama 0.486 5.4% 27.8% United States 0.771 15.0% 79.1% United States 0.486 4.5% 27.8% Italy 0.771 17.4% 79.8% Zambia 0.486 3.2% 26.5% Belarus 0.771 19.8% 71.0% Armenia 0.543 2.9% 26.4% Kenya 0.771 11.8% 40.4% Botswana 0.543 3.2% 29.6% Iran 0.771 14.8% 61.9% Brazil 0.543 3.7% 30.0% Uzbekistan 0.771 14.1% 43.4% Czech Republic 0.543 2.7% 32.1% Ethiopia 0.771 14.5% 60.8% Dominican Republic 0.543 5.4% 32.6% Malaysia 0.771 14.1% 64.0% Guatemala 0.543 5.1% 35.8% 53 Kuwait 0.771 8.1% 51.7% Honduras 0.543 5.9% 30.8% Afghanistan 0.771 11.0% 47.2% Iran 0.543 5.5% 30.0% Mexico 0.771 7.3% 41.3% Jordan 0.543 4.0% 26.3% Saudi Arabia 0.771 13.1% 52.2% Mauritania 0.543 6.5% 33.3% Mauritania 0.771 11.8% 40.7% Nepal 0.543 2.2% 27.4% Slovakia 0.771 16.4% 70.8% New Zealand 0.543 2.9% 32.6% Congo (Kinshasa) 0.771 22.9% 61.0% Nigeria 0.543 2.6% 27.8% Kyrgyzstan 0.771 17.6% 66.1% Pakistan 0.543 2.9% 26.3% Madagascar 0.771 13.8% 56.3% Philippines 0.543 8.2% 38.6% Canada 0.829 16.7% 75.1% Portugal 0.543 7.2% 34.2% Spain 0.829 16.6% 79.9% Senegal 0.543 6.8% 31.0% Russia 0.829 12.8% 58.8% Slovenia 0.543 5.1% 34.5% Syria 0.829 2.2% 20.2% Sri Lanka 0.543 6.8% 31.9% Yemen 0.829 13.1% 51.8% Taiwan 0.543 7.1% 34.4% Panama 0.829 6.6% 38.4% Tanzania 0.543 6.2% 33.7% Kazakhstan 0.829 13.5% 55.9% Uzbekistan 0.543 5.0% 33.3% Zimbabwe 0.829 19.6% 72.9% Venezuela 0.543 6.7% 33.7% Azerbaijan 0.829 22.1% 62.3% Albania 0.600 6.6% 36.2% Bulgaria 0.829 25.0% 81.4% Argentina 0.600 4.9% 37.4% South Africa 0.829 19.8% 62.2% Bolivia 0.600 6.3% 38.0% Iraq 0.829 11.4% 47.5% Cambodia 0.600 3.7% 32.2% Israel 0.829 2.9% 37.8% Cameroon 0.600 5.1% 36.8% Morocco 0.829 6.8% 37.1% Costa Rica 0.600 6.2% 35.0% Haiti 0.829 13.2% 35.1% Ethiopia 0.600 5.8% 32.7% Armenia 0.829 15.8% 54.4% France 0.600 6.3% 34.8% Botswana 0.829 9.9% 46.5% India 0.600 9.3% 36.9% Dominican Republic 0.829 14.3% 65.4% Indonesia 0.600 8.0% 35.5% Pakistan 0.829 6.7% 40.9% Italy 0.600 6.7% 35.8% Philippines 0.829 14.3% 60.1% Kazakhstan 0.600 3.4% 32.8% Portugal 0.829 13.2% 74.6% Malawi 0.600 5.4% 36.4% Senegal 0.829 11.2% 49.0% Moldova 0.600 6.0% 35.0% Sri Lanka 0.829 10.4% 54.2% Mongolia 0.600 5.5% 40.0% Albania 0.829 20.2% 72.5% Namibia 0.600 6.5% 36.0% Costa Rica 0.829 21.3% 72.0% Nicaragua 0.600 3.1% 36.1% Namibia 0.829 10.1% 51.3% Poland 0.600 7.9% 36.9% Poland 0.829 16.1% 73.7% Rwanda 0.600 9.0% 40.4% Rwanda 0.829 15.8% 40.9% Slovakia 0.600 5.0% 35.3% Sudan 0.829 15.2% 62.0% Sudan 0.600 3.8% 30.4% Tajikistan 0.829 10.2% 34.9% Tajikistan 0.600 6.5% 35.6% Uruguay 0.829 15.5% 70.9% Uruguay 0.600 5.7% 37.6% Benin 0.829 22.9% 63.7% Zimbabwe 0.600 1.4% 33.9% Bosnia Herzegovina 0.829 18.8% 75.9% Benin 0.629 6.1% 38.0% Paraguay 0.829 19.0% 60.0% Austria 0.657 7.9% 40.3% Macedonia 0.857 10.2% 53.2% Azerbaijan 0.657 6.8% 35.5% South Korea 0.886 5.9% 48.9% Belarus 0.657 10.9% 43.0% Slovenia 0.886 8.8% 73.0% Bosnia Herzegovina 0.657 4.4% 42.3% 54 Bahrain 0.886 9.3% 54.2% Bulgaria 0.657 6.2% 45.9% Hungary 0.886 20.6% 74.5% China 0.657 7.2% 39.0% El Salvador 0.886 8.9% 38.1% Colombia 0.657 7.3% 39.9% Ghana 0.886 13.8% 51.1% Congo (Kinshasa) 0.657 6.4% 40.7% Zambia 0.886 11.0% 49.3% Greece 0.657 4.9% 35.9% Taiwan 0.886 14.8% 73.8% Kyrgyzstan 0.657 5.7% 43.6% China 0.886 19.5% 60.3% Madagascar 0.657 4.4% 37.3% Nepal 0.886 13.2% 54.6% Malaysia 0.657 8.7% 37.8% Nigeria 0.886 12.0% 52.1% Myanmar 0.657 8.0% 44.3% Tanzania 0.886 15.6% 49.5% Paraguay 0.657 10.3% 41.6% Venezuela 0.886 14.4% 62.3% Peru 0.657 6.8% 44.5% Argentina 0.886 16.1% 73.9% Russia 0.657 6.8% 35.3% Moldova 0.886 11.7% 57.3% South Africa 0.657 4.6% 43.0% Peru 0.886 18.1% 81.7% Turkey 0.657 8.7% 42.9% Turkey 0.886 12.4% 53.4% Chile 0.714 6.4% 45.0% Laos 0.886 21.3% 74.8% Croatia 0.714 7.0% 46.9% Guatemala 0.943 12.5% 60.1% Hungary 0.714 6.8% 46.6% Chile 0.943 20.4% 74.3% Laos 0.771 8.3% 51.1% Near perfect means above a correlation value that allows at most one double-rank inversion (includes one single-rank inversion and two single-rank inversions). 55 Table A2: Coefficients in equations for thresholds past τ 1 in HOPIT specifications ln(τ 2 −τ 1 ) ln(τ 3 −τ 2 ) ln(τ 4 −τ 3 ) (1) (2) (1) (2) (1) (2) Female -0.016** -0.014 -0.003 0.002 -0.032*** -0.022** (0.008) (0.009) (0.008) (0.008) (0.008) (0.009) Age -0.000 -0.000 0.000 0.000 -0.002 -0.001 (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) Age 2 -0.000 0.000 -0.000 0.000 -0.000 -0.000 (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) Urban -0.017 -0.005 0.013 (0.017) (0.011) (0.014) Married 0.015 0.012 -0.006 (0.012) (0.012) (0.012) Separated -0.022 -0.007 -0.053 (0.032) (0.023) (0.034) Divorced 0.008 -0.028 -0.065*** (0.023) (0.018) (0.023) Widowed -0.016 -0.002 -0.019 (0.021) (0.018) (0.022) Domestic partner 0.004 -0.011 -0.045* (0.019) (0.024) (0.023) Employed full time for self 0.000 -0.022* -0.010 (0.015) (0.012) (0.017) Employed part time do not want full time 0.029 -0.019 0.017 (0.020) (0.015) (0.018) Unemployed -0.016 -0.013 -0.031 (0.021) (0.016) (0.021) Employed part time want full time 0.002 -0.026* -0.030 (0.019) (0.015) (0.021) Out of workforce 0.017 -0.009 0.008 (0.017) (0.009) (0.012) Secondary education 0.014 0.046*** 0.072*** (0.012) (0.010) (0.015) Tertiary education 0.075*** 0.056*** 0.165*** (0.017) (0.014) (0.023) Health problems -0.045*** -0.028*** -0.035*** (0.010) (0.011) (0.010) Log equivalized household income 0.007 0.003 0.054*** (0.006) (0.006) (0.008) Effects of individual characteristics on the remaining thresholds. Note that while the coefficients in the right columns of Table 10 are for the first threshold, τ 1 , the coefficients here are for ln(τ j −τ j−1 ). See Section 5. 56 Table A3: HOPIT regression with non-representative sample removed, all equations Life sat. τ 1 ln(τ 2 −τ 1 ) ln(τ 3 −τ 2 ) ln(τ 4 −τ 3 ) (1) (2) (1) (2) (1) (2) (1) (2) (1) (2) Female 0.047*** -0.025 -0.020 -0.015 -0.022** -0.026*** -0.000 -0.005 -0.025** -0.034*** (0.017) (0.016) (0.015) (0.014) (0.010) (0.010) (0.009) (0.009) (0.010) (0.010) Age -0.013*** -0.006*** 0.005*** 0.008*** -0.001 -0.001 -0.000 -0.000 -0.001 -0.002 (0.002) (0.002) (0.002) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) Age 2 0.000*** -0.000 -0.000*** -0.000*** 0.000 0.000 0.000 0.000 -0.000 -0.000 (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) Urban 0.033 0.013 -0.003 -0.016 0.008 (0.025) (0.025) (0.016) (0.013) (0.013) Married 0.121*** 0.034 0.002 0.020 -0.007 (0.022) (0.023) (0.013) (0.014) (0.014) Separated -0.085** 0.050 -0.032 -0.009 -0.049 (0.037) (0.060) (0.036) (0.025) (0.036) Divorced -0.158*** 0.058 -0.017 -0.016 -0.059** (0.046) (0.040) (0.025) (0.020) (0.025) Widowed -0.076* 0.035 -0.019 0.020 -0.009 (0.039) (0.031) (0.024) (0.020) (0.021) Domestic partner -0.019 0.031 0.008 -0.009 -0.039 (0.040) (0.032) (0.022) (0.027) (0.025) Employed full time for self -0.032 0.024 -0.026 -0.021 -0.002 (0.026) (0.029) (0.020) (0.014) (0.020) Employed part time do not want full time 0.017 -0.004 0.004 -0.032* 0.032* (0.030) (0.031) (0.018) (0.017) (0.017) Unemployed -0.255*** 0.025 -0.034* -0.021 -0.016 (0.038) (0.028) (0.019) (0.018) (0.023) Employed part time want full time -0.156*** 0.005 -0.013 -0.031 -0.011 (0.029) (0.031) (0.017) (0.019) (0.022) Out of workforce -0.046* -0.012 0.000 -0.015 0.017 (0.025) (0.022) (0.013) (0.011) (0.014) Secondary education 0.140*** -0.051** 0.016 0.042*** 0.080*** (0.024) (0.021) (0.011) (0.013) (0.018) Tertiary education 0.359*** -0.106*** 0.084*** 0.046*** 0.178*** (0.036) (0.028) (0.018) (0.017) (0.024) Health problems -0.257*** 0.084*** -0.046*** -0.033** -0.035*** (0.027) (0.021) (0.012) (0.014) (0.012) Log equivalized household income 0.300*** 0.027** 0.008 0.001 0.053*** (0.014) (0.013) (0.008) (0.008) (0.008) Observations 33,991 33,991 33,991 33,991 33,991 33,991 33,991 33,991 33,991 33,991 Duplication of Table 10 with non-representative samples removed. The left two columns show the coefficients on the listed variables in the life satisfaction equation, the remaining columns show the coefficients on the listed variables in the threshold equations. The first of these is τ 1 ; the following are forln(τ j −τ j−1 ). See Section 5 57 Table A4: Simulated life satisfaction with own scales and men’s scales, non-representative samples removed (1) (2) Own response thresholds Female average 3.008 3.007 Male average 2.968 2.968 Female - male gap 0.04*** 0.039*** Female Pr(Life sat. = 1) 8.8% 9.0% Female Pr(Life sat. = 2) 23.6% 23.1% Female Pr(Life sat. = 3) 34.7% 34.8% Female Pr(Life sat. = 4) 23.8% 24.5% Female Pr(Life sat. =5) 9.1% 8.7% Male Pr(Life sat. = 1) 9.1% 9.1% Male Pr(Life sat. = 2) 24.8% 24.4% Male Pr(Life sat. = 3) 34.4% 34.6% Male Pr(Life sat. = 4) 23.6% 24.1% Male Pr(Life sat. = 5) 8.1% 7.7% Men’s response thresholds Female average 2.967 2.970 Male average 2.968 2.968 Female - male gap -0.001 0.002 Female Pr(Life sat. = 1) 9.0% 9.2% Female Pr(Life sat. = 2) 24.7% 24.1% Female Pr(Life sat. = 3) 34.8% 34.8% Female Pr(Life sat. = 4) 23.6% 24.1% Female Pr(Life sat. = 5) 7.9% 7.7% % of gap explained 103% 96% Duplication of Table 12 with all non-representative samples removed, as discussed in the Data section. Sig- nificance stars indicate the results of t-tests of average life satisfaction by gender in each panel. 58 59 1 Introduction When economists study subjective well-being, there is a natural tendency to concentrate on the role of income. Indeed the bulk of papers on subjective well-being (SWB from now on) in economics are about the role of money and the question of whether money buys happiness (or life satisfaction; for the purpose of this paper we will use “happiness” and “life satisfaction” interchangeably) and if so, how. A large part of the debate is about whether the effect of income is relative (Can I afford to buy more goods than others in my reference group?) or absolute (My SWB is driven by the abso- lute level of consumption I can afford). To frame this debate, it is important to realize that many factors beyond money may affect SWB. As we will show, ignoring these other factors introduces an omitted variable bias that may seriously affect inference about the role of income in producing SWB. The most powerful studies that have addressed the role of money in SWB are based on international comparisons, starting with Richard Easterlin’s seminal 1974 paper (Easterlin, 1974). In that paper, Easterlin concludes that the contribution of income to self-reported happiness is mainly through its position in the income distribution, invoking Duesenberry’s relative income hypothesis (Due- senberry, 1949), rather than its absolute level. Since this early work, the positive relation between income and SWB in within-country cross-sections has been documented in many studies using a variety of new data sources in hundreds of countries. Such a positive relation between income and SWB within countries is consistent with both a relative and an absolute income hypothesis. If SWB were completely determined by one’s relative position in the income distribution of a coun- try, then one would not expect finding a relation between average SWB in a country and average income in a country (often operationalized by GDP per capita). Indeed Easterlin (1974) did not find a significant relation between GDP per capita and average SWB in the countries he had data on. Since then, vast new datasets have become available and by now a highly significant positive relation between average SWB and GDP per capita has been established in several papers and is widely acknowledged (Deaton, 2008; Stevenson and Wolfers, 2008; Easterlin, 2015). There is considerable debate however, what this positive relation implies. It has been argued that this finding supports the absolute income hypothesis (Stevenson and Wolfers, 2008). In contrast, Easterlin argues in a series of papers that the positive cross sectional correlation between GDP per capita and average SWB reflects factors that are correlated with GDP per capita and have a positive effect on SWB, while GDP itself has no long term positive effect on average SWB. We discuss this in more detail below. This paper will replicate the finding of a positive relation between average SWB and GDP per capita, based on newly available data. However, we will ask what it is that GDP buys (or that is correlated with GDP) that makes people more satisfied in more developed economies, in line with the notion that individual SWB is a function of many more variables than just money. We will analyze determinants of SWB within and across countries and in particular investigate the role of what we will call “societal goods” – goods that make life better, such as higher life expectancy, good healthcare provisions, less corruption, freedom, etc. Our findings point to three main conclusions: 1. The relation between GDP and average SWB primarily runs through the societal goods it buys; 2. Within a country, relative income matters most for SWB, while the null that absolute income does not matter cannot be rejected with the data at hand. 60 3. We show that previous analyses excluding societal goods suffer from an omitted variable bias that biases the coefficient on relative income downward and the coefficient of absolute income upward. Thus in the end, we largely return to where Richard Easterlin started: it appears that primarily relative income matters for SWB. Economic growth by itself does not improve the human lot, but a high GDP per capita can improve SWB if spent on a better society. 2 Literature review In his 1974 paper, Easterlin pulled together a number of different datasets and documents some striking facts. First of all, within countries there is a clear positive relation between income and self-reported happiness. In thirty surveys across eighteen countries this pattern is found without exception. Next, he considers averages of life satisfaction (measured on a 0-10 Cantril scale, as explained in further detail in the next section) across 14 countries and finds only a very weak rela- tion with GDP per capita. A second set of nine datasets collected by Gallup (using a three-point scale: “Very Happy,” “Fairly Happy,” “Not very Happy”) also shows only a weak relation. In both datasets, the U.S. stands out as happiest, but the happiness rankings do not match the GDP rank- ings. For instance in the 14 country dataset, Cuba is a close second even though its GNP per capita is less than one fifth of that in the U.S. In the nine-country dataset, Great Britain is second behind the U.S., but France, whose GNP per capita ranks fourth in this dataset ranks eighth in terms of the percentage of respondents reporting to be very happy. A third source of information used by Easterlin (1974) is a time series covering the U.S. between 1946 and 1970. The micro-data collected by the American Institute for Public Opinion (AIPO, the predecessor of the Gallup organization) show an increase in the percentage reporting being very happy from 39% in April 1946 to 53% in March 1957, and then a decrease to 43% in December 1970. Over the same period real per capita income increased by more than 60%. Over the years the quality and quantity of data available to study the relation between income and SWB has improved tremendously. Two datasets stand out: the World Values Survey (WVS) and the Gallup World Poll (GWP). The first wave of WVS was fielded in the period 1981-1984. In total six waves have been fielded so far, with the latest wave (2010-2014) covering 57 countries. 1 The GWP was initiated in 2005. The first wave covered some 130 countries and was mainly conducted in 2006. Since then, it has expanded to several waves covering over 160 countries, estimated to include 99% of the world’s adult population. Respondents in each country are interviewed semi- annually, annually, or bi-annually. 2 Both WVS and GWP contain life satisfaction questions. WVS asks, “Taking all things together, would you say you are very happy, quite happy, not very happy, or not at all happy?” GWP uses Cantril’s ladder for evaluative life satisfaction (our data also use this measure; see the data section for details), including a prospective evaluation of life satisfaction in five years, as well as several affect questions (i.e., whether respondents experienced happiness, stress, pain, etc. during much of the previous day). Two important studies published in 2008 establish a highly significant and approximately linear re- lation between the logarithm of GDP per capita and average SWB. Using the GWP, Deaton (2008) 1 www.worldvaluessurvey.org/ 2 http://www.gallup.com/178667/gallup-world-poll-work.aspx 61 found that in a cross-section, log GDP per capita is highly predictive of average life satisfaction. The most extensive analysis of the relation between income and SWB is provided by Stevenson and Wolfers (2008). They consider the relation between SWB and income within countries and the relation between SWB and GDP per capita across countries, bringing together a large number of datasets, including the original datasets used by Easterlin (1974). Their general approach for the analysis of cross country differences is to first use the micro data to run ordered probits on country (or country-year) fixed effects, and then use the estimated dummies as SWB averages per coun- try (or country-year). SWB questions vary across datasets in the number of response categories employed and this procedure aims to make the SWB responses more comparable. Stevenson and Wolfers generally find strong positive relations between their measures of country averages of SWB and the logarithm of GDP per capita, similar to the findings by Deaton (2008). In some cases with only a limited number of country observations, the relation is not statistically significant, but in larger datasets like WVS (1999-2004) or GWP (2006), the relation between SWB and log-GDP per capita is statistically significant and generally appears to be linear. Earlier studies appeared to find a positive association between GDP per capita and average SWB for lower levels of GDP, which then would flatten once GDP per capita reached a level of about $20,000 (Layard, 2003, 2011). The interpretation of this finding was that income contributes to happiness by providing the basic necessities of life, but once those basic necessities are provided for, further economic growth would not increase average SWB. Both Stevenson and Wolfers (2008) and Deaton (2008) investigate this possibility by splitting the sample of countries by GDP per capita (with threshold $15,000 in the case of Stevenson and Wolfers; $12,000 and $20,000 in the case of Deaton). Neither study finds any evidence for this hypothesized ceiling effect, at least in terms of log-GDP per capita. In line with this, Stevenson and Wolfers (2013) conclude that there is no evidence of satiation, i.e. SWB does not reach a plateau at a higher level of GDP per capita, at least not at the GDP levels currently observed around the world. In view of the strong cross-country relation between average SWB and GDP per capita, an obvious question is whether within-country income growth raises average SWB. 3 One of the key observa- tions in Easterlin’s 1974 paper was the much stronger relation between SWB and income within a country than between average SWB and GDP per capita across countries. On the other hand, using various data sources (the GWP and the General Social Survey (1972-2006) for the U.S.), Stevenson and Wolfers (2008) find that the slope of SWB with log-income is about the same within countries and across countries. Using four waves of WVS data and considering various specifications relating SWB to log-GDP per capita, their panel fixed effects estimates show a slope coefficient of about 0.3, similar to the slope coefficients they find in the cross-country and within-country regressions. Several related papers (Sacks, Stevenson et al. 2010, Sacks, Stevenson et al. 2012, Sacks, Steven- son et al. 2013) expand on the work by (Stevenson and Wolfers 2008) by adding new data and by placing more emphasis on the effect of economic growth (Sacks et al., 2010, 2012, 2013). Their conclusions remain essentially the same. Proto and Rustichini (2013) appear to use the same WVS data but follow a different procedure. These authors also use four waves of the WVS and include country fixed effects. Rather than using GDP per capita as an explanatory variable directly, countries are divided in 15 quantiles 3 Obviously if growth is persistent, high-growth countries will sooner or later be high-GDP countries, and those with stagnating growth will become low-GDP countries in comparison. 62 and SWB is regressed on 14 quantile dummies. They reproduce the common finding that SWB increases monotonically with GDP per capita. However, when including country fixed effects they find a curvilinear relation with the maximum occurring around the ninth or tenth quantile. Since the introduction of country fixed effects implies that quantile dummies are identified because of country changes between periods, this speaks potentially to the fact that economic growth does not improve SWB beyond a certain GDP level. It is however difficult to interpret what this means as it is not GDP itself that changes between periods, but rather its ranking across countries. As noted before, Easterlin (1974) found essentially no effect of income growth on the percentage of individuals reporting to be very happy or fairly happy in the U.S. for the period 1946-1970 when GNP per capita grew by 60%. In follow-up work, Easterlin et al. (2010) argue that one should look at long-term trends in income over periods of at least ten years. Looking at a sample of 37 countries they find that average life satisfaction tends to follow the business cycle (going down during the down swing and going up during recovery), but that there is no long term effect. Easterlin (2015) looks at very long time horizons (12 to 34 years) and similarly finds no long-run effect. Sacks et al. (2013) revisit the same data as well as a number of other datasets and their conclusion remains that also over the long run economic growth raises average well-being in a country. The results obtained on the basis of the various cross country datasets depend heavily on the treat- ment of sometimes non-representative samples, changes in question format over time, differences in question format across datasets, and particular economic events, such as the transition from a centrally planned economy to a market economy in the former Soviet countries. Easterlin (2017) discusses the Sacks et al. (2013) findings, next to several other papers and concludes that the paradox still stands in that there is no long term relation between economic growth and average SWB. He also returns to the finding in his original paper, mentioned above, that between 1946 and 1970 average happiness hardly increased, while per capita income rose by more than 60%. Since 1972 the General Social Survey asks respondents for their self-reported happiness. The percentage “very happy” has remained essentially unchanged over the period 1972-2014. Combing both these findings he concludes that over about seven decades SWB has hardly changed, despite the fact that over this period GDP per capita more than tripled. This paper is not concerned with the time series relation between SWB and economic growth, but rather with the cross sectional relationship between individual SWB and both micro and macro variables. In principle cross sections and time series may tell very different stories (Easterlin, 2013a), although we will return to the time series evidence in the concluding section. Our findings in this paper confirm that average life satisfaction is significantly higher in countries with higher GDP. It comes with an important proviso however: it appears that this is not the effect of higher material standards of living per se, but rather due to a more generous supply of societal goods in wealthier countries. This conclusion relates to the observation this article started with: SWB is a function of more than income. By concentrating on income, one runs the risk of introducing omitted variable bias. Indeed, in his 1974 article Easterlin cites various studies by Cantril (1965) that list the most important domains for people when rating their life satisfaction or happiness. These include health, standard of living, family, leisure, and work. Easterlin (2006) relates overall happiness to satisfaction with family, finances, jobs, and health. Similarly, Kapteyn et al. (2009) note that life satisfaction can be explained as a function of the satisfaction with four domains: job/daily activities, health, relations with friends and family, and income. Layard (2011) lists seven factors affecting happiness: (1) family relationships; (2) financial situation; (3) work; 63 (4) community and friends; (5) health; (6) personal freedom; (7) personal values. Van Praag et al. (2003) and FerreriCarbonell and Van Praag (2002) use data from the German Socio Economic Panel and relate SWB to satisfaction with six domains: job, financial, house, health, leisure, and environment. Dolan et al. (2008) review the economic literature and identify a large number of factors that have been found to influence SWB. These include gender, health, personality, edu- cation, health, work, unemployment, hours worked, commuting, caring, community involvement and volunteering, exercise, religious activities, trust, political persuasion, religion, marriage and intimate relationships, children, family and friends, and several macro factors like income inequal- ity, inflation, unemployment rates, inflation, welfare system, democracy, climate and the natural environment, safety, and urbanization. The evidence on the importance of several of these factors is mixed and some of these are primarily within the personal sphere and not obviously very much influenced by policy, income, or societal circumstances at large. But many are: unemployment (and unemployment insurance) can be af- fected by policy, workplace regulations can affect the nature of work and how work affects SWB, environmental policies can impact health, etc. In typical economic analyses most non-monetary factors are ignored, introducing the risk of omitted variable bias. If the omitted variables correlate positively with GDP per capita or PPP-adjusted individual or family income, then their omission will bias the effects of GDP per capita and absolute income upward. This is precisely what our analysis shows. Once societal goods are introduced, the estimated effects of absolute income and GDP per capita lose statistical significance and become quantitatively small. At the same time the estimated effect of relative income becomes highly statistically significant and quantitatively important. 3 Data Since 2005 GWP continually surveys residents in over 150 countries, sampling about 1,000 (or more, in some countries) individuals in each country. World Poll questions measure opinions about national institutions, corruption, youth development, community basics, diversity, optimism, vio- lence, religiosity, and other topics. The World Poll questionnaire is translated into major languages of each country. The translation process starts with an English, French, or Spanish version, de- pending on the region. A translator proficient in both original and target languages translates the survey into the target language. A second translator reviews the language version against the original version and recommends refinements. The World Poll includes key indicators such as perceptions of law and order, food and shelter, work quality, personal health, citizen engagement, and well-being (i.e., life satisfaction). These indexes are validated at both respondent and country level, correlating in predictable ways with objective indicators such as personal income, GDP, education, and life expectancy. With some exceptions, all samples are probability-based and nationally representative of the resi- dent population aged 15 and older. The coverage area is the entire country including rural areas, and the sampling frame represents the entire civilian, non-institutionalized, aged 15 and older population of each country. Exceptions include areas where safety of interviewing staff is threat- ened, scarcely populated islands, and areas interviewers can reach only by foot, animal, or small 64 boat. Specifically, sampling in the Central African Republic, Democratic Republic of the Congo, Lebanon, Pakistan, India, Syria, Azerbaijan, Georgia, Morocco, Myanmar (Burma), Chad, Mada- gascar, Moldova, and Sudan was affected by security; some of these as well as Canada, China, Laos, and small parts of Japan had non-representative sampling of some geographic regions. In Arab countries (Bahrain, Kuwait, Saudi Arabia), sampling was of citizens (including Arab expatriates) and those who could complete the survey in Arabic or English; in the United Arab Emirates, all non-Arabs were excluded, i.e. more than half of the population. In the Philippines, urban areas were over-sampled. Israel excludes East Jerusalem (Gallup reports Palestinian Territories sepa- rately, although we do not use it in our analysis due to missing data). 4 Some of these countries have been dropped from our analyses due to missing information on some of the variables needed for our analysis. 5 We generally present results based on all countries in our database with non-missing information, but have also redone all analyses excluding the problem countries above. It turns out results are nearly identical. Appendix B shows the main results of our empirical analyses when we exclude the problem countries. Telephone surveys are used in countries where telephone coverage represents at least 80% of the population or is the customary survey methodology. In Central and Eastern Europe and most of the developing world, an area frame design is used for face-to-face interviewing. In some countries, over-samples are collected in major cities or areas of special interest. In some large countries, such as China and Russia, samples of at least 2,000 are collected. Gallup has created a worldwide data set with standardized income and education data. To make education comparable across countries, education descriptions are recoded into one of three rele- vant categories: “Elementary”: completed elementary education or less (up to eight years of basic education); “Secondary”: completed some education beyond elementary education (9 to 15 years of education); “Tertiary”: completed four years of education beyond “high school” and/or received a four-year college degree. Similarly, annual household income in international dollars is calculated using the World Bank’s individual consumption PPP conversion factor. These PPP-corrected val- ues correlate strongly (r=0.94) with the World Bank estimate of per-capita GDP (PPP-corrected). The result is a household income measure that is comparable across all respondents, countries, and local and global regions. Response rates are calculated according to AAPOR Standard Definitions Callegaro and DiSogra (2008), and reported figures include completed and partial interviews, refusals, non-contacts, and unknown households. Gallup World Poll response rates vary by mode of survey and region. As part of a National Institute on Aging supported project to the authors of this paper, Gallup added a module on the international comparison of well-being to surveys in 109 countries conducted during 2011-2014. 6 These countries provide the sample for the current paper. Eighteen countries were interviewed in 2011, 39 in 2012, 26 in 2013, and 26 in 2014. Country selection ensured key countries were present in the sample (e.g. the United States, China, India), that the diversity of 4 http://www.gallup.com/services/177797/country-data-set-details.aspx 5 Argentina, Myanmar, and Yemen are missing GDP; Ecuador, Georgia, and Singapore are missing employment status; Zimbabwe is missing health expenditures; Bosnia and Herzegovnia and United Arab Emirates are missing education expenditures; Egypt and Lebanon are missing self-reported health; Palestinian Territories is missing health and education expenditures, as well as the Corruption Perception index; Syria is missing GDP and education expen- ditures; Taiwan is missing all macro-level variables except the Corruption Perception index. 6 Most of this module consists of anchoring vignettes but these are not used in the current paper. 65 the world was represented, and that the interviews were of the highest quality. Most countries have approximately 1,000 observations, with the exceptions of Russia (1,500), India (5,000), China (4,500), Germany (3,000), United Kingdom (3,000) and Haiti (500), for a total of about 120,000 observations. Appendix F summarizes all variables used in this paper, while Ap- pendix A presents the exact wording of all survey questions, and the sources and definition of all country-level aggregate variables used in the analysis. Our primary interest is in the answers to the following question (the so-called Cantril ladder): “Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?” We will refer to the answers to this question as (self-reported) life satisfaction. The aim of this paper is to explain variation of self-reported life satisfaction within and across countries on the basis of both individual- and household-level variables, as well as on the basis of a number of variables at the aggregate level for each country. The exact wording of the individual-level questions is given in Appendix A. Summary statistics shown in Appendix E (Tables E1 through E5) show that Western countries (including Western Europe and other wealthy, English-speaking countries) tend to be most satis- fied (as measured by the average response on the 0-10 ladder scale), while Sub-Saharan African countries are the least satisfied, followed by South Asian countries. There are, of course, exceptions – for instance, Greece’s ongoing debt crisis places Greece in the bottom half of the life satisfaction ranking. Canada, Israel, Australia, the United States, and New Zealand are the most satisfied in the sample, on average. These nations have high GDPs (though not the highest), are well educated, and have high life expectancies (though not the highest). Syria, Chad, Afghanistan, Rwanda, and the Central African Republic are the least satisfied; they are all plagued by conflict, have relatively low GDPs, and (excepting relatively well-educated Syria) are among the least educated in the sam- ple. There is a substantial gap between the highest and lowest average SWB, ranging from 3.32 (Syria) to 7.49 (Canada). Generally, more educated and more urban countries have higher life satisfaction. Although at the individual level marriage is associated with higher life satisfaction, the percentage of married re- spondents is somewhat higher in less satisfied countries. Unemployment and underemployment are more prevalent in less satisfied countries, as are self-reported health problems. Looking at macro variables, more satisfied countries generally have higher GDP, higher life expectancy, greater expen- ditures on health and education, more years of schooling, and better environmental health. They are also generally perceived as being less corrupt. The macro variables show considerable variation; health expenditures in the Democratic Republic of the Congo are just $25.34 per capita (in 2011 dollars), while in the United States they are $8,845.18 per capita. Average years of schooling ranges from 1.5 in Chad to 12.1 in the United States. For the variables at the country level we primarily draw on (1) the World Bank Open Data Indica- tors, from which we take data on population, GDP, 7 education expenditures, health expenditures, 7 An alternative source of information for GDP per capita would be the Penn World Tables (PWT). However PWT 66 life expectancy at birth, and energy use per capita (kt of oil equivalent), (2) the United Nations Development Programme, from which we take data on average years of schooling (by people ages 25 and older), (3) Transparency International, from which we get an index of national corruption, (4) the Environmental Performance Index, from which we take the Environmental Health Index, 8 and (5) the Stockholm International Peace Research Institute, from which we get military expenditures per capita (including some estimates). The Environmental Health Index includes child mortality, measures of air quality (the percentage of households using solid fuels as primary cooking fuel, exposure to PM2.5 pollutants), and the proportion of each country’s population with access to improved water and sanitation. Thus this measures how the environment in each country affects human health. Summary statistics for these variables appear in Table E5. Data for education expenditures in 2014 has not yet been released for two countries surveyed in 2014, Saudi Arabia and Yemen. It will be added as soon as it is available; in the meantime, we linearly extrapolate. Results are comparable when, instead of extrapolating linearly, we use 2013 values in place of 2014 values. After dropping observations that are missing an essential variable (life satisfaction, household in- come, etc.), we end up with about 87,000 observations from 96 nations. About 15% of the sample (18,956 observations) was missing household income, the most commonly missing variable. Geor- gia, Singapore, and Ecuador were missing data on unemployment; Taiwan was missing World Bank indicators; Palestinian Territories was missing health expenditures; Myanmar, Argentina, and Syria were missing GDP. 4 Results using aggregate data only To set the stage, it is useful to first replicate results obtained by Stevenson and Wolfers (2008) and Deaton (2008). Our sample of countries is somewhat smaller and the data refer to a later period (2006 in the case of Stevenson and Wolfers and Deaton, 2011-2014 in our case). Among other things the latter period is in the later part of the financial crisis, which may have affected countries differently. Table 1 presents our estimates together with the estimates presented in Deaton’s Table 1 (Deaton, 2008, p.58). The results are qualitatively similar with a highly significant effect of the logarithm of per capita GDP on the country’s mean life satisfaction. Splitting the sample in three subsamples in the same way as Deaton has done (GDP per capita less than $12,000; GDP per capita between $12,000 and $20,000; GDP per capita greater than or equal to $20,000) does not alter that conclusion and indeed in our case the coefficient of log GDP per capita increases monotonically if we move from the lowest to the highest GDP per capita group. In the Deaton results the coefficient goes up from the lowest income category to the next highest, but decreases when the sample is restricted to countries with GDP per capita of at least $20,000 (but with a large standard error). As noted earlier, Stevenson and Wolfers (2008) run ordered probits of self-reported life satisfaction on country fixed effects and use the estimated dummies as SWB averages per country. We repeat their procedure for our data and compare to their outcomes as reported in Table 1 of Stevenson is not yet updated with recent years’ data. For the countries for which PWT data are available we have compared these with World Bank data and found them to be very similar. 8 Index is produced by Yale Center for Environmental Law & Policy (YCELP) and Yale Data-Driven Environmental Solutions Group at Yale University (Data-Driven Yale), the Center for International Earth Science Information Network (CIESIN) at Columbia University, in collaboration with the Samuel Family Foundation, McCall MacBain Foundation, and the World Economic Forum. 67 and Wolfers (2008). See Table 2. Our estimates are very close to theirs. Figure 1 copies Deaton’s Figure 2, while Figure 2 presents similar graphs for our data. Figure 3 copies Figure 4 in the paper by Stevenson and Wolfers (2008), while Figure 4 shows a similar figure based on our data. In all cases, the patterns we find are qualitatively similar to what Deaton (2008) and Stevenson and Wolfers (2008) found with the earlier data. The remainder of the paper is devoted to a more in-depth exploration of the effect of both aggregate variables (such as GDP or life expectancy) and household and individual-level variables on life satisfaction. 5 The effect of contemporaneous micro and macro variables We present the main results in a number of tables. Tables 3 and 4 report results of six variants, which differ by the specification of macro and household income variables. In all specifications we include a number of individual-level variables. The results for most of these are presented in Table 3, with the exception of household income and age of the respondent (which will be discussed below). How the specifications differ becomes clear in Table 4, which presents the estimated co- efficients for the effects of the macro and household income variables. The point of the results in Table 3 is to show that estimates vary little across specifications so that we can discuss the effects of these variables without much regard for which specification the estimates represent. The results presented in Table 3 replicate results often found in the literature. Life satisfaction is higher for females, urban dwellers, respondents with higher education, and married respondents. Divorced or widowed respondents or domestic partners report lower life satisfaction. Unemployed or underemployed respondents report lower life satisfaction. Being out of the workforce does not appear to affect life satisfaction substantially, perhaps because it includes a mix of students, retired people, stay-at-home parents, the disabled, etc. Respondents with health problems 9 are generally less satisfied with their lives. Estimates are consistent across all six specifications, with a couple of expected exceptions. The coefficients on education are lower once household income is added (after column 2), presumably because part of why high education leads to high life satisfaction is through its effect on income. Similarly, the coefficients on indicators for being employed full-time for self, unemployed, employed part time while wanting full time, and out of the workforce all decrease once household income is added as these employment statuses affect life satisfaction in part through their effect on income. Age has been specified by quadratic functions that may vary across countries. The reason for this is that in many countries the relation between life satisfaction and age has been found to be U-shaped (Blanchflower and Oswald, 2008), but with regional variations. 10 The estimates of the age functions are insensitive to the particular specification of the macro or household income variables. Once other characteristics have been controlled for, the majority of countries follow a U-shape in age. The major regional exception is Latin America & the Caribbean, where about half 9 The health measure is the yes/no answer to the question “Do you have any health problems that prevent you from doing any of the things people your age normally can do?” 10 As Easterlin (2006) has pointed out, the U-shape may appear after controlling for many variables, but for instance health deteriorates with age and older people may have lower incomes or be more likely to be alone, so that in effect SWB may fall with age if we do not control for these other variables. 68 of the countries show age functions that are inverse-U-shaped, with the rest a mixture of downward- sloping, upward-sloping, or U-shaped functions. Sub-Saharan Africa shows 16 out of 21 countries with U-shaped age functions, with the rest a mix of inverse-U and downward-sloping functions. The age functions in all other regions are U-shaped with only a few local exceptions (e.g. Israel is downward-sloping while the rest of the Middle East is U-shaped). There is substantial variation in the magnitudes of the coefficients, and thus the curvature of the shapes. Some are relatively flat, while others are much more pronounced. Our key results are contained in Table 4, which shows the results for various specifications of the macro and household income variables. All macro variables with the exception of years of school- ing (2011), the corruption perception index (2014), and the environmental health index (2014) are taken from the year of the interview. For example, respondents in Turkey were interviewed in 2012, so GDP, life expectancy, etc. are measured in 2012. Recall that these specifications control for the micro variables reported in Table 3 and for country-specific age functions. Column 1 shows the effects of GDP per capita. The coefficient of log-GDP per capita is very similar to what was shown in Table 1. Thus, the addition of individual demographics does not have an appreciable effect on the relation between GDP and life satisfaction. The second column adds a number of aggregate variables that can be seen as indicators of societal development. For simplicity of terminology we will refer to these as societal capital or societal goods. The coefficient on GDP is essentially zero once societal goods are included in the model. Of the added variables in column 2, only life expectancy, corruption perception index, and the environmental health index show a statistically significant effect. 11 The first column of Table 5 presents the outcomes of a number of joint tests for the aggregate variables appearing in column 2 of Table 4. It appears that the aggregate variables that are individually insignificant in Column 3 are also insignificant when tested jointly in different combinations. The third column of Table 4 removes societal goods and adds two household-level income variables. The first one is the logarithm of equivalized household income, that is, household income divided by weighted household size using OECD weights: 1 for the first adult, 0.7 for additional adults, and 0.5 for each child younger than 15. (Results are comparable using the modified OECD equivalence scale weights of 1/0.5/0.3.) Household income is measured before taxes, in international dollars, i.e. self-reported household income corrected for purchasing power parity differences. 12 We will refer to this as “absolute income.” The second income variable included in column 2 is the logarithm of relative income, defined as the log of equivalized household income minus the country sample mean of this variable. If we assume the income distribution in a country to be approximately lognormal then we can interpret this variable as the logarithm of the ratio of own household income to the median household income in a country (after correcting for variations in household size). We refer to this variable as “relative income.” The relative income variable has a significantly negative effect in column 3, while absolute income has a significantly positive effect. Column 4 includes absolute income and societal goods. Although the coefficient is much smaller 11 Note that corruption has an unexpected sign. The corruption index is reverse coded: a higher value implies less perceived corruption. This unexpected pattern is not seen in the raw data: the correlation between the perceived corruption index and life satisfaction equals .28 (the correlation with GDP per capita equals .33). 12 http://www.gallup.com/poll/166358/new-measures-global-income-gallup-world-poll.aspx 69 than in column 3 (0.442 vs. 0.228), absolute income still has a significantly positive coefficient. The coefficients on the societal goods, as well as the results of joint tests reported in Table 5, are comparable to column 2. Columns 5 and 6 once again include societal goods. Column 5 also includes absolute and relative income, while column 6 additionally includes GDP. Results are nearly identical in columns 5 and 6. Comparing the results with column 3, we see that adding public goods renders absolute income small and insignificant, while relative income is now positive and significant. Life expectancy, the corruption perception index, and the environmental health index remain significant, and results of joint tests in Table 5 are comparable as well. A remarkable feature of the specifications considered in columns 2, 3, and 4 is that without the inclusion of societal capital (column 2) the effect of absolute income is highly significant while the effect of relative income is insignificant and negligible. When we add the societal goods, but exclude relative income (column 3) the effect of absolute income remains highly significant. When finally in column 4 we also add relative income, the effect of absolute income becomes essentially zero. To get a better understanding of why this is, consider a somewhat simplified version of the model we are estimating: y ic =x 0 ic β 1 + (x ic −x c ) 0 β 2 +v 0 c β 3 +w 0 c β 4 +ε ic (1) where y ic is life satisfaction of respondent i in country c; x ic contains individual-level variables (such as log-absolute income), while x c is the mean of individual-level variables in country c; v c contains aggregate variables for country c (such as log-GDP per capita); w c contains the societal capital variables. Obviously, (1) can be written as: y ic =x 0 ic λ 1 +x c 0 λ 2 +v 0 c β 3 +w 0 c β 4 +ε ic (2) where λ 1 = β 1 +β 2 and λ 2 = −β 2 . We show in Appendix C that when estimating (2) by OLS the omission of w c does not affect the estimate of λ 1 (the effects of individual and household level variables), but it does affect the other parameters. The bias inλ 2 (the effects of the country means of the individual variables) andβ 3 (the effect of GDP per capita) is equal to the regression of w 0 c β 4 on x c and v c . Although the result is exact for OLS in model (2), it is only approximate for the estimates considered in Table 4, for two reasons. First of all the estimates in Table 4 are based on ordered probit and not linear regression. Secondly, the formulation in (2) assumes that we include country means of all individual-level variables, while in Table 4 average log-income is the only included country mean. We return to the second point below. Despite these differences, the approximation is quite good, as shown in the top panel of Table 6 (the bottom panel will be discussed below). The first column shows the estimates of the effects of log-GDP, log-absolute income and the country mean of log-absolute income 13 when we omit the societal goods. The second column is identical to the sixth column of Table 4, except for the reparameterization, and presents the estimates when the societal goods are included. The third column shows the estimates of the bias, as derived in Appendix C, while the last column subtracts the estimated bias in column 3 from the estimated coefficient in column 1. Comparing columns 2 and 4 shows that the estimates are close. And indeed the coefficient of absolute income is not affected. The most striking aspect of column 3 is the large positive bias for the coefficient of mean 13 We still control for all the other variables in Table 3, but do not show them here. 70 log-income, which renders the coefficient in column 1 small and statistically insignificant. A natural extension is to include country means of all individual-level variables. The results for that specification are presented in Table 7. In the bottom panel of Table 7, we once again only report the results for log-GDP, log-absolute income and country means of log-absolute income. Qualitatively, the results are similar, with an even larger negative effect of mean log-absolute income in columns 5 and 6. The implied effect of absolute income is almost exactly equal to zero, while relative income has a highly significant coefficient of .225. Although not the primary focus of this analysis, we note that the estimates of the coefficients of the individual-level variables in Table 7 are very similar to those in Table 3, with the exception of marital status, which shows that relative to being single, other marital statuses have a negative impact. That is not true for the aggregate variables. Average years of schooling as calculated by the UN Development Programme now gets a significant positive coefficient and the coefficients on other societal goods are insignificant. Of more interest are the sign and size of the country means of the individual-level variables. We only discuss coefficients that are significantly different from zero at the five percent level in the sixth column (the full specification). We find that life satisfaction is higher in countries with a higher share of people with secondary educations, while it is lower in countries with lower employment (the country means of unemployment, employed part time but wants full time, and out of the workforce are all negative). In Table 8, we consider two alternative formulations for relative income. As currently defined, relative income is log household income minus the country mean of log household income. We now consider income relative to the log of the country mean of household income. As a second alterna- tive, we calculate each respondent’s income rank by fitting a lognormal distribution function to the equivalized incomes in each country and finding the cumulative distribution function value at each observed income. This is an estimate of a household’s rank in the national income distribution. The same individual characteristics from Table 3 are included in Table 8, and have very similar coefficients as those reported in that table. Columns 1, 2, and 3 replicate columns 3, 5, and 6 of Table 4. Columns 4 through 6 replicate those columns replacing the mean of log income with the log of mean income, and columns 7 through 9 replace it with income rank. The results with the alternative relative income definition closely match the original results. Rel- ative income is significantly negative in the specification without societal goods (column 4), but significantly positive when societal goods are included (column 5), and remain so when GDP is added (column 6). As before, whenever societal goods are included, the effect of absolute household income becomes completely insignificant. The results with income rank are qualitatively similar, but not as sharp. When using income rank as the relative income measure, absolute income remains significant in the specification including societal goods (columns 8 and 9), although its coefficient declines markedly compared to column 7. Since the coefficient of the rank variable cannot be compared directly with the coefficient of log- household income, we ran a simulation where we considered the effect of moving from the 25th percentile to the 50th percentile. Such a change also implies a change in household income, which will be different in different countries. The simulation shows that on average, moving from the 25th percentile of the income distribution to the 50th percentile results in an increase in life satisfaction 71 of about 0.25. On average, 80% of the increase is because of the change in rank and 20% is because of the change in absolute income. Thus although the effect of absolute income is still significant with this alternative definition of ”relative income,” we still find that the change in relative income is much more important than the change in absolute income, reinforcing the main results in Table 4. Table 9 once again replicates Table 4, this time using two placebo variables in place of the societal goods: log military expenditures per capita and energy use per capita (measured as kilotons of oil equivalent). They are correlated with GDP, and are alternatives that additional income could be spent on. Energy use includes all sectors of the economy, and is generally rising with industrializa- tion. Military expenditures data (which include estimates) come from the Stockholm International Peace Research Institute (SIPRI), while energy use comes from the World Bank Indicators. Coun- tries entirely missing either placebo variable are dropped from Table 9, 14 and not-yet-published values from 2013 and 2014 are replaced with linear extrapolations. Although not shown, the individual-level variables included in Table 3 are also included here, and have similar coefficients to those in that table, as one would expect in view of the above cited result on the effect of omitted aggregate variables. Results in columns 1 and 3 of Table 9 are similar to those reported in Table 3, with slight differences due to a somewhat smaller sample. Column 2 in table 4 shows the coefficient of GDP to be zero and insignificant, while in Table 9, it remains strongly significant, and actually increases somewhat. Comparing columns 3 and 5, adding the placebo variables had little effect on the coefficients of absolute and relative income (although both increased in magnitude). This is in contrast to Table 4, in which adding societal goods rendered absolute income insignificant and relative income significantly positive. In column 6 of Table 4, adding log GDP had little effect on the coefficients of absolute and relative income, however, in Table 9, GDP is once again significantly positive, while relative income is insignificant. The effects of absolute income and GDP are thus not mitigated by including these variables, which are themselves insignificant (and negative). It is not the case that adding in just any spending variable related to GDP per capita has this effect on the coefficient of GDP; it is not spending as such that improves life satisfaction, but what it is spent on. 6 Discussion As we have noted in the introduction, numerous papers have found that SWB is affected by many variables, of which income is only one. This paper explores the biases in estimated effects of abso- lute income and GDP per capita caused by ignoring societal goods in international comparisons. Our starting point has been to replicate the most prominent recent studies on our new data, to make sure that our findings are not caused by our data somehow being different. The results have been quite striking: not only do we find that the effect of GDP per capita works through the societal capital it buys (or which it is correlated with), we also find that once we include societal capital goods, the within-country effect of income on life satisfaction becomes purely relative. Thus this paper returns to one of the main themes of Easterlin’s 1974 paper. It does appear that within countries the effect of income on life satisfaction is due to the relative position it buys. However, this only becomes visible when one accounts for societal goods. There is a vast literature on the relative nature of satisfaction with consumption or income (Conti 14 Afghanistan, Central African Republic, Chad, Costa Rica, Haiti, Laos, Liberia, Madagascar, Malawi, Mauritania, Panama, Rwanda, Sudan, Uganda, and Uzbekistan are all missing either military expenditures or energy use. 72 et al., 2006). Frank (2012) discusses evolutionary factors that would suggest that relative ranking should matter. For example, during a famine the strongest individuals would survive; the strongest individuals would be most likely to find a mate, etc. Primate studies have found that if they move up in the social hierarchy, their serotonin levels increase. Serotonin is related to feelings of happi- ness and well-being (van Vugt and Tybur, 2015). The most direct evidence that subjective well-being of humans is influenced by how well others are doing in an individual’s reference group (roughly defined as: others the individual compares herself with) is provided in laboratory settings. For instance Fliessbach et al. (2007) use functional magnetic resonance imaging (fMRI) to measure brain activity of subjects who have to perform an estimation task and are rewarded according to their performance. Subjects are not only informed about their own payments but also about the payments of other subjects. It is shown that neuro- physiological activity responds strongly to relative payments. Outside the laboratory, establishing the effect of relative comparisons requires the definition of reference groups, i.e. groups of people one compares oneself to (Dahlin et al., 2014). Reference groups have traditionally been defined a priori, e.g. by using individuals’ characteristics to define groups, for instance based on education, gender, or age (McBride, 2001). Similarly, coworkers or individuals in the same profession have been used as reference groups to explain the impact of individuals’ rank in the wage distribution on their satisfaction with the pay they receive (Brown et al., 2008). The most commonly made assumption is that individuals mainly compare themselves to others in the same geographical area (Blanchflower and Oswald, 2004; Luttmer, 2004; Ferrer-i Carbonell, 2005; Barrington-Leigh and Helliwell, 2008; Clark et al., 2008). The definition of reference groups based solely on geography leads to complications, since for instance neighbors’ incomes are likely to be related to the quality of public and private goods in an area (access to parks, better stores, etc. ). Using the Gallup Healthways Well-being Index survey, Deaton and Stone (2013) regress SWB on log income (and average SWB on average income) at increasingly high levels of aggrega- tion in the U.S. (individual, zip code, metro area, state). They note that if relative income is all that is important, then the coefficient on income should be rapidly declining as we move to higher levels of aggregation. Instead, they find that a regression of average SWB on average log-income at the zip-code level yields a higher coefficient than when using individual level data. When moving to larger geographic units the coefficient declines, but by only about one fifth compared to the individual level coefficient estimate. They control for age, sex, and race, but not societal goods. Also using the Gallup Healthways Well-being Index survey, Ifcher et al. (2016) find that neighbors’ incomes positively impact SWB in the U.S. at the local (zip code) level, but that at higher levels of aggregation (metro area), the effect is negative. This is true for several measures of well-being, including Cantril’s ladder. Yet, studies that try to elicit directly what reference groups respondents use find little evidence that geography is important. For instance, Goerke and Pannenberg (2013) use pretest modules of the German Socio Economic Panel for the years 2008-2010, which contain questions about the importance of different groups for income comparisons. Their sample is restricted to employed respondents aged 17 to 65. They find that only colleagues at work, other people with the same occupation, and friends matter. Dahlin et al. (2014) also find very little support for the notion that reference groups would be primarily formed on the basis of geographical proximity. 73 In the current context, we can be relatively agnostic about the exact reference group individu- als compare themselves to, as long as we assume these comparison groups are basically contained within one’s own country. But obviously we do assume that comparison to the median person in a country is a good approximation of people’s reference groups. The goal of this paper has not been to point out which government policies would be particularly beneficial for life satisfaction. There exists a small literature aiming to do just that. Helliwell and Huang (2008) use data from the World Values Surveys and the European Values Surveys in com- bination with data on indexes of government quality Kaufmann et al. (2009) to study the relative contribution of government quality relative to GDP per capita on SWB in countries. 15 They find that if the quality of government indicators and GDP per capita are both included in regression models, GDP per capita becomes insignificant, similar to our findings. One of the channels they identify through which government affects SWB is health, just as in the current paper. Oishi et al. (2012) use the GWP (2007) and find that progressivity of the tax system is positively correlated with SWB across countries, but not the overall tax rate. The mechanism for this appears to be that respondents in countries with a progressive tax system are more satisfied with the public goods in that country. At the same time, in countries where government spending as a percentage of GDP is high, satisfaction with public goods tends to be lower. Easterlin (2013b) notes that happiness appears to be higher in countries with full employment and safety net policies. The latter are presumably correlated with progressivity of income tax schedules. The idea that societal goods may be an important contributor to the average well-being in a country is probably non-controversial, and indeed is at the heart of many policy debates. The notion that a higher GDP buys better societal capital has been observed before as well. For instance, Diener et al. (2010) using the first wave of GWP note that the correlation between SWB and GDP per capita may be explained by the fact that “some nations have societal or public goods that, when it comes to well-being, are more important than individual income” (p.54). Our findings go further however, as they suggest that societal capital may be the primary pathway to improved well-being of the citizenry. None of this implies that individual incomes don’t matter. Financial incentives are a strong driver of effort and innovation, precisely the inputs needed to sustain the GDP growth that buys the societal goods. 15 Helliwell (2003) foreshadowed a substantial part of their results. 74 References Barrington-Leigh, C. P. and Helliwell, J. F. (2008). Empathy and emulation: Life satisfaction and the urban geography of comparison groups. Working paper, National Bureau of Economic Research. Blanchflower, D. G. and Oswald, A. J. (2004). Well-being over time in britain and the usa. Journal of public economics, 88(7):1359–1386. Blanchflower, D. G. and Oswald, A. J. (2008). Is well-being u-shaped over the life cycle? Social science & medicine, 66(8):1733–1749. Brown, G. D., Gardner, J., Oswald, A. J., and Qian, J. (2008). Does wage rank affect employees wellbeing? Industrial Relations: A Journal of Economy and Society, 47(3):355–389. Callegaro, M. and DiSogra, C. (2008). Computing response metrics for online panels. Public Opinion Quarterly, 72(5):1008–1032. Cantril, H. (1965). Pattern of human concerns. Clark, A. E., Frijters, P., and Shields, M. A. (2008). Relative income, happiness, and utility: An explanation for the easterlin paradox and other puzzles. Journal of Economic literature, 46(1):95–144. Conti, R. M., Berndt, E. R., and Frank, R. G. (2006). Early retirement and public disability insurance applications: Exploring the impact of depression. Working paper, National Bureau of Economic Research. Dahlin, M. A., Kapteyn, A., and Tassot, C. (2014). Who are the joneses? Working paper 2014-004, CESR-Schaeffer Working Paper Series. Deaton, A. (2008). Income, health, and well-being around the world: Evidence from the gallup world poll. The journal of economic perspectives, 22(2):53–72. Deaton, A. and Stone, A. A. (2013). Two happiness puzzles. The American economic review, 103(3):591–597. Diener, E., Ng, W., Harter, J., and Arora, R. (2010). Wealth and happiness across the world: material prosperity predicts life evaluation, whereas psychosocial prosperity predicts positive feeling. Journal of personality and social psychology, 99(1):52. Dolan, P., Peasgood, T., and White, M. (2008). Do we really know what makes us happy? a review of the economic literature on the factors associated with subjective well-being. Journal of economic psychology, 29(1):94–122. Duesenberry, J. S. (1949). Income, saving, and the theory of consumer behavior. Easterlin, R. A. (1974). Does economic growth improve the human lot? some empirical evidence. Nations and households in economic growth, 89:89–125. Easterlin, R. A. (2006). Life cycle happiness and its sources: Intersections of psychology, economics, and demography. Journal of Economic Psychology, 27(4):463–482. 75 Easterlin, R. A. (2013a). Crosssections are history. Population and Development Review, 38(s1):302–308. Easterlin, R. A. (2013b). Happiness, growth, and public policy. Economic Inquiry, 51(1):1–15. Easterlin, R. A. (2015). Happiness and economic growth – the evidence, pages 283–299. Springer. Easterlin, R. A. (2017). Paradox lost? Working paper. Easterlin, R. A., McVey, L. A., Switek, M., Sawangfa, O., and Zweig, J. S. (2010). The happinessin- come paradox revisited. Proceedings of the National Academy of Sciences, 107(52):22463–22468. Ferrer-i Carbonell, A. (2005). Income and well-being: an empirical analysis of the comparison income effect. Journal of Public Economics, 89(5):997–1019. FerreriCarbonell, A. and Van Praag, B. (2002). The subjective costs of health losses due to chronic diseases. an alternative model for monetary appraisal. Health Economics, 11(8):709–722. Fliessbach, K., Weber, B., Trautner, P., Dohmen, T., Sunde, U., Elger, C. E., and Falk, A. (2007). Social comparison affects reward-related brain activity in the human ventral striatum. science, 318(5854):1305–1308. Frank, R. H. (2012). The easterlin paradox revisited. Emotion, 12(6):1188. Goerke, L. and Pannenberg, M. (2013). Direct evidence on income comparisons and subjective well-being. Helliwell, J. F. and Huang, H. (2008). How’s your government? international evidence linking good government and well-being. British Journal of Political Science, 38(4):595–619. Ifcher, J., Zarghamee, H., and Graham, C. (2016). Income inequality and well-being in the us: Evidence of geographic-scale-and measure-dependence. Working paper. Kapteyn, A., Smith, J. P., and Van Soest, A. (2009). Life satisfaction, pages 70–104. Oxford University Press. Kaufmann, D., Kraay, A., and Mastruzzi, M. (2009). Governance matters VIII: aggregate and individual governance indicators, 1996-2008. Layard, R. (2003). Has social science a clue?: What is happiness? are we getting happier. Lionel Robbins memorial lecture series, pages 03–05. Layard, R. (2011). Happiness: Lessons from a new science. Penguin UK. Luttmer, E. F. (2004). Neighbors as negatives: Relative earnings and well-being. Report, National bureau of economic research. McBride, M. (2001). Relative-income effects on subjective well-being in the cross-section. Journal of Economic Behavior & Organization, 45(3):251–278. Oishi, S., Schimmack, U., and Diener, E. (2012). Progressive taxation and the subjective well-being of nations. Psychological science, 23(1):86–92. Proto, E. and Rustichini, A. (2013). A reassessment of the relationship between GDP and life satisfaction. PloS one, 8(11):e79358. 76 Sacks, D. W., Stevenson, B., and Wolfers, J. (2010). Subjective well-being, income, economic development and growth. In Development Challenges in a Post-Crisis World. World Bank, Washington, D.C. Sacks, D. W., Stevenson, B., and Wolfers, J. (2012). The new stylized facts about income and subjective well-being. Emotion, 12(6):1181. Sacks, D. W., Stevenson, B., and Wolfers, J. (2013). Growth in income and subjective well-being over time. Unpublished raw data. Stevenson, B. and Wolfers, J. (2008). Economic growth and subjective well-being: Reassessing the easterlin paradox. Working paper, National Bureau of Economic Research. Stevenson, B. and Wolfers, J. (2013). Subjective well-being and income: Is there any evidence of satiation? The American Economic Review, 103(3):598–604. Van Praag, B. M., Frijters, P., and Ferrer-i Carbonell, A. (2003). The anatomy of subjective well-being. Journal of Economic Behavior & Organization, 51(1):29–49. van Vugt, M. and Tybur, J. M. (2015). The evolutionary foundations of hierarchy: status, domi- nance, prestige, and leadership. Handbook of evolutionary psychology. 77 Appendices A Data sources and full survey questions list Data sources: Gallup World Poll: life satisfaction, gender, urban/rural, education, marital status, employment status, health problems, household composition, household income. World Bank Open Data Indicators: Gross domestic product per capita, life expectancy, health expenditures per capita, education expenditures per capita, under-5 mortality, energy use (kg of oil equivalent) per capita United Nations Development Programme: Average years of schooling Transparency International: Corruption perception index Stockholm International Peace Research Institute: Military expenditures per capita Gallup survey questions: See Table A1. B Main results with specific reduced samples This appendix replicates Table 4 using somewhat reduced samples: first, with all non-representative samples removed, and second, with the limited sample from placebo tests in Table 9. See Tables B1 and B2. C The effect of omitting societal variables on the estimated effects of GDP and relative income Consider the following model: y =Xβ 1 + (I n ⊗E)Xβ 2 + (V ⊗ι)β 3 + (W ⊗ι)β 4 +ε (3) Observations are ordered by country. There are countries. For simplicity we assume that all coun- tries have an equal number of observations. This simplifies notation but does not affect the analysis in any fundamental way. ι is a vector containing m ones. E = 1 m ιι 0 . Note that E is idempotent: EE =E. Furthermore,Eι =ι. For later purposes it is useful to define another idempotent matrix, M =I−E. Obviously Mι = 0. The error term ε has expectation zero. The interpretation of the variables in (3) is that is a vector of answers to the life satisfaction ques- tion;X contains the absolute income values. The expression (I n ⊗E)X creates a vector of country means. (V ⊗ι) contains variables that are constant within a country, such as GDP per capita. (W ⊗ι) contains societal good variables, which are also constant within countries. The analysis below assumes that (3) is the correct model and explores what happens to the estimates of β 1 , β 2 , and β 3 if we leave out the societal goods (W ⊗ι). For simplicity the analysis is done in a linear regression context, rather than ordered probit. Since linear regressions give qualitatively very similar results as ordered probits, it is unlikely that this will have a major effect on the results. (A discussion for the probit model is provided by (Yatchew and Griliches 1985)). 78 In the interest of notational simplicity, let’s define a number of new variables: Φ≡I n ⊗E, Ψ≡I n ⊗M, so Φ + Ψ =I nm P v ≡V (V 0 V ) −1 V 0 , Π v ≡P v ⊗E, Σ v ≡ (I n −P v )⊗E S x ≡ Φ− ΦX(X 0 ΦX) −1 X 0 Φ,Q = (V 0 ⊗ι)S x (V ⊗ι) (4) We want to study the effect of omitting the term (W ⊗ι)β 4 from (3). Define: Z≡ X ΦX V ⊗ι (5) The bias implied by omitting (W ⊗ι)β 4 equals E b 1 −β 1 b 2 −β 2 b 3 −β 3 = (Z 0 Z) −1 Z 0 (W ⊗ι)β 4 (6) So we proceed by evaluating the expression on the right hand side of (6): Z 0 Z = X 0 X X 0 ΦX X 0 (V ⊗ι) X 0 ΦX X 0 ΦX X 0 (V ⊗ι) (V 0 ⊗ι 0 )X (V 0 ⊗ι 0 )X m.V 0 V (7) One can verify by direct multiplication that (Z 0 Z) −1 = (X 0 ΨX) −1 −(X 0 ΨX) −1 0 −(X 0 ΨX) −1 (X 0 ΦX) −1 X 0 X(X 0 ΨX) −1 + (X 0 ΦX) −1 X 0 (V⊗ι)Q −1 (V 0 ⊗ι 0 )(X 0 ΦX) −1 −(X 0 ΦX) −1 X(V ⊗ι)Q −1 0 −Q −1 (V 0 ⊗ι 0 )(X 0 ΦX) −1 Q −1 (Z 0 Z) −1 Z 0 = (X 0 ΨX) −1 X 0 Ψ −(X 0 ΦX) −1 X 0 Φ−(X 0 ΨX) −1 X 0 Ψ−(XΦX) −1 X 0 (V⊗ι)Q −1 (V 0 ⊗ι 0 )Sx Q −1 (V 0 ⊗ι 0 )S x (Z 0 Z) −1 Z 0 (W ⊗ι)β 4 = 0 (X 0 ΨX) −1 [X 0 Φ−X 0 (V ⊗ι)Q −1 (V 0 ⊗ι 0 )S x ](W ⊗ι)β 4 Q −1 (V 0 ⊗ι 0 )S x (W ⊗ι)β 4 (8) where we have used the fact that Mι = 0. The RHS is the regression of (W ⊗ι)β 4 on the aggregate variables. To see this, notice that X 0 ΦX X 0 Φ(V ⊗ι) (V 0 ⊗ι 0 )ΦX m.V 0 V −1 = (X 0 Σ v X) −1 −(X 0 Σ v X) −1 X 0 (V ⊗ι) 1 m (V 0 V ) −1 − 1 m (V 0 V ) −1 (V 0 ⊗ι 0 )X(X 0 Σ v X) −1 Q −1 Hence, X 0 ΦX X 0 Φ(V ⊗ι) (V 0 ⊗ι 0 )ΦX m.V 0 V −1 X 0 Φ V 0 ⊗ι 0 = (X 0 Σ v X) −1 X 0 Φ− (X 0 Σ v X) −1 X 0 (V ⊗ι) 1 m (V 0 V ) −1 (V 0 ⊗ι 0 ) − 1 m (V 0 V ) −1 (V 0 ⊗ι 0 )X(X 0 Σ v X) −1 X 0 Φ +Q −1 (V 0 ⊗ι 0 ) = (X 0 Σ v X) −1 (X 0 Φ−XΠ v ) − 1 m (V 0 V ) −1 X(X 0 Σ v X) −1 X 0 Φ +Q −1 (V 0 ⊗ι 0 ) (9) 79 It turns out that the last terms in (8) can be obtained by postmultiplying (9) by (W⊗ι)β 4 , but it is a little tedious to show it. It can be verified by direct multiplication that Q −1 = 1 m (V 0 V ) −1 + 1 m [(V 0 V ) −1 V 0 ⊗ι 0 ]X(X 0 ΣX) −1 X 0 1 m [V (V 0 V ) −1 ⊗ι] (10) Inserting this into the bottom element of the last vector on the right-hand side of (8) yields the bottom element of (9). Similar manipulations with the next to the last element of (8) yield the last element of (9). So this establishes that the bias caused by omission of (W ⊗ι)β 4 equals E b1−β 1 b2−β 2 b3−β 3 = 0 (X 0 Σ v X) −1 X 0 Σ v (W ⊗ι)β 4 Q −1 (V 0 ⊗ι 0 )S x (W ⊗ι)β 4 (11) which is obtained by regressing the omitted term on the aggregate variables. D Main results using ordinary least squares regressions See Table D1. E Summary statistics See Tables E1 through E5. 80 Tables and figures Table 1: Cross-country regressions of average life satisfaction on the log of per capita GDP (1) (2) (3) (4) Restriction None y < 12,000 y≥ 12,000 y≥ 20,000 Our data Log GDP per capita in survey year 0.716*** 0.533*** 1.034*** 1.331*** (0.0598) (0.0991) (0.251) (0.301) Constant -1.169** 0.317 -4.366* -7.517** (0.550) (0.827) (2.557) (3.139) Observations 105 59 46 30 R-squared 0.581 0.336 0.278 0.411 Deaton (2008) Log GDP per capita PPP 03 0.838*** 0.690*** 1.625*** 0.384 (0.051) (0.082) (0.312) (0.782) Observations 123 85 38 25 R-squared 0.694 0.458 0.43 0.01 Deaton (2008) GDP from the Penn World Table; our GDP from the World Bank Development Indicators. Deaton (2008) does not report the constant term. 81 Table 2: Cross-country regressions of average life satisfaction on the log of per capita GDP Ordered probit regres- sions, micro data OLS regressions, macro data Without controls With controls All coun- tries GDP per capita > $15,000 GDP per capita > $15,000 (1) (2) (3) (4) (5) Our data 0.345*** 0.371*** 0.355*** 0.624*** 0.316*** (0.003) (0.003) (0.030) (0.127) (0.051) Stevenson & Wolfers (2008) 0.396*** 0.422*** 0.418*** 1.076*** 0.348*** (0.023) (0.023) (0.022) (0.211) (0.037) Stevenson & Wolfers use the first (2006) round of the Gallup World Poll, we use the most recent round. Both use the same Cantril ladder question as the dependent variable. Controls in column (2) include a quartic function of age, interacted with sex, and indicators for missing gender and sex (note that our data is not missing sex for any respondents). This table replicates the first row of Stevenson & Wolfers (2008) Table 1, which also uses Gallup World Poll data; however, our results are in line with all results in that table, from various data sources. Figure 1: The relationship between log GDP and life satisfaction from Deaton (2008) Circle diameter reflects the country’s population in 2003. The dashed line shows the average life satisfaction at each GDP level. The upper and lower lines show the same average, but for ages 15-25 and ages 60+ respectively. GDP data comes from the Penn World Table 82 Figure 2: The relationship between log GDP and life satisfaction, our data Circle diameter reflects the country’s population in 2011. The line shows the average life satisfaction at each GDP level. Figure 3: Life satisfaction and real GDP per capita from Stevenson & Wolfers (2008) Dashed line is from an ordinary least squares regression, dotted line is fit from a local regression estimator (lowess). 83 Figure 4: Life satisfaction and real GDP per capita, our data Red line is from an ordinary least squares regression, green line is fit from a local regression estimator (lowess). 84 Table 3: Effects of basic characteristics on life satisfaction. (1) (2) (3) (4) (5) (6) Female 0.090*** 0.089*** 0.096*** 0.097*** 0.098*** 0.098*** (0.014) (0.013) (0.014) (0.013) (0.013) (0.013) Urban 0.123*** 0.120*** 0.055** 0.051** 0.051** 0.050** (0.022) (0.022) (0.021) (0.021) (0.021) (0.021) Secondary up to 3 yrs tertiary education 0.258*** 0.260*** 0.164*** 0.164*** 0.162*** 0.161*** (0.020) (0.019) (0.019) (0.018) (0.018) (0.018) 4 yrs tertiary education or more 0.501*** 0.501*** 0.317*** 0.317*** 0.314*** 0.313*** (0.030) (0.030) (0.026) (0.027) (0.026) (0.026) Married 0.040*** 0.037** 0.051*** 0.049*** 0.049*** 0.049*** (0.015) (0.015) (0.014) (0.014) (0.014) (0.014) Separated -0.167*** -0.170*** -0.163*** -0.166*** -0.165*** -0.166*** (0.028) (0.028) (0.028) (0.028) (0.028) (0.028) Divorced -0.163*** -0.165*** -0.151*** -0.154*** -0.155*** -0.155*** (0.028) (0.028) (0.027) (0.027) (0.027) (0.027) Widowed -0.141*** -0.150*** -0.140*** -0.146*** -0.146*** -0.145*** (0.022) (0.022) (0.022) (0.022) (0.022) (0.022) Domestic partner -0.043** -0.055*** -0.029 -0.044** -0.046** -0.046** (0.020) (0.020) (0.022) (0.022) (0.021) (0.021) Employed full time for self -0.053*** -0.050*** -0.015 -0.012 -0.011 -0.011 (0.016) (0.016) (0.017) (0.017) (0.017) (0.017) Employed part time do not want full time -0.020 -0.018 0.025 0.028 0.029 0.030 (0.028) (0.028) (0.030) (0.029) (0.030) (0.029) Unemployed -0.323*** -0.318*** -0.219*** -0.219*** -0.219*** -0.219*** (0.026) (0.026) (0.024) (0.024) (0.024) (0.024) Employed part time want full time -0.137*** -0.135*** -0.074*** -0.071*** -0.071*** -0.070*** (0.024) (0.025) (0.025) (0.025) (0.025) (0.025) Out of workforce -0.061*** -0.060*** -0.003 -0.005 -0.004 -0.005 (0.018) (0.018) (0.018) (0.018) (0.018) (0.018) Health problems that prevent normal activities -0.259*** -0.257*** -0.236*** -0.235*** -0.234*** -0.234*** (0.028) (0.028) (0.027) (0.027) (0.027) (0.027) Survey year=2012 -0.256*** -0.213*** -0.274** -0.233*** -0.211** -0.220*** (0.093) (0.075) (0.107) (0.078) (0.084) (0.081) Survey year=2013 -0.401*** -0.397*** -0.364*** -0.401*** -0.393*** -0.404*** (0.110) (0.090) (0.141) (0.097) (0.095) (0.095) Survey year=2014 -0.420*** -0.337*** -0.520*** -0.402*** -0.356*** -0.357*** (0.089) (0.068) (0.105) (0.077) (0.080) (0.078) Observations 86819 86819 86819 86819 86819 86819 Pseudo R-squared (0.066) (0.067) (0.073) (0.074) (0.074) (0.074) Ordered probit. Reference categories: elementary education or less; single/never married; employed full time for employer; survey year 2011. Also includes country-specific age and age 2 terms. See Table 4 for differences in specifications. 85 Table 4: Effects of GDP per capita, household income, and public goods (1) (2) (3) (4) (5) (6) Log GDP 0.380*** 0.084 0.104 (0.031) (0.086) (0.097) Log household income 0.442*** 0.228*** 0.059 0.037 (0.035) (0.013) (0.077) (0.082) Log of relative income -0.211*** 0.174** 0.196** (0.037) (0.078) (0.082) Life expectancy 0.027*** 0.023*** 0.026*** 0.025*** (0.006) (0.007) (0.007) (0.007) Log of health expenditures -0.002 -0.022 0.041 -0.013 (0.101) (0.095) (0.097) (0.108) Log of education expenditures 0.050 0.044 0.067 0.047 (0.070) (0.075) (0.073) (0.074) Average years of schooling 0.001 -0.008 0.001 0.000 (0.018) (0.019) (0.019) (0.018) Corruption perception index -0.005** -0.006*** -0.005** -0.005** (0.002) (0.002) (0.002) (0.002) Environmental health index 0.008** 0.008** 0.008** 0.009** (0.004) (0.004) (0.003) (0.004) Observations 86819 86819 86819 86819 86819 86819 Pseudo R-squared 0.066 0.067 0.073 0.074 0.074 0.074 Ordered probit. The year of the observed variable is the survey year (e.g. log GDP in 2012 for a respondent surveyed in 2012). 2014 education expenditures for Saudi Arabia and Yemen are not yet published; here we use a linear extrapolation. Coefficients on individual characteristics are given in Table 3. 86 Table 5: Joint tests of significants for GDP and public goods Column 2 Column 4 Column 5 Column 6 All societal development goods 0.000 0.000 0.000 0.000 Health expenditures, education expenditures, average years of schooling, corruption perception index 0.153 0.093 0.114 0.329 Life expectancy and environmental health index 0.000 0.000 0.000 0.000 Life expectancy, health expenditures, and environmental health index 0.000 0.000 0.000 0.000 Health expenditures and education expenditures 0.749 0.779 0.211 0.811 Education expenditures and average years of schooling 0.773 0.744 0.654 0.822 All values are p-values of joint tests of significance. Column numbers refer to the columns in Table 4. Table 6: Estimated bias from omitting societal goods (1) (2) (3) (4) Without micro means Log GDP 0.267*** 0.104 0.164*** 0.103 Log household income 0.232*** 0.233*** 0.000 0.232 Mean of log household income -0.042 -0.196** 0.153*** -0.195 With micro means Log GDP 0.179*** 0.162* 0.017*** 0.162 Log household income 0.234*** 0.234*** -0.001*** 0.234 Mean of log household income -0.166** -0.282*** 0.116*** -0.282 (1) gives the estimated coefficients with societal goods excluded, (2) gives the estimated coefficients with the societal goods added, (3) estimates the bias for that variable per Ap- pendix C, (4) is (1) minus (3) (that is, the biased coefficient minus the estimated bias). All models include the control variables in Table 3. Top panel is excluding the country means of all individual-level variables, bottom panel includes them. 87 Table 7: Main results including the means of all micro variables (1) (2) (3) (4) (5) (6) Female 0.077*** 0.077*** 0.089*** 0.088*** 0.089*** 0.089*** (0.013) (0.013) (0.013) (0.013) (0.013) (0.013) Female mean 1.381*** 1.026* 1.415*** 1.198** 1.050** 1.031* (0.436) (0.531) (0.449) (0.489) (0.519) (0.548) Urban 0.118*** 0.118*** 0.048** 0.056*** 0.048** 0.048** (0.021) (0.021) (0.020) (0.020) (0.020) (0.020) Urban mean 0.157 0.180 0.290 0.160 0.247 0.256 (0.188) (0.184) (0.189) (0.190) (0.176) (0.185) Medium education 0.244*** 0.245*** 0.151*** 0.163*** 0.152*** 0.152*** (0.019) (0.019) (0.018) (0.018) (0.018) (0.018) Medium education mean 0.344* 0.488** 0.411** 0.518** 0.595*** 0.592*** (0.207) (0.226) (0.209) (0.247) (0.225) (0.226) High education 0.476*** 0.478*** 0.296*** 0.319*** 0.298*** 0.298*** (0.030) (0.029) (0.025) (0.026) (0.025) (0.025) High education mean -0.251 -0.221 -0.214 -0.239 -0.035 -0.042 (0.355) (0.338) (0.373) (0.346) (0.343) (0.348) Married -0.050*** -0.051*** -0.053*** -0.053*** -0.053*** -0.053*** (0.014) (0.014) (0.014) (0.014) (0.014) (0.014) Married mean 0.291 0.417 0.096 0.198 0.417 0.425 (0.355) (0.372) (0.353) (0.403) (0.375) (0.377) Separated -0.252*** -0.253*** -0.263*** -0.262*** -0.264*** -0.264*** (0.027) (0.027) (0.027) (0.027) (0.027) (0.027) Separated mean 1.904 2.821** 2.102 3.206* 2.845* 2.860* (1.544) (1.439) (1.547) (1.653) (1.485) (1.484) Divorced -0.254*** -0.255*** -0.261*** -0.261*** -0.262*** -0.262*** (0.029) (0.029) (0.028) (0.029) (0.029) (0.029) Divorced mean 2.268** 2.188** 1.176 1.142 2.251** 2.236** (1.051) (1.004) (1.183) (1.103) (1.137) (1.128) Widowed -0.213*** -0.214*** -0.238*** -0.236*** -0.239*** -0.239*** (0.027) (0.027) (0.027) (0.027) (0.027) (0.027) Widowed mean -2.745*** -2.774*** -2.923*** -2.918*** -2.775*** -2.785*** (1.019) (0.887) (1.071) (0.944) (0.906) (0.905) Domestic partner -0.120*** -0.120*** -0.115*** -0.116*** -0.116*** -0.116*** (0.021) (0.021) (0.021) (0.021) (0.021) (0.021) Domestic partner mean 1.200* 1.150* 1.010 0.967 1.154* 1.158* (0.644) (0.621) (0.723) (0.687) (0.635) (0.637) Employed full time for self -0.049*** -0.049*** -0.015 -0.019 -0.015 -0.015 88 (0.015) (0.015) (0.016) (0.016) (0.016) (0.016) Employed full time for self mean -0.762 -0.445 -0.868 -0.434 -0.488 -0.485 (0.566) (0.506) (0.570) (0.543) (0.512) (0.513) Employed part time do not want full time -0.004 -0.004 0.039 0.034 0.039 0.039 (0.028) (0.028) (0.029) (0.028) (0.029) (0.029) Employed part time do not want full time mean 0.255 0.635 -0.162 0.363 0.619 0.594 (0.766) (0.704) (0.742) (0.706) (0.708) (0.724) Unemployed -0.283*** -0.284*** -0.188*** -0.200*** -0.189*** -0.189*** (0.025) (0.025) (0.023) (0.024) (0.023) (0.023) Unemployed mean -2.912*** -2.427*** -3.166*** -2.614*** -2.550*** -2.569*** (0.770) (0.816) (0.803) (0.829) (0.815) (0.838) Employed part time want full time -0.124*** -0.124*** -0.062** -0.070*** -0.063** -0.063** (0.024) (0.024) (0.026) (0.026) (0.026) (0.026) Employed part time want full time mean -1.088 -1.592* -1.382 -1.272 -1.664* -1.690** (0.873) (0.825) (0.856) (0.852) (0.865) (0.860) Out of the workforce -0.019 -0.019 0.028 0.022 0.028 0.028 (0.017) (0.017) (0.018) (0.018) (0.018) (0.018) Out of the workforce mean -0.424 -0.333 -0.592 -0.221 -0.385 -0.390 (0.461) (0.448) (0.446) (0.478) (0.465) (0.460) Health problems that prevent normal activities -0.270*** -0.271*** -0.253*** -0.256*** -0.254*** -0.254*** (0.028) (0.028) (0.028) (0.028) (0.028) (0.028) Health problems mean -1.239*** -0.642 -1.031** -0.534 -0.693 -0.684 (0.426) (0.434) (0.450) (0.445) (0.447) (0.446) Log GDP 0.165*** -0.018 -0.017 (0.042) (0.089) (0.090) Log of equivalized household income 0.181*** 0.198*** -0.003 -0.001 (0.045) (0.014) (0.068) (0.067) Log of equivalized household income mean 0.043 0.228*** 0.226*** (0.047) (0.067) (0.067) Life expectancy 0.011 0.006 0.011 0.011 (0.007) (0.008) (0.008) (0.008) Log of health expenditures -0.013 -0.093 -0.022 -0.014 (0.089) (0.078) (0.082) (0.093) 89 Log of education expenditures 0.152** 0.134** 0.150** 0.154** (0.068) (0.061) (0.063) (0.070) Average years of schooling -0.023 -0.029 -0.024 -0.023 (0.023) (0.022) (0.022) (0.023) Corruption perception index 0.001 0.001 0.002 0.001 (0.002) (0.002) (0.002) (0.002) Environmental health index -0.002 -0.001 -0.002 -0.002 (0.003) (0.003) (0.003) (0.003) Observations 86819 86819 86819 86819 86819 86819 Pseudo R-squared 0.054 0.056 0.061 0.062 0.063 0.063 Ordered probit. Not shown: country-specific age and age 2 variables, year fixed effects. 90 Table 8: Alternative formulations of relative income (1) (2) (3) (4) (5) (6) (7) (8) (9) Log GDP 0.104 0.093 0.094 (0.097) (0.097) (0.090) Log household income 0.442*** 0.059 0.037 0.451*** 0.051 0.032 0.312*** 0.077*** 0.073*** (0.035) (0.077) (0.082) (0.038) (0.086) (0.091) (0.024) (0.025) (0.025) Relative income -0.211*** 0.174** 0.196** (0.037) (0.078) (0.082) Alternative relative income -0.219*** 0.181** 0.200** (0.040) (0.085) (0.090) Income rank -0.194** 0.518*** 0.530*** (0.083) (0.079) (0.078) Life expectancy 0.026*** 0.025*** 0.024*** 0.023*** 0.026*** 0.025*** (0.007) (0.007) (0.007) (0.007) (0.007) (0.007) Log of health expenditures 0.041 -0.013 0.055 0.007 0.032 -0.023 (0.097) (0.108) (0.102) (0.111) (0.092) (0.107) Log of education expenditures 0.067 0.047 0.065 0.046 0.067 0.046 (0.073) (0.074) (0.075) (0.076) (0.073) (0.075) Average years of schooling 0.001 0.000 -0.001 -0.002 -0.000 -0.002 (0.019) (0.018) (0.018) (0.018) (0.019) (0.018) Corruption perception index -0.005** -0.005** -0.005** -0.004* -0.005** -0.005** (0.002) (0.002) (0.002) (0.002) (0.002) (0.002) Environmental health index 0.008** 0.009** 0.008** 0.009** 0.008** 0.009** (0.003) (0.004) (0.003) (0.004) (0.003) (0.004) Observations 86819 86819 86819 86819 86819 86819 86819 86819 86819 Pseudo R-squared 0.073 0.074 0.074 0.073 0.074 0.074 0.072 0.075 0.075 Ordered probit. The year of the observed variable is the survey year (e.g. we use log GDP in 2012 for a respondent surveyed in 2012. 2014 education expenditures for Saudi Arabia and Yemen are not yet published; here we use a linear extrapolation. Coefficients on individual characteristics are comparable to those shown in Table 3. 91 Table 9: Placebo testing with military expenditures and energy use per capita (1) (2) (3) (4) (5) (6) Log GDP 0.340*** 0.364*** 0.234** (0.037) (0.095) (0.103) Log household income 0.391*** 0.250*** 0.459*** 0.337*** (0.039) (0.014) (0.071) (0.087) Log of relative income -0.148*** -0.216*** -0.094 (0.039) (0.072) (0.086) Log of military expenditures -0.026 0.066 -0.017 -0.075 (0.065) (0.043) (0.047) (0.058) Energy use 0.000 0.000 -0.000 -0.000 (0.000) (0.000) (0.000) (0.000) Observations 75354 75354 75354 75354 75354 75354 Pseudo R-squared 0.060 0.060 0.068 0.068 0.068 0.069 Ordered probit. Not shown: all individual characteristics in Table 3 were controlled for in these specifications as well, and had nearly identical coefficients as in Table 3.2014 education expenditures for Saudi Arabia and Yemen are not yet published; here we use a linear extrapolation. Military expenditures per capita come from the Stockholm International Peace Research Institute (SIPRI) (including estimates), energy use per capita comes from World Bank Indicators. 92 Table A1: Gallup survey questions Variable Question text How variable is coded Life Satisfaction Please imagine a ladder with steps numbered from zero at the bottom to ten at the top. Suppose we say that the top of the ladder represents the best possible life for you, and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time, assuming that the higher the step the better you feel about your life, and the lower the step the worse you feel about it? Which step comes closest to the way you feel? 0 to 10 Gender Gender (not asked) 0 if male, 1 if female Urban/rural Respondent lives in: A rural area or on a farm, In a small town or village, In a large city, In the suburb of a large city 0 if in the first two categories, 1 if in the last two Education Completed elementary education or less (up to 8 years of basic education), Secondary education and some education beyond secondary education (9-15 years of education), Completed four years of education beyond high school and/or received a 4-year college degree. Three separate variables, 1 if education is in that range, 0 otherwise Marital status What is your current marital status? Single/Never been married, Married, Separated, Divorced, Widowed, Domestic partner Six separate variables, 1 if currently in that marital status, 0 otherwise Employment status The following questions were used to categorize each person into one of the following: employed full time for employer, employed full time for self, employed part time do not want full time, unemployed, employed part time want full time, out of workforce Six separate variables, 1 if currently in that marital status, 0 otherwise Thinking about your WORK SITUATION over the past 30 days, have you worked for an employer for any pay? (This could be for one or more employers.) 93 In general, over the past 30 days, did you work for an employer (this could be one or more employers) for 30 hours per week or more OR for less than 30 hours per week? Again thinking about the last 30 days, were you self-employed? This means earning money or supporting yourself or a family by working for yourself (freelancing) or working for your own or your family’s business.Self-employment also includes fishing, doing farm work or raising livestock for either your own or your family’s farm or ranch. In a typical week (7 days), do you work 30 hours or more OR less than 30 hours as a self-employed individual? In the past 30 days, have you actively been looking for employment? (If necessary, read:) Actively looking means applying for jobs, searching for jobs, etc. If you were offered a job that required you to work 30 hours per week or more for which you would receive pay from an employer, would you take the job? Are you a full-time student? Are you retired or disabled? Are you doing full-time housework or caring for children or others, but are not paid for it? Would you have takena job for pay from an employer if one were offered that required you to work 30 hours or moreper week? Would you have takena job for pay from an employer if one were offered that required you to work less than 30 hours per week? Health problems Do you have any health problems that prevent you from doing any of the things people your age normally can do? 0 if no, 1 if yes Household composition How many children under 15 years of age are now living in your household? 0 to 99+ Household composition Including yourself, how many people who are residents of this country, age 15 or over, currently live in this household? 0 to 99+ 94 Household income “What is your total monthly* household income in [local currency], before taxes? Please include income from wages and salaries, remittances from family members living elsewhere, farming, and all other sources. Again, please provide your total monthly household income.” (*In Organization for Economic Co-operation and Development countries where appropriate, the question asked about annual rather than monthly income.) In local currency 95 Table B1: All non-representative samples removed (as indicated in the Data section) (1) (2) (3) (4) (5) (6) Log GDP 0.385*** 0.053 0.093 (0.035) (0.104) (0.113) Log household income 0.439*** 0.222*** 0.033 0.022 (0.041) (0.012) (0.099) (0.099) Log of relative income -0.213*** 0.194** 0.205** (0.042) (0.099) (0.099) Life expectancy 0.027*** 0.022** 0.026*** 0.025** (0.009) (0.009) (0.010) (0.010) Log of health expenditures 0.053 0.015 0.095 0.046 (0.111) (0.110) (0.108) (0.118) Log of education expenditures 0.068 0.063 0.079 0.060 (0.077) (0.078) (0.077) (0.082) Average years of schooling -0.009 -0.022 -0.005 -0.009 (0.022) (0.023) (0.023) (0.023) Corruption perception index -0.006** -0.007*** -0.006** -0.005** (0.002) (0.002) (0.003) (0.003) Environmental health index 0.007 0.008 0.007 0.008 (0.005) (0.006) (0.005) (0.006) Observations 65101 65101 65101 65101 65101 65101 Pseudo R-squared 0.067 0.068 0.074 0.075 0.075 0.075 Ordered probit. Not shown: all individual characteristics in Table 3 were controlled for in these specifications as well, and had nearly identical coefficients as in Table 3. Dropped countries with any nonrepresentative samples: Azerbaijan, Bahrain, Canada, Central African Republic, Chad, China, Democratic Republic of the Congo, India, Japan, Kuwait, Laos, Madagascar, Moldova, Morocco, Pakistan, Philippines, Saudi Arabia, Sudan. This table includes 77 countries. 96 Table B2: With countries missing either placebo variable (energy use, military expenditures) re- moved (1) (2) (3) (4) (5) (6) Log GDP 0.340*** 0.017 0.020 (0.037) (0.115) Log household income 0.391*** 0.241*** 0.133 0.130 (0.039) (0.014) (0.086) (0.086) Log of relative income -0.148*** 0.111 0.114 (0.039) (0.085) (0.085) Life expectancy 0.021*** 0.012 0.015* 0.015* (0.008) (0.008) (0.009) (0.009) Log of health expenditures 0.143 0.083 0.119 0.105 (0.137) (0.103) (0.111) (0.147) Log of education expenditures -0.020 -0.057 -0.037 -0.040 (0.080) (0.087) (0.086) (0.086) Average years of schooling -0.011 -0.029 -0.020 -0.020 (0.020) (0.021) (0.021) (0.021) Corruption perception index -0.004* -0.005* -0.004* -0.004 (0.002) (0.002) (0.002) (0.002) Environmental health index 0.008** 0.012*** 0.011*** 0.011*** (0.004) (0.004) (0.004) (0.004) Observations 75354 75354 75354 75354 75354 75354 Pseudo R-squared 0.060 0.061 0.068 0.069 0.069 0.069 Ordered probit. Not shown: all individual characteristics in Table 3 were controlled for in these specifications as well, and had nearly identical coefficients as in Table 3. Dropped countries missing either or both placebo variable: energy use (Afghanistan, Laos, Liberia, Madagascar, Malawi, Mau- ritania, Rwanda, Uganda), military expenditures (Costa Rica, Haiti, Panama, Sudan, Uzbekistan), both (Central African Republic, Chad). This table includes 80 countries. 97 Table D1: Main results using ordinary least squares (1) (2) (3) (4) (5) (6) Female 0.173*** 0.171*** 0.183*** 0.184*** 0.185*** 0.185*** (0.027) (0.026) (0.026) (0.026) (0.026) (0.026) Urban 0.236*** 0.231*** 0.103** 0.095** 0.095** 0.093** (0.044) (0.042) (0.041) (0.040) (0.040) (0.040) Secondary up to 3 yrs tertiary education 0.499*** 0.502*** 0.310*** 0.309*** 0.305*** 0.304*** (0.034) (0.034) (0.032) (0.031) (0.031) (0.031) 4 yrs tertiary education or more 0.968*** 0.966*** 0.601*** 0.600*** 0.594*** 0.593*** (0.053) (0.054) (0.046) (0.047) (0.047) (0.046) Married 0.077*** 0.070** 0.096*** 0.092*** 0.092*** 0.092*** (0.029) (0.029) (0.027) (0.027) (0.027) (0.027) Separated -0.323*** -0.329*** -0.311*** -0.315*** -0.314*** -0.314*** (0.055) (0.055) (0.054) (0.054) (0.054) (0.054) Divorced -0.318*** -0.321*** -0.290*** -0.294*** -0.296*** -0.297*** (0.054) (0.054) (0.050) (0.051) (0.051) (0.051) Widowed -0.275*** -0.291*** -0.267*** -0.278*** -0.277*** -0.277*** (0.041) (0.042) (0.040) (0.041) (0.041) (0.041) Domestic partner -0.086** -0.109*** -0.057 -0.085** -0.089** -0.089** (0.038) (0.038) (0.041) (0.040) (0.039) (0.039) Employed full time for self -0.101*** -0.095*** -0.027 -0.022 -0.020 -0.020 (0.031) (0.030) (0.033) (0.032) (0.032) (0.032) Employed part time do not want full time -0.041 -0.036 0.046 0.051 0.055 0.055 (0.053) (0.054) (0.055) (0.055) (0.055) (0.055) Unemployed -0.625*** -0.613*** -0.415*** -0.415*** -0.414*** -0.415*** (0.051) (0.050) (0.045) (0.045) (0.045) (0.045) Employed part time want full time -0.267*** -0.263*** -0.141*** -0.137*** -0.136*** -0.134*** (0.047) (0.048) (0.048) (0.048) (0.048) (0.048) Out of workforce -0.117*** -0.116*** -0.005 -0.009 -0.008 -0.009 (0.035) (0.034) (0.034) (0.033) (0.033) (0.033) Health problems that prevent normal activities -0.501*** -0.498*** -0.449*** -0.446*** -0.445*** -0.445*** (0.056) (0.055) (0.053) (0.053) (0.053) (0.053) Survey year=2012 -0.485*** -0.400*** -0.514** -0.436*** -0.394** -0.410*** (0.176) (0.143) (0.199) (0.146) (0.157) (0.151) Survey year=2013 -0.767*** -0.755*** -0.686** -0.752*** -0.737*** -0.757*** (0.206) (0.170) (0.263) (0.181) (0.177) (0.178) 98 Survey year=2014 -0.796*** -0.634*** -0.976*** -0.749*** -0.664*** -0.665*** (0.170) (0.129) (0.197) (0.142) (0.147) (0.145) Log GDP 0.728*** 0.154 0.185 (0.056) (0.165) (0.182) Log household income 0.836*** 0.431*** 0.120 0.081 (0.062) (0.023) (0.145) (0.154) Log of relative income -0.398*** 0.320** 0.359** (0.068) (0.147) (0.154) Life expectancy 0.053*** 0.043*** 0.049*** 0.048*** (0.012) (0.012) (0.013) (0.013) Log of health expenditures -0.003 -0.044 0.072 -0.025 (0.190) (0.174) (0.180) (0.199) Log of education expenditures 0.098 0.083 0.127 0.090 (0.132) (0.138) (0.135) (0.138) Average years of schooling 0.003 -0.016 0.002 -0.000 (0.035) (0.037) (0.035) (0.035) Corruption perception index -0.010** -0.011*** -0.010** -0.009** (0.004) (0.004) (0.004) (0.004) Environmental health index 0.014** 0.016** 0.016** 0.017** (0.007) (0.007) (0.007) (0.007) Constant -0.652 -0.213 -0.481 -0.954 -0.074 -0.643 (0.506) (0.856) (0.519) (0.624) (0.737) (0.886) Observations 86819 86819 86819 86819 86819 86819 R-squared 0.252 0.255 0.275 0.278 0.279 0.279 Ordinary least squares. Not shown: country-specific age and age 2 variables. 99 Table E1: Selected summary variables, by region Region Life satisfac- tion Female % Low edu % Medium edu % High edu % GDP per capita Equiv- alized income Life ex- pectancy Aus/NZ/US/Can 7.35 53.8% 6.5% 55.7% 37.8% 42,012.42 26,870.74 80.8 Western Europe 6.25 54.8% 17.5% 61.7% 20.8% 34,703.77 18,915.02 81.3 Latin America & Caribbean 6.07 59.5% 36.7% 53.9% 9.4% 12,006.09 4,308.98 73.4 Central America 6.00 54.4% 45.5% 44.6% 10.0% 9,260.72 2,948.63 74.6 East Asia 5.70 55.5% 25.7% 49.4% 24.9% 22,157.57 13,036.69 77.0 Middle East & North Africa 5.39 48.5% 26.8% 51.2% 22.0% 27,109.58 9,823.03 74.2 Southeast Asia 5.38 56.9% 44.3% 43.9% 11.8% 18,063.02 4,056.05 71.3 Transition economies 5.33 58.3% 18.7% 63.0% 18.3% 15,219.05 6,205.27 73.8 South Asia 4.44 53.1% 60.0% 34.8% 5.2% 4,372.79 1,441.52 67.8 Sub-Saharan Africa 4.25 50.5% 47.1% 47.5% 5.4% 3,578.10 1,834.03 59.5 Equivalized income is household income adjusted for OECD weights, by total household income by 1 + 0.7∗(# additional adults)+0.5∗(# children). 100 Table E2: Summary statistics: N, female, urban, education, health Country N Life satis- faction Female % Urban % Low edu % Medium edu % High edu % Health problems % Afghanistan 1000 3.57 50.0 18.0 64.0 29.3 6.7 23.9 Albania 999 4.88 61.2 48.0 51.3 35.7 13.0 38.8 Argentina 1000 6.80 62.1 86.0 28.9 65.7 5.4 23.6 Armenia 1000 4.21 60.3 42.4 14.0 64.5 21.5 43.4 Australia 1002 7.43 58.1 75.4 4.9 59.3 35.8 20.4 Austria 1000 7.23 51.3 38.6 9.2 74.9 15.9 17.2 Azerbaijan 1000 5.55 51.9 36.4 11.1 74.8 14.1 20.1 Bahrain 1002 6.61 37.9 61.7 5.4 43.8 50.8 19.8 Bangladesh 1000 4.74 55.7 12.8 53.3 44.3 2.4 28.2 Belarus 1052 5.76 62.5 50.9 10.6 62.4 27.0 37.4 Benin 1000 3.83 49.0 9.6 74.0 25.5 0.5 29.2 Bolivia 1000 5.72 58.0 52.1 37.3 53.3 9.4 34.2 Bosnia Herzegovina 1001 5.19 59.3 24.0 32.7 55.7 11.6 36.5 Botswana 1000 4.27 55.2 11.2 26.4 59.5 14.1 29.6 Brazil 1042 7.01 65.1 52.5 51.4 43.8 4.8 26.5 Bulgaria 1000 4.03 60.7 50.7 22.8 55.7 21.5 37.2 Cambodia 1000 3.97 67.3 10.4 84.2 12.6 3.2 42.0 Cameroon 1000 4.35 51.1 31.2 51.1 47.1 1.8 26.4 Canada 1002 7.49 49.0 59.8 8.8 52.4 38.9 17.3 Central African Republic 1000 3.73 47.8 19.3 76.4 22.9 0.7 23.6 Chad 1000 3.52 41.4 12.8 71.4 28.3 0.3 26.4 Chile 1009 6.55 55.3 70.6 31.0 53.8 15.2 26.9 China 4256 5.27 54.9 34.8 61.5 30.0 8.5 18.1 Colombia 1000 6.44 65.3 59.6 29.9 57.2 12.9 21.8 Congo Kinshasa 1000 4.71 44.5 24.0 18.4 66.9 14.7 19.1 Costa Rica 1000 7.07 54.7 44.8 36.7 49.1 14.2 20.7 Croatia 1000 5.57 50.4 32.0 13.2 69.3 17.5 20.2 Czech Republic 1001 6.53 56.7 41.9 12.4 76.0 11.5 25.6 Dominican Republic 1000 5.20 62.3 59.1 67.4 15.9 16.7 24.6 Ecuador 1003 5.62 62.1 51.4 33.6 58.9 7.5 30.0 Egypt 1077 4.30 44.0 40.5 38.9 47.7 13.4 - El Salvador 1000 5.83 55.9 44.8 56.7 36.4 6.9 19.4 Ethiopia 1000 4.59 60.8 7.2 59.8 40.2 0.0 23.3 France 1003 6.60 58.0 35.9 5.7 69.6 24.8 15.7 Georgia 1000 4.21 55.7 35.0 11.8 52.7 35.5 45.2 Germany 3033 6.70 53.7 35.7 1.8 67.3 30.9 24.8 101 Ghana 1000 4.00 48.1 26.7 41.0 53.8 5.1 21.2 Greece 1003 4.70 52.5 52.2 30.9 55.5 13.6 23.3 Guatemala 1000 5.84 52.4 23.2 54.7 38.0 7.3 21.4 Haiti 504 3.92 53.4 22.0 33.5 64.1 2.4 34.6 Honduras 1000 4.67 55.1 40.8 59.0 38.0 3.0 22.3 Hungary 1019 4.86 58.5 36.2 30.1 53.0 16.9 34.4 India 5000 4.65 45.6 24.0 58.2 31.7 10.1 17.4 Indonesia 1000 5.29 54.6 25.2 36.2 59.7 4.1 23.6 Iran 1000 5.19 48.7 76.2 10.9 57.7 31.3 18.0 Iraq 1003 4.98 48.1 71.0 36.0 43.8 20.2 30.2 Israel 1000 7.44 49.6 77.2 3.2 68.1 28.6 19.4 Italy 1004 6.16 58.6 31.6 36.0 48.5 15.5 13.5 Japan 1000 6.25 54.5 51.3 10.5 61.8 27.7 16.1 Jordan 1000 5.34 49.6 84.8 16.5 66.7 16.8 18.1 Kazakhstan 1000 5.84 62.0 38.6 12.0 62.2 25.8 29.8 Kenya 1000 3.81 47.2 10.4 38.3 57.9 3.8 19.3 Kuwait 1008 6.58 42.9 99.8 2.9 46.3 50.8 16.3 Kyrgyzstan 1000 5.16 59.4 20.4 16.2 64.1 19.6 32.1 Laos 1000 4.91 57.1 28.3 57.7 34.8 7.5 34.5 Lebanon 1012 4.50 52.7 66.9 22.4 55.0 22.5 - Liberia 1000 4.61 52.8 32.0 57.5 39.8 2.7 31.6 Macedonia 1025 4.73 47.1 59.2 21.9 61.8 16.2 23.3 Madagascar 1008 3.81 56.0 23.0 74.5 25.2 0.3 31.5 Malawi 1000 3.94 59.2 10.4 56.0 43.1 0.9 34.1 Malaysia 1000 5.76 41.5 48.7 12.0 56.5 31.6 14.1 Mauritania 1000 4.75 41.3 32.8 41.9 49.7 8.3 19.8 Mexico 1000 7.23 48.4 54.5 23.8 68.5 7.8 14.6 Moldova 1000 5.85 57.8 21.0 20.6 62.0 17.4 27.6 Mongolia 1000 4.77 57.0 45.6 27.4 51.8 20.8 36.0 Morocco 1007 5.13 54.6 33.3 72.5 22.8 4.7 29.4 Myanmar 1020 4.12 62.7 29.4 71.1 23.7 5.2 31.1 Namibia 1000 4.48 61.5 22.4 44.0 50.8 5.2 29.9 Nepal 1000 4.00 59.7 7.0 80.9 16.2 2.8 33.9 New Zealand 1008 7.24 58.2 63.9 10.9 59.0 30.1 17.4 Nicaragua 1000 5.74 52.5 44.0 40.7 46.1 13.2 21.9 Nigeria 1000 5.17 42.5 28.1 22.1 74.7 3.3 15.4 Pakistan 1008 5.15 50.0 42.7 68.3 25.2 6.5 32.6 Palestine 1000 4.85 57.5 73.0 23.9 61.5 14.6 27.0 Panama 1001 6.85 55.6 66.3 25.1 59.7 15.2 15.6 Paraguay 1000 5.55 57.1 50.1 50.9 36.3 12.8 26.9 Peru 1000 5.82 56.8 69.4 21.1 69.2 9.6 24.6 Philippines 1000 5.03 55.2 51.3 29.3 59.1 11.6 34.5 Poland 1000 5.73 61.6 33.6 16.0 64.1 19.8 37.0 Portugal 1007 5.32 54.9 29.5 30.1 46.1 23.8 19.7 Russia 1500 5.74 66.5 55.8 11.0 57.2 31.8 33.4 Rwanda 1000 3.57 51.4 8.8 62.6 37.2 0.2 17.9 102 Saudi Arabia 1017 6.27 33.3 82.4 9.5 64.0 26.5 12.9 Senegal 1000 4.49 49.1 39.2 53.8 43.3 2.9 18.5 Singapore 1000 7.06 50.4 100.0 19.7 59.6 20.8 15.1 Slovakia 1000 5.99 56.5 27.7 11.8 74.5 13.7 32.3 Slovenia 1017 5.76 56.2 33.8 8.4 88.4 3.3 27.3 South Africa 1000 4.60 53.3 35.2 19.2 73.8 6.9 20.2 South Korea 1000 5.87 55.5 87.3 8.8 56.6 34.6 22.0 Spain 1001 6.41 55.6 41.2 17.4 78.8 3.8 14.0 Sri Lanka 1030 4.52 57.8 14.8 35.3 62.3 2.4 32.4 Sudan 1000 4.21 41.7 50.0 33.4 37.6 29.1 24.8 Syria 1025 3.32 49.3 30.2 58.4 35.4 6.2 10.8 Taiwan 1000 6.36 55.4 63.6 20.3 46.8 32.8 11.8 Tajikistan 1000 4.96 59.2 14.6 25.4 57.1 17.5 16.8 Tanzania 1008 3.89 50.6 11.9 69.0 29.7 1.3 25.2 Thailand 1000 6.39 66.8 27.8 53.8 36.3 9.9 22.1 Turkey 1000 5.30 51.9 73.1 27.2 63.5 9.3 19.8 Uganda 1000 3.86 48.9 4.8 46.1 53.4 0.5 37.8 United Arab Emirates 1012 7.05 57.0 99.0 8.8 61.4 29.8 10.9 United Kingdom 3075 6.88 54.1 33.2 9.1 53.1 37.8 20.7 United States 1019 7.26 50.0 52.0 1.5 52.1 46.4 19.4 Uruguay 1000 6.53 62.0 81.7 44.4 45.9 9.8 25.6 Uzbekistan 1000 6.01 62.0 18.6 20.6 69.7 9.7 30.9 Venezuela 1000 6.50 65.4 63.1 24.5 68.2 7.3 23.4 Vietnam 1000 5.84 56.7 21.9 35.3 52.8 11.9 22.8 Yemen 1000 3.94 50.0 22.0 65.0 30.3 4.7 23.6 Zambia 1000 4.69 54.2 11.2 31.0 58.5 10.5 20.8 Zimbabwe 1000 4.81 53.2 27.8 15.2 74.2 10.5 21.4 Table E3: Summary statistics: marital status Country Life Satis- faction Single/ never married Married Separated Divorced Widowed Domestic partners Afghanistan 3.57 34.8 63.3 0.1 0.0 1.8 0.0 Albania 4.88 25.1 62.7 0.6 0.9 10.2 0.5 Argentina 6.80 28.7 31.0 7.5 2.5 12.7 17.6 Armenia 4.21 22.4 59.5 2.0 2.5 12.1 1.4 Australia 7.43 21.4 51.9 2.2 7.3 10.8 6.4 Austria 7.23 23.0 54.5 3.0 6.4 6.5 6.5 103 Azerbaijan 5.55 28.8 61.0 1.6 3.1 5.5 0.0 Bahrain 6.61 21.7 73.3 0.0 2.3 2.7 0.0 Bangladesh 4.74 13.1 81.3 0.3 0.4 4.9 0.0 Belarus 5.76 17.4 50.2 3.4 11.4 15.4 2.1 Benin 3.83 31.4 54.1 2.2 0.7 3.7 7.9 Bolivia 5.72 30.3 42.7 3.0 2.7 8.1 13.2 Bosnia Herzegovina 5.19 21.2 59.5 1.7 2.3 15.0 0.3 Botswana 4.27 62.8 16.1 1.3 1.8 5.5 12.5 Brazil 7.01 26.6 42.6 6.7 3.0 9.7 11.3 Bulgaria 4.03 15.6 49.9 1.9 6.0 21.4 5.1 Cambodia 3.97 19.1 69.0 3.0 8.7 0.1 0.1 Cameroon 4.35 39.5 44.3 1.8 1.9 7.1 5.4 Canada 7.49 25.8 48.7 4.5 7.2 6.0 7.9 Central African Rep 3.73 36.2 20.4 2.0 1.2 5.8 34.3 Chad 3.52 29.6 58.5 3.7 2.2 5.4 0.6 Chile 6.55 33.4 43.7 6.0 2.0 7.5 7.5 China 5.27 15.5 78.1 0.4 1.0 4.7 0.4 Colombia 6.44 31.4 27.3 7.7 1.9 8.0 23.7 Congo Kinshasa 4.71 41.7 44.4 3.5 1.9 4.1 4.4 Costa Rica 7.07 36.6 37.2 2.5 5.6 4.2 13.9 Croatia 5.57 32.6 50.9 4.2 1.8 7.5 3.0 Czech Republic 6.53 21.4 52.5 0.8 11.5 9.9 3.9 Dominican Republic 5.20 26.2 16.5 15.3 2.8 6.5 32.7 Ecuador 5.62 23.9 43.9 7.3 2.5 8.7 13.8 Egypt 4.30 23.5 67.6 0.1 2.3 6.5 0.0 El Salvador 5.83 40.6 37.4 2.1 1.0 4.2 14.7 Ethiopia 4.59 30.3 53.2 2.8 7.5 6.0 0.2 France 6.60 20.3 47.2 2.8 7.2 9.1 13.4 Georgia 4.21 22.4 52.6 3.7 4.1 16.7 0.4 Germany 6.70 22.8 49.5 1.5 9.1 11.9 5.2 Ghana 4.00 45.7 43.8 2.9 2.0 3.8 1.8 Greece 4.70 25.6 52.6 2.1 5.1 14.2 0.4 Guatemala 5.84 34.7 40.7 1.7 0.6 4.4 17.9 Haiti 3.92 55.6 27.5 8.9 1.7 6.3 0.0 Honduras 4.67 35.6 25.8 4.1 0.9 5.4 28.2 Hungary 4.86 18.8 37.5 2.3 10.0 23.4 8.1 India 4.65 23.5 70.5 0.4 0.4 5.0 0.1 Indonesia 5.29 19.2 71.6 0.9 1.9 6.4 0.0 Iran 5.19 29.2 68.9 0.4 0.5 0.8 0.2 Iraq 4.98 24.2 70.9 0.9 1.9 2.1 0.0 Israel 7.44 25.9 61.2 0.8 6.6 5.5 0.0 Italy 6.16 21.3 64.5 3.1 1.8 7.3 2.0 104 Japan 6.25 18.3 67.1 0.3 5.3 8.9 0.0 Jordan 5.34 33.8 60.7 0.0 1.6 3.9 0.0 Kazakhstan 5.84 21.0 55.6 1.4 10.2 9.9 1.8 Kenya 3.81 37.8 50.9 4.0 1.4 5.6 0.3 Kuwait 6.58 21.7 74.1 0.0 2.5 1.7 0.0 Kyrgyzstan 5.16 23.9 56.0 4.8 5.9 8.7 0.7 Laos 4.91 15.9 75.1 0.1 1.5 7.2 0.2 Lebanon 4.50 35.9 57.6 0.0 1.7 4.8 0.0 Liberia 4.61 51.0 35.0 1.3 1.9 2.2 8.6 Macedonia 4.73 16.4 70.0 0.5 1.8 9.9 1.5 Madagascar 3.81 22.5 54.3 0.5 7.6 6.4 8.8 Malawi 3.94 25.3 61.9 1.8 3.5 6.6 0.9 Malaysia 5.76 44.1 53.0 0.5 1.2 1.2 0.0 Mauritania 4.75 44.2 44.3 2.0 4.8 4.6 0.1 Mexico 7.23 26.3 51.0 3.3 2.5 6.5 10.4 Moldova 5.85 20.4 55.7 3.4 5.6 12.7 2.2 Mongolia 4.77 21.8 58.3 2.3 2.5 12.0 3.1 Morocco 5.13 29.0 61.9 0.0 2.5 6.7 0.0 Myanmar 4.12 20.7 66.5 1.7 1.0 10.2 0.0 Namibia 4.48 60.5 23.4 2.2 1.6 7.1 5.1 Nepal 4.00 16.1 75.3 0.2 0.2 8.1 0.1 New Zealand 7.24 27.3 46.8 2.5 6.2 5.5 11.8 Nicaragua 5.74 35.6 33.1 3.3 1.2 3.2 23.6 Nigeria 5.17 49.3 44.1 1.9 0.9 3.5 0.2 Pakistan 5.15 28.5 68.2 0.3 0.2 2.8 0.0 Palestine 4.85 33.2 58.4 0.8 1.4 6.2 0.0 Panama 6.85 35.7 28.3 9.2 1.9 2.9 22.0 Paraguay 5.55 33.5 46.7 4.3 0.7 7.5 7.3 Peru 5.82 33.7 32.0 6.7 1.0 4.7 21.9 Philippines 5.03 24.4 55.9 3.3 0.0 8.8 7.6 Poland 5.73 23.4 54.4 1.2 4.2 15.4 1.3 Portugal 5.32 25.1 55.5 1.7 8.7 5.2 3.8 Russia 5.74 20.9 45.3 2.8 12.6 15.6 2.9 Rwanda 3.57 37.2 46.3 2.9 2.3 11.0 0.3 Saudi Arabia 6.27 34.5 62.4 0.4 1.7 1.1 0.0 Senegal 4.49 38.6 53.1 0.6 2.4 5.1 0.2 Singapore 7.06 31.8 60.5 0.5 3.1 4.0 0.1 Slovakia 5.99 22.2 50.6 0.3 10.3 13.7 2.9 Slovenia 5.76 18.7 48.0 0.5 4.1 10.8 17.8 South Africa 4.60 55.6 27.2 2.7 3.7 7.5 3.3 South Korea 5.87 37.0 54.6 0.8 1.1 6.2 0.3 Spain 6.41 26.4 57.8 3.1 4.0 4.8 3.8 Sri Lanka 4.52 15.0 76.9 1.3 0.1 6.4 0.3 105 Sudan 4.21 30.4 63.5 1.3 1.4 3.3 0.1 Syria 3.32 27.8 54.9 0.0 10.1 7.2 0.0 Taiwan 6.36 36.5 60.5 0.0 1.0 2.0 0.0 Tajikistan 4.96 26.6 63.6 1.9 2.3 5.6 0.0 Tanzania 3.89 36.6 50.3 4.1 1.7 6.0 1.3 Thailand 6.39 19.1 68.8 2.2 1.9 5.9 2.1 Turkey 5.30 32.4 59.9 0.5 2.9 3.9 0.3 Uganda 3.86 30.5 53.6 6.5 1.9 6.2 1.3 United Arab Emirates 7.05 33.8 62.0 0.0 2.7 1.5 0.0 United Kingdom 6.88 19.4 48.3 3.4 10.6 10.6 7.8 United States 7.26 22.4 56.0 2.1 8.2 9.9 1.4 Uruguay 6.53 23.2 37.0 5.0 9.2 13.5 12.0 Uzbekistan 6.01 22.5 66.8 1.9 2.6 6.2 0.0 Venezuela 6.50 42.2 31.4 2.5 3.8 5.9 14.2 Vietnam 5.84 17.7 73.2 0.2 1.7 7.1 0.0 Yemen 3.94 22.0 73.1 0.2 1.5 3.2 0.0 Zambia 4.69 40.6 40.7 5.8 4.2 8.1 0.6 Zimbabwe 4.81 34.2 44.9 5.2 6.1 9.1 0.4 Table E4: Summary statistics: employment status Country Life Satis- faction Employed FT for employer Employed FT for self Employed PT, does not want FT Unem- ployed Employed PT, wants FT Out of the workforce Afghanistan 3.57 16.7 18.9 4.8 13.7 3.2 42.7 Albania 4.88 13.8 14.6 5.8 11.2 9.6 44.9 Argentina 6.80 29.3 12.5 7.5 4.4 8.2 38.1 Armenia 4.21 15.0 7.5 7.5 12.4 5.9 51.7 Australia 7.43 28.1 9.2 15.2 3.7 5.8 38.0 Austria 7.23 44.0 13.7 6.4 4.2 8.3 23.4 Azerbaijan 5.55 30.4 11.3 11.7 7.1 6.4 33.1 Bahrain 6.61 52.2 6.4 5.3 6.0 4.2 25.9 Bangladesh 4.74 11.7 24.7 2.1 4.1 2.0 55.4 Belarus 5.76 46.6 5.8 14.9 1.3 3.8 27.6 Benin 3.83 9.6 38.0 12.1 6.6 15.9 17.8 Bolivia 5.72 17.7 20.7 11.6 4.7 7.9 37.4 Bosnia Herzegovina 5.19 19.7 7.7 5.0 4.3 4.4 58.9 Botswana 4.27 15.9 8.3 16.8 13.7 14.6 30.7 Brazil 7.01 27.5 11.0 15.5 5.0 1.9 39.0 Bulgaria 4.03 34.2 5.5 3.4 6.9 2.6 47.4 106 Cambodia 3.97 12.1 28.7 6.5 5.5 10.4 36.8 Cameroon 4.35 10.1 20.8 17.5 6.7 11.2 33.7 Canada 7.49 44.3 4.9 8.2 5.6 5.8 31.2 Central African Rep 3.73 5.3 33.1 16.6 6.4 10.3 28.3 Chad 3.52 9.1 30.0 15.7 5.4 16.8 23.0 Chile 6.55 31.1 7.5 5.5 5.9 3.6 46.4 China 5.27 25.8 28.3 6.5 3.0 7.0 29.4 Colombia 6.44 21.0 17.1 6.6 9.9 8.0 37.4 Congo Kinshasa 4.71 14.7 17.3 11.8 9.7 13.0 33.4 Costa Rica 7.07 29.7 9.1 4.8 7.2 6.4 42.8 Croatia 5.57 43.2 3.6 5.0 8.2 5.3 34.7 Czech Republic 6.53 41.4 4.5 5.4 4.5 2.4 41.9 Dominican Republic 5.20 21.5 9.2 5.0 12.4 12.5 39.4 Ecuador 5.62 - - - - - - Egypt 4.30 25.2 12.2 3.2 3.9 4.8 50.8 El Salvador 5.83 22.0 6.6 4.5 10.2 4.3 52.4 Ethiopia 4.59 18.4 29.8 3.2 5.0 15.6 28.0 France 6.60 40.8 3.7 3.7 5.4 2.6 43.9 Georgia 4.21 - - - - - - Germany 6.70 31.0 5.7 11.3 2.7 2.1 47.1 Ghana 4.00 10.3 25.2 11.2 13.5 13.9 25.9 Greece 4.70 23.5 8.8 2.9 13.1 5.0 46.8 Guatemala 5.84 21.2 15.0 3.8 12.0 6.3 41.7 Haiti 3.92 4.8 3.6 5.0 22.4 24.4 39.9 Honduras 4.67 19.7 7.8 2.3 10.6 7.5 52.1 Hungary 4.86 31.0 3.7 3.1 5.0 2.1 55.1 India 4.65 23.7 19.6 3.9 4.7 3.1 45.0 Indonesia 5.29 19.7 27.8 7.4 4.3 7.9 32.9 Iran 5.19 11.1 22.2 4.7 12.1 4.6 45.3 Iraq 4.98 18.7 12.0 6.4 13.2 10.7 39.1 Israel 7.44 54.6 5.7 7.1 4.9 3.9 23.8 Italy 6.16 27.9 8.1 5.1 6.2 5.1 47.7 Japan 6.25 36.2 6.1 11.5 2.0 4.9 39.3 Jordan 5.34 26.6 10.6 1.0 5.8 1.1 54.9 Kazakhstan 5.84 34.6 7.9 10.3 3.8 9.7 33.7 Kenya 3.81 26.7 20.1 11.5 16.1 8.5 17.1 Kuwait 6.58 53.6 3.7 6.3 4.4 5.7 26.5 Kyrgyzstan 5.16 18.1 20.6 9.6 2.8 10.0 38.9 Laos 4.91 15.0 35.1 6.3 2.3 8.3 33.0 Lebanon 4.50 26.3 22.0 3.6 4.7 4.7 38.6 Liberia 4.61 4.0 11.4 17.6 9.6 13.1 44.3 Macedonia 4.73 28.9 3.2 5.5 11.4 7.5 43.5 Madagascar 3.81 10.7 41.7 15.6 3.2 14.9 14.0 107 Malawi 3.94 13.4 18.2 10.6 11.3 15.4 31.1 Malaysia 5.76 37.4 17.5 6.7 4.8 6.2 27.4 Mauritania 4.75 14.6 6.7 15.5 13.4 7.5 42.3 Mexico 7.23 34.0 8.7 10.1 5.3 5.5 36.4 Moldova 5.85 35.6 6.6 8.3 5.5 6.2 37.8 Mongolia 4.77 27.8 21.8 2.4 7.2 1.1 39.7 Morocco 5.13 11.8 9.8 4.7 11.3 6.5 55.9 Myanmar 4.12 14.4 36.3 5.7 2.1 9.6 32.0 Namibia 4.48 12.9 6.5 9.2 11.1 17.7 42.6 Nepal 4.00 9.0 37.2 9.2 3.2 4.4 37.0 New Zealand 7.24 40.8 6.1 12.7 6.2 8.2 26.1 Nicaragua 5.74 22.0 17.0 5.5 7.9 7.5 40.1 Nigeria 5.17 11.5 24.1 21.4 7.4 11.2 24.4 Pakistan 5.15 24.6 14.9 3.0 5.3 1.6 50.7 Palestine 4.85 17.0 6.4 2.7 10.6 4.7 58.6 Panama 6.85 32.4 3.6 12.3 4.2 6.2 41.4 Paraguay 5.55 23.8 24.8 6.8 7.0 2.7 34.9 Peru 5.82 24.6 19.9 8.1 6.2 8.8 32.4 Philippines 5.03 26.5 18.6 5.0 5.3 9.8 34.8 Poland 5.73 38.3 5.7 3.9 6.1 2.0 44.0 Portugal 5.32 44.7 10.7 3.8 8.8 5.3 26.7 Russia 5.74 45.9 4.2 11.7 2.7 2.4 33.1 Rwanda 3.57 13.4 19.2 11.8 10.8 14.9 29.9 Saudi Arabia 6.27 38.7 6.2 5.6 11.2 8.3 30.0 Senegal 4.49 14.2 9.8 6.6 11.0 14.8 43.6 Singapore 7.06 - - - - - - Slovakia 5.99 42.1 4.8 3.4 5.7 1.7 42.3 Slovenia 5.76 40.0 6.1 7.3 3.9 6.4 36.3 South Africa 4.60 27.2 4.4 6.4 26.2 7.9 27.9 South Korea 5.87 28.7 7.4 7.3 4.5 3.0 49.1 Spain 6.41 33.9 5.9 3.1 15.6 6.0 35.6 Sri Lanka 4.52 19.5 18.3 6.2 3.7 5.9 46.4 Sudan 4.21 18.0 12.9 5.0 8.2 8.5 47.4 Syria 3.32 15.8 7.0 7.0 2.0 3.9 64.3 Taiwan 6.36 32.3 8.2 5.8 1.8 3.5 48.4 Tajikistan 4.96 13.9 7.1 11.3 8.4 12.6 46.7 Tanzania 3.89 12.9 29.7 20.3 6.4 10.4 20.2 Thailand 6.39 21.9 38.0 8.3 1.8 2.0 28.0 Turkey 5.30 26.5 5.2 6.0 4.0 1.7 56.6 Uganda 3.86 16.0 33.1 18.5 5.0 11.7 15.7 United Arab Emirates 7.05 45.3 2.7 4.3 6.9 2.5 38.3 108 United Kingdom 6.88 35.9 5.4 10.8 3.9 4.0 40.1 United States 7.26 40.8 5.0 7.3 3.8 6.7 36.4 Uruguay 6.53 24.2 7.3 4.8 6.9 4.6 52.2 Uzbekistan 6.01 22.1 17.8 14.3 2.3 9.4 34.1 Venezuela 6.50 24.1 9.5 6.0 10.6 3.5 46.3 Vietnam 5.84 13.3 40.9 8.8 3.3 6.2 27.5 Yemen 3.94 7.9 10.6 6.0 10.6 9.6 55.3 Zambia 4.69 15.8 17.8 11.7 15.4 10.5 28.8 Zimbabwe 4.81 21.1 19.5 11.3 12.4 9.6 26.1 109 Table E5: Summary statistics: macro variables Country Life Satis- faction GDP Equiv- alized income Life ex- pectancy Health exp. Edu exp. Years schooling Corruption percep- tion Env. health index Afghanistan 3.57 1876.19 1308.22 60.03 161.22 10.67 3.2 12 34.6 Albania 4.88 10136.02 3170.61 77.72 572.26 133.93 9.3 33 72.6 Argentina 6.80 - 7042.14 75.65 1341.67 751.45 9.8 34 86.8 Armenia 4.21 7267.98 2287.71 74.45 332.49 101.08 10.8 37 74.6 Australia 7.43 42829.71 27015.96 82.20 4191.09 3222.84 12.8 80 99.4 Austria 7.23 43908.13 27416.83 80.84 4956.93 2919.54 10.8 72 92.2 Azerbaijan 5.55 16593.19 3559.32 70.69 956.65 191.73 11.2 29 59.4 Bahrain 6.61 41931.51 22230.94 76.55 1900.21 600.00 9.4 49 83.2 Bangladesh 4.74 2714.77 1927.89 70.86 85.16 16.77 5.1 25 30.4 Belarus 5.76 16906.92 7002.06 71.97 866.94 315.19 11.5 31 81.0 Benin 3.83 1761.57 814.05 58.93 80.90 35.70 3.2 39 34.5 Bolivia 5.72 5598.57 3462.32 66.94 271.77 146.97 9.2 35 54.0 Bosnia Herzegovina 5.19 9490.14 5403.30 76.43 922.45 - 8.3 39 77.6 Botswana 4.27 15359.06 4360.23 64.50 888.08 625.02 8.8 63 62.0 Brazil 7.01 14830.90 5922.86 73.55 1311.95 718.01 7.2 43 72.2 Bulgaria 4.03 16022.11 5679.31 74.47 1212.52 262.30 10.6 43 86.6 Cambodia 3.97 2795.17 2159.63 67.33 208.62 14.46 5.8 21 42.7 Cameroon 4.35 2835.57 1463.96 55.50 146.90 39.34 5.9 27 37.6 Canada 7.49 41865.05 25644.38 81.24 4609.73 2590.79 12.3 81 97.9 Central African Rep 3.73 893.68 659.96 48.35 34.63 5.84 3.5 24 31.1 Chad 3.52 2081.72 950.51 51.60 75.07 18.62 1.5 22 29.6 Chile 6.55 20266.04 6598.45 80.59 1482.47 564.71 9.8 73 89.4 China 5.27 11016.99 5808.55 75.20 577.85 112.50 7.5 36 42.7 Colombia 6.44 11496.53 4242.87 73.45 740.95 227.50 7.1 37 66.0 Congo Kinshasa 4.71 640.59 1322.21 57.85 25.34 8.83 3.1 22 23.9 Costa Rica 7.07 13900.09 6191.93 79.23 1369.04 718.11 8.4 54 82.6 110 Croatia 5.57 20033.09 11261.92 77.33 1557.15 566.95 11.0 48 83.4 Czech Republic 6.53 28148.20 12733.18 78.28 1981.84 775.19 12.3 51 90.6 Dominican Republic 5.20 11375.62 3385.67 72.95 524.11 110.40 7.4 32 69.1 Ecuador 5.62 9926.95 3627.23 75.23 577.42 210.76 7.6 33 73.4 Egypt 4.30 10071.21 2469.62 70.53 520.29 121.06 6.4 37 69.5 El Salvador 5.83 7717.70 2056.17 72.23 507.65 120.85 6.5 39 65.6 Ethiopia 4.59 1233.95 1054.59 62.79 60.69 13.59 2.4 33 35.2 France 6.60 37224.19 19302.22 81.97 4213.23 2120.11 11.1 69 96.5 Georgia 4.21 6930.29 2713.74 74.08 696.99 63.52 12.1 52 73.1 Germany 6.70 43035.05 28871.95 80.89 4634.90 1921.94 12.9 79 92.8 Ghana 4.00 3894.00 2457.09 61.31 233.24 168.28 7.0 48 40.3 Greece 4.70 24060.78 8773.78 80.63 2512.67 702.69 10.2 43 91.2 Guatemala 5.84 6962.81 1456.01 71.49 466.59 94.46 5.6 32 60.6 Haiti 3.92 1652.17 1061.46 62.77 203.90 12.49 4.9 19 37.1 Honduras 4.67 4548.31 1490.16 72.76 425.29 79.06 5.5 29 57.5 Hungary 4.86 22821.38 7154.18 75.27 1839.01 589.29 11.3 54 89.0 India 4.65 4861.06 1392.27 67.29 195.57 44.41 4.4 38 33.2 Indonesia 5.29 9282.71 1269.93 68.52 273.35 114.98 7.5 34 55.7 Iran 5.19 16023.15 8323.71 75.13 1414.50 224.36 7.8 27 76.1 Iraq 4.98 15123.58 2384.58 69.47 695.35 316.05 5.6 16 58.0 Israel 7.44 30574.25 15677.19 81.66 2200.84 1772.11 12.5 60 92.6 Italy 6.16 34795.67 14835.12 82.24 3153.18 1426.06 10.1 43 81.5 Japan 6.25 34315.80 21440.58 82.59 3458.30 1527.35 11.5 76 94.7 Jordan 5.34 11404.74 5461.24 73.90 760.63 218.59 9.9 49 77.0 Kazakhstan 5.84 22469.68 4229.84 70.45 1023.48 379.38 10.4 29 75.4 Kenya 3.81 2747.39 1235.33 60.95 101.38 73.76 6.3 25 38.4 Kuwait 6.58 74181.33 17869.17 74.46 2374.56 1673.98 7.2 44 78.0 Kyrgyzstan 5.16 2869.84 2105.88 70.00 208.58 80.34 9.3 27 63.5 Laos 4.91 4498.09 2641.16 65.25 85.64 14.87 4.6 25 34.5 Lebanon 4.50 16573.54 9480.28 79.85 1149.89 175.28 7.9 27 85.9 Liberia 4.61 802.05 767.94 60.84 101.05 14.38 3.9 37 38.0 Macedonia 4.73 11569.20 4412.50 75.03 796.93 159.97 8.2 45 78.3 111 Madagascar 3.81 1373.19 1067.43 65.10 70.05 12.55 5.2 28 37.2 Malawi 3.94 758.58 884.87 58.50 79.64 17.74 4.2 33 39.3 Malaysia 5.76 23418.83 7259.93 74.57 938.29 589.64 9.5 52 87.7 Mauritania 4.75 3731.90 4187.46 63.05 152.73 46.05 3.7 30 37.3 Mexico 7.23 16136.28 5339.63 76.35 1061.88 485.17 8.5 35 70.0 Moldova 5.85 4753.55 3336.86 68.93 609.07 196.44 9.8 35 66.9 Mongolia 4.77 11396.42 4923.53 69.51 603.17 211.96 8.3 39 55.1 Morocco 5.13 7076.13 3357.42 73.71 438.24 157.06 4.4 39 63.4 Myanmar 4.12 - 1171.51 65.65 36.66 7.58 4.0 21 41.4 Namibia 4.48 9497.79 4056.24 64.81 757.68 456.07 6.2 49 55.5 Nepal 4.00 2114.78 1179.83 68.82 118.42 29.02 3.2 29 31.7 New Zealand 7.24 32805.74 22200.90 81.16 3290.84 2842.69 12.5 91 87.9 Nicaragua 5.74 4532.84 1997.57 74.51 381.76 74.44 5.8 28 59.9 Nigeria 5.17 5309.52 1565.98 52.11 183.68 22.17 5.2 27 34.1 Pakistan 5.15 4380.24 1233.60 65.72 122.44 21.90 4.7 29 38.8 Palestine 4.85 4301.68 3385.91 73.39 - - 8.9 - Panama 6.85 17902.56 4499.96 77.24 1299.44 281.87 9.4 37 68.6 Paraguay 5.55 7504.68 2815.54 72.49 665.61 174.11 7.7 24 59.0 Peru 5.82 10378.57 2994.62 73.84 497.57 116.22 8.9 38 55.9 Philippines 5.03 6041.78 1330.12 68.01 270.53 74.77 8.9 38 60.6 Poland 5.73 23175.01 8693.24 76.85 1550.72 611.10 11.8 61 76.3 Portugal 5.32 26184.10 13407.19 80.37 2493.17 1144.77 8.2 63 98.2 Russia 5.74 23299.19 10114.44 70.37 1528.87 482.01 11.7 27 74.2 Rwanda 3.57 1483.35 1067.71 62.80 157.79 26.35 3.3 49 38.7 Saudi Arabia 6.27 49536.99 25093.88 74.34 1371.50 1778.53 8.8 49 87.5 Senegal 4.49 2225.83 1628.15 66.44 97.46 54.49 4.5 43 44.8 Singapore 7.06 78958.09 13106.99 82.70 3941.36 1502.25 10.2 84 99.4 Slovakia 5.99 26470.61 10944.93 76.41 2228.31 706.18 11.6 50 87.9 Slovenia 5.76 28153.03 14469.65 80.43 2572.73 1292.21 11.9 58 91.4 South Africa 4.60 12374.53 4664.52 56.10 1090.63 480.13 9.9 44 59.3 112 South Korea 5.87 31901.07 18375.92 81.21 2244.01 1161.14 11.8 55 81.7 Spain 6.41 31657.14 13852.79 82.43 2925.31 1263.97 9.6 60 97.6 Sri Lanka 4.52 10289.69 1607.32 74.24 304.14 44.74 10.8 38 67.5 Sudan 4.21 3882.25 2217.99 63.50 201.70 37.62 3.1 11 36.3 Syria 3.32 - 3954.13 74.71 167.99 93.94 6.6 20 67.1 Taiwan 6.36 - 14634.88 - - - - 61 90.4 Tajikistan 4.96 2567.04 1785.89 69.63 188.27 41.77 9.9 23 49.9 Tanzania 3.89 2335.96 1378.08 64.29 125.98 40.93 5.1 31 35.9 Thailand 6.39 14597.18 4875.28 74.07 629.55 198.44 7.3 38 71.2 Turkey 5.30 18032.27 6198.91 74.86 970.60 274.28 7.6 45 73.9 Uganda 3.86 1665.13 816.82 57.77 145.81 19.06 5.4 26 37.3 United Arab Emirates 7.05 57594.13 20304.44 76.85 1770.74 9.1 70 88.3 United Kingdom 6.88 36765.05 24860.31 80.90 3234.72 2183.47 12.3 78 98.6 United States 7.26 50549.19 32621.72 78.74 8845.18 2451.82 12.9 74 92.7 Uruguay 6.53 17904.82 6025.87 76.54 1438.07 607.80 8.4 73 90.0 Uzbekistan 6.01 4704.53 3046.83 68.10 305.62 168.35 10.0 18 67.7 Venezuela 6.50 17001.91 3498.07 73.79 800.96 609.73 8.6 19 72.8 Vietnam 5.84 4912.32 2689.92 75.61 291.87 90.54 5.5 31 52.4 Yemen 3.94 3717.53 1154.02 63.84 182.46 60.60 2.5 19 43.8 Zambia 4.69 3724.53 2475.80 60.11 208.28 22.14 6.5 38 36.2 Zimbabwe 4.81 1684.23 1085.82 55.63 - 20.09 7.2 21 46.4 GDP, health expenditures, and education expenditures are per capita. Years schooling is average years of schooling completed by age 25. A low value for the Corruption Perception Index indicates a higher perception of corruption in that country. 113 114 115 1 Introduction Unlike Homo Economicus, real people consider not only their expected monetary outcome when making choices; they value the outcomes and opinions of other people, even strangers, and their self-image. The risk of embarrassment and their own moral code, not just economic loss, contribute to their honesty. People caught violating the expectations of others can lose social status. Offices sell snacks on the “honor system,” with an unmonitored box collecting cash, knowing that most people will give the right amount. Few people leave taxis without paying their fare, and even adding a tip. This kind of cooperation is widespread; social systems would break down if these expectations weren’t met. However, there are cases where social stigma does not currently create strong disincentives even in the face of illegality, such as digital piracy. In that case, the expected value of pirating is larger than the expected value from paying, and many people choose not to pay. We designed a novel experiment to investigate the relationship between emotions and behavior in a particular setting, specifically, one in which some subjects are able to “steal” from others. In lab settings, subjects are normally discouraged from knowing each other at all, by keeping them anonymous in the game. Here, we introduce subjects’ valuations of each other as relevant factors. When deciding to steal from another subject or not, they can anticipate the negative emotions they will feel, particularly if they are caught. How much negative emotion is felt depends on the individual; we use skin conductance responses (SCRs) as a proxy for the existence and intensity of the emotions. We find that introducing a social stigma will have a strong, yet highly heteroge- neous negative effect on stealing behavior. See section 2 for a detailed description of the experiment. Traditionally, economics assumes that agents will take the action that maximizes their own ex- pected utility, taking into consideration the monetary and nonmonetary costs and benefits. Becker (1968) modeled crime and punishment, considering fines as well as the opportunity cost of impris- onment (for the individual and for society). Recently, economists are interested in modeling those nonmonetary costs and benefits, including other-regarding preferences, emotional responses, and moral beliefs, as they pertain to decision making. Experiments in and out of laboratories have found that people do not act solely to maximize their monetary outcomes. In dictator games, people would maximize their payoffs by giving nothing, yet they give on average nearly 30% (Engel, 2011). Haley and Fessler (2005) found that subtle cues, such as an image of stylized eyes, or the wordless sounds of other players, led to more prosocial behavior. They conclude that “(a) many types of input relevant to questions of anonymity and observability influence prosocial behavior, and (b) individuals differ with regard to their sensitivity to various types of such input.” Bateson et al. (2006) found that contributions to an “honesty box,” collecting money for drinks on the honor system, were tripled when an image of eyes was displayed. Even when they are not observed (and thus there is no reputation formation), being reminded of observation as such influences them. Haley and Fessler argue that economic models fail to predict this due to “a failure to adequately consider individual decision-making processes, rather than overall patterns of behavior.” When a social norm is violated, the violator may feel guilty or ashamed. If this negative feeling is re- lived when recalling the act, it reduces the positive payoff of the violation, making it less personally profitable. Thus these negative emotions encourage adhering to social norms, encourage prosocial behavior, and discourage repeating the same violations (Eisenberg (1986); Tangney (1995); others). Barr (2001) found that in rural Zimbabwe, shame induced by criticism led to greater cooperation 116 in a public goods game. Charness and Dufwenberg (2006) find that nonbinding preplay communi- cation leads to more cooperation in a trust game, to avoid feeling “guilty.” de Hooge et al. (2008) find that inducing shame leads to more prosocial subsequent behavior among both prosocial and proself individuals, as long as the shame they experienced is relevant to the subsequent actions. Several studies of public goods games have found that allowing punishment and reward schemes lead to more/higher contributions, and that in general, negative incentives lead to higher contri- butions than positive incentives (Fehr and G¨ achter (2000); Sefton et al. (2007); Sutter et al. (2010)). This paper’s experiment is structurally equivalent to a classic dictator game, but with a limited strategy space and with sanctions added. The splitter may choose to share $7 of a $20 pie, or $0. The potential sanction in part 1 is a $1 fine, while the potential sanction in part 2 is the same $1 fine, plus having their picture displayed to other participants and labeled a thief. Unlike some games with sanctions (e.g., public goods games in Sefton et al. (2007)), the sanctions are not imposed by other players. Instead, they are imposed randomly, with a 60% chance in both part 1 and part 2. Krupka and Weber (2009) found that by framing a dictator game as giving or taking up to $5, rather than splitting $10, more people share evenly. As Camerer and Fehr (2002) point out, giving in dictator games depends seriously on many variables, particularly anonymity of the recipient. Social isolation leads to lower offers, by separating the splitters from social norms (Hoffman et al., 1996). In our case, there are only four possible recipients, and the rounds are repeated many times, meaning each pairing is likely to occur multiple times. Since the splitters and recipients are in the room together, after waiting in line together, etc., it seems likely that the social distance is relatively low. However, we still find that sharing $7 is relatively uncommon, even when the splitter faces the additional consequence of having their picture displayed. Schram and Charness (2015), citing Elster (2009), distinguish a social norm (expected behavior or lack of behavior that is observed by others and enforced somehow) from a moral norm (internal feelings which do not require a social element). In our case, only moral norms are at first relevant for our subjects (when decisions are anonymous), and social norms become relevant later on (when antisocial decisions are randomly revealed). There is no communication or observation of others’ choices among the decision makers, so that a social norm is difficult to measure, and is thus likely “homegrown,” i.e., determined by the individual’s experiences and expectations outside the lab. 1.1 Physiological data & skin conductance responses In this experiment, we measure skin conductance, which is the electrical conductivity of the skin. When a subject becomes emotionally aroused, they sweat very slightly more, leading to higher conductivity. Although skin conductance has been studied as a measure of emotional arousal for well over 100 years (Dawson et al., 2007), economists are just beginning to incorporate it into our experimental work. Coricelli et al. (2010) found that, in a game involving tax evasion, the SCRs of evaders were higher than those reporting honestly. In that experiment, one random under-reporter caught by an audit had their photo shown to other participants. However, and unlike this study, underreporting in that game actually benefitted other participants, because the lowest income reporters had a higher chance of audit. The collected “tax” revenue was not distributed to other participants. In our study, having one’s picture displayed is unambiguously associated with withholding payment from 117 another player. SCRs have also been studied with regard to “fairness.” Joffily et al. (2014) studied public goods games with and without punishment. They found that high contributors reported positive feelings and lower SCRs, that players grew angry and had higher SCRs when others did not contribute, and that the more angry/emotionally aroused they were, the harsher/more likely they were to punish. They also found that being punished led to negative feelings and increased SCRs. In an ultimatum game against other humans (with prescribed behavior) and computers, subjects had higher SCRs from less-than-even offers from humans (but not computers) (Van’t Wout et al., 2006). An exciting avenue of SCR research attempts to predict subject behavior by viewing their SCR data. Bechara (1997) measured SCR while subjects played the Iowa Gambling task, in which subjects chose cards from decks with different risk and reward levels, without knowing anyhting about each deck’s characteristics. They found that SCRs were higher when picking from the riskier deck even before subjects could articulate the differences between the decks. Using neural data (rather than SCR data), Smith et al. (2014) develop models that decently predict subject choices for out-of-sample individuals. 2 Experimental design The experiment took place in the Los Angeles Behavioral Economics Laboratory (LABEL) at the University of Southern California and was programmed and conducted with the software z-Tree (Fischbacher, 2007). In the experiment, 208 subjects were recruited via Online Recruitment System for Economic Experiments (ORSEE, see ?). In the recruitment email, they were notified, “In this experiment, we may also record physiological measures (such as heart beat and skin conductance). These are non-invasive procedures. Please note that you cannot participate in the experiment if you do not accept these procedures.” Each session had 8 subjects, split randomly into 2 even groups: Consumers and Producers. 2.1 Non-choice measures During the experiment, we collected psychophysiological data from Consumers using the Biopac MP150 and TEL100C physiological system. Specifically, we measured skin conductance responses (SCRs), spikes in the electrical conductivity of the skin driven by minute changes in moisture, as brought on by physiological arousal. SCRs are a proxy for emotional arousal, and indicate that the subject is emotionally excited, but do not indicate what kind of excitement: it could be negative (embarrassment), positive (elation), neutral (surprise), etc. SCRs have been widely used in psychology and are considered a reliable measure of emotional arousal (Boucsein, 2012). De- tails of the recording procedure and technical aspects of the analysis are presented in Appendix A1. We used a novel effort task, which was time-effective, unambiguously effortful, and individual- specific, also using the Biopac system. Privately, each Producer’s grip strength was calibrated (without explanation of its purpose for the experiment) by telling them to squeeze a hand grip as hard as they could for 5 seconds. Three seconds into the task, they were reminded, “squeeze as hard as you can” to ensure maximum effort. An individual grip threshold was then calculated as 50% of the average grip during the five-second period. This threshold was not reported to the subjects, but was recorded for use in the experiment. Unlike some other common effort inducement 118 methods such as math operations (Niederle and Vesterlund, 2007), anagrams (Charness and Ville- val, 2009), decoding tasks (Charness et al., 2014) or mazes (Gneezy et al., 2003), the task difficulty was calibrated individually, making it closer to even difficulty for all subjects. It was physically (rather than cognitively) effortful and made intuitive sense to Producer participants, who were told during the experiment that they were physically “creating” a good. Consumers also squeezed a (disconnected) hand grip to see for themselves the task the Producers would have to complete, and to be convinced of the effort and physical difficulty of the task. Finally, each Consumer’s photo was taken, head only, against a neutral backdrop with a neutral expression. 2.2 Procedures Once all Producers were calibrated and all Consumers had their photos taken, all 8 subjects were brought into the room and seated at computers, which were assigned so that all subjects in a role were seated in the same row. Dividers separated individuals so that all screens were private. Instructions were read aloud with screenshots on a projector. The basics of skin conductance response (SCR) measurement were explained and each Consumer’s left index and middle fingers were cleaned and then had electrodes applied and attached. We also applied a long strip of gen- tle tape across their left forearms and reminded them not to move that hand during the experiment. During part 1, each Producer was told to squeeze their hand grip as hard as they could. After 7 seconds above their calibrated personal grip threshold they were moved to a screen informing them that they had created a digital good worth $20 to an anonymous Consumer they had been randomly matched with. If they did not exceed the threshold for 7 seconds, the software remained on the same screen telling them to squeeze as hard as they could. Producers never received feed- back regarding their threshold nor their progress toward creating the good, just that they needed to exceed a threshold for 7 seconds. The task was tiring and took, on average, 45 seconds. Once all Producers had completed the task, each Consumer had two possible choices. First, they could pay $7 to the Producer they were matched with, leaving them a net payoff of $13. Alterna- tively, they could steal the good, that is, pay $0. If they chose to steal, there was a 40% chance they would not be caught (in which case they would obtain a net payoff of $20) and a 60% chance they would be caught and fined a nominal $1 (in which case they would obtain a net payoff of $19, since they would still keep the good). The fine was returned to the experimenter (not the Pro- ducer). There was no other consequence to stealing. Payoffs and fines were chosen in a way that an individual with no other-regarding or self-image concerns would unambiguously prefer to steal the good. At the same time, we used loaded language in the instructions (“stealing,” “caught,” etc.) to elicit an emotional response that the physiological system could record. Once all Consumers had made their choice, subjects saw their result screens. Consumers saw their own photo (regardless of their choice), their choice that period and, if they stole, whether they were charged the fine. Producers saw whether the Consumer they were matched with had stolen or paid for the good, with no photos. After everyone viewed their results and their final payoffs for the period, they proceeded to the next period, where Producers had to squeeze the hand grip to create a new digital good. Part 1 lasted 10 periods. 119 After those 10 periods, subjects were instructed that they were starting a new part of the experi- ment. In part 2, the rules and consequences were identical, except for one change: any Consumer caught stealing would additionally have their photo displayed to all Producers, along with text indicating that they had been caught stealing. Consumers could not see which other Consumers had been caught, only whether they had been caught themselves. Producers could see photos of all Consumers caught that period. Part 2 lasted 20 periods. We hypothesized that showing a photo when caught would (weakly) increase the emotional load of stealing and affect their behavior with some probability. This constituted the within-subject treatment effect of the study. After part 2, electrodes were removed from the Consumers and all subjects filled out a survey with questions about gender, age, the strategy employed in the experiment as well as their general attitude towards various forms of illegal activities (illegal media downloading, purchase of stolen goods, etc.). Subjects were paid their earnings in one randomly selected period from part 1 or part 2 plus a $5 show-up fee. The experiment adhered to the standard techniques in experimental economics, with a comprehension quiz before each part and a practice period before part 1. Subjects were not told in advance how many periods were in each part to avoid last-period effects. Sessions lasted on average about 75 minutes and the earnings of Consumers and Producers, including the $5 show-up fee, averaged $20.33 and $7.70, respectively. Every Producer created the good every period, although it often took multiple tries, and up to 4 minutes at the longest. One session was ended prematurely because one Producer refused to continue. Subjects in that session were all paid the show-up fee and the data was removed from the analysis. This left us with data from 200 subjects, 100 Consumers and 100 Producers, collected in 25 sessions with 8 subjects each. 3 Descriptive analysis 3.1 Behavior The entire analysis in the paper is performed on the 100 subjects who acted as Consumers. The role of Producers in our experiment was limited to “creating” the good and serving as real person counterparts to Consumers who could pay or steal from them. We first analyzed the propensity of Consumers to steal the good over the course of the experiment and found that, as expected, adding the picture of subjects who were caught had a significant effect on behavior. Indeed, the percentage of observations in which Consumers stole dropped from 78% in part 1 to 58% in part 2. Figure 1 represents the time series of behavior over the 30 periods (10 in part 1 and 20 in part 2) of the experiment. We performed a Wald test of structural break with unknown break date and found the existence of a break in the first period of part 2 (p-value = 0.000). We also computed the trend in each part separately and found that the slopes within each part were not significantly different from zero. Overall, there was strong evidence of change in the aggregate rates of stealing between parts and no evidence of change within parts. Figure 2 shows the cumulative distribution function (CDF) of the percentage of stealing by subjects in each part. The CDF plot indicates that for any stealing percentage, there are more subjects stealing that amount or less in part 2 than in part 1; in other words, the CDF from part 1 stochastically domi- nates the CDF from part 2. This is clear visually, but is confirmed by both Kolmogorov-Smirnov and chi-squared independence tests (p = 0.001 and p = 0.000 respectively). Our data also exhibited a large degree of heterogeneity in behavior. For each part, we categorized 120 each subject as “Always” [A] or “Never” [N] if they made two or fewer deviations from always steal- ing and never stealing respectively, and we categorized them as “Sometimes” [S] otherwise. 1 Table 1 summarizes the proportion of subjects in each category and their average stealing rates, where the first and second letter in the brackets refer to their group in parts 1 and 2 of the experiment respectively. Two-thirds of the subjects had a stable behavior over the course of the experiment. Of these, many always stole [AA], some always paid [NN], and some others had an interior (and, as it turned out, quite similar) stealing frequency in both parts [SS]. 2 The other subjects exhibited a treatment effect. Of these, all but one decreased their stealing frequency in part 2 ([AN], [AS], [SN]), with t-tests on the difference between means being statistically significant (p-value = 0.000). The remaining individual increased his stealing frequency in part 2. We will exclude this subject from the rest of the analysis, as he is a clear outlier in our sample. Notice that all the [S] groups exhibit average stealing probabilities close to 0.5 with relatively low dispersion. This behavioral focal point is all the more interesting in that it does not result in equal (or even close) payoffs for Consumer and Producer. Figure 3 depicts the time series of choices among the subjects who reacted to the treatment by significantly changing their behavior ([AN], [AS], [SN] groups). Not surprisingly, the difference across parts is exaggerated in this subsample (stealing probabilities of 82% and 23% in parts 1 and 2, respectively). It is almost entirely driven by less stealing in part 2, as opposed to more stealing in part 1. Indeed, a two-sided t-test comparison of differences in the average stealing probabilities of subjects with and without a treatment effect reveals statistically significant differences in part 2 (p-value = 0.000) but not in part 1 (p-value = 0.142). Table 2 shows stealing behavior by type following all 3 possible round outcomes: stole and caught in the last round, stole and not caught in the last round, and paid last round. Subjects classified as [S] are actually more likely to steal the round after they are caught in part 1 (t-test p-value = 0.041), while [S] subjects in part 2 are less likely to steal after being caught (t-test p-value = 0.002). 3 This indicates that being caught and nominally fined in part 1 carried little or no consequence for those deciding round-by-round whether to steal or pay (rather than following a blanket rule, such as “always pay”), but was a disincentive in part 2, when the social cost of being caught increased. At the same time, [S] subjects in both parts were most likely to steal this period if they paid last period, resulting in frequent switching behavior. Finally, we found an interesting gender effect: 42% of female subjects reacted to the treatment compared to 26% of male subjects. In particular, 9 of the 10 individuals with the most severe reaction to the treatment ([AN] group) were female. 1 We allowed 20% deviations in part 1 and 10% in part 2 because the first 1 or 2 decisions in the game may not be representative of the subject’s overall behavior. Small variations in the percentage cutoffs did not significantly affect any results. 2 Most [SS] subjects changed their stealing behavior minimally; the largest change was an increase of 30 percentage points from part 1 to part 2. 3 Note that only 11 of the 22 [S] subjects in part 1 remained [S]-types; the rest became type [N]. Similarly, 14 of the 25 [S] subjects in part 2 were type [A] in part 1. 121 3.2 Physiology Next we analyze arousal in conjunction with two events, decision and feedback. Skin conduc- tance data were analyzed using SCRalyze (Bach et al., 2010), see Appendix A1 for details on data recording and analysis. Decision refers to skin conductance responses (SCRs) recorded while the Consumer is deciding to steal or to pay (on average, 4.8 seconds). Feedback refers to SCR recorded immediately after the subject has received information on the consequence of his choice (caught/not caught/paid). 4 We consider three measures of arousal that have been previously reported in the SCR literature: amplitude, latency, and dispersion. For flexible duration events, SCRalyze estimates latency, dispersion, and amplitude. For fixed events, only amplitude is estimated, and canonical parameters of latency and dispersion are as- sumed. These measures helped us capture how aroused the person was (amplitude), how long it took to respond to the stimulus (latency) and how much time passed before skin conductance was back to the baseline level (dispersion). For each individual, we also collected non event-related SCR data to compute a “lability index.” This index is obtained by counting the number of non-event-related SCRs during a fixed amount of time, in our case, 2 minutes before the start of the experiment (again, see Appendix A1 for further details on the lability index and classification of subjects). According to prior literature, lability (the tendency to have many non-event-related SCRs) is relatively consistent within individuals, and interestingly, is associated with some personality traits (Dawson et al., 2007). In particular, subjects with more non-event-related SCRs (“labiles”) tend to be less emotionally demonstrative and calm. On the other hand, subjects with fewer SCRs (“stabiles”) tend to be emotionally ex- pressive and impulsive (Crider, 1993). 3.2.1 Physiological responses in part 1 versus part 2 First we look at aggregate differences in SCR between parts 1 and 2 to study the physiological response to the treatment (not showing vs. showing the picture when a subject is caught). We examine the overall highest amplitude of each individual in Figure 4, to determine the event most likely to trigger arousal. Most subjects had their highest amplitude in part 2. Of those that were most aroused in part 1, 21/41 were in the first 2 periods, indicating that they may have been still adjusting to the experiment. Looking separately at the highest amplitude of each individual in each part we find a similar pat- tern. The majority of subjects (82 subjects in part 1 and 81 subjects in part 2) were more aroused at decision time than at feedback time. This may be surprising since no information was obtained at decision time (subjects received new information only at feedback time and provided they stole the good). As noted in section 3.2.2, this may be due to the fact that arousal amplitude measures only magnitude, not valence; on seeing their feedback after stealing, subjects may be surprised, relieved, upset, or any combination. Within decision time, subjects were most aroused when they ultimately decided to steal. This is illustrated in Figure 5. The findings suggest that even though subjects who were more aroused in one part were also more 4 The feedback screen only contains new information (caught/not caught) in periods in which the subject stole the good. 122 aroused in the other, the treatment affected physiological responses. Interestingly, both latency and dispersion decreased significantly between part 1 and part 2, possibly due to a habituation effect: responses were faster and the time to reach the peak of arousal was shorter in part 2. As noted in section 3.1, there was a large heterogeneity in behavior. We conjectured that this would map into heterogeneity in arousal. To investigate this possibility, we computed the average amplitude, latency and dispersion at decision time for each group of subjects. The results are presented in Table 3. We can see marked differences in amplitude both across groups and across parts. First, among subjects who keep stealing at the same rates across parts ([AA], [SS]), the amplitude of arousal is increased in part 2 (p = 0.051), suggesting that showing the picture of subjects who are caught makes individuals more anxious, even when their behavior remains the same. Second and inter- estingly, many of the subjects who exhibited a treatment effect were among the more aroused in part 1, suggesting that an underlying emotional trait or state may have triggered their response to the treatment. Among those who always stole in part 1 ([AA], [AS], [AN]), those who exhibited a treatment effect ([AS], [AN]) had higher amplitudes in part 1 than those who did not ([AA]) (p=0.000). Third, the aggregate pattern in latency and dispersion noted previously (quicker re- sponse and shorter time to reach a peak in part 2) persisted at the group level. However, and perhaps because of sample sizes, there was no significant differences between the latency and dis- persion measures across groups. Therefore, from now on (and unless otherwise stated) we will focus the physiological analysis on the most relevant variable, namely amplitude. 3.2.2 Physiological responses by choice To further investigate the relationship between choice and arousal, we show in Table 4 the differ- ences in amplitudes by type of decision. Arousal at decision was (weakly) higher when the subject stole than when they paid. In particular, arousal among [SS] subjects significantly increased when they stole and knew that the picture would be shown if they were caught. This is consistent with the previous evidence regarding the distribution of the highest arousal levels. We also briefly investigated arousal at feedback. The main results are reported in Table 5. Am- plitudes at feedback were rather similar across all groups in both parts and, confirming previous results, arousal levels were significantly lower at feedback than at decision. Surprisingly, we found no systematic patterns in the differences of arousal when caught and not caught. There are two possible explanations. First, arousals levels are overall small, hence relatively uninformative; sub- jects may have anticipated the consequences and thus had mild emotional reactions. Second, there might be confounding effects: higher arousal when caught may reflect anxiety to the picture that will be shown whereas higher arousal when not caught may reflect relief, and surprise can confound in either case. It would be interesting to design a set of experiments to disentangle these effects. Last, we report the average lability scores across groups in Table 6. The lability score is the count of nonspecific SCRs during a 2-minute period prior to the start of the experiment. Following standard procedure, we classified subjects as labiles if their score was above the median, stabile if their score was below the median and unclassified if their score was exactly at the median. We found that subjects with a constant behavior across parts had significantly higher lability scores than those who reduced stealing when the picture was shown (p-value = 0.000). This fits with the personality 123 traits described above: labiles are calm and less emotionally demonstrative, so they react less to the treatment effect. Note, however, that differences in lability scores were typically small relative to standard errors. Taken together, the results suggest that the most interesting relationship between decision-making and physiology occurred at decision time and that individual differences were best predicted by amplitude measures. 3.3 Summary The results in this section can be summarized as follows. We found two distinct groups of subjects. For a large group, showing their picture had little to no impact on their decision but it increased their arousal. These subjects felt the emotional load of the picture, although not strongly enough to affect their behavior. These were typically the more deliberative, less excitable individuals (labiles). For the remaining subjects, the threat of the picture substantially reduced their stealing rate. In- terestingly, these subjects were more aroused in the first part (when they stole more frequently) than in the second, and also more aroused than the other subjects in the first part, suggesting that their high emotional reaction may have caused their decision to change behavior; most of their arousal change between parts can likely be attributed to the change in their behavior. Indeed and not surprisingly, amplitude measures were higher when subjects stole (part 1) than when they paid (part 2). These were typically the more assertive and impulsive individuals (stabiles). Finally, subjects did not seem to learn about their preferences throughout the game. Their choices were rather consistent within each part. In particular, we found little evidence of sustained decrease in stealing within a given part. The physiological reaction to feedback was also modest. This suggests that choices resulted from stable preferences and consequences were anticipated at the time of decision. 4 Predicting behavior with psychophysiology To what extent the can the treatment effect observed in part 2 be predicted by the physiological measures collected in part 1 and/or by non-event-related physiological measures (the lability score)? Said differently, is the observation of emotional responses of an individual in a given environment useful to predict behavior in an environment where emotions are likely to be exacerbated? As indicated before, the subjects who substantially decreased their stealing rate when the picture was shown ([AN], [AS], [SN]) were significantly more aroused in part 1 at decision time than those who kept their behavior constant ([AA], [SS], [NN]), with mean arousals of 0.90 and 0.46 respec- tively (p-value = 0.030). To further investigate the relationship between amplitude in part 1 and the effect of the picture, we ran ordinary least squares regressions 5 , where the dependent variable indicated whether the subject exhibited a treatment effect and the independent variables were aver- age amplitude at decision in part 1, average amplitude at feedback in part 1, and the lability index. Results are presented in the first 3 columns of Table 7. In a pooled regression of all subjects, the change in behavior is positively related to arousal at decision but not related to arousal at feedback, and is negatively related to lability. As one might expect, the R-squared values are low, indicating 5 Probit regression results are extremely similar to the linear probability model in this case. 124 that most of what explains the treatment effect is not captured by skin conductance alone; still, emotional responses during part 1 (and before, in the case of lability) do help to predict behavior in part 2. Recall that amplitude was not uniformly high among all subjects who reduced their stealing rate. We therefore split our sample between subjects who always stole in part 1 ([AA], [AN] and [AS], hereafter [A1]) and those who sometimes stole in part 1 ([SS] and [SN], hereafter [S1]) and ran the same regressions separately in each of these two subsamples. The results are presented in the last 6 columns of Table 7. We find that the previous result was driven by subjects who always stole in part 1: in the [A1] subsample, subjects with a low arousal amplitude were predicted to continue stealing in part 2 while those with large arousals were predicted to stop stealing. By contrast, among the subjects who did steal frequently but not always in part 1 ([S1]), amplitude was not a good predictor of the treatment effect. For these subjects, the lability scores were better predictors of the treatment effect: labiles were less affected by the treatment than stabiles. Previously we showed that amplitudes are higher when stealing than when paying, implying that overall average amplitude at decision may be less predictive for [S1] subjects, who paid about as often as they stole. However, running the same regressions using average amplitude at decision conditional on stealing yields nearly identical results. Additionally, although an interesting gender effect was noted in section 3 (females were more likely to change behavior between parts), the results in Table 7 are virtually unchanged when gender is included in these regressions. In other words, particularly high arousal at decision seems to explain which subjects will respond to the treatment among those who always stole in part 1, while lability seems to explain which subjects will respond to the treatment among those who stole, on average, about half the time. Labiles tend to be more deliberative and less expressive, which may be because of effortful control on their part Crider (2008), and may make them less susceptible to this treatment. More research into skin conductance and its relationship to behavior (and in particular behavioral changes) is necessary. 5 Conclusion We study experimentally the decision of an individual to steal or pay for an object that is produced at a cost by another individual. We consider two conditions. In the first condition, stealing is revealed with a positive probability and it is sanctioned by a fine. In the second condition, the sanction is increased by making the identity of the individual public. These decisions are made multiple times with different partners. To better understand the reasons that lead to choices, we also collect skin conductance responses at the time of decision and at the time the individual is notified that he is caught or not. We find that subjects stole less on average when their identity could be revealed, but that there was heterogeneity in responses, with about 65% showing little to no reduction in stealing. Subjects who were more aroused in the first condition are more likely to stop stealing in the second condition. Future research will examine the dynamic, evolving relationship between skin conductance responses and behavior, modeling the feedback between the two on a round-by-round basis. 125 References Bach, D. R., Daunizeau, J., Friston, K. J., and Dolan, R. J. (2010). Dynamic causal modelling of anticipatory skin conductance responses. Biological Psychology, 85(1):163–170. Barr, A. (2001). Social dilemmas and shame-based sanctions: Experimental results from rural Zim- babwe. Working paper 2001.11, Centre for the Study of African Economies, Oxford University. Bateson, M., Nettle, D., and Roberts, G. (2006). Cues of being watched enhance cooperation in a real-world setting. Biology Letters, 2(3):412–414. Bechara, A. (1997). Deciding Advantageously Before Knowing the Advantageous Strategy. Science, 275(5304):1293–1295. Becker, G. S. (1968). Crime and Punishment: An Economic Approach. Journal of Political Econ- omy, 76(2):169–217. Boucsein, W. (2012). Electrodermal activity, volume 3. Camerer, C. F. and Fehr, E. (2002). Measuring Social Norms and Preferences using Experimental Games: A Guide for Social Scientists. Institute for Empirical Research in Economics. University of Zurich. Working Paper Series, (97). Charness, G. and Dufwenberg, M. (2006). Promises and partnership. Econometrica, 74(6):1579– 1601. Charness, G., Masclet, D., and Villeval, M. C. (2014). The Dark Side of Competition for Status. Management Science, 60(1):38–55. Charness, G. and Villeval, M.-C. (2009). Cooperation and competition in intergenerational exper- iments in the field and the laboratory. American Economic Review, 99(3):956–978. Coricelli, G., Joffily, M., Montmarquette, C., and Villeval, M. C. (2010). Cheating, emotions, and rationality: An experiment on tax evasion. Experimental Economics, 13(2):226–247. Crider, A. (1993). Electrodermal response lability-stability: Individual difference correlates. In Progress in electrodermal research., pages 173–186. Crider, A. (2008). Personality and electrodermal response lability: An interpretation. Dawson, M. E., Schell, A. M., and Filion, D. L. (2007). The Electrodermal System. de Hooge, I. E., Breugelmans, S. M., and Zeelenberg, M. (2008). Not so ugly after all: When shame acts as a commitment device. Journal of Personality and Social Psychology, 95(4):933–943. Eisenberg, N. (1986). Altruistic emotion, cognition and behavior. Erlbaum, Hillsdale, NJ. Elster, J. (2009). Social norms and the explanation of behavior. In Hedstrom, P. and Bearman, P., editors, The Oxford Handbook of Analytical Sociology, pages 195–217. Oxford University Press, Oxford, UK. Engel, C. (2011). Dictator games: A meta study. Experimental Economics, 14(4):583–610. Fehr, E. and G¨ achter, S. (2000). Cooperation and punishment in public goods experiments. Amer- ican Economic Review, 90(4):980–994. 126 Figner, B. and Murphy, R. O. (2011). Using skin conductance in judgment and decision making research. A Handbook of Process Tracing Methods for Decision Research: A Critical Review and User’s Guide, pages 163–184. Fischbacher, U. (2007). Z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics, 10(2):171–178. Gneezy, U., Niederle, M., and Rustichini, A. (2003). Performance in Competitive Environments: Gender Differences. The Quarterly Journal of Economics, 118(3):1049–1074. Haley, K. J. and Fessler, D. M. T. (2005). Nobody’s watching? Subtle cues affect generosity an anonymous economic game. Evolution and Human Behavior, 26(3):245–256. Hoffman, E., McCabe, K. A., and Smith, V. L. (1996). On expectations and the monetary stakes in ultimatum games. International Journal of Game Theory, 25(3):289–301. Hui, K. and Png, I. P. (2003). Piracy and the Legitimate Demand for Recorded Music. Contributions to Economic Analysis and Policy, 2(1):na. Joffily, M., Masclet, D., Noussair, C. N., and Villeval, M. C. (2014). Emotions, Sanctions, and Cooperation. Southern Economic Journal, 80(4):1002–1027. Karaganis, J. (2012). Copyright Infringement and Enforcement in the US. Culture, (November 2011):1–10. Krupka, E. and Weber, R. A. (2009). The focusing and informational effects of norms on pro-social behavior. Journal of Economic Psychology, 30(3):307–320. Niederle, M. and Vesterlund, L. (2007). Do Women Shy Away From Competition? Do Men Compete Too Much? The Quarterly Journal of Economics, 122(3):1067–1101. Schram, A. and Charness, G. (2015). Inducing Social Norms in Laboratory Allocation Choices. Management Science, 61(7):1531–1546. Sefton, M., Shupp, R., and Walker, J. M. (2007). The effect of rewards and sanctions in provision of public goods. Economic Inquiry, 45(4):671–690. Smith, A., Douglas Bernheim, B., Camerer, C. F., and Rangel, A. (2014). Neural activity reveals preferences without choices. American Economic Journal: Microeconomics, 6(2):1–36. Sutter, M., Haigner, S., and Kocher, M. G. (2010). Choosing the carrot or the stick? Endogenous institutional choice in social dilemma situations. Review of Economic Studies, 77(4):1540–1566. Svensson, M˚ ans; Larsson, S. (2009). Social Norms and Intellectual Property. Technical report. Tangney, J. P. (1995). Recent Advances in the Empirical Study of Shame and Guilt. American Behavioral Scientist, 38(8):1132–1145. Van’t Wout, M., Kahn, R. S., Sanfey, A. G., and Aleman, A. (2006). Affective state and decision- making in the Ultimatum Game. Experimental Brain Research, 169(4):564–568. 127 Appendix A1: Technical details of skin conductance recording and analysis Recording: Our data was collected using the AcqKnowledge software (version 4.3). Events were placed at least 7 seconds apart to allow skin conductance levels to return to their baselines (Figner and Murphy, 2011). The acquisition rate was set to 2000Hz while the channel sample rate was set to 1000Hz. Gain on the Biopac system was set to 2000. Event-related analysis: Skin conductance data was analyzed using SCRalyze (Bach et al., 2010). It was trimmed to include the entire period from initial instructions through end of the final period. Raw data was processed using a high-pass Butterworth filter with cutoff frequency 0.0159Hz and a low-pass filter with cutoff frequency 5Hz. SCRalyze uses model-based analysis to understand sudomotor nerve (SN) activity based on ob- served SCRs. Sympathetic nervous system arousal leads to SN activity, which in turn leads to SCRs. SCRalyze uses an inversion model, which observes SCRs and infers SN activity (the state of interest) from them, with an understanding of the processes which lead from sympathetic arousal to SN activity and from SN activity to SCRs. Specifically, it assumes that an event-related sym- pathetic arousal causes a Gaussian SN burst, after some delay. It assumes a canonical shape of the SCR function (based on over 1,000 recorded SCRs). In our case, we have two types of events. Flexible duration events are the periods during which the Consumer is making a choice (from when the choice appears on the screen until the subject makes a selection) while fixed events occur the moment the Consumer sees feedback from their choice (they are “fixed” because they have no duration) we measure the psychophysiological responses that follow them). As noted in the text, for flexible duration events, SCRalyze estimates latency (time from the start of the event until the SN burst), dispersion, and amplitude (of the SN burst). For fixed events, only amplitude is estimated, and canonical parameters of latency and dispersion are assumed. Lability index: Electrodermal lability measurement is based on the number of nonspecific SCRs a person has (as opposed to event-related SCRs, which arise when the subject is reacting to something in particular). We measure each subject’s “lability score” by counting the number of SCRs in the instruction period at the beginning of part 1, during a two-minute interval during the instruction period, ending thirty seconds before the practice round. Following the literature (Boucsein, 2012), subjects with SCRs above the median are classified as “labiles” while those with SCRs below the median are classified as “stabiles.” Some studies with large sample size look only at the top and bottom 10%, which is more appropriate for larger datasets. 128 Tables and figures Figure 1: Proportion stealing by period: all subjects Time series showing the proportion of Consumers choosing to steal in each round. As discussed in section 3, there is a structural break between parts 1 and 2, and no significant trend in stealing behavior over time in either part. Figure 2: CDFs by part The cumulative distribution function of average stealing behavior in part 1 stochastically dominates the CDF in part 2; in other words, for any given average stealing percentage (e.g. subject stole 50% of the time), there are strictly more subjects stealing that amount or less in part 2 than in part 1. 129 Table 1: Behavioral types Type N Part 1 steal % Std. Err. Part 2 steal % Std. Err. [AA] 44 96.1% 1.0% 98.5% 0.6% [SS] 11 50.0% 3.3% 49.5% 5.2% [NN] 9 8.9% 3.5% 3.3% 2.4% [AS] 10 92.9% 2.2% 55.7% 4.2% [AN] 14 95.0% 2.7% 0.0% 0.0% [SN] 11 57.3% 4.5% 2.3% 1.0% [SA] 1 60.0% - 90.0% - Total 100 78.0% 2.9% 58.1% 4.2% Subjects are behaviorally categorized by how often they chose to steal in each part, where e.g. an [AA] (Always Always) subject stole in every or nearly every period in both parts, while an [SN] subject stole about half the time in part 1, and none or almost none of the time in part 2. Table 2: Conditional probabilities of stealing Group N P(steal| Obs Pr(steal| Obs P(steal| Obs caught last) not caught last) paid last) Part 1 [A] 68 95.8% 330 93.3% 253 100.0% 29 [S] 22 45.2% 73 24.2% 33 73.9% 92 [N] 9 25.0% 4 0.0% 3 9.5% 74 Part 2 [A] 44 98.2% 496 98.8% 327 100.0% 13 [S] 25 38.0% 150 56.5% 108 63.6% 217 [N] 30 0.0% 8 33.3% 3 1.8% 559 Probability of stealing, conditional on last round’s outcome. In part 1, the Sometimes group ([S]) was more likely to steal immediately after being caught, while part 2 was the opposite. 130 Table 3: Amplitude, latency, and dispersion at decision Type N Amplitude Std. Err. Latency Std. Err. Dispersion Std. Err. Part 1 [AA] 44 0.454 (0.060) 3.138 (0.263) 1.571 (0.109) [SS] 11 0.440 (0.075) 2.980 (0.501) 1.363 (0.150) [NN] 9 0.503 (0.168) 4.055 (1.379) 1.723 (0.358) [AS] 10 0.923 (0.239) 2.306 (0.403) 0.975 (0.119) [AN] 14 1.296 (0.510) 2.390 (0.453) 1.307 (0.326) [SN] 11 0.514 (0.240) 3.040 (0.804) 1.732 (0.372) Total: 99 0.615 (0.077) 2.987 (0.206) 1.463 (0.082) Part 2 [AA] 44 0.515 (0.072) 2.244 (0.248) 1.102 (0.093) [SS] 11 0.656 (0.178) 2.365 (0.454) 0.928 (0.073) [NN] 9 0.351 (0.107) 2.108 (0.408) 1.284 (0.186) [AS] 10 0.537 (0.102) 1.844 (0.238) 0.863 (0.056) [AN] 14 0.912 (0.204) 2.186 (0.247) 0.984 (0.179) [SN] 11 0.602 (0.202) 2.182 (0.307) 1.081 (0.169) Total: 99 0.569 (0.052) 2.174 (0.136) 1.049 (0.053) Average amplitude, latency, and dispersion; see Appendix A1 for technical definitions. Table 4: Amplitude at decision by choice Part 1 Part 2 Type Pay SE Steal SE Pay SE Steal SE [AA] - - 0.459 (0.061) - - 0.514 (0.072) [SS] 0.392 (0.109) 0.544 (0.101) 0.445 (0.149) 0.897 (0.224) [NN] 0.486 (0.174) - - 0.339 (0.104) - - [AS] - - 0.933 (0.249) 0.497 (0.131) 0.604 (0.143) [AN] - - 1.340 (0.509) 0.912 (0.204) - - [SN] 0.530 (0.215) 0.531 (0.279) 0.581 (0.191) - - Total: 0.459 (0.071) 0.664 (0.083) 0.578 (0.075) 0.665 (0.090) Average amplitude at decision, broken down by behavioral type and choice. 131 Table 5: Amplitude at feedback by outcome Part 1 Part 2 Not Std. Err. Caught Std. Err. Not Std. Err. Caught Std. Err. caught caught [AA] 0.294 (0.059) 0.268 (0.049) 0.252 (0.045) 0.288 (0.052) [SS] 0.187 (0.039) 0.330 (0.073) 0.506 (0.101) 0.354 (0.066) [NN] - - - - - - - - [AS] 0.281 (0.049) 0.283 (0.084) 0.222 (0.058) 0.321 (0.082) [AN] 0.499 (0.220) 0.360 (0.097) - - - - [SN] 0.203 (0.059) 0.324 (0.092) - - - - Total 0.295 (0.040) 0.303 (0.032) 0.289 (0.036) 0.339 (0.046) Average amplitude at feedback among those who chose to steal that round, broken down by behavioral type and outcome. Table 6: Lability scores Nonspecific Std. Err. # labiles # stabiles # unclassified SCR count [AA] 12.45 0.91 20 21 3 [SS] 17.09 2.31 7 6 2 [NN] 12.78 2.53 6 7 0 [AS] 13.50 1.72 7 3 0 [AN] 10.00 1.84 2 8 0 [SN] 9.00 1.74 3 3 1 Total 12.52 0.66 45 48 6 Lability score is the number of non-event-related SCRs in a 2-minute period before the start of the experiment. The subject is considered “labile” if they have more than the median number of nonspecific SCRs (12), “stabile” if they have fewer, and unclassified if they are at the median. 132 Figure 3: Percent stealing by period: [AS], [AN], [SN] only Time series showing the proportion of Consumers choosing to steal in each round, only among those who exhibited a strong treatment effect: [AS], [AN], and [SN] subjects. As expected, the decline in stealing between parts 1 and 2 is larger. Still, there is no significant time trend in either part. Figure 4: Time of largest reaction overall Count of the number of subjects whose largest reaction (amplitude) was recorded at that time. For example, 28 of the 99 subjects had their largest reaction while stealing in part 2. 133 Figure 5: Time of largest reaction, by part Count of the number of subjects whose largest reaction (amplitude) was recorded at that time, separately by part. For example, looking just a part 2, 47 of the 99 subjects had their largest reaction while stealing. 134 Table 7: Regressions predicting the treatment effect (1) (2) (3) (4) (5) (6) (7) (8) (9) Full sample [A1] only [S1] only Amp(D) 0.175*** 0.198*** 0.206*** 0.204*** 0.224*** 0.223*** 0.058 -0.009 0.114 (0.061) (0.069) (0.068) (0.066) (0.073) (0.073) (0.198) (0.339) (0.293) Amp(F) -0.141 -0.117 -0.123 -0.116 0.291 0.195 (0.183) (0.181) (0.192) (0.195) (1.167) (0.997) Lability -0.014* -0.003 -0.037** (0.007) (0.009) (0.013) Constant 0.246*** 0.272*** 0.430*** 0.215*** 0.238*** 0.272** 0.472*** 0.434* 0.882*** (0.060) (0.069) (0.107) (0.071) (0.079) (0.134) (0.146) (0.215) (0.242) N 99 99 99 68 68 68 22 22 22 R-squared 0.077 0.083 0.117 0.127 0.133 0.134 0.004 0.008 0.314 Significance: * = 10%, ** = 5%, *** = 1%. Standard errors in parentheses. Ordinary least squares regressions of an indicator of whether the subject exhibited a treatment effect, that is, whether they had relatively consistent behavior in both parts ([AA], [SS], [NN]) or they showed a significant decrease in stealing in part 2 ([AS], [AN], [SN]). [A1] ([S1]) refers to all types that always (sometimes) stole in part 1. 135 136
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Four essays on how policy, the labor market, and age relate to subjective well-being
PDF
The determinants and measurement of human capital
PDF
What leads to a happy life? Subjective well-being in Alaska, China, and Australia
PDF
Three essays on social policy: institutional development, and subjective well-being as a cause and consequence of labor market outcomes
PDF
Essays on the empirics of risk and time preferences in Indonesia
PDF
Three essays on economics of early life health in developing countries
Asset Metadata
Creator
Montgomery, Mallory
(author)
Core Title
Beyond revealed preferences: how gender, relative socioeconomic status, and social norms drive happiness and behavior
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
07/21/2017
Defense Date
06/12/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
behavioral economics,experimental economics,gender,Happiness,life satisfaction,OAI-PMH Harvest,relative income,revealed preference,skin conductance response
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kapteyn, Arie (
committee chair
), Easterline, Richard (
committee member
), Stone, Arthur (
committee member
), Strauss, John (
committee member
)
Creator Email
m.montgomery@gmail.com,mallorym@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-406659
Unique identifier
UC11265372
Identifier
etd-Montgomery-5574.pdf (filename),usctheses-c40-406659 (legacy record id)
Legacy Identifier
etd-Montgomery-5574.pdf
Dmrecord
406659
Document Type
Dissertation
Rights
Montgomery, Mallory
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
behavioral economics
experimental economics
gender
life satisfaction
relative income
revealed preference
skin conductance response