Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Bound in hatred: a multi-methodological investigation of morally motivated acts of hate
(USC Thesis Other)
Bound in hatred: a multi-methodological investigation of morally motivated acts of hate
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Bound in Hatred: A multi-methodological investigation of morally motivated acts
of hate
by
Joe Hoover
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Psychology)
August 2020
Copyright 2020 Joe Hoover
Acknowledgements
This research would not have been possible without generous funding from the University of
Southern California Dornsife College of Arts and Letters, the National Science Foundation, and
the United States Army Research Laboratory.
This work would also not have been possible without the support, wisdom, and patience of
my friend and advisor Professor Morteza Dehghani. Thank you for helping me chase my dreams
and for giving me the freedom to fall down rabbit holes; for always believing in me and, when I
needed to hear it, telling me to suck it up. Thank you.
Even though he abandoned me in my second year, I would also like to thank Professor Jesse
Graham. Along with Morteza, he gave me the chance to join the Academy and he has never
stopped being one of the most kind, intelligent, and strange people I have ever met. Even from a
distance, he was always available to advise and support.
I would also like to thank my PhD committee, Professors Wendy Wood, Hok Chio Lai, Pablo
Barbera, and Hajar Yazdiha for their wonderful candor, insight, and guidance, as well as their
willingness to make time as I raced toward the nish line.
Throughout my graduate studies, there were many others who deeply impacted my path and
whom I would like to acknowledge: Professor Sara Hodges for letting me volunteer in her lab and
join the University of Oregon as her Masters student; Professor Azim Shari for taking me on as
a Masters student and helping form the foundation upon which I would build my PhD research;
and Professor Sanjay Srivastava for teaching me to love methodology, measurement, and Meehl.
ii
I am also deeply grateful for the companionship, inspiration, help, and much needed distraction
oered by my friends and colleagues from Computational Social Science Laboratory. Thank you
Justin Garten, Reihane Boghrati, Kate Johnson, Mohammad Atari, Aida Mostafazadeh, Brendan
Kennedy, and Leigh Yeh.
Of course, the beginning of this path started many years ago and I would have not made it far
at all without the love and support of my parents Marian and John Hoover. They taught me to
love learning, exploration, curiosity, and inquiry and always encouraged me to follow my dreams.
I would also like to oer an innity of thanks to my partner, best friend, and soul-mate,
Ryleigh Nucilli. As much as anyone's, it was her fault that I started graduate school. And it was
her love, patience, faith, support, and ceaselessly stimulating intellect that helped me nish it.
While I sat in my ivory kitchen chair, she commuted across Los Angeles, developed an impressive
air for road rage, became a digital media mogul, and put food on the table.
Finally, I would like to thank my Miniature Australian Shepherd, Frida, for all of the barking,
licking, and growling that distracted me when I needed it most and for forcing me to go outside
every once-in-a-while.
iii
Table of Contents
Acknowledgements ii
Chapter 1: Introduction 1
Chapter 2: The Big, The Bad, and The Ugly: Geographic estimation with
awed
psychological data 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Sub-national Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Post-stratication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Raking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.5 Multilevel Regression and Post-stratication . . . . . . . . . . . . . . . . . 14
2.2.6 Multilevel Regression and Synthetic Post-stratication . . . . . . . . . . . . 17
2.3 Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1.1 Project Implicit Data . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1.2 2010 U.S. Religious Census . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1.3 2011-2015 American Community Survey 5-year estimates . . . . . 25
2.4.1.4 MIT Election Data . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1.5 Geographic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2.1 Primary Data preparation . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2.2 Secondary Data preparation . . . . . . . . . . . . . . . . . . . . . 27
2.4.2.3 Disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.2.4 Post-stratication and Raking . . . . . . . . . . . . . . . . . . . . 27
2.4.2.5 MrP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.2.6 MrsP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Study 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.1.1 Racial Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iv
2.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 3: Bound in Hatred: The role of group-based morality in acts of hate 42
3.1 Study 1: Hate Speech and Moral Rhetoric . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Study 2: Hate Groups and County-level Moral Values . . . . . . . . . . . . . . . . 50
3.3 Perceived Moral Violations and the Justication of EBEPs . . . . . . . . . . . . . 55
3.4 Study 3: Experimental Manipulation of Perceived Moral Wrongness . . . . . . . . 56
3.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5 Study 4: Justication of EBEPs against Mexicans . . . . . . . . . . . . . . . . . . 59
3.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.6 Study 5: Justication of EBEPs against Muslims . . . . . . . . . . . . . . . . . . . 62
3.6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Bibliography 68
Appendix A
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.1 Simple MrsP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.1.0.1 Adjusted MrsP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Appendix B
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.1 Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.1.1 Data Generating Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.1.2 Sampling Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Appendix C
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.1 Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.1.1 Analysis A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.1.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.1.2 Analysis B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.2 Study 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
C.2.1 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
C.2.2 Mediation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
C.2.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
C.2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.3 Study 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C.3.1 Mediation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C.4 Study 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
C.4.1 Mediation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
v
Introduction
Acts of hate have been used to silence, terrorize, and erase marginalized social groups throughout
history. The rising rates of these behaviors in recent years underscores the importance of devel-
oping a better understanding of when, why, and where they occur. Notably, most social science
research focused on behaviors such as hate speech and hate crime has been produced by crimi-
nology, sociology, and political science (Green and Spry, 2014). While psychology has produced a
rich literature on prejudice (Brown, 2011, Hall, 2013), there has been little work addressing more
extreme behavioral expressions of prejudice, which we refer to as acts of hate (Green and Spry,
2014). This is likely due to the fact that acts of hate are dicult to investigate using conventional
psychological research methods. In situ, acts of hate are relatively rare and dicult to measure,
and in controlled settings such as surveys or behavioral experiments, they are dicult to mean-
ingfully approximate. Nonetheless, acts of hate are as much psychological phenomena as they are
sociological or political phenomena and understanding them requires a robust theoretical account
of their psychological underpinnings.
In this work, we investigate acts of hate as moral phenomena. This approach is grounded in
psychological research that suggests that human violence is often morally motivated, such that
that perpetrators of violence often feel that their acts are morally justied or obligated (Fiske
et al., 2014). Here, we investigate whether acts of hate are associated with perpetrators moral
values and, further, whether this association can be explained, at least in part, by their beliefs
1
about outgroup moral violations. To address the methodological issues noted above, we conducted
a combination of observational and experimental studies that rely on a range of methodological
and analytical approaches, including Natural Language Processing (NLP), small-area estimation,
geospatial modeling, and mediation analysis. Specically, in Chapter 3, we investigate hate speech
in 24 million social media posts, hate group activity across 3108 U.S. counties, and beliefs about
the justication of hate acts reported by 1,200 U.S. survey respondents. This multi-methodological
approach allows us to investigate the psychological underpinnings of hate acts using both real-
world data and data generated in controlled settings.
First, however, in Chapter 2 we discuss and evaluate a range of approaches to sub-national
estimation under conditions of non-random sampling. We conducted this work to facilitate our
study of the association between county-level moral values and the county-level prevalence of hate
groups (See Study 2 in Chapter 3). In this study, we relied Multilevel Regression and Synthetic
Post-straticaion (MrsP; Leemann and Wasserfallen, 2017), a survey adjustment procedure, to
obtain \sub-national" estimates of county-level moral values. However, as sucient randomly
sampled data was not available, these estimates were necessarily derived from non-randomly
sampled data, which raises serious questions about the reliability of our estimates. While MrsP,
and its forerunner, Multilevel Regressions and Post-stratication (MrP Park et al., 2004b), have
been widely tested on randomly sampled data, there has been very little work investigating their
ecacy on non-random data, such as those obtained from large-scale online convenience samples.
Accordingly, in Chapter 2, we report three studies evaluating MrP's performance under simulated
and real-world conditions of sample biases. Ultimately, we nd that MrP is likely to outperform
the other sub-national estimation methods that psychological researchers currently use.
In 3, we present a program of research that suggests that acts of hate may often be best
understood as morally motivated behaviors grounded in peoples moral values and perceptions of
moral violations. As evidence for this claim, we present ndings from ve studies that rely on a
2
combination of natural language processing, spatial modeling, and experimental methods to inves-
tigate the relationship between moral values and acts of hate toward marginalized groups. Across
these studies, we nd consistent evidence that moral values oriented around ingroup preservation
are disproportionately evoked in hate speech, predictive of the county-level prevalence of hate
groups, and associated with the belief that acts of hate against marginalized groups are justied.
Additional analyses suggest that the association between group-oriented moral values and hate
acts against marginalized groups can be partly explained by the belief that these groups have done
something morally wrong. By accounting for the role of moralization in acts of hate, this work
provides a unied framework for understanding hateful behaviors and the events or dynamics that
trigger them.
3
The Big, The Bad, and The Ugly: Geographic estimation
with
awed psychological data
2.1 Introduction
The sub-national distributions of psychological constructs are attracting increasing interest in
the psychological literature where, for example, outcomes such as well-being or racial bias are
being studied within smaller units such as states or counties. Such research relies on what is
referred to as sub-national estimation, which involves estimating the population distribution of a
construct across a set of sub-national units using a sample of data drawn from those units. While
sub-national estimation is a relatively new approach to psychological research, it is relevant to
any psychologist who is interested in working with estimates at smaller, more localized levels like
the state-, county-, or city-level, as opposed to larger national or international levels.
By studying a construct's sub-national variation, researchers can learn about its stability,
relationships with covariates, and responses to naturally occurring perturbations. For example,
a growing body of literature has identied systematic sub-national geographic covariance among
personality traits (Allik et al., 2009, Rentfrow et al., 2008, 2015) and between personality and other
outcomes, such as life-satisfaction (Jokela et al., 2015), liberalism (Rentfrow et al., 2013, 2015),
cancer (McCann, 2017b), volunteering (McCann, 2017a), work satisfaction (McCann, 2018), and
economic resilience (Obschonka et al., 2016). Recent research has also provided evidence that the
4
congruence between a person's personality and the dominant personality traits in their region is
associated with their subjective well-being (G otz et al., 2018).
Another burgeoning line of work has focused on the sub-national distribution of racial bias and
its association with indicators of racial inequity. Studies in this area have identied links between
county-level implicit bias against Blacks and the Black-White infant mortality gap (Orchard and
Price, 2017), Black's death-rates (Leitner et al., 2016a,b), exposure to racial out-groups (Rae
et al., 2015), disproportionate use of lethal force against Blacks in policing (Hehman et al., 2017),
and racial disparities in school-based disciplinary actions (Riddle and Sinclair, 2019). While
researchers have long speculated that such associations exist, they have remained dicult to assess
quantitatively. However, by focusing on sub-national variation in target outcomes, researchers
have been able gain novel insight into the relationships between psychological phenomena and
real-world outcomes.
Unfortunately, some of the approaches to sub-national estimation that are most widely em-
ployed in the psychological literature do not adequately address the methodological challenges
of sub-national estimation. At worst, these approaches can yield completely invalid estimates
and inferences. Specically, the methods most widely used either inadequately address or wholly
neglect issues of sub-national sparsity and representativeness. A sample exhibits sub-national
sparsity when, for some sub-national units, data is missing or N s are very small. Similarly, a sam-
ple exhibits sub-national non-representativeness when the data representing some sub-national
units is not representative. If these issues are not addressed, sub-national estimates may be
unreliable, biased, and (or) completely invalid.
In this work, we review these issues and discuss methods that have been developed to address
them. While some of these methods, such as post-stratication (Gelman and Little, 1997, Little,
1993, Lohr, 2009) have been used in the psychological literature (Leemann and Wasserfallen,
2017, Leitner et al., 2016a, Obschonka et al., 2016, Orchard and Price, 2017), others, such as
5
raking (Deville et al., 1993, Kalton and Flores-Cervantes, 2003), multilevel regression and post-
stratication (MrP; Gelman and Little, 1997, Park et al., 2004b), and multi-level regression and
synthetic post-stratication (MrsP Leemann and Wasserfallen, 2017) are not as well-known to
psychological researchers. Each of these methods constitute an approach to survey adjustment
that can be used to address sub-national sparsity and non-representativeness. We provide an
overview of these approaches and discuss their strengths and weaknesses.
More specically, however, we propose that MrP and its more recent variants will be particu-
larly useful for psychologists interested in sub-national investigations of psychological phenomena.
MrP oers a model-based approach to obtaining sub-national estimates for a given outcome, such
as state-level estimates of public opinion (Krimmel et al., 2016) and voter behavior (Gelman,
2014), county-level estimates of racial bias (Riddle and Sinclair, 2019), or city-level estimates of
health outcomes (Wang et al., 2018). In contrast to methods like post-stratication and raking
(See below for discussion of these methods), MrP relies on a hierarchical response model which
helps improve estimation accuracy via partial-pooling or smoothing (Park et al., 2004b). Ac-
cordingly, a researcher interested in studying racial bias, for example, could apply MrP to data
from Project Implicit in order to derive estimates of state- or county-level racial bias. Through
the application of MrP, these estimates would be stabilized due to partial-pooling as well as ad-
justed for response biases via the application of post-stratication. MrP has become increasingly
popular and is now considered the gold standard for estimating sub-national political preferences
(Caughey and Warshaw, 2019, Leemann and Wasserfallen, 2017, Selb and Munzert, 2011). Re-
cent work has also demonstrated that MrP can even generate surprisingly accurate sub-national
estimates from non-random and non-representative data (Wang et al., 2015). Further, it has been
shown to outperform the methods more commonly used in psychological research, such as and
disaggregation (Erikson et al., 1993) | merely calculating region-specic sample means | and
post-stratication (Park et al., 2004b).
6
However, previous comparative evaluations of MrP have found that it oers diminishing returns
as sample sizes increase (Buttice and Highton, 2013, Hanretty et al., 2016, Lax and Phillips,
2009), suggesting that when enough data is available, more simple approaches like disaggregation
may perform comparably. These evaluations, however, were conducted with randomly sampled,
nationally representative data and thus cannot necessarily be generalized to the kinds of large,
but also non-random and biased data (e.g. data collected via Project Implicit, MyPersonality, or
YourMorals.org) that psychological researchers often work with today.
Accordingly, in addition to providing a detailed introduction to MrP and some of its recent
modications, we also report results from three new studies investigating its comparative per-
formance under conditions similar to those faced by psychological researchers. Specically, these
studies address the following questions:
1. Under simulated conditions of sampling bias caused by unrepresentative sampling, how does
MrP perform (Study 1)?
2. Given a large, unrepresentative, non-random sample, how does MrP perform compared to
other methods of sub-national estimation (Study 2)?
3. Given a large, unrepresentative, non-random sample, do downstream inferences about the
relationship between sub-national estimates and a secondary construct vary depending on
the method used to obtain sub-national estimates (Study 3)?
In Study 1, we address the rst question via a large-scale Monte-Carlo simulation that we
use to estimate the accuracy and bias of sub-national MrP estimates under varying levels of
non-representativeness and sample size. While simulation necessarily requires making simplifying
assumptions about data generating processes, this study provides new information about MrP's
performance under conditions of varying bias and sample size.
Next, in order to better understand how MrP performs under these conditions when applied to
real data, we rely on large-scale data obtained from Project Implicit (Xu et al., 2013) to generate
7
county-level estimates of the rate of Catholic adherence using MrP as well as a range of other
methods. While the county-level rate of Catholic adherence may not be of particular psychological
interest, focusing on this variable allows us to directly evaluate estimation accuracy and bias, as
a reasonable approximation of \ground-truth" (the true rate of Catholic adherence) is available
via the 2010 US Religious Census (Grammich, 2012).
Finally, in Study 3, we investigate how inferences about the relationship between Barack
Obama's 2008 General Election county-level vote share and county-level White racial bias against
Blacks vary depending on the method used to estimate county-level racial bias. Previous research
has found a negative association between intent to vote for Obama and both explicit and implicit
racial bias (Greenwald et al., 2009b). Given this, a question of presumable interest might be
whether this association exists at the county-level. Importantly, however, our goal in this study is
not to provide evidence for or against such an association, but rather to investigate how inferences
vary depending on the method used to obtain estimates of county-level racial bias. That is, in this
study, we sought to determine whether the method of estimation | in this particular context |
had substantive implications for the kind of downstream analyses psychologists might be interested
in conducting.
Overall, our aim in this work is to introduce psychologists to sub-national estimation, highlight
its challenges, and provide actionable information regarding how these challenges can and should
be addressed. In our empirical work, we provide evidence via simulation and analysis of real data
that, under conditions of sub-national sparsity and (or) non-representativeness, MrP can improve
the accuracy of sub-national estimates, regardless of sample size. Further, we also demonstrate
that downstream inferences about the relationship between county-level estimates and a secondary
county-level outcome can vary substantively depending on the method of estimation. Finally, we
also provide all of the code and data used for these studies at https://osf.io/8javp/ so that readers
can more easily apply these methods or use our estimates in their own research.
8
2.2 Sub-national Estimation
2.2.1 Overview
Sub-national estimation of a variable involves obtaining estimates of population parameters,
such as means or medians, for sub-national areas that fall below the nation level, such as states,
provinces, counties, or districts. For example, the problem of estimating state-level means for
Extroversion, explicit racial bias, or well-being are all problems of sub-national estimation. Sub-
national estimation is neither inherently dicult nor complicated. As is the case with many
problems of estimation, access to sucient data renders the problem trivial. For instance, esti-
mating U.S. state-level explicit racial bias would be simple if one had a suciently large random
sample of racial bias measurements drawn from each state. With such data, sub-national esti-
mates of explicit racial bias could simply be obtained by calculating the distribution of means for
each state.
Unfortunately, researchers rarely have access to such data due to the cost and diculty of
collecting suciently large random samples from multiple sub-national areas. Accordingly, various
methods are used in order to facilitate the derivation of sub-national estimates from less-than-
ideal data. In the psychological literature, the methods most frequently used for sub-national
estimation are disaggregation (Erikson et al., 1993) and post-stratication (Gelman and Little,
1997, Lohr, 2009). Below, we review these approaches to sub-national estimation and discuss
two other approaches that are less well-known to psychological researchers, raking (Deville et al.,
1993, Kalton and Flores-Cervantes, 2003) and MrP (Park et al., 2004b).
2.2.2 Disaggregation
As noted above, sub-national estimation of a variable, such as explicit racial bias, is trivial when
a random sample of the variable is available for each sub-national unit. With such data, population
estimates of the target variable's sub-national means can simply be estimated via the sub-national
9
sample means, a procedure often referred to as \disaggregation". Further, while such data is
rarely directly available, it can, in some cases, be approximated by combining data from multiple
nationally representative surveys into a single data-set and then segmenting or \disaggregating"
the data into the desired level of analysis (Erikson et al., 1993). Population estimates of the target
variable's sub-national means can then be simply estimated via the disaggregated sample means.
This approach hinges on the premise that combining multiple random and nationally repre-
sentative samples will eventually produce a super-sample that is suciently representative at the
targeted sub-national level. However, while it is asymptotically valid, in many instances it is
not a viable option. While it may be possible to construct a sucient super-sample for a small
set of constructs for which data is frequently collected, this is often not the case for constructs
excluded from that set, such as personality inventories and measures of explicit and implicit atti-
tudes. Further, depending on the level of geographic analysis, it may not be possible to assemble
a super-sample for even the most widely collected variables. Consider, for example, that moving
from a US state-level analysis to a county-level analysis increases the number of spatial units by
a factor of approximately 60; thus, data deemed sucient for a disaggregated state-level analysis
would need to be expanded by roughly the same factor to provide comparable coverage for a
county-level analysis.
An even more pressing problem for disaggregation is its inability to address response biases
and failures in randomization. If certain segments of the target population over- or under-respond,
disaggregated estimates will be biased (Holt and Smith, 1979, Lax and Phillips, 2009, Little, 1993)
even if they are derived from an innite sample (Pew, 2018).
2.2.3 Post-stratication
To address issues of response bias or non-representativeness, researchers employ a range of
techniques that aim to adjust a sample so that it re
ects known population characteristics. For
instance, the proportion of people in a sample who fall in certain age bracket, report a given sex,
10
or perhaps are characterized by some combination of these variables may not match the popu-
lation proportions for these demographic characteristics. One way to account for this mismatch
between sample demographic proportions and population demographic proportions is to calculate
sample weights that can be used to weight respondents so that the weighted sample demographic
proportions match the population demographic proportions.
One approach to calculating sample weights is \post-stratication" (Gelman and Little, 1997,
Lohr, 2009), which, in the psychological literature, has most frequently been used to adjust for
age and gender (Leemann and Wasserfallen, 2017, Leitner et al., 2016a, Obschonka et al., 2016,
Orchard and Price, 2017, for example, see ). Post-stratication is generally implemented as follows.
The rst step is to select a set of demographic variables, often referred to as auxiliary variables,
for which adjustments will be made. Generally, auxiliary variables should be selected depending
on whether the target variable (i.e. the variable for which sub-national estimation is conducted)
varies over their levels. For instance, age and sex might be selected as auxiliary variables for the
sub-national estimation of well-being. Conceptually, these auxiliary variables are used to \post-
stratify" sample respondents into a set of demographic categories or cross-classications. That is,
the auxiliary variables age and sex can be used to post-stratify sample respondents into discrete
demographic bins that each represent a unique combination of age and sex. By convention, we
refer to these as demographic cross-classications or \post-strata".
Finally, the population estimate of a target variable, such as well-being, within a given sub-
national area can be estimated as the weighted mean of the post-strata sample means
u[l];j
,
where the weights re
ect the demographic population proportions corresponding to the post-
strata within the sub-national unit. Here,
u[l];j
refers to the post-strata sample mean for
post-stratum j located within sub-national area l of upper-level area u. For instance, under this
notation convention, j might refer to the post-stratum combination of age and gender and u[l]
might index counties nested in states.
11
Note that under this approach, the post-stratied mean for a given sub-national unit is a
function of the post-strata sample means within that sub-national unit. That is, the post-stratied
sub-national estimate for a given sub-national area is based exclusively on the data sampled from
that sub-national area. Accordingly, this approach minimally requires n
u[l];j
> 0, where n
u[l];j
represents the sample size n for post-stratum j located in sub-national area l in upper-level area
u. Accordingly, n
u[l];j
> 0 simply states that there must be at least one sample respondent for
each sample post-strata within each sub-national area. However, it is generally preferable to have
larger sample sizes, such as n
u[l];j
50, in order to minimize the eects of sampling error.
To summarize, sub-national estimates of a target variable Y
u[l]
can be obtained via post-
stratication by selecting a set of auxiliary variables; calculating the means
u[l];j
of the post-
strata j within each sub-national area u[l]; and nally calculating the weighted mean of
u[l];j
,
where weights p
u[l];j
represent the population proportion of post-stratum j in sub-national area
u[l]:
Y
u[l]
=
j=J
X
j=1
p
u[l];j
u[l];j
; (2.1)
for each post-stratum j2j = 1;:::;j =J.
Importantly, post-stratication adjustment procedures vary substantially in complexity. For
instance, it is often desirable to select multiple auxiliary variables, with age, gender, race, educa-
tion being the most used set. However, adding auxiliary variables can dramatically increase the
number of demographic cross-classications, particularly considering that they are crossed with
sub-national units. For example, post-stratifying on 3-level age and education and 2-level gender
would produce 18 demographic cross-classications, which themselves are nested in sub-national
units. For a state- or county-level analysis, this approach would yield approximately 1850 = 900
12
or 183; 007 = 54; 126 distinct participant cross-classications and adhering to a 50 rule would
require sample sizes of approximately 45,000 or 2.7 million, respectively.
To mitigate such exploding sample size requirements, post-stratication can be reformulated
so that estimates of the post-stratum means are pooled across sub-national units. That is, rather
than estimating the mean for each post-stratum within each sub-national unit | the no pooling
approach | the post-stratum means can be estimated across all sub-national units (Gelman
and Little, 1997). However, while un-pooled post-stratication risks high-standard errors and
in
ated between-unit variation, pooled stratication risks homogeneity and suppressed between-
unit variation.
2.2.4 Raking
In addition to issues of sparsity within post-strata cells, another challenge that often compli-
cates post-stratication is the diculty of obtaining population estimates for the cross-classication
of the auxiliary variables. In response to this issue, methods such as raking (Deville et al., 1993,
Kalton and Flores-Cervantes, 2003) are often substituted for post-stratication. Whereas post-
stratication operates on the joint distribution of the auxiliary variables, raking operates on their
marginal distributions, such that sample weights are derived by iteratively adjusting the marginal
distributions of the auxiliary sample variables to match the population marginal distributions. For
example, raking over age and education at the state-level would involve weighting respondents
within each state so that the weighted distribution of their ages matches the known marginal
state-level distribution of age. The same procedure would then be applied to education and if
this re-weighting interferes with the age alignment, it would be reapplied to age. This iterative
re-weighting process would be repeated until the marginal distributions of the auxiliary variables
match their known marginal distributions within some a priori range of error.
Raking can considerably expand the pool of viable auxiliary variables, compared to post-
stratication. However, the ne-grained information encoded by the full join-distribution is lost
13
and this can negatively impact estimates. To mitigate this loss of information, raking can also
be conducted over some subset of the cross-classications of the auxiliary variables. However,
as with post-stratication, this introduces additional data requirements: the distribution for the
chosen cross-classications must be known and sucient sample data must be available for each
cross-classication category, otherwise estimates may be wildly inaccurate (Gelman, 2007).
2.2.5 Multilevel Regression and Post-stratication
While post-stratication and raking remain viable approaches to survey weighting, a more
recently developed method, multilevel regression and post-stratication (MrP; Gelman and Lit-
tle, 1997, Park et al., 2004b), has become increasingly popular and is now considered the gold
standard for estimating sub-national political preferences (Leemann and Wasserfallen, 2017, Selb
and Munzert, 2011). For example, it has been used in sub-national studies on legislative respon-
siveness to constituent opinion (Kastellec et al., 2015, Krimmel et al., 2016), regional variations
in environmental opinions (Fowler, 2016, Howe et al., 2015), and the relationship between income
and political preferences (Gelman, 2014). In contrast to conventional post-stratication, in which
sample weights are applied directly to the sample means for each post-stratum, MrP involves
applying sample weights to estimates of post-stratum means derived from a hierarchical model
t to individual-level data (Lax and Phillips, 2009, Park et al., 2004b). Sub-national means are
then estimated as the population-weighted mean of these predicted post-stratum means.
The primary advantage conferred by MrP arises from how post-stratum means for a given out-
come are estimated. First, individual-level responses are modeled as a hierarchical, or multilevel,
function of demographic auxiliary variables, sub-national geographic indicators, and contextual
factors (Lax and Phillips, 2009, Park et al., 2004b). For example, an individual i's responsey
i
on
a measure of explicit racial bias, could be estimated as a function of their age, level of education,
and sub-national unit (SNU; e.g. county), and contextual factors X (e.g. associated with their
14
sub-national unit (e.g. county-level Democratic vote proportion, median income, proportion of
population living in poverty, etc):
y
i
=
0
+
a[i]
+
e[i]
+
c[i]
a
N (0;
2
a
); for a = 1;:::;A
e
N (0;
2
e
); for e = 1;:::;E
SNU
N (
ULU[SNU]
+X
c
;
2
SNU
); for SNU = 1;:::;SNU
ULU
N (0;
2
ULU
); for ULU = 1;:::;ULU (2.2)
In the above model, the eects of the auxiliary variables age,
a[i]
, and education,
e[i]
, are
modeled as random eects (Gelman and Hill, 2006, Raudenbush and Bryk, 2002, Steenbergen
and Jones, 2002), such that they are assumed to be generated from a normal distribution with
= 0 and variance
2
. Further, the eect of sub-national unit,
SNU
, is modeled as a normally
distributed random eect, conditional on SNU-level contextual factors and the upper-level unit
that it is nested in. Finally, the eect of upper-level unit,
ULU
, is modeled as an unconditional
random eect.
After modeling individual-level measurements of the target construct, the next step in MrP
is to use the trained model to generate predictions
aeSNU
for each cross-classication of the
auxiliary variables age and education, conditional on sub-national location and contextual factors.
Thus, in the case of this example, for each sub-national unit, a prediction is made for each
combination of the levels of age and education. That is, the model is used to estimate the average
response for a person of age a with level of education e who lives in sub-national unit SNU, for
all combinations of a = 1;:::;A, e = 1;:::;E, and c = 1;:::;C.
15
Finally, post-stratication proceeds similar to as discussed above: sub-national means
Y
MrP
c
are estimated by summing over the products of the predicted means and population proportions
for each cross-classication of age and education:
Y
MrP
c
=
A
X
a
E
X
e
P
aeSNU
aeSNU
(2.3)
Where, P
aeSNU
is the proportion of people in sub-national unit = SNU of age = a with
education =e and
aeSNU
is the predicted outcome for the same cross-classied group.
All together, this approach helps address issues driven by data sparsity with regard to both
sub-national units and post-strata cells. Even if there are no observations for a particular county,
its eects can still be estimated as a linear combination of demographic eects, its contextual
variable scores, and the eects for the other counties in its state. Further, if a given post-stratum
cell contains few observations (e.g. if the data happens to contain few measurements for women
who are over 65 years of age and have not attended college), hierarchical smoothing helps stabilize
the estimates for this post-stratum cell. This robustness to sparsity makes it possible to include
more relevant auxiliary variables, which can further improve estimates (Gelman, 2007).
Another notable benet of MrP is that it is easy to expand the predictive model to exploit
known or expected eects. For example, interactions between auxiliary variables can also be
estimated and/or the eects of auxiliary variables can be permitted to vary across spatial units,
such as regions, which would constitute a so-called random slopes model (Raudenbush and Bryk,
2002).
However, while MrP oers notable advantages, evaluations of its performance highlight that
designing a MrP model requires careful thought (Buttice and Highton, 2013, Hanretty et al., 2016,
Lax and Phillips, 2009), as its capacity to capture regional variation in an outcome depends on the
variance that the predictive model explains. Both Buttice and Highton (2013) and Hanretty et al.
(2016) emphasize that well-chosen contextual variables are essential for MrP estimation, nding
16
that MrP models with poor or unrelated contextual variables may not perform well. Further, it
should be noted that the variables traditionally used in MrP models | such as Presidential Vote
share | may not oer be as predictive when modeling psychological outcomes. Accordingly, it
is important that contextual variables are not chosen based on convention but rather for their
association with the target variable.
Similarly, post-stratication variables should not be chosen arbitrarily, but rather with atten-
tion to the goal of explaining as much variance in the outcome as possible. That said, it is worth
noting that we are not aware of any evidence suggesting that a weak MrP model will yield worse
estimates than disaggregation or poststratication.
2.2.6 Multilevel Regression and Synthetic Post-stratication
While demographic post-stratication variables should be chosen in order to maximize ex-
plained variance in the outcome, there is a strong constraint on whether a variable is eligible
for being chosen: the joint distribution of the post-stratication variables' cross-classications
must be known. This has been a major obstacle for MrP estimation, because sub-national joint
distributions are not available for many combinations of variables (Leemann and Wasserfallen,
2017).
To address this issue, Leemann and Wasserfallen (2017) recently introduced a procedure for
conducting MrP with an estimated, or as they call it synthetic, post-stratication joint distribu-
tion, which they derive from marginal distributions. They provide evidence that their method,
multilevel regression and synthetic post-stratication (MrsP, Leemann and Wasserfallen, 2017),
performs better than raking, and comparably to or better than MrP (Leemann and Wasserfallen,
2017), depending on the predictive value of the added auxiliary variable (See Appendix A or
Leemann and Wasserfallen (2017) for a detailed discussion of simple and adjusted MrsP.).
17
To generate synthetic post-stratication joint distributions, Leemann and Wasserfallen (2017)
propose two approaches, which they refer to as `simple MrsP' and `MrsP with adjusted syn-
thetic joint distributions'. Under simple MrsP, synthetic joint distributions are calculated merely
as the product of the post-stratication variables' marginal distributions. For example, if only
the county-level marginal distribution of age and education is known, their county-level simple
synthetic joint distribution would be estimated as the product of their county-level marginal dis-
tributions. Importantly, the same approach can be used to extend a known demographic joint
distribution to include other demographic variables for which only the marginal distribution is
known. For example, if both the county-level joint distribution of age and gender and the county-
level marginal distribution of education are known, their synthetic joint distribution could be
estimated as the product of the joint and marginal distributions.
However, a notable short-coming of simple MrsP is that the estimated joint distribution will
only be correct when the auxiliary variables are independent. As they diverge from indepen-
dence, the synthetic joint distribution becomes a less accurate (Leemann and Wasserfallen, 2017).
Accordingly, while Leemann and Wasserfallen (2017) nd that errors in the synthetic joint distri-
bution do not necessarily induce errors in post-stratied, sub-national estimates, they also propose
a procedure for adjusting synthetic joint distributions. The goal of this adjustment procedure,
or `adjusted MrsP', is to encode any available knowledge about the true joint distribution in the
synthetic joint distribution. That is, rather than simply estimating the synthetic joint distribution
of age, gender, and education as the product of the joint distribution of age x gender and the
marginal distribution of education, adjusted MrsP would involve using external data (e.g. from a
nationally representative survey) to adjust the simple synthetic joint distribution to re
ect known
correlations between age, gender, and education (See Appendix A or Leemann and Wasserfallen
(2017) for a more detailed discussion of simple and adjusted MrsP.).
18
Regardless of whether simple or adjusted synthetic joint distributions are used, after calcu-
lating the synthetic joint distribution, MrsP follows the same procedure as MrP. That is, a hier-
archical model predicting individual-level responses is estimated, this model is used to generate
predictions for each post-stratum cell, and these predictions are weighted by their corresponding
population weights. The only dierence is that the population weights represent a synthetic joint
distribution.
2.3 Study 1
Previous evaluations of MrP have found that it oers diminishing returns, compared to disag-
gregation, as sample sizes increase. For instance, Lax and Phillips (2009) found that disaggrega-
tion and MrP performed comparably with a sample size of approximately 14,000 in a state-level
analysis with 49 states. However, this convergence in performance is contingent on the sample
being representative, because, as sample size is increased, a representative sample systematically
approaches the population.
In contrast, when a sample is not representative (e.g. due response biases), increasing the sam-
ple size does not necessarily yield a more precise approximation of the population. For example,
if a particular population segment is not represented in a sample, estimates for sub-national units
populated by that segment may be biased and the degree of that bias will partly depend on the
population portion of that segment. Under such conditions, the performance of disaggregation
and MrP will still move toward convergence as sample sizes increase as, after-all, an exhaustive
sample would be equivalent to the population. However, this convergence will be inhibited by the
degree to which the sample departs from representativeness.
To better understand this process, in this study, we investigate the results of Monte Carlo sim-
ulations (Mooney, 1997) in which disaggregation and MrP estimates are compared under dierent
conditions of sample size and response bias. Here, our focus is primarily on the performance of
MrP as both sample size and bias increase. Specically, we focus on simulated sample sizes of
19
1,000, 10,000, 50,000, and 100,000 that are drawn from a sub-national area containing 400 regions.
These sample sizes were selected so that we could evaluate the relative performance of MrP as
a function of a sample size, where sample size ranges from what would be considered a small
sample for sub-national estimation to a size that would be considered relatively large. Notably,
some recent sub-national investigations of psychological phenomena have used samples an order
of magnitude larger than our largest simulated sample size. However, these investigations were
conducted at the county-level and thus focused on a sub-national area containing more than 3000
sub-national units. To roughly approximate the ratio of these orders of magnitude while also
maintaining computational feasibility, we selected 100,000 as our largest simulated sample size.
Finally, in this simulation, we include disaggregation estimates as a performance baseline, but do
not include other estimation methods, such as disaggregation combined with post-stratication
or raked weights. Instead, we report a more comprehensive evaluation of relative performance on
real-world data in Study 2.
2.3.1 Method
Our Monte Carlo simulation is structured as follows. First, a population of respondents
is generated over a grid of 400 sub-national units. Specically, over the grid of units, marginal
distributions of demographic characteristics are sampled for three demographic variables: one two-
level,
1
variable and two three-level variables,
2
and
3
. Then, constrained by these marginal
distributions, demographic characteristics are assigned to simulated respondents within each sub-
national unit. Thus, each respondent is associated with a specic level of each
d
, for d in
d = 1; 2; 3.
Then, given the generated population, a set of population weights are randomly drawn. These
weights consist of linear eects (i.e. model parameters) for for the demographic and contextual
factors, a random sub-national unit eect, and individual-level error. These population weights
are used to generate values for the response variable Y
l[i]
.
20
Next, samples of sizes S = [1,000, 10,000, 50,000, 100,000] are drawn from the population for
each of three degrees of response bias. To simulate response bias, respondents are drawn with
specied probabilities for each level of one of the three-level demographic variables. Specically,
three dierent degrees of bias are examined: p = (1=3; 1=3; 1=3), p = (0:4; 0:35; 0:25), and p =
(0:70; 0:2; 0:1). Further, to simulate random residual response-bias at the sub-national unit level,
response probabilities are randomly assigned to each sub-national unit, which makes responses
from some units more likely than others. Finally, given a drawn sample, disaggregation and MrP
estimates of unit means are obtained and root mean squared error and bias are calculated.
For this study, three populations (N s = 10,000,000) were simulated. Then, for each popu-
lation, 50 sets of population weights were sampled. Finally, for each combination of the four
sample sizes and three levels of response bias, 100 samples were drawn. This yielded a total of
180,000 iterations, or 15,000 iterations for each combination of sample size and response bias.
These settings were chosen in order to minimize uncertainty while also maintaining reasonable
computational cost. For a detailed description of the data generating process, please see Study 1
Data Generating Process in Appendix B.
2.3.2 Results
For each sample drawn within our simulation framework, disaggregation and MrP estimates
were obtained and used to calculate root mean square error (RMSE) and bias. To evaluate the
performance of these methods as a function sample size and response bias, we estimated the mean
RMSE and bias for each method within each combination of sample size and response bias across
samples, population weights, and populations.
As expected, per previous ndings, under conditions of low response bias, the performance of
disaggregation quickly converges with that of MrP (See Table 2.1 and Figure 2.1). For example,
with samples of 100, disaggregation's expected RMSE (
X = 0:59; ^ = 0:04) was 0.22 higher
than MrP's (
X = 0:37; ^ = 0:04). However, increasing the sample size to only 1,000 considerably
21
Figure 2.1: RMSE (A) and mean bias (B) as function of sample size and bias. Error bars represent
2.5th and 97.5th percentiles in simulation distribution.
reduced this gap, such that disaggregation's expected RMSE (
X = 0:18; ^ = 0:01) was 0.05 higher
than MrP's (
X = 0:13; ^ = 0:01). Further, with samples of 10,000, disaggregation's expected
RMSE (
X = 0:06; ^ < 0:01) was only about 0.015 higher than MrP's (
X = 0:04; ^ < 0:01).
However, while the convergence of disaggregation and MrP's expected RMSE only decreases
slightly under medium response bias, under high response bias the convergence is attenuated
considerably. Further, our simulations suggest that as response bias increases, disaggregation's
RMSE becomes more variable. Notably, this increase in variance is not observed for MrP.
Regarding the estimation bias of disaggregation and MrP, neither method showed strong mean
bias under any conditions (See Table 2.1 and Figure 2.1). However, the variances of their estimates
of bias demonstrated starkly dierent patterns. Specically, while, in the low bias condition, the
variance of both disaggregation and MrP's bias shrunk toward zero as sample size increased,
this remained true only for MrP as response bias was introduced. That is, as expected, under
conditions of response bias, increasing the sample size had virtually no eect on the estimation
bias of disaggregation.
22
Table 2.1: Mean RMSE and Bias by Sample Size and Bias
Mean RMSE Mean Bias
Sample Size Bias Disaggregation MrP Disaggregation MrP
1,000 Low 0.594 (0.04) 0.368 (0.04) -0.001 (0.03) 0 (0.02)
1,000 Medium 0.595 (0.04) 0.369 (0.04) 0.009 (0.04) 0.001 (0.02)
1,000 High 0.61 (0.04) 0.38 (0.04) 0.022 (0.11) 0 (0.03)
10,000 Low 0.184 (0.01) 0.129 (0.01) 0 (<0.01) 0 (<0.01)
10,000 Medium 0.19 (0.02) 0.13 (0.01) 0.011 (0.04) 0 (<0.01)
10,000 High 0.245 (0.05) 0.14 (0.02) 0.031 (0.13) 0 (<0.01)
50,000 Low 0.08 (<0.01) 0.058 (<0.01) 0 (<0.01) 0 (<0.01)
50,000 Medium 0.091 (0.01) 0.058 (<0.01) 0.011 (0.04) 0 (<0.01)
50,000 High 0.16 (0.06) 0.062 (<0.01) 0.031 (0.13) 0 (<0.01)
1e+05 Low 0.056 (<0.01) 0.041 (<0.01) 0 (<0.01) 0 (<0.01)
1e+05 Medium 0.071 (0.02) 0.041 (<0.01) 0.011 (0.04) 0 (<0.01)
1e+05 High 0.146 (0.07) 0.044 (<0.01) 0.031 (0.13) 0 (<0.01)
2.3.3 Discussion
Previous evaluations of MrP suggested that the performance disaggregation and MrP converges
as sample size increases. However, these evaluations were conducted with representative samples.
In this study, we show that this convergence is inhibited as response biases are introduced to
the sampling mechanism. Specically, we found that under conditions of response bias, MrP
considerably outperformed disaggregation in terms of error and bias, regardless of sample size.
However, it is important to interpret MrP's superior performance in these simulations in con-
text. In any situation, MrP's performance will depend on the association between the outcome
and the variables selected for post-stratication, the contextual factor(s), and the modeled resid-
ual hierarchical variance (Buttice and Highton, 2013). Accordingly, the degree to which MrP
outperforms disaggregation is dependent on strength and comprehensiveness of the MrP model.
If a MrP analysis does not include a strong contextual factor or it post-straties on demographic
variables that explain very little variance in the outcome, the estimates generated by the analysis
will be worse than those generated by a stronger MrP analysis. This means that a weak MrP
analysis may not substantively outperform disaggregation, as performance depends on the quality
23
of the MrP model. We emphasize this not to suggest that disaggregation is a viable alternative
to MrP, but rather to highlight the importance of building a strong MrP model.
However, this raises the question of whether MrP will perform when applied to real-world
psychological data. Further, while the current study provides evidence that MrP performs well
under response bias, it does not address MrP's performance relative to alternative approaches to
survey adjustment, such as post-stratication and raking. We address these issues in the next
study by comparing real-world ground-truth to county-level estimates obtained via application of
MrP, MrsP, disaggregation, and raking to real-world data.
2.4 Study 2
In the current study, we use disaggregation, post-stratication, post-stratication with raking,
MrP, and MrsP to obtain county-level estimates of Catholic Adherence from data collected by
Project Implicit (Xu et al., 2013). We then compare these estimates to ground-truth, which has
been obtained from the 2010 US Religious Census (Grammich, 2012).
While we could have selected data from other sources for this study, we chose to focus on
Project Implicit data | specically, their Public IAT Racial Bias data | for two reasons. First, it
exemplies the kind of large-scale data that can be collected via online, opt-in collection strategies.
Over 17 years of operation, Project Implicit has collected millions of responses and it oers an
unprecedented opportunity to examine sub-national variations in racial bias and associations
between racial bias and secondary outcomes. Second, because of these characteristics, these
data have been increasingly used to estimate sub-national racial bias and we expect that such
applications will only become more popular in the future. As such, we believe that it is particularly
important to evaluate sub-national estimates based on these data using a variety of procedures.
To this end, we evaluate estimates obtained from ve dierent estimation procedures: (1)
disaggregation (2) disaggregation with rates obtained via raking (3) disaggregation with weights
obtained via post-stratication and raking (4) MrP and (5) MrsP. Specically, these methods are
24
used to estimate the county-level rate of Catholicism. We then evaluate estimate accuracy and
bias via RMSE and mean average bias (MAB).
2.4.1 Data
2.4.1.1 Project Implicit Data
The primary data used for this study were responses to an item measuring religious aliation
which was administered to participants in Project Implicit's (Xu et al., 2013) racial bias IAT
survey from 2002-2013. This item, along with items measuring participants' age, level of education,
sex, race, and county were obtained from the 2002-2018 Public Racial Bias IAT Open Science
Foundation repository
1
.
2.4.1.2 2010 U.S. Religious Census
Ground-truth estimates of Catholic adherents for 3105 counties located in the contiguous U.S.
was obtained from the county-level 2010 U.S. Religious Census (Grammich et al., 2010) data,
which was downloaded from the Association of Religion Data Archives http://www.thearda.com/Archive/.
2.4.1.3 2011-2015 American Community Survey 5-year estimates
To estimate county-level joint distributions, we rely on US Census data obtained from the
2011-2015 American Community Survey (ACS) 5-year estimates and 2010 decennial US Census
data. We also use these data to estimate county-level proportion of Blacks, Latinos, people below
the poverty line, proportion of population living in an urban area, and population density. Census
data was accessed using the `tidycensus' (Walker, 2019) and `acs' R packages.
2.4.1.4 MIT Election Data
To estimate county-level 2016 Democratic vote proportion, we use data obtained from the
MIT Election Data and Science Lab (MIT Election Data and Science Lab, 2018).
1
https://osf.io/yn2g7/
25
2.4.1.5 Geographic Data
County names, Federal Information Processing Standard (FIPS) codes, and locations were
determined using data accessed via the `USAboundaries' R package (Mullen and Bratt, 2017),
which provides access to the US Census Bureau's geographic database (US Census Bureau, 2015).
2.4.2 Method
In this study, we estimate county-level Catholic adherence rates using disaggregation, post-
stratication, post-stratication and raking, MrP and MrsP. For our standard MrP estimates, age,
gender, and race are selected as post-stratication variables, as these are directly available from
the US census. In contrast, for our MrsP estimates, we extend the post-stratication variables
to also include level of education by calculating its adjusted synthetic joint distribution with age,
gender and race (Leemann and Wasserfallen, 2017).
2.4.2.1 Primary Data preparation
Participants in the IAT Implicit Race survey who reported religious aliation and who were
matched to counties located in the contiguous U.S. (3018) were selected for analysis, N =
3,014,859. This yielded a data set with coverage of 3,088 counties (County N summary statis-
tics: Mean = 976, Median = 114, SD = 3; 440). For the selected participants, race, age, sex,
and education was coded as follows: race = [Black, Hispanic, Other, White]; age = [18-29, 30-
44, 45-64, 65+]; sex = [female, male]; education = [High school graduate or less; some college
through bachelors degree; graduate degree
2
]. We selected these demographic categories with the
goal of minimizing the risk of sparsity while maintaining as much demographic distinction as
possible. That is, selecting a more ne-grained set of races would allow for more demographic
distinction, but it would also increase sparsity as other races were far less frequent in our sample.
Finally, participants' Catholic aliation was represented as a binary indicator where `1' indicates
self-reported Catholic aliation.
2
We did not discriminate between professional and non-professional secondary degrees
26
2.4.2.2 Secondary Data preparation
Four contextual variables were selected for the MrP and MrsP models: county-level Democratic
vote share for the 2016 Presidential election and the proportion of Blacks, Latinos, people below
the poverty line, population living in an urban area, and population density. These variables were
selected based on a priori expectations about their potential association with racial bias against
blacks. That is, we expected racial bias in a given area to be partially dependent on the the areas
Democratic vote share, the racial composition of the area, the number of people living below the
poverty line, and the urban/rural status of the area. These variables were all standardized prior
to inclusion in Mr(s)P models.
2.4.2.3 Disaggregation
We obtained disaggregated estimates of D score y
u[l]
for county l located in region u simply
by calculating the sample mean for observations from county l:
y
1
u[l]
=
P
N
l
i
y
u[l];i
N
l
(2.4)
where y
u[l];i
represents response i from county u[l], N
l
represents the sample size for county
u[l], and y
1
u[l]
is the estimated mean for county u[l] obtained via method 1, disaggregation.
2.4.2.4 Post-stratication and Raking
In addition to simple disaggregation, we also estimate two sets of post-stratication weights and
use these to perform weighted disaggregation. The rst set of weights (Rake 1) were calculated
via raking across the county-level marginal distributions of age, sex, and race using the same
demographic levels as in the Mr(s)P models. To address issues of post-stratum cell sparsity, to
generate the second set of weights (Rake 2), we collapsed age into two levels (below/above 30 years
of age), race into three levels (White, Black, Other), and education into two levels (No college/at
27
least some college). These demographic collapses were selected in order to minimize sparsity |
which helps stabilize sample estimates | while also maximizing demographic variety. Notably,
the necessity of collapsing levels and choosing which levels to collapse is one of the major obstacles
for post-stratication and raking as there are no established guidelines for making these choices.
We then performed a second raking procedure across the joint distribution of the collapsed age
and race variables and the marginal distributions of gender and education. For each set of weights,
county-level estimates of Catholic adherence were then generated via weighted disaggregation.
Raking was performed for each county based on the data present in that county. Thus,
counties with insucient demographic coverage (e.g. counties for which data was fully missing for
a demographic cell) were dropped from analysis. Iterations were limited to 1,000 and counties for
which the raking procedure did not converge were discarded. In this context, convergence failure is
typically caused by demographic sparsity and it indicates that stable sample weights could not be
derived. When raking is used for nation-level analyses, it is common to identify the source of the
sparsity and remove it by collapsing additional demographic variables. Unfortunately, this is often
not practical when raking is applied to sub-national estimation as this procedure would need to be
repeated for each sub-national area for which convergence was not reached. As an alternative, we
exclude areas that did not reach convergence. Importantly, this approach could introduce bias into
the distribution of sub-national estimates, as counties for which weights do not converge could be
systematically dierent from counties that do. However, for sub-national estimation procedures
that involve thousands of sub-national areas, conducting boutique adjustments for each area is
simply not practical. Raking was implemented using the `survey' R package. (Lumley, 2004).
2.4.2.5 MrP
To estimate county-level Catholic adherence via MrP, we modeled individual Catholic ali-
ation using a hierarchical generalized linear model with a logit link estimated using the `lme4'
R package (Bates et al., 2015). In this model, age, race, and sex were included as demographic
28
variables, county, state, and division were included as geographic levels, and county-level Demo-
cratic vote share for the 2016 Presidential election and the proportion of Blacks, Latinos, people
below the poverty line, population living in an urban area, and population density were included
as contextual factors.
These demographic variables were selected based on data available information from the U.S.
census. Further, we included county, state, and division as geographic random eects in order to
maximize the benets of partial pooling. In this design, the random eect for a county with few
respondents will be shifted toward the intercept for the state containing the county. Similarly,
state means are assumed to be distributed around a random region intercept. Accordingly, this
3-level structure allows the model to re
ect potentially complex regional patterns across the U.S.
Finally, we selected our contextual factors based on a priori expectations about their potential
association with Catholicism. That is, we expected the prevalence of Catholicism in a given area
to be partially dependent on the the area's Democratic vote share, the racial composition of the
area, the number of people living below the poverty line, the urban/rural status of the area, and
the area's density.
29
These variables were all standardized prior to inclusion in Mr(s)P models. Specically, we
estimated the following model:
P (Y = 1) = logistic()
=
0
+
division:race[i]
+
division:sex:age[i]
+
state[i]
+
county[i]
division:race[i]
N (0;
2
a
); for division :race = 1;:::;DivisionRace
division:sex:age[i]
N (0;
2
e
); for division :sex :age =;:::;DivisionSexAge
county
N (
state[county]
+X
c
;
2
county
); for county = 1;:::;County
state
N (0;
2
state
); for state = 1;:::;State (2.5)
Specied using `lme4' this model would be:
glmer ( y = 1 + . . . +
(1 j county ) +
(1 j s t a t e ) +
(1 j d i v i s i o n : race ) +
(1 j d i v i s i o n : sex : age )
where `...' includes xed eects for each contextual factor. That is, we estimated random
intercepts at both the county (N = 3,088) and state (N = 48) levels. Further, demographic eects
were estimated as random intercepts crossed with division. Specically, a random intercept was
estimated for each level of race within each of the 9 U.S. divisions. Similarly, the interaction of
sex with age was also crossed with division. Initially, we did not cross the demographic eects
with division; however, models estimated on the full data set with this specication did not reach
convergence after many iterations. In contrast, we found that models that crossed demographic
eects with division converged relatively quickly.
30
This model was then used to make predictions
county;j
for each cross-classication j of race,
age, and gender within each county. Finally, the post-stratication step was implemented using
the county-level population joint distribution for race, age, and gender estimated by the US
Census:
Y
county
=
X
j
P
county;j
county;j
(2.6)
2.4.2.6 MrsP
MrsP estimates were obtained following exactly the same procedure as for MrP. However, an
additional random eect for education was estimated. As in the MrP model, the random eect
for education was crossed with division.
To obtain post-stratied estimates, we calculated the adjusted synthetic joint distribution be-
tween the county-level joint distribution of race, age, and gender and the county-level marginal
distribution of education. To inform the adjustment procedure, we relied on National-level esti-
mates of the full joint distribution of these variables obtained from the U.S. census. Using the
adjusted synthetic county-level joint distribution, post-stratication was implemented following
the same procedure used for MrP.
2.4.3 Results
To evaluate the relative performance of each estimation method, we used the rate of Catholic
adherents reported by the 2010 U.S. Religious Census (Grammich et al., 2010) to calculate RMSE
and MAB for each set of estimates (See Table 2.2). The results indicate that both MrP (0.09)
and MrsP (0.09) slightly outperform the estimates obtained via disaggregation (0.12), raking
over the marginal distributions of age, sex, and race (Rake 1; 0.10), and collapsing the levels of
the demographic variables and raking over the joint distribution of age and race and marginal
distributions of sex and education (Rake 2; 0.10). Regarding the average bias of the estimates,
31
all estimates were slightly negatively biased, but the MrP and MrsP estimates were slightly less
biased than the estimates obtained via Disaggregation and the rst raking procedure.
However, it is important to interpret these results relative to county coverage. For example,
while MrP and MrsP perform only slightly better than the other methods, they oer complete
coverage of the 3,105 counties for which ground-truth was obtained. While disaggregation also
oered almost complete coverage, it showed the worst performance in terms of RMSE and MAB.
Further, the raking procedure that preserved the original coding of the demographic variables
oered coverage of only 1,468 counties. While the procedure that collapsed the demographic
variables and included the joint distribution of age and race oered better coverage, it still missed
nearly 1,000 counties. Notably, if RMSE is calculated for MrP and MrsP over only the counties
covered by the raking procedures, it drops to 0.08.
Table 2.2: Performance metrics for estimates of
County-level Catholic Adherence
Method RMSE MAB N Counties
Dissagregation 0.12 -0.02 3076
MrP 0.09 -0.01 3105
MrsP 0.09 -0.01 3105
Rake 1 0.10 -0.02 1468
Rake 2 0.10 -0.01 2155
Finally, to evaluate the overall associations between ground-truth and each set of estimates,
we calculated their correlations and plotted their lines of linear t (See Figure 2.2). Notably, MrP
(r = 0.73) and MrsP (r = 0.74) were substantially more strongly correlated with ground-truth,
compared to the estimates obtained via disaggregation (r = 0.59) and both raking procedures,
Raking 1 r = 0.62 and Raking 2 r = 0.65.
2.4.4 Discussion
These results suggest that even with an extremely large sample, Mr(s)P can be used to obtain
sub-national estimates that are clearly superior to methods that have been more widely used
32
Figure 2.2: Observed (y-axis) vs. estimated (x-axis) county-level % Catholic for each estimation
method. Points represent counties. Correlation coecient for predicted vs. observed values shown
in top right of each panel.
in the psychological literature. Notably, while the Mr(s)P estimates only slightly reduced error
and bias, these improvements were achieved over the full set of counties. In contrast, the other
methods were only able to generate estimates for a subset of counties. Further, estimates obtained
via Mr(s)P also showed substantially stronger correlations with ground-truth.
That said, it should be noted that we only compared Mr(s)P estimates to estimates derived
from two dierent raking/post-stratication procedures. Thus, it is certainly possible that, out
of the universe of possible raking congurations, a better performing conguration may exist.
Nonetheless, for both congurations we sought to include as much information as possible while
also minimizing issues caused by demographic sparsity.
It is also notable that MrsP oered virtually no improvement in performance, relative to MrP.
This, of course, is a function of the conditional relationship between the demographic variable
added for MrsP (education) and the outcome (Catholic aliation). Simply put, in this case,
adding education to the MrP model did not improve its accuracy. Accordingly, in our view, de-
pending on the research context, researchers should still consider the possible benets of extending
the post-stratication joint distribution.
Ultimately, these results provide evidence that MrP should be preferred for obtaining sub-
national estimates from large-scale convenience data. While other methods performed only slightly
worse in terms of error and bias, MrP oered better coverage of sub-national units and stronger
correlations with ground-truth. Further, it is worth noting that, in our experience, estimating
33
a single MrP model over a set of sub-national units is considerably simpler and allows far fewer
researcher degrees of freedom than obtaining sample weights via post-stratication or raking
because MrP does not require the arbitrary collapsing of demographic categories.
2.5 Study 3
Results from the previous study indicated that Mr(s)P oered both superior performance and
better coverage of sub-national units, relative to un-weighted disaggregation and weighted disag-
gregation. However, in instances where prediction accuracy is not the central focus, these results
might raise the question of whether it matters which estimation method is used. For example,
in psychology, researchers are often primarily interested in obtaining county-level estimates of a
given construct and then drawing inferences about the association between this construct and a
second county-level outcome. In such situations, to what extent might it matter which estimation
procedure researchers use?
In this study, we address this question by estimating the association between county-level
implicit and explicit racial bias and Barack Obama's Presidential vote share in 2008. In addressing
this question, our goal is not necessarily to provide evidence for or against an association between
these constructs. Rather, we are interested in how conclusions about this association might vary
depending on how county-level racial bias is estimated. Nonetheless, for the purposes of this
study, we sought to test the following hypothesis:
Hypothesis 1 Controlling for 2004 county-level Democratic vote share, White's implicit
and explicit county-level racial bias against Blacks should be negatively associated with
Barack Obama's 2008 county-level Presidential vote share.
34
2.5.1 Data
2.5.1.1 Racial Bias
The primary data used for this study were responses to the race implicit association test (IAT)
obtained from Project Implicit (Xu et al., 2013) and collected between 2002 and 2017. The IAT
relies on a timed dual-categorization task that requires respondents to evaluate pairings of White
and Black faces and words referring to `good' and `bad' things. An indication of racial bias (against
Blacks) occurs when a respondent more quickly categorizes words representing "bad" things as
bad when they are paired with a Black face, compared to a White face, and when they are able
to more quickly categorize "good" words when they are paired with a White face (Greenwald
et al., 2009a). Through repeated measures sampling across categorization trails, the IAT permits
the estimation of the so called D score, which represents the dierence in response latency. D
scores range from -2.0 to 2.0, where scores above 0 indicate a positive bias toward White faces
and a negative bias toward Black faces. Participants who completed the Race IAT and who were
located in a county in the contiguous U.S. were retained for analysis, N = 1,704,789.
Explicit racial bias toward blacks was measured -10 to 10 via single item re
ecting participants'
`warmth toward Blacks', N = 1,091,841.
All other data sources were identical to those reported in Study 2.
2.5.2 Method
As in Study 2, we generated county-level estimates using un-weighted disaggregation, MrP,
MrsP, and two variations of weighted disaggregation, where weights were calculated using a com-
bination of raking and post-stratication. The MrP and MrsP models were identical to those
reported in Study 2, with the exception that for this study racial bias was modeled as a contin-
uous random variable. For each method, estimates of both implicit and explicit racial bias were
obtained.
35
After obtaining estimates of county-level implicit and explicit racial bias using each estimation
procedure, we estimated separate linear regression models in which Obama's 2008 county-level
Presidential vote share was regressed on either county-level implicit or explicit racial bias. As
controls, we also included Kerry's 2004 county-level Presidential Vote share as well as the county-
level proportion of Blacks, Latinos, people living in urban areas, people living below the Federal
poverty line, density. Controlling for these variables is essential as they are used in obtaining the
Mr(s)P estimates. All independent variables were standardized in all models. We then examined
the estimated coecient for either measure of racial bias across estimation methods.
2.5.3 Results
As expected, MrP and MrsP provided better county coverage (See Table 2.3). Further, the
smoothing eects of the hierarchical model are evident in the reduced variance and more reason-
ably minimum and maximum values of the Mr(s)P estimates, relative to the other methods.
Table 2.3: Summary of sub-national estimation of implicit racial bias for each method
Method N Counties Mean St. Dev. Min Max
Implicit bias - Disaggregation 3,086 0.33 0.10 0.62 0.99
Implicit bias - MrP 3,105 0.36 0.07 0.03 0.45
Implicit bias - MrsP 3,105 0.36 0.07 0.03 0.45
Implicit bias - Rake 1 1,612 0.35 0.11 0.37 0.78
Implicit bias - Rake 2 2,270 0.34 0.14 0.48 1.02
Explicit bias - Disaggregation 3,073 0.53 0.63 10.00 5.00
Explicit bias - MrP 3,105 0.55 0.34 1.37 1.20
Explicit bias - MrsP 3,105 0.56 0.34 1.39 1.19
Explicit bias - Rake 1 1,612 0.52 0.58 4.38 5.97
Explicit bias - Rake 2 2,269 0.47 0.81 5.83 6.09
However, despite the reduced variance in the Mr(s)P models, for both implicit and explicit
racial bias the estimated association between Obama's 2008 Presidential vote share was substan-
tially stronger for the models that relied on the Mr(s)P estimates (See Tables 2.4 and 2.5. For
example, the association between implicit racial bias and Obama's 2008 vote share was estimated
as b = -0.05%, SE < 0.001, 05% = [-0.05, 0-.042], indicating that a one SD increase in implicit
36
racial bias | as estimated via MrP | was associated with an expected 5% decrease in Obama's
county vote share. In contrast, estimates obtained from other methods were substantially weaker,
though still statistically signicant.
Table 2.4: Estimated Conditional Association between County-level implicit racial bias and
Obama's 2008 Presidential Vote Share
Method Estimate SE 95 CI
Disaggregation 0:006 0:001 [-0.008, -0.004]
MrP 0:047 0:001 [-0.051, -0.042]
MrsP 0:048 0:001 [-0.053, -0.044]
Raking 1 0:003 0:001 [-0.005, -0.001]
Raking 2 0:003 0:001 [-0.005, -0.001]
Table 2.5: Estimated Conditional Association between County-level explicit racial bias and
Obama's 2008 Presidential Vote Share
Method Estimate SE 95 CI
Disaggregation 0:004 0:001 [-0.006, -0.003]
MrP 0:043 0:001 [-0.047, -0.04]
MrsP 0:040 0:001 [-0.044, -0.036]
Raking 1 0:003 0:001 [-0.006, -0.001]
Raking 2 0:004 0:001 [-0.006, -0.002]
Across all models, the estimated eects for the control variables were equivalent within round-
ing error at three decimal places (See Table 2.6 for these estimates).
Table 2.6: Estimated eects for control variables
Estimate Std. Error t value Pr(>jtj)
Intercept 0:418 0:001 547:000 0
Kerry Vote Share std 0:120 0:001 127:000 0
% Black std 0:029 0:002 17:200 0
% Urban std 0:009 0:001 9:690 0
% Below Poverty std 0:015 0:001 16:500 0
Density std 0:001 0:001 1:670 0:095
% Latino std 0 0:001 0:449 0:653
37
2.5.4 Discussion
Overall, these results clearly demonstrate that downstream inferences based on county-level
estimates can vary dramatically depending on the method of estimation. Using the Mr(s)P
estimates, we observed a stronger association between racial bias and Obama's 2008 vote share;
however, using estimates obtained from the other methods, this association attenuated. Given
our results in Studies 1 and 2 as well as other literature on MrP, we would be generally more
inclined to trust inferences derived from MrP estimates as these can be reasonably expected to be
the most accurate. Importantly, in some cases, such as when a large random sample is available,
MrP may perform no better than disaggregation. However, when such a sample is not available
or, further, when the available sample is subject to various sources of sampling and response bias,
MrP can provide a more robust approach to obtaining sub-national estimates, compared to other
approaches like disaggregation, raking, or post-stratication.
However, it should also be noted that using MrP estimates to estimate associations with
secondary variables raises unique challenges. For example, given that the estimates are a function
of the contextual predictors in the MrP model, it is important to consider if and to what extent
an association between MrP estimates and a secondary variable might be driven by masked
associations with the contextual variables. Here we attempt to account for this possibility by
including the variables used in the MrP model. However, researchers who use MrP in such
instances should carefully consider these issues. Finally, it should be noted that MrP does not
resolve issues of causal direction (Caughey and Warshaw, 2019). That said, given the gains in
accuracy and robustness to sampling bias, in our view MrP is still likely one of the best methods
for obtaining sub-national estimates which will then be used in secondary analyses.
2.6 General Discussion
One of the primary diculties for psychological research is establishing connections between
hypothesized constructs and real-world phenomena. While psychologists are experts at simulating
38
phenomena in laboratory settings and developing indirect (e.g. survey based) measures of target
phenomena, establishing external validity remains one of the primary challenges for psycholog-
ical research. While geographic studies of psychological phenomena raise their own challenges,
they also directly supplement conventional approaches to psychological research. More speci-
cally, geographic approaches to psychological research oer an opportunity to directly investigate
the association between psychological constructs and real-world outcomes (Rentfrow and Jokela,
2016).
This is, of course, not a revelation, as cultural psychology pioneered cross-cultural (e.g. inter-
national) studies decades ago. However, early investigations of the geographic study of psycho-
logical outcomes were limited by focus on nation-level variation. Recently, however, researchers
have begun focusing psychological constructs' sub-national variation and associations with target
outcomes. This work has yielded a number of important ndings, such as new evidence for the
deleterious eects of racial prejudice (Hehman et al., 2017, Leitner et al., 2016a, Orchard and
Price, 2017, Rae et al., 2015) and associations between personality and various target outcomes
(G otz et al., 2018, Jokela et al., 2015, McCann, 2017a, 2018, 2017b, Rentfrow et al., 2015), as well
as insight into the spatially structured patterning of personality (Jokela et al., 2015, Rentfrow
et al., 2008, 2013, 2015) and moral values (Hoover et al., 2018).
However, while the sub-national study of psychological constructs aords a range of exciting
new opportunities for research, it also raises a number of challenges that are not commonly encoun-
tered or addressed in psychological research. In this work, our goal was to oer a comprehensive
introduction to state-of-the-art methods, Mr(s)P, for addressing these challenges that have been
developed in other elds. Beyond this, we sought to evaluate the performance of these methods
under conditions similar to those in which many sub-national studies of psychological phenomena
have been conducted. Specically, the methods reviewed in this work were both designed for
and have been most often applied to relatively small, randomized, and representative samples.
39
In contrast, psychologists today often nd themselves working with large, non-random, and non-
representative samples with non-uniform sub-national sparsity. Accordingly, in order to provide
researchers with an informed introduction to these methods, we addressed several questions re-
garding the optimal approach to sub-national estimation under such conditions: (1) whether MrP
oers improvements in accuracy, compared to other methods, when applied to very large samples
and (2) whether estimates obtained via MrP yield dierent conclusions about associations with
secondary variables, compared to estimates obtained via other methods.
Specically, we found that MrP outperforms other commonly used methods, including dis-
aggregation, raking, and post-stratication, when applied to samples with response biases, even
when those samples contain 100,000 (Study 1) or more than three million (Study 2) responses.
In Study 1, we evaluated the dierential performance of disaggregation | a simple but widely
used approach to small-area estimation | and MrP under varying conditions of sample size and
response bias. This study provided strong evidence that under conditions of response bias, MrP
dramatically outperforms disaggregation, regardless of sample size. Importantly, previous research
has shown that with even modestly large random samples (e.g. N =10,000) disaggregation per-
forms comparably to MrP. In contrast, our results suggest that in both simulated and real-world
data, MrP outperforms disaggregation under conditions of response bias. Further, in Study 2, we
provide evidence that even with a very large convenience sample, Mr(s)P is a better estimator
than not only disaggregation, but also raking and post-stratication. We also show that it also
oers substantially better coverage of spatial units even with a sample size of more three million
respondents.
Finally, in Study 3, we show that the associations between an estimated county-level construct
and a secondary outcome can vary substantially depending on the estimation procedure used. No-
tably, the estimates obtained via Mr(s)P were consistent with previous literature; however, using
Mr(s)P estimates for secondary analysis raises issues of potential contamination caused by the
contextual variables included in the MrP model. To address this issue, we suggest that researchers
40
who use MrP estimates in secondary analyses conduct sensitivity analyses by controlling for the
contextual factors in the MrP model (e.g. in a regression model that includes MrP estimates as
an independent variable) and, ideally, compare inferences across multiple estimation strategies
(e.g. against estimations obtained via post-stratication).
Overall, given the relative ease of implementing MrP and the improvements in estimation
accuracy and stability that it oers, we see little reason for this method to not be more widely ap-
plied to sub-national geographic studies of psychological phenomena. While today's large, online,
opt-in samples have opened many new opportunities for studying the geographic distribution of
psychological constructs, it is important that limitations of these samples are addressed as well as
is possible. In our view, MrP is a useful tool that can help psychological researchers work toward
this goal.
41
Bound in Hatred: The role of group-based morality in acts
of hate
Throughout history, humans have discriminated against, persecuted, and murdered other humans
because of their identities (Kiernan, 2007, Moore, 2000, Nirenberg, 2015). In addition to the
horrors of death and extermination, such acts of hatred have tragic eects on survivors and
survivors' communities. Victims of hate crime, for example, experience higher levels of depression
and anxiety compared to victims of comparable crimes not motivated by bias (Hall, 2013) and
they may ultimately reject or despise the part of their identity that was targeted (Cogan, 2002).
Even for people who are not directly victimized, just sharing a trait targeted by a hate crime can
cause clinical levels of post-traumatic stress (Government of Canada et al., 2011). Online racial
discrimination has also been linked to elevated levels of depression and anxiety in victims (Tynes
et al., 2008).
Tragically, the human tendency toward identity-based hatred and violence remains a major
contributor to human suering. In the 20th-21st century, genocide was one of the leading causes
of preventable violent death (Blum et al., 2008). In recent years, hate crime in the United States
(Center for the Study of Hate & Extremsism, 2018, Eligon, 2018) and Europe (Engel et al., 2018)
has systematically increased, with U.S. incidence reports reaching their highest levels since the
September 11th attack on the World Trade Center. The number of hate groups operating in the
U.S. has also recently reached a record high (SPLC, 2019), and concerns over the rising prevalence
42
of online hate speech have led to shifts in social media content policies (Beckett, 2019, Conger,
2019, Frenkel et al., 2018).
These trends highlight the importance of developing a better understanding of why, when,
and where acts of hate occur. Research addressing these questions has often focused on the
roles of inter-group threat, resource based con
icts, and political ideology as focal mechanisms
in the emergence of behaviors like hate crime (Hall, 2013, Stacey et al., 2011b), hate group
activity (McCann, 2010, McVeigh, 2004, McVeigh and Sikkink, 2005, Medina et al., 2018), and
hate speech (Piatkowska et al., 2018). Echoing these ndings, psychological investigations of
attitudinal prejudice have consistently observed that perceptions of either realistic or symbolic
outgroup threat (Stephan and Stephan, 2017, 1996) lead to increased prejudice toward outgroups
and that this eect is positively mediated by attitudes associated with authoritarianism and social
dominance (Asbrock et al., 2010, Charles-Toussaint and Crowson, 2010, Cohrs and Ibler, 2009,
Duckitt and Sibley, 2009, 2017).
Together, this line of work suggests behaviors like hate crime, hate group activity, and hate
speech can be at least partly understood as responses to perceived outgroup threats (Hall, 2013).
However, this account begs an essential question: what is it about some threats | and the people
who perceive them | that is sucient for inducing such costly behaviors, behaviors that can lead
to social exclusion, retaliation, and the heaviest of criminal penalties?
To answer this question, we propose that the moralization of a threat is a central factor in the
motivational process underlying acts of hate such as hate speech, hate group activity, and hate
crime | behaviors we refer to collectively as extreme behavioral expressions of prejudice (EBEPs).
This view is grounded in a large body of research linking violence and extreme behavior to moral
values, perceptions of moral violations, and feelings of moral obligation (Atran and Ginges, 2012,
Darley, 2009, Fiske et al., 2014, Graham and Haidt, 2011, Mooijman et al., 2018, Rai, 2019,
Skitka et al., 2017, Zaal et al., 2011). Drawing on this work, we suggest that EBEPs are often
motivated by the belief that an outgroup has done something morally wrong and, further, that a
43
person's risk of perceiving such moral violations is partially dependent on their moral values | a
hypothesis we refer to as the Moralized Threat Hypothesis.
By accounting for the role of moralization in EBEPs, this perspective provides a unied frame-
work for understanding why certain events or dynamics trigger hateful behaviors. Under this
hypothesis, EBEPs can be understood as a perpetrators' response to a perceived violation of their
moral values. Thus, EBEP triggers that have largely been studied in isolation, such terrorist
attacks, (Byers and Jones, 2007, Hanes and Machin, 2014), immigration (Stacey et al., 2011a),
same sex marriage (Levy and Levy, 2017, Valencia et al., 2019), interracial romantic relationships
(Perry and Sutton, 2006, 2008) or the espousal of non-Western religious values (Green and Spry,
2014, Velasco Gonz alez et al., 2008), are conceptualized as perceived moral violations. Accord-
ingly, from the perpetrators' perspective, EBEPs function as a mechanism for regulating social
relations (Fiske et al., 2014, Rai and Fiske, 2012) with the outgroup that is blamed for the moral
violation.
To test the Moralized Threat Hypothesis, we relied on a diverse set of observational and
experimental methodologies in order to investigate the role of moral values in both real-world
EBEPs and beliefs about the justication of EBEPs. Given recent increases in EBEPs aligned
with right-wing ideology (Eligon, 2018, Engel et al., 2018, Lowery et al., 2018) and concerns
over the role of hate speech in violent crimes toward social identities often demonized by right-
wing groups (Roose, 2018), we focused specically on EBEPs that were aligned with right-wing
ideologies. Accordingly, we expected that these EBEPs would be associated with moral values
oriented around group preservation because such values have been linked to conservatism and
right-wing ideologies in U.S. contexts (Frimer et al., 2013, Graham et al., 2009). To operationalize
these values, we rely on Moral Foundations Theory (MFT; Graham et al., 2011, 2013), which
proposes a hierarchical model of moral values composed of two superordinate, bipolar categories:
Individualizing values and Binding values. While the former is comprised of values focused on
individuals' rights | caring for others and following principles of fairness | the latter is comprised
44
of values considered to be associated with group preservation | maintaining ingroup solidarity,
submitting to authority, and preserving the purity of the body and sacred objects.
Using this model of moral values, the Moralized Threat Hypothesis predicts that Binding values
are associated with EBEPs toward groups marginalized by the ideological right. We examine this
prediction across ve studies. In Study 1, we use a series of Long Short-Term Memory (Hochreiter
and Schmidhuber, 1997) neural network models to study online hate speech and test the hypothesis
that moralization and hate speech are concomitant. In Study 2, we move out of the digital world
and focus on the geospatial relationship between U.S. county-level moral values and the county-
level prevalence of hate groups. Then, in Studies 3, 4, and 5, we switch from naturally generated
data to data collected via psychological surveys, which enable us to test our hypotheses with more
precision and control. In Study 3, we investigate whether people see a range of EBEPs as more
justied when they believe that the targeted outgroup has done something morally wrong. Then,
in Study 4, we test for associations between American's group-oriented moral values and perceived
justication of EBEPs against Mexican immigrants and investigate whether this association can
be accounted for by their perceptions of outgroup moral wrongdoing. Finally, in Study 5, we
investigate these eects for a dierent outgroup (Muslims) using a national U.S. sample stratied
across participants' gender, age, and political ideology.
Together, these studies rely on a multi-methodological approach that uses observational and
experimental research designs to test hypotheses against distinct operationalizations and mea-
surements of EBEPs and group-oriented moral values. Relying on this approach enables us to
directly study the phenomena of interest, EBEPs, while also maintaining the precision and con-
trol aorded by more traditional approaches to psychological research. Across all ve studies,
we found consistent evidence that extreme behavioral expressions of hatred and the belief that
they are justied are associated with the Binding values. Further, data from the three surveys
we conducted indicate that this association can be at least partly explained by the perception
45
of outgroup moral wrongdoing. Notably, these estimated eects remain substantial even after
adjusting for participants' political ideology.
3.1 Study 1: Hate Speech and Moral Rhetoric
Our rst step toward testing the Moralized Threat Hypothesis is to investigate the relationship
between expressions of hate speech in social media posts and the concomitant reliance on moral
rhetoric evoking the so-called Binding vices. Under MFT, each component of the Individualizing
and Binding values is associated with two valences or poles: virtues (i.e. prescriptive moral
concerns) and vices (i.e. proscriptive moral concerns). Conditional on the Moralized Threat
Hypothesis, which holds that EBEPs are motivated by perceived moral violations, we hypothesized
that online hate speech is most often articulated through the language of the Binding vices |
language evoking concerns about violations of loyalty, authority, and purity.
To test this hypothesis, we focused on the social media platform Gab, which was recently
embroiled in controversy following its role in the October 2018 terrorist attack on a synagogue
in Pittsburgh, Pennsylvania (Roose, 2018). Gab purports to be a haven for free speech and has
attracted a large membership of users who align themselves with far-right ideologies (Anthony,
2016, Benson, 2016). This emphasis on free speech entails the absence of any institutional oversight
of content and, in contrast to mainstream social media platforms, Gab users are free to post
anything, including hate speech and incitations of violence against marginalized groups.
Accordingly, Gab presents a valuable opportunity to investigate the rhetorical structure of
hate speech because the combined eects of the ideological biases of its users and the absence of
content moderation mitigate issues posed by the statistical rarity of hate speech on mainstream
social media platforms. Indeed, a recent analysis of hate speech on Gab found that hate words (e.g.
racial and group-oriented slurs) occur at 2.4 times the rate of hate words on Twitter (Zannettou
et al., 2018). This relative prevalence of hate helps mitigate issues of sparsity that have been
46
encountered in other computational studies of hate speech in social media discourse (Del Vigna
et al., 2017, Saleem et al., 2017, Zhang et al., 2018).
To investigate the relationship between hate speech and moral rhetoric evoking the Binding
vices, we rst annotated 7,692 messages posted by 800 Gab users for hate-based rhetoric (Kennedy
et al., 2018) and moral sentiment (Hoover et al., 2017). For examples of Gab posts labeled as
hate-based rhetoric, which were also labeled positively for one of the moral vices, see Table 3.1.
Using these annotated messages, we then trained two Long Short-Term Memory (LSTM) neural
network models (Hochreiter and Schmidhuber, 1997) to detect hate speech and moral sentiment
in Gab posts. LSTMs incorporate a recurrent structure that encodes long-term dependencies
between words and their past context. This makes them particularly eective for quantifying the
sequential and context-dependent nature of semantic structures in natural language.
The rst model that we developed, a single-task LSTM model, was trained to predict whether
posts contain hate speech. In contrast, the second model, a multi-task LSTM model (Collobert
and Weston, 2008), was trained to predict the presence of each moral vice simultaneously. In
both the single-task and multi-task models, posts are represented as matrices of pretrained GloVe
word embeddings (Pennington et al., 2014) corresponding to the words in the original post. This
embedding matrix is then input to a 100 dimensional LSTM layer which is connected to a layer of
fully connected units, with 0:33 dropout ratio (Srivastava et al., 2014). A softmax transformation
is then applied to the output of the nal layer in order to generate probabilistic predictions for
the outcome (See Figure 3.1).
We then used these models to independently predict the presence of hate speech and each moral
vice in a large Gab corpus, which consists of 24,978,951 posts from 236,823 users, after removing
posts with too few English tokens (Ganey, 2018). The model trained to predict hate obtained
an average F1 score of 0:62, precision of 0:65 and recall of 0:59 using 10 fold cross-validation. For
the ve jointly predicted moral vices in the multi-task model, F1 scores were 0:55 for \harm",
0:46 for \cheating", 0:55 for \betrayal", 0:44 for \subversion", and 0:44 for \degradation".
47
Moral Vice Gab Text
Harm If you are right-wing and pay to send your daughter to college, you are retarded.
Woman are submissive and much more prone to peer pressure. They are not
equipped (NAWALT) to handle life in a re-education center. By sending her
to college, you are destroying her future and increasing the risk she'll die in
childless misery
Cheating Britain is a country where a rich black man can set up a university scholarship
for black people ONLY & if you question if it's 'racist' YOU are 'Racist'. But
if a white man set one up for whites only, there would be no doubt it's 'racist'
& he would likely be prosecuted for race hate.
Betrayal Anyone surprised that another clueless May appointee hates the white working
class while loving immigrants.
Subversion Any honest Jewish person should be able to admit the truth, if they're honest.
Anywhere across time and space in the Western world their people have gained
a majority of power has rapidly descended into chaos and turned into a shithole
country.
Degradation FUCK LONDON ! Disgusting ! I WILL REMMEBER THIS DISRESPECT
.... Everytime there is an Islamic terror attack ..... LONDON IS INFECTED
.... I HOPE THE ENRICHMENT LEVELS BREAK THE SCALE IN THE
COMING YEARS .....THATS ALL
Table 3.1: Examples of hate speech-labeled Gab posts. Each was also labeled with at least one
moral vice, denoted in the \Moral Vice" column
Figure 3.1: Visualizations of the classication models trained on Gab data and used to predict on
entire Gab corpus.
48
Finally, using these predicted labels, we tested the hypothesis that posts evoking either the
Binding or Individualizing vices were more likely to contain hate speech. Using the union of
the predicted labels for the Individualizing and Binding vices, we then estimated the probability
that a given post contains hate speech as a function of whether or not it is labeled as evoking
Individualizing or Binding vices. To account for the fact that this corpus contains multiple
messages per user, we estimated this probability using a hierarchical logistic regression model in
which varying intercepts were estimated for each user and the eects of Individualizing and Binding
vices were permitted to vary across users. To minimize the possibility that model estimates were
biased by messages generated by bots, we estimated two separate hierarchical logistic regression
models. The rst model was trained on a subset of the full corpus that only included posts by
users who made fewer than 500 total posts (N
posts
= 4,994,480; N
users
= 229,538). The second
model was trained on the entire Gab corpus.
Both models indicated a strong association between the Binding vices and the presence of
hate speech. As model results were comparable, here we focus on the results from the rst
model (See Table C.1 in Appendix C for estimates from the model trained on the full corpus).
Specically, after adjusting for the eect of Individualizing vices, posts labeled as evoking Binding
vices had approximately 25 times the risk of containing hate speech compared to posts that did
not evoke the Binding vices, b = 3:22, SE = 0:01, Z = 295, Risk Ratio (RR) = 25:12. While
the Individualizing vices were also positively associated with hate speech, their estimated eect
was substantially smaller, such that a post that evokes the Individualizing vices approximately 6
times the risk of containing hate speech compared to posts that do not evoke the Individualizing
vices, b = 1:79, SE = 0:01, Z = 126, RR = 5:92. Notably, while the eects of Binding and
Individualizing vices showed variation across users (SD
b
Binding
= 0:79, SD
b
Individualizing
= 0:86),
xed eects estimates indicated that, on average, they were both positively associated with the
presence of hate speech.
49
Together, these results are consistent with the hypothesis that hate speech tends to be articu-
lated through the language of morality. They also indicate that the Binding values are particularly
relevant to hate speech, as messages predicted as evoking the Binding vices were nearly 40% more
likely to be predicted as hate speech, compared to messages predicted as evoking the Individual-
izing vices. This dierential association suggests that articulations of hate speech rely much more
strongly on language that evokes group-based moral values, compared to language that evokes
individual-based moral values. Importantly, this is consistent with the Moralized Threat Hypoth-
esis, which proposes that the perception of outgroup moral violations is a central risk factor for
EBEPs. Indeed, overall, our results suggest that a sense of group-based moral violation is often
encoded directly in articulations of hate speech.
3.2 Study 2: Hate Groups and County-level Moral Values
In Study 1, we found evidence that real-world hate speech often invokes Binding or group-
based moral concerns, which is consistent with the hypothesized role of moral values in EBEPs. In
Study 2, we extend this nding by investigating whether a comparable association exists between
Binding moral values and a dierent EBEP, real-world hate group activity. That is, we test the
hypothesis that county-level moral values | specically the Individualizing and Binding values
| are associated with the county-level prevalence of hate groups. Per our ndings in Study 1, we
expect to nd a positive association between Binding values and the prevalence of hate groups.
Further, we expect that this eect will be stronger than the observed eect of the Individualizing
values. Importantly, as in Study 1, this design enables a direct application of the Moralized Threat
Hypothesis, which maintains that people's values should in
uence their perceptions of outgroup
moral violations. If this is the case, there should be an association between the moral values
held in a county and the prevalence of hate groups in that county, as certain congurations of
county-level moral values should increase the local risk of hate group prevalence.
50
To estimate the county-level distribution of moral values, we use data collected via Your-
Morals.org, a website operated by the founders of MFT to collect measurements of voluntary
respondents' moral values, from approximately 2012 to 2018 (N = 106,465). While this is a rel-
atively large sample, it cannot be used to directly estimate county-level moral values because it
is not randomly sampled or representative at the county level (Hoover and Dehghani, 2019). To
account for these issues, we rely on Multilevel Regression and Synthetic Poststratication (MrsP;
Leemann and Wasserfallen, 2017), a model-based approach to survey adjustment and sub-national
estimation that extends Multilevel Regression and Poststratication (MrP; Park et al., 2004a).
Both MrP and MrsP involve estimating regional outcomes on a target construct from individual-
level data. This data is used to model the target construct as a multilevel function of demographic
characteristics (e.g. gender, age, and level of education), regional indicators (e.g. county, state,
or region), and regional factors (e.g. presidential vote proportion, median income, or educational
attainment). This model is then used to generate predictions for each combination of demographic
characteristics within each region. Finally, information about the population distribution of these
demographic characteristics within each region are used to estimate a weighted mean based on
the model predictions.
Here, we use MrsP, which follows the above approach, but also enables the inclusion of a more
diverse set of demographic variables. Specically, we model individual-level responses to each
moral foundation as a function of six demographic variables: gender (2 levels), age (3 levels),
ethnicity (4 levels), level of education (3 levels), religious attendance (3 levels), and political
ideology (3 levels). We also account for two levels of regional clustering, the county level and the
region level and include the proportion of Democratic votes in the 2016 presidential election as
a county-level factor. Finally, the multilevel model that we estimate also includes a hierarchical
auto-regressive prior (Riebler et al., 2016) that, under the presence of spatial auto-correlation,
51
induces local spatial smoothing (Hanretty et al., 2016, Hoover and Dehghani, 2019, Selb and
Munzert, 2011) between proximate counties
1
.
Using this approach, we estimated the county-level distribution of each Moral Foundation
2
and then used these estimates to calculate scores for the Individualizing (Care and Fairness) and
Binding (Loyalty, Authority, and Purity) dimensions for each county (See Figure 3.2).
Individualizing Moral Values Binding Moral Values
2.20−2.40 2.40−2.60 2.60−2.80 2.80−3.00 3.20−3.40 3.40−3.60 3.60−3.80
Figure 3.2: Estimates of county-level Individualizing and Binding Moral Foundations adjusted for
representativeness via MrsP with spatial smoothing
Finally, we use these county-level estimates of Individualizing and Binding values to predict
the county-level prevalence of hate groups. Estimates of this outcome were obtained from the
Southern Poverty Law Center (SPLC; Southern Law Poverty Center, 2019), which maintains an
ongoing hate group task force that monitors and documents hate group activity at the city level.
We used this data to generate county-level counts of active hate groups by identifying the counties
containing each city
3
. Finally, because the data used to estimate county-level moral values was
collected from 2012-2018, we calculated the average county-level count of active hate groups from
2012 to 2017 (the latest available year in the SPLC data).
1
For a detailed discussion of these methodologies and models, as well as an evaluation of the ecacy of using
this approach with non-random, non-representative data see Hoover and Dehghani (2019).
2
An interactive visualization of these estimates can be viewed athttps://mapyourmorals.usc.edu/
3
For cities located in multiple counties, we selected the county containing the largest proportion of the cities
population in order to avoid over counting hate groups.
52
Using this measurement of hate-group prevalence as the outcome, we estimated the county-
level rate of hate groups per 10,000 inhabitants using a Negative-binomial regression model with a
proper conditional auto-regressive prior (Leroux et al., 2000, Lindgren and Rue, 2015) to account
for spatial auto-correlation. A zero-in
ated Negative Binomial model was also considered; how-
ever, comparisons of leave-one-out cross-validation estimates of model t (Held et al., 2010, Vehtari
et al., 2016, 2017) suggested worse t for the zero-in
ated model (elpd =4674:90;SE = 340:77),
compared to the Negative Binomial model (elpd = 4296:17;SE = 304:07; elpd
difference
=
378:72, SE
difference
= 112:26). As predictors in this model, we included standardized esti-
mates of Individualizing and Binding values as well as the proportion of people below the poverty
line, the proportion of people with four-year degrees, and the county-level proportion of White
inhabitants.
Figure 3.3: Observed (Left) and Predicted (Right) County-level Rate of Hate Groups
Comparisons of model predictions of the county-level rate of hate groups were largely consis-
tent with the observed rates (See Figure 3.3), RMSE = 0.11. Consistent with our hypotheses,
our results indicate a relationship between the county-level rate of hate groups and county-level
Binding values (See Figure 3.4). Even after attempting to adjust for ethnic composition, edu-
cational attainment, and the proportion of county population below the poverty line, the rate
of hate groups is expected to increase by 26% (posterior SD = 12%) with a standard deviation
increase in Binding values. Notably, no such eect was observed for Individualizing values, which
53
suggests that, after accounting for the eects of the other variables in the model, variations in
Individualizing values are not associated with the prevalence of hate groups.
Figure 3.4: Estimated conditional association between Binding Values and the county-level preva-
lence of hate groups
Overall, our results suggest that, consistent with the Moralized Threat Hypothesis, the preva-
lence of hate groups in a particular region is linked to the Binding values held in that region.
Thus, while there are, of course, many factors that likely drive participation in a hate group
(Hall, 2013, Simi and Futrell, 2015), it appears that a region's moral milieu may play a systematic
role in facilitating the emergence and maintenance of hate groups. In our view, this association
is likely multiply determined. It may be the case that Binding values function as risk factors at
the individual level, such that people who prioritize Binding values are more likely, on average,
to be susceptible to recruitment into hate groups. At the same time, the existence of hate groups
may also be facilitated, for example, by a degree of tacit community acceptance (or an absence of
community condemnation) that is linked to people's moral values.
Importantly, these results are theoretically consistent with our Study 1 nding that hate speech
and the language of group-based morality overlap. Even though these studies relied on dierent
measurement methodologies and operated on dierent levels of analysis, they both indicate that
54
Binding values play an important role in real-world manifestations of EBEPs. However, neither of
these studies directly investigate the association between people's moral values and EBEPs at the
individual-level. Accordingly, we address this issue next through a series of survey-based studies
that allow us to more precisely investigate the association between moral values and EBEPs.
3.3 Perceived Moral Violations and the Justication of EBEPs
In the previous studies, our results indicated an association between EBEPs and moral values
in the context of real-world hate speech and the spatial distribution of hate groups. In the
next series of studies, we extend these results by investigating the relationship between people's
moral values and the degree to which they believe EBEPs are justied. While the self-reported
justication of EBEPs is, of course, a dierent construct than actual EBEPs, this approach allows
us to assess how public approval of EBEPs varies as a function of people's moral values. Further,
self-reported justication of EBEPs may re
ect tacit community acceptance. Using this approach,
we also investigate the hypothesis that approval of EBEPs toward an outgroup is higher when
people believe that the outgroup has done something immoral. Specically, building on our
ndings in Studies 1 and 2, we use experimental and observational survey studies to test three
primary hypotheses from the Moralized Threat Hypothesis:
Hypothesis 1. An EBEP toward an outgroup is seen as more justied when that outgroup
is perceived as having done something morally wrong.
Hypothesis 2. EBEPs should be seen as more justied by people who prioritize Binding
values.
Hypothesis 3. The association between Binding values and EBEPs toward a given out-
group is at least partially mediated by the perception that the outgroup has done something
morally wrong.
55
3.4 Study 3: Experimental Manipulation of Perceived Moral
Wrongness
First, we tested whether the perception that an outgroup has done something morally wrong
predicts support for EBEPs against that outgroup (Hypothesis 1). To do this, we randomly
assigned participants recruited via Amazon Mechanical Turk and paid $1.00 for their participation
to one of two conditions | a `high moral threat' condition and a `low moral threat' condition |
that manipulated the moral valence of a ctional outgroup. We focus on a ctional outgroup in
order to limit the in
uence of participants' prior beliefs on their responses (Crandall and Schaller,
2005).
In both conditions, participants (N = 321; Mean Age = 33.92, SD = 10.88; 62% Female) were
asked to read a ctional news story about `Sandirian' (Crandall and Schaller, 2005) immigrants
taking jobs in Webster Springs, Illinois, a ctional town. In the low moral threat condition, the
Sandirians' actions were framed as stimulating the local economy. In contrast, in the high moral
threat condition, the Sandirians were described as undermining the local economy and, thus,
harming \native" citizens. We then asked participants to indicate on a 7-point scale (1 = `Not
at all morally wrong', 7 = `Extremely morally wrong') the degree to which they believed it was
morally wrong for the Sandirians to take jobs in Webster Springs (M = 3.25, SD = 1.81).
Finally, to assess participants' approval of EBEPs, we asked them to imagine a male Webster
Springs resident, Dave, who believed that Sandirian immigrants were hurting his community.
Participants then indicated how justied (`Not at all justied' = 1 to `Extremely justied' = 7)
Dave would be in committing four dierent EBEPs: posting hate speech to Facebook (M = 2.80,
Median = 2, SD = 1.88), distributing hate speech
yers in Webster Springs (M = 2.73, Median
= 2, SD = 1.86), yelling slurs at a Sandirian resident of Webster Springs (M = 1.90, Median = 1,
SD = 1.47), and physically assaulting a Sandirian resident of Webster Springs (M = 1.36, Median
56
= 1, SD = 1.04). These exemplar EBEPs were selected to represent a variety of potential EBEPs
that are characterized by dierent magnitudes of social norm violation.
3.4.1 Results
Twelve participants skipped the item measuring moral wrongness and an additional 15 partic-
ipants spent less than 10 seconds reading the experimental manipulation, which was our a priori
cuto to ensure data quality. Accordingly, 294 participants were retained for analysis, though, ro-
bustness checks veried that retaining these participants had no substantive eect on our results.
To test the hypothesis that perceived moral wrongdoing is associated with EBEP justication, we
rst estimated a Bayesian linear regression (Model 1) in which Z-scored perceived moral wrong-
doing was regressed on experimental condition. Estimates from this model (See Table C.2 for
complete model estimates) show that participants in the high moral threat condition reported
substantially higher levels of perceived moral wrongdoing, b = 0.50, posterior SD = 0.11, 95% CI
= [0.28, 0.73].
Next, we assessed the eect of experimental condition on EBEP justication. To do this, we
estimated a second model (Model 2) in which EBEP justication was regressed on experimental
condition. In this model, we treated responses to the EBEP items (Chronbach's = 0.84, 95%
CI = [0.82, 0.86]) as a repeated measure of EBEP justication, yielding four measurements for
each participant. To account for this, we used a Bayesian hierarchical modeling framework to
allow for varying intercepts (i.e. random eects) for both participants and EBEP items. This
approach enabled our model to address the facts that (1) dierent participants should be more or
less likely to see EBEPs as justied in general and (2) that each EBEP item, on average, should
been seen as more or less justied. Further, to account for the fact that the eects of experimental
condition on EBEP justication may vary depending on the EBEP, we also allowed the eect of
condition to vary across EBEP items (i.e. by estimating random slopes). Finally, because the
distribution of responses to each EBEP item was heavily skewed, we modeled EBEP justication
57
using a cumulative logistic regression model (B urkner and Vuorre, 2019). All together, this yielded
a hierarchical Bayesian cumulative logistic regression model in which (1) EBEP justication was
regressed on experimental condition and (2) varying intercepts for participant and EBEP item
and a varying slope for both condition were estimated.
Results from Model 2 (See Table C.2 for complete model estimates) indicate that, even after
attempting to account for the random eects of participant (intercept SD = 3.79) and EBEP
item (intercept SD = 4.04, b
threat
SD = 0.62), participants in the high moral threat condition
were substantially more likely to see EBEPs against Sandirians as more justied, compared to
participants in the low moral threat condition, b = 1.44, posterior SD = 0.68, CI 95% = [0.16,
2.80], OR = 4.21, CI 95% = [1.18, 16.45]. In other words, for participants in the high moral
threat condition, the odds of seeing EBEPs, on average, as extremely justied, versus less than
extremely morally justied, was 4.21 times higher than for participants in the low moral threat
condition.
Next, we directly investigated the role of perceived moral wrongdoing in participants' EBEP
justication responses. To do this, we extended Model 2 by including standardized perceived
moral wrongdoing (PMW) | the degree to which participants believed it was morally wrong for
Sandirians to take jobs in Webster Springs | as an independent variable with varying slopes
across EBEP items (Model 3). Estimates from this model indicated a strong positive association
between believing it was morally wrong for Sandirians to take jobs and seeing EBEPs against
Sandirians as more justied, b = 2.21, posterior SD = 0.42, CI 95% = [1.48, 2.94], OR = 9.20,
CI 95% = [4.38, 18.92]. Notably, adjusting for PMW also led to a dramatic attenuation of
the estimated eect of experimental condition, such that a clear positive eect was no longer
supported, b = 0.28, posterior SD = 0.63, CI 95% = [-0.86, 1.57], OR = 1.29, CI 95% = [0.42,
4.79].
Finally, we tested the hypothesis that perceived moral wrongdoing mediated the eect of ex-
perimental condition on EBEP justication. Relying on Bayesian posterior simulation to estimate
58
average mediation eects (AME) and average direct eects (ADE) (Imai et al., 2010, Steen et al.,
2017, VanderWeele and Vansteelandt, 2014, VanderWeele et al., 2016), we found that perceived
moral wrongdoing statistically mediates the eect of experimental condition on the probability of
indicating that an EBEP was at least slightly justied (See Appendix C).
Consistent with Hypothesis 1, these results indicate that participants who were led to be-
lieve that a ctional immigrant group | the Sandirians | had done something immoral also
believed that EBEPs against this group were more justied, compared to participants in the
control condition. Importantly, adjusting for degree to which participants believed Sandirians
had done something morally wrong also completely accounted for the eect of experimental con-
dition. Consistent with this, mediation analyses also indicated that the eect of experimental
manipulation was mediated by the degree to which participants believed that the Sandirians had
done something immoral. Importantly, a secondary set of analyses in which participant political
ideology was adjusted revealed no substantive changes in any of the reported ndings (See C.2
in Appendix C for details). That is, these eects are hold even after adjusting for the degree to
which participants identify as conservative.
3.5 Study 4: Justication of EBEPs against Mexicans
Next, we expand upon the experimental ndings of Study 3 and investigate whether partic-
ipants' Binding values are associated with the degree to which they see EBEPs as justied. In
this study, rather than focusing on a ctional immigrant group, we directly address the perceived
justication of EBEPs against Mexican immigrants. We focus on Mexican immigrants due to
their cultural salience as a social group and consistent increases hate crimes targeting Mexicans
in recent years (United States Department of Justice, Federal Bureau of Investigation, 2018). To
this end, we asked participants (N = 355, Mean age = 33, 54% identifying as female) who were
recruited from Amazon Mechanical Turk for $1.00 to read the same ctional news article shown
in the high moral threat condition of the previous study. However, in this study, each participant
59
read a version of the article that replaced the Sandirians with Mexican immigrants. As in Study
3, participants were then asked to indicate their perceptions of moral wrongdoing via the same
6-point Likert scale. Prior to reading the news article, participants were also asked to complete
the Moral Foundations Questionnaire (Graham et al., 2011), a 30 item scale designed to measure
the degree to which people prioritize the ve moral domains proposed by MFT. Responses to this
scale were aggregated and standardized to construct Binding ( = 0.83, 95% CI = [0.81, 0.86])
Individualizing ( = 0.78, 95% CI = [0.75, 0.78]) scores for each participant. Finally, participants
responded to the same four items measuring the to degree to which they thought EBEPs against
Mexicans in Webster Springs were justied ( = 0.80, 95% CI = [0.78, 0.83]).
3.5.1 Results
Three participants did not complete the MFQ and an additional 28 participants spent less
than 10 seconds reading the experimental manipulation, spent less than eight minutes on the
entire survey, or failed one of the MFQ manipulation checks, which were our a priori criteria to
ensure data quality. Accordingly, 324 participants were retained for analysis, though, as in Study
3, robustness checks veried that retaining these participants had no substantive eect on our
results.
To test Hypothesis 2, we modeled (Model 1) participants' responses to the EBEP items using
a hierarchical Bayesian ordered logistic regression model. As in Study 3, we treated the EBEP
responses as repeated measures and estimated varying intercepts for both participants and each
EBEP item. In this model, we also estimated xed and varying eects for participants' standard-
ized Binding and Individualizing scores.
Consistent with Studies 1 and 2, model estimates indicated a strong association between partic-
ipants' Binding values and the degree to which they believed EBEPs against Mexican immigrants
in Webster Springs were justied (See Figure 3.5). Specically, after attempting to account for the
eect of Individualizing scores, the odds of selecting a higher, versus lower, response option were
60
estimated to be 4.95 times higher given a standard deviation increase in Binding values, b = 1:60,
posteriorSD = 0:27, 95%CI =[1.12, 2.14]. In contrast, this model indicated that Individualizing
values were negatively associated with EBEPs, such that the odds of selecting a higher, versus
lower, response option were estimated to be 0.31 times lower given a standard deviation increase
in Individualizing values, b =1:15, posteriorSD = 0:41, 95%CI =[-1.88, -0.35].
Next, to test Hypotheses 1 and 3, we estimated two additional regression models. As in
Study 3, we rst modeled (Model 2) participants' standardized responses to the perceived moral
wrongdoing item. However, in this model, we included participants' standardized Binding and
Individualizing scores as predictors. We then estimated a third model (Model 3) that followed
the same specication as Model 1 with the exception that xed and random eects were also
estimated for standardized perceived moral wrongdoing.
As expected, estimates from Model 2 indicated a positive association between participants'
Binding values and the degree to which believed it was morally wrong for Mexican immigrants to
\take jobs" in Webster Springs, b = 0:47, posteriorSD = 0:05, 95%CI =[0.37, 0.56]. That is, a
standard deviation increase in Binding values was associated with an estimated 0.47 standard devi-
ation increase in perceived moral wrongness. In contrast, Individualizing values were estimated to
be negatively associated with the perception of moral wrongdoing, b = 0:47,posteriorSD = 0:05,
95%CI =[0.37, 0.56].
Further, as hypothesized, estimates from Model 3 indicated that even after attempting to
adjust for the eects of standardized Individualizing and Binding values, standardized perceived
moral wrongdoing was estimated to be positively associated with perceived EBEP justication,
b = 1:63, posteriorSD = 0:46, 95%CI =[0.70, 2.49]. Thus, the odds of seeing EBEPs as more
justied than a given response level versus less justied or equal to that level are 5.10 times
higher given a standard deviation increase in perceived moral wrongdoing. Notably, adjusting
for the eect of perceived moral wrongdoing also substantially decreased the estimated eects of
61
Binding (b = 0:73, posteriorSD = 0:26, 95%CI =[0.24, 1.23]) and Individualizing (b =0:87,
posteriorSD = 0:46, 95%CI =[-1.65, 0.04]) values.
Finally, similar to Study 3, we relied on posterior simulation to test the hypothesis that
perceived moral wrongdoing statistically mediates the association between Binding values and
EBEP justication. Results from this analysis indicated that perceived moral wrongdoing partially
mediated the association between Binding values and perceived EBEP justication (See Appendix
C). Importantly, as in Study 3, we also repeated these analyses while attempting to adjust for
participant political ideology (See C.3 in Appendix C for details). Again, we found that including
political ideology in these analyses did not lead to substantive changes in any of our results.
Together, these results suggest that the degree to which people believe EBEPs are justied is
positively associated with the degree to which they prioritize the Binding values. Further, our
results are also consistent with the hypothesis that the association between the Binding values and
EBEP justication is at least partially mediated by the perceived moral wrongness of outgroup
behavior. However, as the current study focused on EBEPs against Mexican immigrants, it is
not necessarily the case that these eects generalize to other real-world groups. To account for
this, we conducted a nal study with a stratied national sample investigating the perceived
justication of EBEPs against Muslims motivated by the moral wrongness of Muslims spreading
\Islamic culture".
3.6 Study 5: Justication of EBEPs against Muslims
Similar to Study 4, in this study we investigate the relationship between Binding values and
the perceived justication of EBEPs against a real-world outgroup. However, here, we focus
on a dierent outgroup, Muslims, which enables us to evaluate the degree to which the results
from our previous studies generalize to other outgroups. Here, we focus specically on the moral
wrongness of Muslims' spreading Islamic values given the cultural salience of Muslims as a social
outgroup, recent increases in hate crimes against Muslims (Pew Research Center, 2017), and
62
Figure 3.5: Estimated association between Binding values and perceived justication of EBEPs
against Mexican immigrants (Study 4) and Muslims (Study 5). Participants who prioritized
Binding values tended, on average, to see EBEPs against both groups as more justied.
because focusing on the spread of Islamic values enabled us to evaluate the Moralized Threat
Hypothesis under conditions of symbolic threat.
To conduct this study, a sample of participants (N = 511) stratied by sex (51% female),
age (10% to 20% per each 5-year bracket ranging from 18 to 65 or older), ethnicity (62% non-
Hispanic White, 17% Hispanic, 13% Black, 7% other), and political aliation (51% Democrat)
was recruited by Qualtrics. After presenting a series of demographic questions, we measured
participants' Individualizing ( = 0.80, 95% CI = [0.77, 0.82]) and Binding ( = 0.85, 95% CI =
[0.83, 0.87]) values using the MFQ. We then measured perceived moral wrongness using a 6-point
item that asked participants to indicate, \How morally wrong is it for Muslims to spread Islamic
values or laws (e.g. Sharia law) in the US instead of assimilating into American culture?" To
measure the perceived justication of EBEPs against Muslims, we asked participants to imagine a
man named `Dave' who \believes Muslims are hurting his community." As in the previous studies,
63
we then asked participants to indicate how \justied" Dave would be in committing each of the
exemplar EBEPs ( = 0.92, 95% CI = [0.90, 0.93]).
3.6.1 Results
To test our hypotheses, we used the same modeling procedure followed in Study 4. First,
we modeled (Model 1) the association between participants' standardized Individualizing and
Binding values and the degree to which they saw EBEPs against Muslims as justied. We then
estimated two additional regression models, one (Model 2) in which the perceived moral wrongness
of Muslims spreading Islamic values was regressed on participants' Individualizing and Binding
values and another (Model 3) in which the perceived justication of EBEPs against Muslims was
regressed on participants' Individualizing and Binding values as well as perceived moral wrongness.
Finally, we used posterior simulations to estimate the degree to which perceived moral wrongness
statistically mediates the eect of between Binding values on the perceived justication of EBEPs
against Muslims.
Consistent with our previous studies, estimates from Model 1 indicated a strong association be-
tween participants' Binding values and the degree to which they believed EBEPs against Muslims
were justied,b = 2:29,posteriorSD = 0:35, 95%CI =[1.67, 2.96] (See Figure 3.5). As in Study 4,
we also again observed a negative eect of Individualizing values,b =1:70,posteriorSD = 0:32,
95%CI =[-2.32, -1.10]. Further, estimates from Model 2 showed a positive association be-
tween perceived moral wrongdoing and EBEP justication, b = 1:72, posteriorSD = 0:51,
95%CI =[0.75, 2.77]. Adjusting for perceived moral wrongdoing, in Model 2, also lead to a
substantial reduction, relative to Model 1, in the magnitude of the eects of Individualizing val-
ues, b = 1:29, posteriorSD = 0:34, 95%CI =[-1.89, -0.68], and Binding values, b = 1:48,
posteriorSD = 0:35, 95%CI =[0.82, 2.18]. Finally, our mediation analysis indicated that per-
ceived moral wrongness statistically mediates the association between Binding values and EBEP
justication (See Appendix C)
64
Importantly, these results completely replicated the patterns of eects observed in Study 4.
We observed a strong positive association between participants' Binding values and the degree
to which they believed EBEPs against Muslims were justied. We also found that the degree
to which participants thought it was morally wrong for Muslims to \spread Islamic values" was
not positively associated with EBEP justication, but it also partially mediated the association
between Binding values and EBEP justication. Finally, as in the previous studies, adjusting for
participant political ideology did not lead to substantive changes in any of our results (See C.3 in
Appendix C for details).
Combined with Studies 3 and 4, these results suggest that, at least in our current social
context, people who prioritize group-oriented moral values are more likely to see EBEPs against
an outgroup as justied. Further, it appears that this association may at least partly depend on
the perception that the outgroup has done something morally wrong.
3.7 Conclusions
Taken together, our analyses of the Gab corpus (approximately 24 million posts), 3108 U.S.
counties, and experimental and observational survey data collected from over 1,200 participants
converge on the nding that extreme behavioral expressions of prejudice | behaviors like hate
speech, hate group activity, and hate crime | are tied to people's group-oriented moral values.
Specically, we found that hate speech tends to be articulated through the language of Binding
values; that hate groups are more prevalent in regions that prioritize Binding values, and that
individuals who place a greater emphasis on Binding values are more likely to believe that hate
speech, harassment, and even assault against outgroup members are more justied. Crucially, it
also appears that this association between Binding values and EBEP justication can be at least
partly explained by the belief that an outgroup has done something morally wrong.
While it may seem counter-intuitive to think of a class of behaviors | many of which that
have received their own special legal designation as particularly heinous crimes (Hall, 2013) |
65
as moral phenomena, this view is well-grounded in current understandings of the relationship
between morality and acts of extremism or violence (Atran and Ginges, 2012, 2015, Darley, 2009,
Dehghani et al., 2010, Mooijman et al., 2018, Skitka et al., 2017, Zaal et al., 2011). Just as suicide
attacks (Ginges and Atran, 2009, Ginges et al., 2009) and other acts of violence (Fiske et al., 2014,
Rai and Fiske, 2012) can often be better understood as moralized or sacred con
icts, our ndings
suggest that acts of hate may often be morally motivated. Notably, in this work, we focused on
acts of hate perpetrated against outgroups that are more often demonized by people who subscribe
to conservative or right-wing ideologies. This raises important questions about the motivations
underlying acts of hate perpetrated by people who subscribe to liberal or left-wing ideologies.
The Moralized Threat Hypothesis predicts that these acts, too, are motivated by perceived moral
violations. However, the specic moral values that these violations are grounded in may dier
from those that ground the moral violations perceived by conservatives. For instance, people
who subscribe to more ideologically liberal belief systems may be more sensitive to violations of
principals of Care or Fairness than they are to violations of the Binding values. At the same time,
it may also be the case that, even among liberals, stronger subscription to group-oriented moral
values is associated with EBEPs toward outgroups.
Given today's digital media environment and its potential for stoking moral outrage (Crockett,
2017) and uniting isolated individuals who share fringe ideologies, understanding these eects
is particularly important. While much research on EBEPs has highlighted the role of specic,
concrete threats (Green and Spry, 2014, Piatkowska et al., 2018), the Moralized Threat Hypothesis
oers a framework for understanding when, where, and why people may engage in EBEPs even
in the absence of an ostensible material threat. This hypothesis suggests that a person does not
necessarily need to fear for their job or safety to engage in or approve of EBEPs; instead, it may be
sucient for them to simply feel a sense of moral outrage. Importantly, however, understanding
EBEPs as morally motivated responses to perceived violations does not justify or excuse them;
rather, it provides a psychological framework (Atran and Ginges, 2012, Ginges and Atran, 2009,
66
Ginges et al., 2009, Skitka et al., 2017) for understanding why people engage in such extreme
behaviors.
67
Bibliography
J uri Allik, Anu Realo, Ren e M~ ottus, Helle Pullmann, Anastasia Trifonova, Robert R McCrae,
and and 56 Members of the Russian Character and Personality Survey. Personality traits of
russians from the observer's perspective. European Journal of Personality, 23(7):567{588, 2009.
A Anthony. Inside the hate-lled echo chamber of racism and conspiracy theo-
ries. theguardian.com, 2016. URL https://www.theguardian.com/media/2016/dec/18/
gab-the-social-network-for-the-alt-right.
Frank Asbrock, Chris G Sibley, and John Duckitt. Right-wing authoritarianism and social dom-
inance orientation and the dimensions of generalized prejudice: A longitudinal test. European
Journal of Personality: Published for the European Association of Personality Psychology, 24
(4):324{340, 2010.
Scott Atran and Jeremy Ginges. Religious and sacred imperatives in human con
ict. Science, 336
(6083):855{857, May 2012.
Scott Atran and Jeremy Ginges. Devoted actors and the moral foundations of intractable inter-
group con
ict. The moral brain, pages 69{85, 2015.
Douglas Bates, Martin M achler, Ben Bolker, and Steve Walker. Fitting linear mixed-eects models
using lme4. Journal of Statistical Software, 67(1):1{48, 2015. doi: 10.18637/jss.v067.i01.
Lois Beckett. Facebook to ban white nationalism and separatism content. The Guardian, March
2019.
T Benson. Inside the \twitter for racists": Gab the site where milo yiannopou-
los goes to troll now. Salon.com, 2016. URL https://www.salon.com/2016/11/05/
inside-the-twitter-for-racists-gab-the-site-where-milo-yiannopoulos-goes-to-troll-now/.
Rony Blum, Gregory H Stanton, Shira Sagi, and Elihu D Richter. 'ethnic cleansing' bleaches the
atrocities of genocide. European journal of public health, 18(2):204{209, April 2008.
Rupert Brown. Prejudice: Its Social Psychology. John Wiley & Sons, June 2011.
Paul-Christian B urkner and Matti Vuorre. Ordinal regression models in psychology: A tutorial.
Advances in Methods and Practices in Psychological Science, page 2515245918823199, February
2019.
M K Buttice and B Highton. How does multilevel regression and poststratication perform with
conventional national surveys? Political Analysis, 21(4), 2013.
Bryan D Byers and James A Jones. The impact of the terrorist attacks of 9/11 on Anti-Islamic
hate crime. Journal of Ethnicity in Criminal Justice, 5(1):43{56, February 2007.
Devin Caughey and Christopher Warshaw. Public opinion in subnational politics. The Journal
of Politics, 81(1):352{363, 2019.
68
Center for the Study of Hate & Extremsism. Hate crimes rise in U.S. cities and counties in time
of division & foreign interference, 2018.
Giene C Charles-Toussaint and H Michael Crowson. Prejudice against international students:
the role of threat perceptions and authoritarian dispositions in U.S. students. The Journal of
psychology, 144(5):413{428, September 2010.
Christopher Claassen and Richard Traunm uller. Improving and validating survey estimates of reli-
gious demography using bayesian multilevel models and poststratication. Sociological Methods
& Research, page 0049124118769086, 2018.
Jeanine C Cogan. Hate crime as a crime category worthy of policy attention. The American
behavioral scientist, 46(1):173{185, September 2002.
J Christopher Cohrs and Sina Ibler. Authoritarianism, threat, and prejudice: An analysis of
mediation and moderation. Basic and applied social psychology, 31(1):81{94, February 2009.
Ronan Collobert and Jason Weston. A unied architecture for natural language processing: Deep
neural networks with multitask learning. In Proceedings of the 25th international conference on
Machine learning, pages 160{167. ACM, 2008.
Kate Conger. Facebook says it is more aggressively enforcing content rules. The New York Times,
May 2019.
Christian S Crandall and Mark Schaller. Social psychology of prejudice: Historical and contem-
porary issues. Lewinian Press, 2005.
M J Crockett. Moral outrage in the digital age. Nature human behaviour, 1(11):769{771, November
2017.
John M Darley. Morality in the law: The psychological foundations of citizens' desires to punish
transgressions. Annual Review of Law and Social Science, 5:1{23, 2009.
Morteza Dehghani, Scott Atran, Rumen Iliev, Sonya Sachdeva, Douglas Medin, and Jeremy Gin-
ges. Sacred values and con
ict over iran's nuclear program. Judgment and Decision Making, 5
(7):540, 2010.
Fabio Del Vigna, Andrea Cimino, Felice DellOrletta, Marinella Petrocchi, and Maurizio Tesconi.
Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the First Italian
Conference on Cybersecurity, Venice, Italy, pages 86{95, 2017.
Jean-Claude Deville, Carl-Erik S arndal, and Olivier Sautory. Generalized raking procedures in
survey sampling. Journal of the American Statistical Association, 88(423):1013{1020, 1993.
doi: 10.2307/2290793.
John Duckitt and Chris G Sibley. A Dual-Process motivational model of ideology, politics, and
prejudice. Psychological inquiry, 20(2-3):98{109, August 2009.
John Duckitt and Chris G Sibley. The dual process motivational model of ideology and prejudice.
The Cambridge handbook of the psychology of prejudice, pages 188{221, 2017.
John Eligon. Hate crimes increase for the third consecutive year, F.B.I. reports. The New York
Times, November 2018.
69
Valery Engel, Jean-Yves Camus, Matthew Feldman, William Allchorn, Anna Castriota, Ildik o
Barna, Bulcs u Hunyadi, Patrik Szicherle, Farah Rasmi, Vanja Ljujic, Marcus Rheindorf, Pran-
vera Tika, Katarzyna du Val, Dmitry Stratievsky, Ruslan Bortnik, Anna Luboevich, Ilya
Tarasov, Anna Garc a Juanatei Dr, and Maxim Semenov. Xenophobia, radicalism, and hate
crime in europe annual report. Technical report, December 2018.
Robert S Erikson, Gerald C Wright, and John P McIver. Statehouse Democracy: Public Opinion
and Policy in the American States. Cambridge University Press, Cambridge, 1993.
Allison Cynthia Fialkowski. SimMultiCorrData: Simulation of Correlated Data with Multiple
Variable Types, 2017. URL https://CRAN.R-project.org/package=SimMultiCorrData. R
package version 0.2.1.
A P Fiske, T S Rai, and S Pinker. Virtuous Violence: Hurting and Killing to Create, Sustain,
End, and Honor Social Relationships. Cambridge University Press, 2014.
Luke Fowler. The states of public opinion on the environment. Environmental Politics, 25(2):
315{337, 2016.
Sheera Frenkel, Mike Isaac, and Kate Conger. On instagram, 11,696 examples of how hate thrives
on social media. The New York Times, October 2018.
Jeremy A Frimer, Jeremy C Biesanz, Lawrence J Walker, and Callan W MacKinlay. Liberals
and conservatives rely on common moral foundations when making moral judgments about
in
uential people. Journal of personality and social psychology, 104(6):1040{1059, June 2013.
Gavin Ganey. Pushshift gab corpus. https://files.pushshift.io/gab/, 2018. Accessed:
2019-5-23.
Andrew Gelman. Struggles with survey weighting and regression modeling. Statistical Science,
22(2):153{164, 2007. doi: 10.1214/088342306000000691.
Andrew Gelman. How bayesian analysis cracked the Red-State, Blue-State problem. Statistical
Science, 29(1):26{35, 2014.
Andrew Gelman and Jennifer Hill. Data Analysis Using Regression and Multilevel/Hierarchical
Models. Cambridge University Press, Cambridge, 2006.
Andrew Gelman and Thomas C Little. Poststratication into many categories using hierarchical
logistic regression. Survey Methodology, 23(2):127{135, 1997.
Jeremy Ginges and Scott Atran. What motivates participation in violent political action. Annals
of the New York Academy of Sciences, 1167(1):115{123, 2009.
Jeremy Ginges, Ian Hansen, and Ara Norenzayan. Religion and support for suicide attacks.
Psychological science, 20(2):224{230, February 2009.
Friedrich M G otz, Tobias Ebert, and Peter J Rentfrow. Regional cultures and the psychological
geography of switzerland: Person{Environment{Fit in personality predicts subjective wellbeing.
Frontiers in psychology, 9:517, 2018.
Government of Canada, Department of Justice, Research, and Statistics Division. Understanding
the community impact of hate crimes: A case study - victims of crime research digest, is-
sue no. 4. https://www.justice.gc.ca/eng/rp-pr/cj-jp/victim/rd4-rr4/p4.html, June
2011. Accessed: 2019-6-5.
70
Jesse Graham and Jonathan Haidt. Sacred values and evil adversaries: A moral foundations
approach. The social psychology of morality: Exploring the causes of good and evil., pages 1{18,
2011.
Jesse Graham, Jonathan Haidt, and Brian a Nosek. Liberals and conservatives rely on dierent
sets of moral foundations. Journal of personality and social psychology, 96(5):1029{1046, 2009.
Jesse Graham, Brian A Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H Ditto.
Mapping the moral domain. Journal of personality and social psychology, 101(2):366, 2011.
Jesse Graham, Jonathan Haidt, Sena Koleva, Matt Motyl, Ravi Iyer, Sean P Wojcik, and Peter H
Ditto. Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in
experimental social psychology, volume 47, pages 55{130. Elsevier, 2013.
Cliord Grammich, Kirk Hadaway, Richard Houseal, Dale E Jones, Alexei Krindatch, Richie
Stanley, and Richard H Taylor. US religion census: Religious congregations & membership
study, 2010.
Cliord Anthony Grammich. 2010 US religion census: Religious congregations & membership
study: An enumeration by nation, state, and county based on data reported for 236 religious
groups. Association of Statisticians of American Religious Bodies, 2012.
Donald P Green and Amber D Spry. Hate crime research: Design and measurement strategies for
improving causal inference. Journal of contemporary criminal justice, 30(3):228{246, August
2014.
Anthony G Greenwald, T Andrew Poehlman, Eric Luis Uhlmann, and Mahzarin R Banaji. Un-
derstanding and using the implicit association test: III. meta-analysis of predictive validity.
Journal of personality and social psychology, 97(1):17{41, 2009a.
Anthony G Greenwald, Colin Tucker Smith, N Sriram, Yoav Bar-Anan, and Brian A Nosek.
Implicit race attitudes predicted vote in the 2008 U.S. presidential election. Analyses of Social
Issues and Public Policy, 9(1):241{253, 2009b.
Nathan Hall. Hate crime. Routledge, 2013.
Emma Hanes and Stephen Machin. Hate crime in the wake of terror attacks: Evidence from 7/7
and 9/11. Journal of contemporary criminal justice, 30(3):247{267, August 2014.
Chris Hanretty, Benjamin E Lauderdale, and Nick Vivyan. Comparing strategies for estimating
constituency opinion from national survey samples. Political Science Research and Methods, 6
(3):1{21, 2016.
Eric Hehman, Jessica K Flake, and Jimmy Calanchini. Disproportionate use of lethal force in
policing is associated with regional racial biases of residents. Social psychological and personality
science, page 1948550617711229, 2017.
Leonhard Held, Birgit Schr odle, and H avard Rue. Posterior and cross-validatory predictive checks:
A comparison of MCMC and INLA. In Thomas Kneib and Gerhard Tutz, editors, Statistical
Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, pages 91{110.
Physica-Verlag HD, Heidelberg, 2010.
Sepp Hochreiter and J urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):
1735{1780, 1997.
D Holt and T M F Smith. Post stratication. Journal of the Royal Statistical Society. Series A,
142(1):33{46, 1979.
71
J Hoover and M Dehghani. The big, the bad, and the ugly: Geographic estimation with
awed
psychological data. Psychological Methods, 2019.
Joe Hoover, C Zhao, and M Dehghani. MapYourMorals. https://mapyourmorals.usc.edu/#/,
2018.
Joseph Hoover, Kate Johnson-Grey, Morteza Dehghani, and Jesse Graham. Moral values coding
guide. 2017.
Peter D Howe, Matto Mildenberger, Jennifer R Marlon, and Anthony Leiserowitz. Geographic
variation in opinions on climate change at state and local scales in the USA. Nature Climate
Change, 5(6):596{603, 2015.
Kosuke Imai, Luke Keele, and Dustin Tingley. A general approach to causal mediation analysis.
Psychological methods, 15(4):309{334, December 2010.
Markus Jokela, Wiebke Bleidorn, Michael E Lamb, Samuel D Gosling, and Peter J Rentfrow.
Geographically varying associations between personality and life satisfaction in the london
metropolitan area. PNAS, 112(3):725{730, 2015.
Graham Kalton and Ismael Flores-Cervantes. Weighting methods. Journal of Ocial Statistics,
19(2):81, 2003.
Jonathan P Kastellec, Jerey R Lax, Michael Malecki, and Justin H Phillips. Polarizing the elec-
toral connection: Partisan representation in supreme court conrmation politics. The Journal
of Politics, 77(3):787{804, 2015.
B Kennedy, D Kogon, Kris Coombs, Hoover, Park, G Portillo-Wightman, A Mostafazadeh,
M Atari, and M Dehghani. A typology and coding manual for the study of hate-based rhetoric.
2018.
Ben Kiernan. Blood and Soil: A World History of Genocide and Extermination from Sparta to
Darfur. Yale University Press, 2007.
Katherine Krimmel, Jerey R Lax, and Justin H Phillips. Gay rights in congress: Public opinion
and (mis) representation. Public Opinion Quarterly, 80(4):888{913, 2016.
Jerey R Lax and Justin H Phillips. How should we estimate public opinion in the states?
American Journal of Political Science, 53(1):107{121, 2009.
Lucas Leemann and Fabio Wasserfallen. Extending the use and prediction precision of subnational
public opinion estimation. American Journal of Political Science, 61(4):1003{1022, 2017. doi:
https://doi.org/10.1111/ajps.12319.
Jordan B Leitner, Eric Hehman, Ozlem Ayduk, and Rodolfo Mendoza-Denton. Blacks' death rate
due to circulatory diseases is positively related to whites' explicit racial bias: A nationwide
investigation using project implicit. Psychological Science, 27(10):1299{1311, 2016a.
Jordan B Leitner, Eric Hehman, Ozlem Ayduk, and Rodolfo Mendoza-Denton. Racial bias is
associated with ingroup death rate for blacks and whites: Insights from project implicit. Social
Science & Medicine, 170:220{227, 2016b.
Brian G Leroux, Xingye Lei, and Norman Breslow. Estimation of disease rates in small areas: A
new mixed model for spatial dependence. In Statistical Models in Epidemiology, the Environ-
ment, and Clinical Trials, pages 179{191. Springer New York, 2000.
72
Brian L Levy and Denise L Levy. When love meets hate: The relationship between state policies
on gay and lesbian rights and hate crime incidence. Social science research, 61:142{159, January
2017.
Finn Lindgren and H avard Rue. Bayesian spatial modelling with R-INLA. Journal of statistical
software, 63(19), 2015. doi: 10.18637/jss.v063.i19.
R J A Little. Post-Stratication: A modeler's perspective. Journal of the American Statistical
Association, 88(423):1001, 1993. doi: 10.1080/01621459.1993.10476368.
Sharon L Lohr. Sampling: design and analysis, 2009.
Wesley Lowery, Kimberly Kindy, and Andrew Ba Tran. In the united states, right-wing violence
is on the rise. The Washington Post, November 2018.
Thomas Lumley. Analysis of complex survey samples. Journal of Statistical Software, 9(1):1{19,
2004. R package verson 2.2.
Stewart J H McCann. Authoritarianism, conservatism, racial diversity threat, and the state
distribution of hate groups. The Journal of psychology, 144(1):37{60, January 2010.
Stewart J H McCann. Higher USA state resident neuroticism is associated with lower state
volunteering rates. Personality & Social Psychology Bulletin, 43(12):1659{1674, 2017a. doi:
10.1177/0146167217724802.
Stewart J H McCann. U.S. state resident big ve personality and work satisfaction: The impor-
tance of neuroticism. Cross-Cultural Research, 52(2):155{191, 2018.
Stewart JH McCann. The relation of state resident neuroticism levels to state cancer incidence
in the usa. Current psychology, pages 1{14, 2017b.
Rory McVeigh. Structured ignorance and organized racism in the united states. Social forces; a
scientic medium of social study and interpretation, 82(3):895{936, March 2004.
Rory McVeigh and David Sikkink. Organized racism and the stranger. Sociological Forum, 20(4):
497{522, December 2005.
Richard M Medina, Emily Nicolosi, Simon Brewer, and Andrew M Linke. Geographies of organized
hate in america: A regional analysis. Annals of the Association of American Geographers.
Association of American Geographers, 108(4):1006{1021, July 2018.
Joshua Menke and Tony R Martinez. Using permutations instead of student's t distribution for
p-values in paired-dierence algorithm comparisons. In Neural Networks, 2004. Proceedings.
2004 IEEE International Joint Conference on, volume 2, pages 1331{1335. IEEE, 2004.
MIT Election Data and Science Lab. County Presidential Election Returns 2000-2016, 2018. URL
https://doi.org/10.7910/DVN/VOQCHQ.
Marlon Mooijman, Joe Hoover, Ying Lin, Heng Ji, and Morteza Dehghani. Moralization in social
networks and the emergence of violence during protests. Nature Human Behaviour, 2:389{396,
2018.
Christopher Z Mooney. Monte Carlo Simulation. SAGE Publications, 1997.
Barrington Moore. Moral Purity and Persecution in History. Princeton University Press, 2000.
73
Lincoln Mullen and Jordan Bratt. USAboundaries: Historical and Contemporary Bound-
aries of the United States of America, 2017. URL https://CRAN.R-project.org/package=
USAboundaries. R package version 0.3.0.
David Nirenberg. Communities of Violence: Persecution of Minorities in the Middle Ages -
Updated Edition. Princeton University Press, May 2015.
Martin Obschonka, Michael Stuetzer, David B Audretsch, Peter J Rentfrow, Je Potter, and
Samuel D Gosling. Macropsychological factors predict regional economic resilience during a
major economic crisis. Social Psychological and Personality Science, 7(2):95{104, 2016.
Jacob Orchard and Joseph Price. County-level racial prejudice and the black-white gap in infant
health outcomes. Social Science & Medicine, 181:191{198, 2017. doi: 10.1016/j.socscimed.2017.
03.036.
David K Park, Andrew Gelman, and Joseph Bafumi. Bayesian multilevel estimation with post-
stratication: State-Level estimates from national polls. Political analysis: an annual publica-
tion of the Methodology Section of the American Political Science Association, 12(4):375{385,
2004a.
David K Park, Andrew Gelman, and Joseph Bafumi. Bayesian multilevel estimation with post-
stratication: State-Level estimates from national polls. Political Analysis, 12(4):375{385,
2004b. doi: 10.1093/pan/mph024.
Jerey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word
representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP
2014), 12:1532{1543, 2014.
Barbara Perry and Michael Sutton. Seeing red over black and white: Popular and media rep-
resentations of inter-racial relationships as precursors to racial violence. Canadian Journal of
Criminology and Criminal Justice, 48(6):887{904, October 2006.
Barbara Perry and Mike Sutton. Policing the colour line violence against those in intimate
interracial relationships. Race, Gender & Class, pages 240{261, 2008.
Pew. Demographics of Internet Users. Pew Internet & American Life Project, 2018.
http://www.pewinternet.org/fact-sheet/internet-broadband/.
Pew Research Center. Assaults against muslims in U.S. surpass
2001 level. https://www.pewresearch.org/fact-tank/2017/11/15/
assaults-against-muslims-in-u-s-surpass-2001-level/, 2017. Accessed: 2019-6-5.
Sylwia J Piatkowska, Steven F Messner, and Tse-Chuan Yang. Xenophobic and racially moti-
vated crime in belgium: exploratory spatial data analysis and spatial regressions of structural
covariates. Deviant behavior, 39(11):1398{1418, November 2018.
James R Rae, Anna-Kaisa Newheiser, and Kristina R Olson. Exposure to racial out-groups and
implicit race bias in the united states. Social Psychological and Personality Science, 6(5):
535{543, 2015. doi: 10.1177/1948550614567357.
T S Rai and A P Fiske. Beyond harm, intention, and dyads: Relationship regulation, virtuous
violence, and metarelational morality. Psychological inquiry, 2012.
Tage S Rai. Higher self-control predicts engagement in undesirable moralistic aggression. Per-
sonality and individual dierences, 149:152{156, October 2019.
74
Stephen W Raudenbush and Anthony S Bryk. Hierarchical linear models: Applications and data
analysis methods, volume 1. Sage, 2002.
Peter J Rentfrow and Markus Jokela. Geographical psychology: The spatial organization of
psychological phenomena. Current Directions in Psychological Science, 25(6):393{398, 2016.
doi: 10.1177/0963721416658446.
Peter J Rentfrow, Samuel D Gosling, and Je Potter. A theory of the emergence, persistence, and
expression of geographic variation in psychological characteristics. Perspectives on Psychological
Science, 3(5):339{369, 2008.
Peter J Rentfrow, Samuel D Gosling, Markus Jokela, David J Stillwell, Michal Kosinski, and Je
Potter. Divided we stand: three psychological regions of the united states and their political,
economic, social, and health correlates. Journal of Personality and Social Psychology, 105(6):
996{1012, 2013. doi: 10.1037/a0034434.
Peter J Rentfrow, Markus Jokela, and Michael E Lamb. Regional personality dierences in great
britain. PloS One, 10(3):e0122245, 2015.
Travis Riddle and Stacey Sinclair. Racial disparities in school-based disciplinary actions are
associated with county-level rates of racial bias. PNAS, 116(17):8255{8260, 2019.
Andrea Riebler, Sigrunn H Srbye, Daniel Simpson, and H avard Rue. An intuitive bayesian spatial
model for disease mapping that accounts for scaling. Statistical methods in medical research,
25(4):1145{1165, August 2016.
Kevin Roose. On gab, an Extremist-Friendly site, pittsburgh shooting suspect aired his hatred in
full. The New York Times, October 2018.
Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, and Derek Ruths. A web of hate:
Tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159, 2017.
P Selb and S Munzert. Estimating constituency preferences from sparse survey data using auxiliary
geographic information. Political Analysis, 2011.
Pete Simi and Robert Futrell. American Swastika: Inside the white power movement's hidden
spaces of hate. Rowman & Littleeld, 2015.
Linda J Skitka, Brittany E Hanson, and Daniel C Wisneski. Utopian hopes or dystopian fears?
exploring the motivational underpinnings of moralized political engagement. Personality &
social psychology bulletin, 43(2):177{190, February 2017.
Mark D Smucker, James Allan, and Ben Carterette. A comparison of statistical signicance
tests for information retrieval evaluation. In Proceedings of the Sixteenth ACM Conference on
Conference on Information and Knowledge Management, CIKM '07, pages 623{632, New York,
NY, USA, 2007. ACM.
Southern Law Poverty Center. Splc hate map. https://www.splcenter.org/hate-map, 2019.
Accessed: 2019-6-24.
SPLC. Hate groups reach record high. https://www.splcenter.org/news/2019/02/19/
hate-groups-reach-record-high, 2019. Accessed: 2019-6-5.
Nitish Srivastava, Georey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.
Dropout: a simple way to prevent neural networks from overtting. The Journal of Machine
Learning Research, 15(1):1929{1958, 2014.
75
Michele Stacey, Kristin Carbone-L opez, and Richard Rosenfeld. Demographic change and ethni-
cally motivated crime: The impact of immigration on Anti-Hispanic hate crime in the united
states. Journal of contemporary criminal justice, 27(3):278{298, August 2011a.
Michele Stacey, Kristin Carbone-L opez, and Richard Rosenfeld. Demographic change and ethni-
cally motivated crime: The impact of immigration on Anti-Hispanic hate crime in the united
states. Journal of contemporary criminal justice, 27(3):278{298, August 2011b.
Johan Steen, Tom Loeys, Beatrijs Moerkerke, and Stijn Vansteelandt. Flexible mediation analysis
with multiple mediators. American journal of epidemiology, 186(2):184{193, July 2017.
Marco R Steenbergen and Bradford S Jones. Modeling multilevel data structures. American
Journal of Political Science, 46(1):218{237, 2002. doi: 10.2307/3088424.
W G Stephan and C W Stephan. Intergroup threat theory. In Young Y Kim, editor, The
International Encyclopedia of Intercultural Communication, volume 39, pages 1{12. John Wiley
& Sons, Inc., Hoboken, NJ, USA, November 2017.
Walter G Stephan and Cookie White Stephan. Predicting prejudice. International journal of
intercultural relations: IJIR, 20(3):409{426, June 1996.
Brendesha M Tynes, Michael T Giang, David R Williams, and Geneene N Thompson. Online racial
discrimination and psychological adjustment among adolescents. The Journal of adolescent
health: ocial publication of the Society for Adolescent Medicine, 43(6):565{569, December
2008.
United States Department of Justice, Federal Bureau of Investigation. Hate crime statistics, 2017,
November 2018. https://ucr.fbi.gov/hate-crime/2017/topic-pages/jurisdiction.
US Census Bureau. 2015 Census Bureau's MAF/TIGER geographic database.
"https://www.census.gov/geo/maps-data/data/cbf/cbf counties.html", 2015.
Zehra Valencia, Breyon Williams, and Robert Pettis. Pride and prejudice: Same-Sex marriage
legalization announcements and LGBT Hate-Crimes. March 2019.
T J VanderWeele and S Vansteelandt. Mediation analysis with multiple mediators. Epidemiologic
methods, 2(1):95{115, January 2014.
Tyler J VanderWeele. Mediation analysis: A practitioner's guide. Annual review of public health,
37:17{32, 2016.
Tyler J VanderWeele, Yun Zhang, and Pilar Lim. Brief report: Mediation analysis with an ordinal
outcome. Epidemiology, 27(5):651{655, September 2016.
A Vehtari, T Mononen, V Tolvanen, T Sivula, and others. Bayesian leave-one-out cross-validation
approximations for gaussian latent variable models. The Journal of Machine, 2016.
Aki Vehtari, Andrew Gelman, and Jonah Gabry. Practical bayesian model evaluation using leave-
one-out cross-validation and WAIC. Statistics and computing, 27(5):1413{1432, September
2017.
Karina Velasco Gonz alez, Maykel Verkuyten, Jeroen Weesie, and Edwin Poppe. Prejudice towards
muslims in the netherlands: testing integrated threat theory. The British journal of social
psychology / the British Psychological Society, 47(Pt 4):667{685, December 2008.
Kyle Walker. tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-
Ready Data Frames, 2019. R package version 0.9.2.
76
Wei Wang, David Rothschild, Sharad Goel, and Andrew Gelman. Forecasting elections with
non-representative polls. International Journal of Forecasting, 31(3):980{991, 2015.
Yan Wang, James B Holt, Fang Xu, Xingyou Zhang, Daniel P Dooley, Hua Lu, and Janet B
Croft. Using 3 health surveys to compare multilevel models for small area estimation for
chronic diseases and health behaviors. Preventing Chronic Disease, 15:E133, 2018.
Rand R Wilcox. Introduction to robust estimation and hypothesis testing. Academic press, New
York, 2017.
Frank Kaiyuan Xu, Nicole Lofaro, Brian A Nosek, and Anthony G Greenwald. Race IAT 2002-
2017. https://osf.io/52qxl/, 2013. Accessed: 2018-9-30.
Maarten P Zaal, Colette Van Laar, Tomas St ahl, Naomi Ellemers, and Belle Derks. By any means
necessary: the eects of regulatory focus and moral conviction on hostile and benevolent forms
of collective action. The British journal of social psychology / the British Psychological Society,
50(4):670{689, December 2011.
Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringh-
ini, Haewoon Kwak, and Jeremy Blackburn. What is gab? a bastion of free speech or an
alt-right echo chamber? arXiv preprint arXiv:1802.05287, 2018.
Ziqi Zhang, David Robinson, and Jonathan Tepper. Detecting hate speech on twitter using a
convolution-gru based deep neural network. In European Semantic Web Conference, pages
745{760. Springer, 2018.
77
Appendix A
A.1 Simple MrsP
To generate synthetic post-stratication joint distributions, Leemann and Wasserfallen (2017)
propose two approaches, which they refer to as `simple MrsP' and `MrsP with adjusted synthetic
joint distributions'. Under simple MrsP, synthetic joint distributions are calculated merely as
the product of the post-stratication variables' marginal distributions. Thus, for a given set of
post-stratication variables, the joint distributions for each sub-national unit k is estimated as
the product of the post-stratication variables' marginal distributions within each sub-national
unit. For example, for an arbitrary sub-national unit k, the simple synthetic joint distribution of
3-level age and education P
aek
can be estimated as the products of their marginal distributions,
as shown in Table A.1.
Table A.1: Example of simple MrsP synthetic joint
distribution for age and education
age = 1 age = 2 age = 3
edu = 1 0:01 0:05 0:04 0:1
edu = 2 0:08 0:4 0:32 0:8
edu = 3 0:01 0:05 0:04 0:1
0:1 0:5 0:4
Note: The marginal probabilities for each level
of age and education are shown in the row and
column margins, respectively. The simple syn-
thetic joint distribution is shown in the interior
cells.
It is also worth noting that simple MrsP can be used to extend a known joint distribution to
include an additional variable for which only the marginal distribution is known. For example,
the joint distribution of age and gender could be extended to include education via the same
procedure.
After generating synthetic joint probabilities via MrsP, the estimation procedure is identical
to estimation with MrP. That is, sub-national means are calculated as the population-weighted
mean of the model predictions for each post-stratication cross-classication. Accordingly, substi-
tuting MrsP estimation for MrP estimation is relatively easy and requires little additional domain
expertise.
However, a notable short-coming of simple MrsP estimation is that the estimated joint distri-
bution will only be correct when the auxiliary variables are independent. As they diverge from
independence, the synthetic joint distribution becomes a less accurate (Leemann and Wasserfallen,
2017). Given that complete independence is rarely observed, this means that simple synthetic
78
joint estimates will almost always be wrong. However, Leemann and Wasserfallen (2017) nd
that errors in the synthetic joint distribution do not necessarily induce errors in post-stratied,
sub-national estimates.
Specically, sub-national estimates remain constant, regardless of the synthetic joint distribu-
tion, as long as the auxiliary variables are modeled with constant marginal eects. Under these
conditions, MrP and simple MrsP yield identical sub-national estimates, even if the synthetic
joint distribution estimated in MrsP diers substantially from the true joint distribution used in
MrP. However, a biased synthetic joint distribution will yield divergent MrsP estimates when the
condition of constant marginal eects is violated.
Thus, for instance, Leemann and Wasserfallen (2017) note that MrsP estimates derived from a
probit or logistic response model | which both have non-constant marginal eects | will diverge
from MrP estimates. However, even with complete dependence among auxiliary variables, the
magnitude of this divergence is generally negligible. Importantly, divergences can also occur with
linear regression response models, but only when the marginal eects of the auxiliary variables
are non-constant, which occurs when interaction eects are estimated. In this case, the degree of
divergence between MrsP and MrP will vary as a function of the magnitude of the interactions.
A.1.0.1 Adjusted MrsP
However, this does not mean that researchers should avoid estimating interactions among aux-
iliary variables, as ignoring a meaningful interaction will also inhibit estimation accuracy. Instead,
when auxiliary variables are correlated and their marginal eects are non-constant, researchers
can either accept that some degree of bias will aect simple MrsP's sub-national estimates, or
they can rely on the second approach to estimating synthetic joint distributions, which employs
an adjustment procedure to rene the synthetic joint distribution.
The goal of this adjustment procedure is to encode any available knowledge about the true
joint distribution in the synthetic joint distribution. For example, while the joint distribution of
a set of auxiliary variables may not be known at the county-level | thus making MrsP necessary
for county-level estimation | in many cases it can be estimated at the national or even state level.
Such higher-level estimates of the joint distribution can then be used as a baseline or template
for estimating the synthetic joint distribution for each sub-national unit.
To generate an adjusted synthetic joint distribution for a given level of sub-national analysis,
the following data is requisite:
1. First, the upper-level, auxiliary variable cross-classication population counts N
u;j
must be
gathered or estimated, where j indexes the cross-classications of the auxiliary variables
and u indexes the upper-level units in u = 1;:::;U. In some cases, this joint distribution
may be available via census data; however, surveys can also be used to estimate it. For
example, a nationally representative survey could be used to estimate the joint distribution
for a set of auxiliary variables, such as gender, age, and education. However, the quality of
adjusted MrsP estimates will depend on the accuracy of the auxiliary correlations encoded
in this data. Accordingly, researchers must be careful in deciding whether a given survey is
suciently reliable, as using unreliable data to adjust MrsP can produce estimates that are
inferior to simple MrsP.
2. Next, at the targeted lower level of analysis, if the cross-classication population proportion
of any subset of the auxiliary variables is available, this should be obtained. We represent this
subset of the joint distribution as P
u[l];k
, whereu[l] indexes the lower-level units l = 1;:::;L
in upper-level unit u andk indexes the cross-classications of the auxiliary variable subset.
For example, the county-level joint distribution of age gender education could be the
desired joint distribution; but perhaps only the county-level joint distribution of age x gender
is available. In this case, the county-level joint distribution of age x gender P
u[l];k
| which
represents the proportion population proportion of people who fall into age x gender cross-
classicationk for countyl, which is nested in upper-level unitu | should be obtained. If no
79
joint distribution data is available for the targeted level of analysis,P
u[l];k
will be a marginal
distribution for one of the variables in the desired joint distribution. For example, if the
county-level joint distribution of age x gender is not available, P
u[l];k
can be the marginal
distribution of age or gender in unit u[l].
3. Finally, at the targeted sub-national level of analysis, the marginal population proportions
P
u[l];m
for each variablem not represented inP
u[l];k
must be obtained. Thus, if age x gender
x education is the desired joint distribution, but only the joint distribution of age x gender
is available, the marginal population proportions of education must be obtained in order to
construct the synthetic joint distribution.
Given these data, several things can be treated as known. First, the overall or general re-
lationship among the auxiliary variables is encoded in N
u;j
. Second, if P
u[l];k
is available, the
sub-national joint distribution for some subset of the auxiliary variables is known. Finally, for
those variables not included in subset in P
u[l];k
, the distributions P
u[l];m
tell us their marginal
population proportions at the targeted sub-national level. The purpose of adjusted MrsP is then
to generate a sub-national synthetic joint distributionP
MrsPA
u[l];j
that accounts for the correlational
information encoded in N
u;j
.
To accomplish this, P
u[l];m
is rst used to adjust the marginal distribution of m, the variable
being added to the joint distribution, in N
u;j
for each unit u[l] so that it matches P
u[l];m
. In
our ongoing example, this means that the marginal distribution of education in N
u;j
is adjusted
to match the known marginal distribution of education in unit u[l], P
u[l];m
. This adjustment is
accomplished by transforming P
u;j
with a correction factor:
cf
u[l]m[i]
=
P
u[l]m[i]
P
um[i]
(A.1)
whereP
u[l];m[i]
is the true marginal population proportion of people that fall into leveli of variable
m within sub-national unitu[l] andP
u;m[i]
is an estimate of the same marginal proportion derived
fromN
u;j
. For example, ifm =education,P
u;m[i]
is the proportion of people with education level
i observed in the national-level data; andP
u[l];m[i]
is the proportion of people with that education
level in sub-national unitu[l]. Thus,cf
u[l];m[i]
is simply the ratio of the true proportion of people
with education =i to the proportion estimated from the national data.
N
u;j
is then transformed as follows:
N
adj
u[l];k;m[i]
=N
u;k;m[i]
cf
u[l];m[i]
(A.2)
where cf
u[l];m[i]
is the correction factor for level i of variable m, N
u;k;m[i]
is the set of cross-
classication population counts in unitu[l] for whichm =i, andN
adj
u[l];k;m[i]
is the cross-classication
population counts for unit u[l] that have been adjusted so that the margin of m[i] is the same as
the observed margin inP
u[l];m[i]
. This means that, in our example,N
adj
u[l];k;m[i]
is the adjusted pop-
ulation count for the cross-classication of age x gender x education in unit u[l]; the adjustment
ensures that the marginal distribution of education matches the known marginal distribution of
education in county u[l].
Finally, the adjusted synthetic joint distribution is generated by using N
adj
u[l];k;m[i]
to extend
P
u[l];k
, the known cross-classication population proportions:
P
MrsPA
u[l];j
=P
u[l];k
N
adj
u[l];k;m[i]
P
I
i=1
N
adj
u[l];k;m[i]
(A.3)
where the second right-hand term is the relative weight of m's levels for each cross-classication
of the k, the subset of auxiliary variables for which the joint distribution is known. Specically,
80
the numerator is simply the adjusted population count of people who fall into cross-classication
j (i.e. cross-classication k;m =i) in sub-national unit u[l]; and the denominator is the adjusted
population count of people who fall into cross-classication j, summed across each level of m.
This yields P
MrsPA
u[l];j
, the estimated proportion of people in sub-national unit u[l] who fall into
cross-classication j.
For example, the numerator of the second term in EQ A.3 represents the adjusted population
count for the cross-classication gender x age x education. The denominator, on the other hand,
represents the adjusted population count of the cross-classication of gender x age summed across
each level of education. P
u[l];k
represents the proportion of people in unit u[l] who fall into
cross-classication k of age x gender. Finally, P
MrsPA
u[l];j
represents the estimated proportion of
people in unit u[l] who fall into cross-classication j of age x gender x education. Accordingly,
in contrast to simple MrsP estimates, adjusted MrsP estimates are enhanced by information
about the correlational relationship between the auxiliary variables. To generate estimates for an
outcome Y in unit u[l], P
MrsPA
u[l];j
can be substituted for P
u
[s[c]];j in EQ 2.3.
This procedure can be repeated multiple times to further extend P
MrsPA
u[l];j
. For example, it
could be applied to the adjusted synthetic joint distribution of age x gender x education and the
marginal distribution of another variable in order to produce a 4-dimensional joint distribution.
However, it some instances there may be no available data on the marginal distribution for a
desired variable. In such cases, it is still possible to estimate a synthetic joint distribution.
However, rather than relying on marginal distributions to implement adjustments, the marginal
distribution for the target variable can be estimated via the full adjusted MrsP procedure. That
is, a multinomial response model estimating the proportions of the target model can be used to
make predictions for each level of the variable and then these predictions can be post-stratied,
yielding an estimated marginal distribution for the target variable (Claassen and Traunm uller,
2018, Kastellec et al., 2015). Then, adjusted MrsP can proceed as above. However, while this
approach is feasible, its benet should be weighted against its cost: if a reasonable response
model cannot be estimated, then the predicted marginal distribution will be inaccurate and this
will negatively eect the accuracy of the nal estimation procedure. Accordingly, researchers
should carefully evaluate the importance of including a particular variable, keeping in mind that
estimation of that variable's marginal distribution may be biased.
81
Appendix B
B.1 Study 1
B.1.1 Data Generating Processing
In this section, we provide detailed information about the data generating process designed
for Study 1. For each sub-national unit l in l = 1;:::; 40, a population n
l
is sampled from a
Poisson distribution with = 25; 000. This average sample size was selected to balance the goals
of creating a reasonably large population for each area while also maintaining computational
practicality.
Then, for each sub-national unit, marginal population distributions are sampled for three
demographic variables,
d
. The marginal distributions for the rst two of these variables are
sampled from the following Dirichlet distributions:
1
Dirichlet( = [10; 10])
2
Dirichlet( = [5; 5; 5]) (B.1)
Where can be interpreted loosely as sample counts for each level of the Dirichlet distributed
variable. Thus, for example,
1
is a two-level categorical variable with equal weight on both levels.
Note, however, that a marginal distribution for
d
is randomly drawn for each sub-national unit.
Accordingly, each sub-national unit has its own population proportions for each
d
and these are
not necessarily equal.
The marginal distribution for
3
is constructed similarly to the other
s; however, there is
one key dierence. Rather than being drawn from a single Dirichlet distribution,
3
is drawn
from a 3-component mixture of Dirichlets. This means that for each
3
l
,
3
s marginal population
proportion in sub-national unit l will be randomly drawn from one of three dierent Dirichlet
distributions:
Dirichlet( = [33; 33; 33])
Dirichlet( = [70; 20; 10])
Dirichlet( = [10; 20; 70]) (B.2)
with probabilitiesp = [:50;:25;:25]. Accordingly, while 50% of the
3
l
s will be drawn from a Dirich-
let distribution that places an equal but relatively small weight on each level of the variable, the
other 50% of the
3
l
s will be drawn from Dirichlet distributions that place a disproportionately
high weight on the lowest or highest levels and decreasing weight on the other levels. For exam-
ple, under this parameterization, for approximately 25% of the sub-national units, the marginal
distribution for
3
will have an expected value of p(
3
1
= 70%;
3
2
= 20%;
3
3
= 10%). As such,
sampling the sub-national marginal distributions for
3
from a mixture of Dirichlets allows us
82
to simulate variation in the population composition of sub-national units. These conditions were
chosen so that we could simulate sub-national units with distinct demographic compositions.
Next, we generated demographic characteristics and sub-national contextual factor values. To
do this, the sub-national demographic marginal distributions for each unit l,
d
l
, were used to
randomly assign demographic values (i.e. levels of the
d
s) to each simulated respondent i in
unit l. Specically, the
d
l[i]
values were generated as uncorrelated ordinal variables using the
SimMultiCorrData Version 0.2.1 R package (Fialkowski, 2017), which simulates ordinal data with
a given correlation structure. This yielded a set of demographic characteristics [
1
l[i]
;
2
l[i]
;
3
l[i]
]
for each respondent i in sub-national unit l. Finally, a continuous sub-national level contextual
factor,
l
, was generated by drawing from a Standard Normal distribution.
Standardized values for the outcome variable Y
l[i]
were generated as:
Y
l[i]
=std(
0
+
1
1=2
l[i]
+
2
2=2
l[i]
+
3
2=3
l[i]
+
4
3=2
l[i]
+
5
3=3
l[i]
+
6
l[i]
+
SNU
l
+
l[i]
) (B.3)
where
i
;i = 1;:::; 5 are randomly drawn mean dierences for each dummy-coded contrast of the
demographic variables distributed as:
1
N (0; 1)
2
N (0:25; 1)
3
N (0:5; 1)
4
N (0; 1)
5
N (0:25; 1)
(B.4)
and
6
is a randomly drawn eect for the contextual factor
l
distributed as:
N (0:3; 1) (B.5)
Finally,
l
and
l[i]
are, respectively, a randomly drawn sub-national unit eect and individual-
level error, distributed as:
N (0; 1)
N (0; 1) (B.6)
Accordingly, the simulated eects for the demographic variables and contextual factor were
parameterized to be modest on average, but also variable, such that a given eect could small or
large on any given draw.
Finally, so that Y
l[i]
would be comparable across simulations, the summation of the linear
model was standardized to have = 0 and = 1.
In review, this simulation procedure yields a total population of approximately 1,000,000, with
an expected 25,000 occupants per sub-national unit. Further, each occupant is assigned a single
level for each of three demographic variables. However, this demographic assignment process is
constrained, so that the demographic marginal distributions at the sub-national unit level match
the assigned marginal distributions.
For
1
, a two-level demographic variable, and
2
, a three-level demographic variable, the
expected values for the assigned marginal distributions are balanced. In contrast, the expected
marginal distribution for
3
, a three-level demographic variable, is balanced for 50% of the sub-
national units, biased toward one extreme in 25% of the units, and biased toward the other
83
extreme in the remaining 25%. Accordingly, this parameterization yields three dierent types of
sub-national units: those with proportions of occupants who are roughly balanced on
3
, those
who have many more occupants with
3
= 1, and those who have many more occupants with
3
= 3. Finally, values for the sub-national unit contextual factor are also generated.
Next, the outcome variableY
l[i]
is simulated by randomly sampling eects for the mean dier-
ences between demographic levels, the contextual factor, sub-national units, and error. Finally,
the outcome is calculated as the standardized linear combination of these eects for each respon-
dent nested in each unit.
B.1.2 Sampling Procedure
Given a simulated population with a specic set of population weights and outcome variable
Y
l[i]
, 100 samples are iteratively drawn for each of combination of sample size (S = [1000; 10; 000; 50; 000; 100; 000])
and
3
response biases (p(1=3; 1=3; 1=3), p(0:4; 0:35; 0:25), and p(0:70; 0:2; 0:1)). These levels of
response bias were selected to simulate three dierent conditions of sampling bias. Under the rst
condition, no response bias, cases assigned with each level of
3
are equally likely to be sampled.
Under the next condition, moderate response bias, cases assigned with the rst level of gamma
3
are approximately 1.15 times more likely to respond that cases assigned with the second level
of gamma
3
and cases assigned with the second level of gamma
3
are 1.4 times more likely to re-
spond than cases assigned with the third level of gamma
3
. Finally, the last condition of response
bias, high response bias, follows the same logic while making the degree of bias more severe. For
instance, in the high response bias condition, cases assigned with the rst level of gamma
3
are
3.5 times more likely to respond than cases assigned with the second level of gamma
3
and cases
assigned with the second level of gamma
3
are 2 times more likely to respond than cases assigned
with the third level of gamma
3
. These conditions of response bias were selected so that we could
investigate the consequences of varying degrees of response bias.
Further, to simulate sub-national unit-level response biases, a response probability RP
l
is
randomly generated for each unit by drawing from a Beta distribution:
RPBeta( = 5; = 4,000) (B.7)
and normalizing the obtained values. This parameterization yields a set of unit-level response
probabilities with 0:0025, which is roughly equivalent to the expected response probability
under a uniform response distribution, and 0:001. Accordingly, under these conditions, 2.5
respondents are expected for every thousand occupants per unit; however, this rate randomly
varies from around
0:5
100
to
5
100
, which induces relative sparsity for some units.
84
Appendix C
C.1 Study 1
C.1.1 Analysis A
C.1.1.1 Results
In Study 1, our primary analysis was conducted with the sub sample of messages posted by Gab
users wither fewer than 500 posts. Specically, we used a hierarchical logistic regression model to
estimate the probability of a message being labeled as hate speech conditional on whether it was
also labeled as evoking the Individualizing vices or Binding vices. In this model, the intercept and
the eects of the moral vices were permitted to vary across users. Below, we report results from
the same model that was estimated using the entire Gab corpus (N
p
osts = 24,978,951; N
u
sers;
= 236,823).
Table C.1: Eect of the presence of Individualizing and Binding vices on the log-probability of
Message-level hate speech
Fixed Eects
Intercept 5.926
(0.012)
Individualizing Rhetoric 1.713
(0.009)
Binding Rhetoric 3.125
(0.007)
Random Eects
Intercept
User
1.49
Individualizing Rhetoric
User
0.66
Binding Rhetoric
User
0.65
Note: Estimates reported on log-scale.
p < 0.01
C.1.2 Analysis B
To supplement the primary analysis for Study 1, we also evaluated the association between
hate speech and the Binding and Individualizing foundations using an alternative analytic strat-
egy. Specically, rather than using independent Long Short-term Memory (LSTM) neural network
models models (Hochreiter and Schmidhuber, 1997) to detect hate speech and moral rhetoric, we
trained two additional LSTMs that included indicators for either the Binding or Individualiz-
ing vices as additional predictive features. This allowed us to directly address the questions of
whether (1) accounting for moral rhetoric improves hate speech detection and (2) whether ac-
counting for rhetoric evoking the Binding vices improves performance more than accounting for
the Individualizing vices.
85
To train these models, we relied on the same data used for our rst analysis. Specically,
as in our rst analysis, the rst model that we developed (See left panel of Figure C.1) was a
standard LSTM, trained only to predict whether or not a given post was labeled as hate speech. In
contrast, the second and third models were designed to incorporate labels indicating the presence
of Binding or Individualizing vices. These labels were represented as a vector of features and
concatenated to the LSTM's output vector to predict the hate label.
Because this architecture directly incorporates contextual information about the moral content
of a message, it allowed us to test the hypothesis that the semantic spaces of moral concerns and
hate overlap. If our hypothesis is wrong (i.e. if hate speech does not rely on the language of
the moral vices), the feature-based models should perform worse or no better than the vanilla
LSTM model because adding irrelevant features should add noise to the information extracted
from a sentence. On the other hand, if our hypothesis is correct, the feature-based models should
perform better than the basic model and this would suggest an intersection between hate speech
and moral language.
Figure C.1: Vanilla LSTM Model Structure (left) and feeature-based LSTM (right). The feature-
based LSTM exploits the input features along with the LSTM output to predict the label.
Similar to our other models, the vanilla and feature-based models both represent posts as ma-
trices of pretrained GloVe word embeddings (Pennington et al., 2014) corresponding to the words
in the original post. This embedding matrix is then input to a 100 dimensional LSTM layer which
is connected to a layer of fully connected units, with 0.33 dropout ratio. A softmax transformation
is then applied to the output of the nal layer in order to generate probabilistic predictions for the
outcome. Further, in the feature-base model, the feature set is transformed to vector and concate-
nated to the output of the LSTM before generating the predictions. Specically,the feature-based
LSTMs were trained with Binding and Individualizing moral vices independently. The accuracy
of the models were then calculated by training and testing the models in 10 iterations of 10-fold
cross validations.
The comparative performance of feature-based and vanilla LSTM models was consistent with
our hypotheses. Specically, the feature-based model trained on the Binding vices (M
F1
=
0:671;SD
F1
= 0:032) substantially outperformed the vanilla LSTM model, which was trained to
predict only hate (M
F1
= 0:64;SD
F1
= 0:031), Yuen's t(117.9) = 6.47, SE = 0.005, 95% CI = [-
0.04 -0.02], = 0.64, 95% CI [0.44, 0.77]
1
. In contrast, the model trained with the Individualizing
1
Here, we report Yuen's t because F1 scores are not normally distributed (Menke and Martinez, 2004, Smucker
et al., 2007) and Yuen's t is robust to deviations from normality (Wilcox, 2017). Eect sizes for Yuen's t, , are
86
vices (M
F1
= 0:645;SD
F1
= 0:034) did not substantively improve performance, Yuen's t(117.4)
= 1.90, SE = 0.005, 95% CI = [ -0.02, 0.0004 ], = 0:20, 95% CI = [0, 0.4].
Consistent with the results from our primary analysis, these results indicate that hate speech
is often articulated via language that evokes the Binding concerns. Specically, by incorporating
information about the presence or absence of language evoking the Binding vices into the repre-
sentations of Gab posts, we were able to substantively improve our models' capacity for detecting
hate speech.
C.2 Study 3
C.2.1 Regression Models
Here we report model estimates for each of the three regression models estimated for Study
3 (See main text for discussion). Specically, we report mean posterior estimates with Highest
Posterior Density Intervals for the xed and random eects estimated via these models (See Table
C.2). We also report estimates from models that adjust for individual-level political ideology (M
= 2:67, SD = 1:78; See Table C.3). Note, for these models, N = 278 as 10 participants indicated
that they either did not know or have a political ideology.
Table C.2: Study 3 Model Estimates.
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP J EBEP J
Intercept 0:24
[0:40; 0:09]
High Moral Threat 0:50
1:44
0:28
[0:27; 0:72] [0:15; 2:78] [0:87; 1:55]
PMW Std. 2:21
[1:48; 2:94]
Intercept[1] 0:71 0:17
[3:55; 4:43] [3:63; 3:76]
Intercept[2] 2:65 2:10
[1:46; 6:56] [1:64; 5:74]
Intercept[3] 3:65 3:10
[0:66; 7:34] [0:76; 6:67]
Intercept[4] 5:19
4:68
[0:89; 8:93] [0:74; 8:19]
Intercept[5] 6:70
6:25
[2:28; 10:34] [2:23; 9:69]
Intercept[6] 9:14
8:83
[4:74; 12:87] [4:98; 12:48]
SD of Random Eects
Intercept
Subject
3:79
2:99
[3:27; 4:37] [2:55; 3:45]
Intercept
EBEP
4:04
3:78
[1:36; 8:12] [1:28; 7:48]
High Moral Threat
EBEP
0:62
0:71
[0:00; 1:76] [0:00; 2:06]
PMW Std
EBEP
0:47
[0:00; 1:29]
0 outside 95% highest posterior density interval. Estimates for Models 2 and 3 reported on log scale.
also reported and can be interpreted as similar to Cohen's d, such thatxi of 0.15, 0.35, 0.50 is comparable to d =
0.2, 0.5, 0.8.
87
Table C.3: Study 3 Model Estimates Adjusted for Ideology
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP J EBEP J
Intercept 0:23
[0:38; 0:08]
High Moral Threat 0:47
1:48
0:51
[0:25; 0:68] [0:27; 2:68] [0:72; 1:70]
Ideology Std. 0:32
1:88
1:22
[0:21; 0:43] [1:15; 2:64] [0:38; 1:96]
PMW Std. 1:83
[1:03; 2:63]
Intercept[1] 0:67 0:24
[3:68; 4:32] [3:55; 3:67]
Intercept[2] 2:65 2:21
[1:68; 6:31] [1:55; 5:65]
Intercept[3] 3:65 3:22
[0:65; 7:34] [0:56; 6:65]
Intercept[4] 5:26
4:86
[1:00; 9:03] [1:15; 8:39]
Intercept[5] 6:80
6:46
[2:51; 10:57] [2:62; 9:94]
Intercept[6] 9:23
9:00
[4:83; 12:97] [5:13; 12:56]
SD of Random Eects
Intercept
Subject
3:40
2:85
[2:89; 3:93] [2:41; 3:29]
Intercept
EBEP
4:12
3:85
[1:33; 8:06] [1:21; 7:52]
High Moral Threat
EBEP
0:60
0:66
[0:00; 1:73] [0:00; 1:97]
Ideology Std
EBEP
0:46
0:51
[0:00; 1:26] [0:00; 1:41]
PMW Std
EBEP
0:50
[0:00; 1:40]
0 outside 95% highest posterior density interval. Estimates for Models 2 and 3 reported on log scale.
Ideology is coded such that higher values indicate stronger associations with Conservative ideology.
88
C.2.2 Mediation Analysis
To investigate whether perceived moral wrongdoing statistically mediated the eect of condi-
tion, we relied on Bayesian posterior simulation to estimate average mediation eects and average
direct eects (Imai et al., 2010, Steen et al., 2017, VanderWeele and Vansteelandt, 2014, Vander-
Weele et al., 2016). We used this framework because standard approaches to mediation analysis
are not appropriate for ordered logistic regression outcome models (VanderWeele et al., 2016)
due to their xed error variance (Imai et al., 2010) and because it oers a holistic approach to
mediation estimation for hierarchical models (VanderWeele, 2016).
C.2.2.1 Method
Under this approach, the average mediation eect (AME; e.g. indirect eect) for a mediator
M is estimated via posterior simulations from two Bayesian regression models. In the rst of
these models, M is regressed on the independent or treatment variable T . In the second of these
models, the endogenous dependent variable Y is regressed on both M and T .
Next, the model posteriors are used to generate a set of counterfactual predictions that are
used to estimate the AME and average direct eect (ADE). Specically, the rst model is used to
simulate N predicted values of MjT =T . For example, this might involve simulating two sets of
values for M by drawing 500 values for MjT =control and 500 values for MjT =experimental
from the model posterior. Per convention, we represent these sets of simulated values as M
T=0
andM
T=1
, respectively, whereM
T=0
represents the conditional posterior distribution ofM when
treatment equals zero.
Then, M
T=0
and M
T=1
, the simulated values generated in the previous step, are used to
simulate the expected values of YjT =t;M =m. Specically, three sets of values are simulated
for Y : YjT = 0;M = M
T=1
, YjT = 1;M = M
T=0
, YjT = 0;M = M
T=0
. These simulation
sets approximate the conditional distribution of Y given treatment equals t and the M equals a
plausible value under a given treatment condition. For example, the rst set, which we represent
asY
T=0;M
T=1
, approximates the expected distribution ofY whereT = 0 butM is set as ifT = 1.
That is, Y
T=0;M
T=1
estimates the posterior distribution of Y for values of M expected under the
treatment condition while setting T to zero. Similarly, the second and third sets, Y
T=1;M
T=0
and
Y
T=0;M
T=0
, approximate the posterior distributions forY whenT = 1 butM is set as ifT = 0 and
forY whenT = 0 andM is set as ifT = 0. Thus, in contrast toY
T=0;M
T=1
,Y
T=1;M
T=0
estimates
the posterior distribution of Y under the treatment condition while eectively blocking the eect
of M and Y
T=0;M
T=0
estimates the posterior distribution of Y under the control condition.
Finally, the simulation sets Y
T=0;M
T=1
, Y
T=1;M
T=0
, and Y
T=0;M
T=0
are used to estimate the
AME and ADE. Specically, the posterior distribution of the AME is calculated as Y
T=0;M
T=1
Y
T=0;M
T=0
and the ADE is calculated asY
T=1;M
T=0
Y
T=0;M
T=0
. Accordingly, the AME estimates
the counterfactual change in Y that is expected given that the treatment is held constant at control
but the mediator is changed as if the treatment was administered. Similarly, the ADE estimates
the counterfactual change in Y that is expected given that the treatment is administered, but the
mediator is restricted to values consistent with the control condition.
Because this approach to mediation estimation relies on model predictions, concerns about
coecient comparisons across models are irrelevant. Importantly, even for generalized linear
models, it also enables direct estimates of mediation eects on the scale of the dependent variable.
Thus, combining this approach with ordered logistic regression allows us to estimate AMEs and
ADEs for the probabilities of selecting each response level of the dependent variable. Finally, this
approach also facilitates estimating AMEs and ADEs while adjusting for covariates. To do this,
covariates are included in the regression models and then conditioned on at specic levels during
the simulation step.
Using this approach, we estimated AMEs and ADEs for perceived moral wrongdoing (M),
experimental condition (T ) and EBEP justication (Y ) using the two-step process outlined above.
Further, we estimated these eects both without and with adjustments for political ideology.
89
First, for both the low and high moral threat conditions, we used Model 1 to simulate two sets
of 500 perceived moral wrongdoing scores, which we represent as M
T=0
and M
T=1
, respectively.
Then, for each EBEP item j, we used Model 3 to simulate three sets, each consisting of 500
draws, of EBEP justication scores conditional on experimental condition and the simulated moral
wrongness scores, Y
j
T=0;M
T=1
, Y
j
T=1;M
T=0
, and Y
j
T=0;M
T=0
. For all posterior draws from Model 3,
we conditioned on the random eects of EBEP item and marginalized over the random eects of
subject. Thus, for each of the four EBEP items, this process yielded posterior approximations of
Y
j
T=0;M
T=1
,Y
j
T=1;M
T=0
, andY
j
T=0;M
T=0
, such that each simulation set consisted of 500 (simulated
moral wrongness scores)500 (posterior draws) = 250; 000 values.
Finally, we calculated AMEs and ADEs for eachj2j = 1;:::; 4 EBEP items as well as marginal
AME and ADE across all EBEP items as
AME
j
=
1
N
j
(
i=Nj
X
i=1
Y
ij;T=0;M
T=1
Y
ij;T=0;M
T=0
)
ADE
j
=
1
N
j
(
i=Nj
X
i=1
Y
ij;T=1;M
T=0
Y
ij;T=0;M
T=0
)
AME =
1
P
j=4
j=1
N
j
(
j=4
X
j=1
i=N
X
i=1
Y
ij;T=0;M
T=1
Y
ij;T=0;M
T=0
)
ADE =
1
P
j=4
j=1
N
j
(
j=4
X
j=1
i=N
X
i=1
Y
ij;T=1;M
T=0
Y
ij;T=0;M
T=0
),
where N = 250,000.
C.2.2.2 Results
As noted above, our mediation procedure yielded AME and ADE estimates for the probability
of selecting a given EBEP response level for each EBEP item and a given EBEP response level
marginalized across EBEP items. Here, we present all of these estimates, as well as summary
estimates that represent the AMEs and ADEs for responses \slightly justied".
Table C.4: Study 3 Mediation Estimates for PMW for EBEP \Slightly justied"
EBEP AME ADE Total
Facebook 0.03 [0, 0.24] 0.01 [-0.03, 0.08] 0.04 [0, 0.28]
Flyer 0.03 [0, 0.2] 0.01 [-0.02, 0.11] 0.04 [0, 0.26]
Yell 0.03 [0, 0.19] 0.01 [-0.02, 0.07] 0.03 [0, 0.23]
Assault 0.03 [0, 0.21] 0.01 [-0.02, 0.11] 0.04 [0, 0.25]
Marginal 0.03 [0, 0.2] 0.01 [-0.02, 0.08] 0.04 [0, 0.24]
NOTE: Cell values represent posterior means and 95% CIs. Bold entries indicate that the CI does not overlap
with zero. Because EBEP is ordinal, mediation is estimated at each level of the variable. Here, however, results
are summarized to re
ect the indirect, direct, and total eects on the probability of an EBEP being rated as at
least `Slightly Justied' marginalized across EBEP items.
90
Table C.5: Study 3 Mediation Estimates for PMW for EBEP \Slightly justied" Adjusted for
Ideology
EBEP AME ADE Total
Facebook 0.02 [0, 0.19] 0.02 [-0.01, 0.15] 0.04 [0, 0.32]
Flyer 0.03 [0, 0.2] 0.01 [-0.02, 0.13] 0.04 [0, 0.32]
Yell 0.02 [0, 0.17] 0.01 [-0.01, 0.11] 0.03 [0, 0.23]
Assault 0.02 [0, 0.15] 0.01 [-0.01, 0.12] 0.03 [0, 0.23]
Marginal 0.02 [0, 0.18] 0.01 [-0.01, 0.11] 0.03 [0, 0.26]
NOTE: Cell values represent posterior means and 95% CIs. Bold entries indicate that the CI does not overlap
with zero. Because EBEP is ordinal, mediation is estimated at each level of the variable. Here, however, results
are summarized to re
ect the indirect, direct, and total eects on the probability of an EBEP being rated as at
least `Slightly Justied' marginalized across EBEP items. All eects were estimated with standardized political
ideology set to its mean.
Table C.6: Study 3 Mediation Estimates for PMW at Each Response Level
EBEP Response Level AME ADE Total
Facebook 1 -0.185 [-0.37, -0.01] -0.045 [-0.29, 0.15] -0.23 [-0.56, 0]
2 0.072 [-0.19, 0.23] 0.021 [-0.1, 0.18] 0.092 [-0.24, 0.34]
3 0.041 [-0.08, 0.13] 0.009 [-0.06, 0.09] 0.049 [-0.1, 0.18]
4 0.041 [-0.05, 0.17] 0.008 [-0.05, 0.09] 0.049 [-0.07, 0.22]
5 0.019 [0, 0.14] 0.003 [-0.02, 0.04] 0.023 [0, 0.17]
6 0.01 [0, 0.11] 0.003 [-0.01, 0.02] 0.014 [0, 0.13]
7 0.002 [0, 0.01] 0.001 [0, 0] 0.003 [0, 0.02]
Flyer 1 -0.185 [-0.37, -0.01] -0.047 [-0.29, 0.12] -0.232 [-0.55, 0]
2 0.07 [-0.19, 0.23] 0.018 [-0.13, 0.16] 0.088 [-0.25, 0.33]
3 0.042 [-0.07, 0.12] 0.01 [-0.05, 0.09] 0.052 [-0.09, 0.18]
4 0.044 [-0.01, 0.17] 0.01 [-0.05, 0.12] 0.054 [-0.01, 0.24]
5 0.018 [0, 0.13] 0.004 [-0.02, 0.06] 0.022 [0, 0.16]
6 0.008 [0, 0.07] 0.004 [0, 0.04] 0.011 [0, 0.1]
7 0.003 [0, 0.01] 0.001 [0, 0] 0.004 [0, 0.02]
Yell 1 -0.18 [-0.36, -0.01] -0.046 [-0.29, 0.15] -0.225 [-0.54, 0]
2 0.067 [-0.19, 0.22] 0.016 [-0.12, 0.15] 0.082 [-0.25, 0.31]
3 0.042 [-0.07, 0.13] 0.011 [-0.04, 0.1] 0.053 [-0.08, 0.18]
4 0.044 [0, 0.17] 0.012 [-0.04, 0.11] 0.056 [-0.01, 0.24]
5 0.018 [0, 0.12] 0.005 [-0.01, 0.05] 0.023 [0, 0.15]
6 0.007 [0, 0.06] 0.002 [0, 0.02] 0.01 [0, 0.07]
7 0.002 [0, 0.01] 0 [0, 0] 0.002 [0, 0.01]
Assault 1 -0.185 [-0.37, -0.01] -0.049 [-0.31, 0.16] -0.234 [-0.58, 0]
2 0.063 [-0.18, 0.23] 0.013 [-0.17, 0.16] 0.076 [-0.25, 0.34]
3 0.046 [-0.07, 0.13] 0.013 [-0.06, 0.09] 0.059 [-0.08, 0.19]
4 0.046 [-0.04, 0.17] 0.013 [-0.04, 0.11] 0.06 [-0.03, 0.24]
5 0.019 [0, 0.11] 0.006 [-0.01, 0.08] 0.025 [0, 0.15]
6 0.009 [0, 0.09] 0.003 [0, 0.03] 0.012 [0, 0.11]
7 0.002 [0, 0.01] 0 [0, 0] 0.002 [0, 0.02]
91
Table C.7: Study 3 Mediation Estimates for PMW at Each Response Level Adjusted for Ideology
EBEP Response Level AME ADE Total
Facebook 1 -0.137 [-0.3, 0] -0.084 [-0.38, 0.1] -0.221 [-0.57, 0]
2 0.057 [-0.15, 0.19] 0.033 [-0.17, 0.22] 0.09 [-0.26, 0.34]
3 0.028 [-0.07, 0.1] 0.017 [-0.05, 0.11] 0.045 [-0.11, 0.18]
4 0.029 [-0.06, 0.13] 0.017 [-0.06, 0.14] 0.046 [-0.1, 0.25]
5 0.014 [0, 0.11] 0.009 [-0.02, 0.09] 0.022 [0, 0.19]
6 0.008 [0, 0.09] 0.007 [0, 0.06] 0.015 [0, 0.17]
7 0.001 [0, 0.01] 0.001 [0, 0.01] 0.002 [0, 0.03]
Flyer 1 -0.139 [-0.3, 0] -0.073 [-0.36, 0.16] -0.212 [-0.54, 0]
2 0.051 [-0.15, 0.19] 0.028 [-0.13, 0.19] 0.08 [-0.24, 0.33]
3 0.031 [-0.07, 0.11] 0.016 [-0.06, 0.11] 0.047 [-0.11, 0.19]
4 0.031 [-0.08, 0.14] 0.016 [-0.07, 0.12] 0.047 [-0.11, 0.22]
5 0.014 [0, 0.11] 0.007 [-0.02, 0.08] 0.021 [0, 0.17]
6 0.009 [0, 0.11] 0.004 [0, 0.05] 0.013 [0, 0.15]
7 0.002 [0, 0.02] 0.002 [0, 0.01] 0.004 [0, 0.02]
Yell 1 -0.145 [-0.3, -0.01] -0.079 [-0.32, 0.1] -0.224 [-0.52, 0]
2 0.057 [-0.15, 0.19] 0.029 [-0.14, 0.19] 0.086 [-0.26, 0.34]
3 0.033 [-0.06, 0.1] 0.018 [-0.04, 0.11] 0.051 [-0.08, 0.18]
4 0.033 [-0.03, 0.14] 0.018 [-0.03, 0.13] 0.051 [-0.05, 0.23]
5 0.014 [0, 0.1] 0.008 [-0.01, 0.07] 0.022 [0, 0.14]
6 0.007 [0, 0.06] 0.004 [0, 0.03] 0.011 [0, 0.09]
7 0.001 [0, 0.01] 0.001 [0, 0] 0.002 [0, 0.01]
Assault 1 -0.14 [-0.3, -0.01] -0.077 [-0.33, 0.09] -0.216 [-0.54, 0]
2 0.054 [-0.15, 0.18] 0.032 [-0.11, 0.18] 0.086 [-0.22, 0.32]
3 0.034 [-0.05, 0.1] 0.017 [-0.05, 0.11] 0.051 [-0.08, 0.18]
4 0.033 [-0.01, 0.14] 0.017 [-0.03, 0.13] 0.05 [-0.01, 0.22]
5 0.013 [0, 0.09] 0.007 [-0.01, 0.08] 0.02 [0, 0.15]
6 0.005 [0, 0.05] 0.003 [0, 0.04] 0.008 [0, 0.08]
7 0.001 [0, 0.01] 0.001 [0, 0] 0.002 [0, 0.01]
NOTE: Cell values represent posterior means and 95% CIs. Bold entries indicate that the CI does not overlap
with zero. All eects were estimated with standardized political ideology set to its mean.
92
Table C.8: Study 4 Model Estimates
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP EBEP
Intercept 0:00
[0:09; 0:09]
I Values Std. 0:16
1:15
0:87
[0:26; 0:06] [1:91; 0:40] [1:70; 0:03]
B Values Std. 0:47
1:60
0:73
[0:37; 0:56] [1:10; 2:11] [0:21; 1:19]
PMW Std. 1:63
[0:74; 2:51]
Intercept[1] 0:20 0:19
[3:89; 3:93] [3:69; 3:98]
Intercept[2] 2:34 2:35
[1:82; 6:03] [1:79; 5:89]
Intercept[3] 3:45 3:47
[0:59; 7:26] [0:73; 6:96]
Intercept[4] 4:70
4:73
[0:60; 8:51] [0:54; 8:22]
Intercept[5] 6:42
6:51
[2:16; 10:12] [2:40; 10:13]
Intercept[6] 8:13
8:28
[3:95; 11:96] [4:17; 11:97]
SD of Random Eects
Intercept
Subject
2:83
2:39
[2:43; 3:24] [2:01; 2:75]
Intercept
EBEP
4:12
3:94
[1:43; 8:01] [1:41; 7:78]
I Values Std
EBEP
0:57
0:63
[0:03; 1:45] [0:01; 1:61]
B Values Std
EBEP
0:21
0:23
[0:00; 0:69] [0:00; 0:75]
PMW Std
EBEP
0:61
[0:00; 1:70]
0 outside 95% highest posterior density interval. Estimates for Models 2 and 3 reported on log scale.
C.3 Study 4
Here we report model estimates for each of the three regression models estimated for Study
4 (See main text for discussion). Specically, we report mean posterior estimates with Highest
Posterior Density Intervals for the xed and random eects estimated via these models (See Table
C.8). We also report estimates from models that adjust for individual-level political ideology (M
= 2:28, SD = 1:66; See Table C.9). Note, for these models, N = 313 as 11 participants indicated
that they either did not know or have a political ideology.
C.3.1 Mediation Results
To investigate whether perceived moral wrongdoing statistically mediated the eect of the
Binding values, we used the same approach as in Study 3. Specically, we estimated AMEs and
ADEs for PMW (M) using standardized Binding values as the exogenous treatment variable and
perceived EBEP justication as the outcome variable. To evaluate the eects of standardized
Binding values, we focus on expected changes in PMW and EBEP justication given a change
from 0 to 1 in standardized Binding values (i.e. the dierence between being at the mean of
Binding values vs being one standard deviation above the mean). All procedural steps were
otherwise identical to those used for the mediation analysis reported for Study 3.
93
Table C.9: Study 4 Model Estimates Adjusted for Ideology
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP EBEP
Intercept 0:01
[0:10; 0:09]
I Values Std. 0:07 0:68 0:59
[0:18; 0:04] [1:61; 0:33] [1:55; 0:39]
B Values Std. 0:35
1:08
0:47
[0:23; 0:46] [0:47; 1:69] [0:15; 1:03]
PMW Std. 1:53
[0:44; 2:57]
Ideology Std. 0:22
1:13
0:78
[0:10; 0:35] [0:53; 1:73] [0:18; 1:42]
Intercept[1] 0:13 0:19
[4:13; 3:73] [4:12; 3:65]
Intercept[2] 2:33 2:41
[2:04; 5:84] [1:66; 6:09]
Intercept[3] 3:50 3:59
[0:73; 7:17] [0:44; 7:33]
Intercept[4] 4:76
4:87
[0:44; 8:36] [0:78; 8:57]
Intercept[5] 6:54
6:72
[2:26; 10:25] [2:48; 10:33]
Intercept[6] 8:47
8:74
[4:22; 12:25] [4:50; 12:41]
SD of Random Eects
Intercept
EBEP
4:25
4:03
[1:48; 8:18] [1:53; 7:93]
Intercept
Subject
2:83
2:44
[2:42; 3:26] [2:06; 2:81]
I Values Std
EBEP
0:69
0:72
[0:01; 1:76] [0:03; 1:83]
B Values Std
EBEP
0:27
0:28
[0:00; 0:84] [0:00; 0:89]
PMW Std
EBEP
0:80
[0:00; 2:10]
Ideology Std
EBEP
0:28
0:34
[0:00; 0:87] [0:00; 1:03]
0 outside 95% highest posterior density interval. Estimates for Models 2 and 3 reported on log scale.
Ideology is coded such that higher values indicate stronger associations with Conservative ideology.
Table C.10: Study 4 Mediation Estimates for PMW for EBEP \Slightly justied"
EBEP AME ADE Total
Facebook 0.03 [0, 0.19] 0.02 [0, 0.18] 0.05 [0, 0.37]
Flyer 0.03 [0, 0.18] 0.02 [0, 0.18] 0.05 [0, 0.34]
Yell 0.03 [0, 0.19] 0.03 [0, 0.17] 0.06 [0, 0.36]
Assault 0.03 [0, 0.18] 0.03 [0, 0.15] 0.05 [0, 0.31]
Assault 0.03 [0, 0.18] 0.03 [0, 0.17] 0.05 [0, 0.34]
Table C.11: Study 4 Mediation Estimates for PMW for EBEP \Slightly justied" Adjusted for
Ideology
EBEP AME ADE Total
Facebook 0.02 [0, 0.13] 0.02 [0, 0.14] 0.03 [0, 0.26]
Flyer 0.02 [0, 0.13] 0.02 [0, 0.12] 0.03 [0, 0.24]
Yell 0.02 [0, 0.11] 0.01 [0, 0.09] 0.03 [0, 0.17]
Assault 0.01 [0, 0.1] 0.01 [0, 0.09] 0.03 [0, 0.18]
Marginal 0.02 [0, 0.11] 0.01 [0, 0.1] 0.03 [0, 0.19]
94
Table C.12: Study 4 Mediation Estimates for PMW at Each Response Level
EBEP Response Level AME ADE Total
Facebook 1 -0.122 [-0.25, 0] -0.114 [-0.26, 0] -0.237 [-0.45, -0.01]
2 0.046 [-0.15, 0.16] 0.044 [-0.13, 0.17] 0.089 [-0.27, 0.3]
3 0.028 [-0.07, 0.1] 0.026 [-0.06, 0.09] 0.055 [-0.13, 0.17]
4 0.022 [-0.04, 0.1] 0.02 [-0.04, 0.09] 0.042 [-0.1, 0.17]
5 0.017 [0, 0.11] 0.015 [0, 0.1] 0.032 [0, 0.21]
6 0.006 [0, 0.06] 0.006 [0, 0.07] 0.013 [0, 0.13]
7 0.003 [0, 0.03] 0.003 [0, 0.04] 0.007 [0, 0.08]
Flyer 1 -0.123 [-0.25, 0] -0.12 [-0.25, 0] -0.243 [-0.45, -0.01]
2 0.043 [-0.16, 0.17] 0.043 [-0.14, 0.17] 0.086 [-0.28, 0.31]
3 0.029 [-0.07, 0.09] 0.029 [-0.06, 0.1] 0.058 [-0.13, 0.18]
4 0.024 [-0.03, 0.11] 0.023 [-0.03, 0.1] 0.048 [-0.07, 0.19]
5 0.018 [0, 0.11] 0.017 [0, 0.11] 0.035 [0, 0.21]
6 0.006 [0, 0.05] 0.005 [0, 0.05] 0.011 [0, 0.11]
7 0.003 [0, 0.02] 0.002 [0, 0.02] 0.005 [0, 0.04]
Yell 1 -0.121 [-0.25, 0] -0.113 [-0.25, 0] -0.233 [-0.44, 0]
2 0.037 [-0.15, 0.17] 0.036 [-0.16, 0.16] 0.073 [-0.29, 0.3]
3 0.029 [-0.07, 0.1] 0.027 [-0.06, 0.1] 0.056 [-0.13, 0.18]
4 0.024 [-0.06, 0.1] 0.022 [-0.06, 0.1] 0.046 [-0.12, 0.2]
5 0.018 [0, 0.12] 0.017 [-0.01, 0.12] 0.035 [0, 0.22]
6 0.007 [0, 0.08] 0.006 [0, 0.08] 0.014 [0, 0.15]
7 0.004 [0, 0.05] 0.004 [0, 0.04] 0.009 [0, 0.08]
Assault 1 -0.119 [-0.24, 0] -0.115 [-0.25, 0] -0.235 [-0.45, -0.01]
2 0.037 [-0.16, 0.16] 0.036 [-0.17, 0.17] 0.073 [-0.3, 0.3]
3 0.029 [-0.07, 0.1] 0.029 [-0.05, 0.1] 0.058 [-0.11, 0.19]
4 0.025 [-0.05, 0.11] 0.025 [-0.04, 0.11] 0.05 [-0.08, 0.2]
5 0.018 [0, 0.12] 0.017 [0, 0.1] 0.035 [0, 0.2]
6 0.007 [0, 0.06] 0.006 [0, 0.05] 0.013 [0, 0.11]
7 0.003 [0, 0.02] 0.003 [0, 0.02] 0.006 [0, 0.06]
95
Table C.13: Study 4 Mediation Estimates for PMW at Each Response Level Adjusted for Ideology
EBEP Response Level AME ADE Total
Facebook 1 -0.085 [-0.2, 0] -0.077 [-0.22, 0.01] -0.162 [-0.36, -0.01]
2 0.033 [-0.11, 0.13] 0.03 [-0.11, 0.14] 0.063 [-0.19, 0.25]
3 0.021 [-0.05, 0.08] 0.018 [-0.05, 0.08] 0.039 [-0.1, 0.14]
4 0.015 [-0.03, 0.07] 0.013 [-0.02, 0.07] 0.028 [-0.06, 0.13]
5 0.011 [0, 0.08] 0.01 [0, 0.09] 0.021 [0, 0.16]
6 0.004 [0, 0.05] 0.004 [0, 0.04] 0.008 [0, 0.1]
7 0.002 [0, 0.01] 0.001 [0, 0.01] 0.003 [0, 0.02]
Flyer 1 -0.084 [-0.2, 0] -0.076 [-0.21, 0.01] -0.161 [-0.36, -0.01]
2 0.032 [-0.11, 0.13] 0.029 [-0.11, 0.14] 0.062 [-0.2, 0.24]
3 0.02 [-0.05, 0.08] 0.018 [-0.05, 0.08] 0.038 [-0.1, 0.14]
4 0.015 [-0.04, 0.07] 0.014 [-0.02, 0.07] 0.028 [-0.05, 0.14]
5 0.011 [0, 0.08] 0.01 [0, 0.08] 0.021 [0, 0.14]
6 0.004 [0, 0.05] 0.004 [0, 0.04] 0.008 [0, 0.08]
7 0.002 [0, 0.02] 0.001 [0, 0.01] 0.003 [0, 0.03]
Yell 1 -0.082 [-0.19, 0] -0.075 [-0.21, 0.02] -0.157 [-0.34, 0]
2 0.029 [-0.12, 0.13] 0.028 [-0.1, 0.14] 0.057 [-0.19, 0.23]
3 0.021 [-0.04, 0.07] 0.02 [-0.03, 0.09] 0.042 [-0.07, 0.14]
4 0.017 [-0.01, 0.08] 0.015 [-0.01, 0.07] 0.031 [-0.01, 0.13]
5 0.01 [0, 0.08] 0.008 [-0.01, 0.06] 0.019 [0, 0.12]
6 0.003 [0, 0.03] 0.003 [0, 0.02] 0.006 [0, 0.04]
7 0.002 [0, 0.01] 0.001 [0, 0] 0.003 [0, 0.01]
Assault 1 -0.084 [-0.19, 0] -0.074 [-0.2, 0] -0.159 [-0.34, -0.01]
2 0.03 [-0.11, 0.13] 0.026 [-0.12, 0.13] 0.056 [-0.2, 0.23]
3 0.023 [-0.04, 0.08] 0.02 [-0.04, 0.08] 0.043 [-0.07, 0.14]
4 0.017 [-0.02, 0.08] 0.016 [-0.01, 0.08] 0.032 [-0.01, 0.14]
5 0.01 [0, 0.07] 0.009 [0, 0.07] 0.019 [0, 0.12]
6 0.003 [0, 0.03] 0.002 [0, 0.02] 0.006 [0, 0.04]
7 0.001 [0, 0.01] 0.001 [0, 0.01] 0.002 [0, 0.01]
96
Table C.14: Study 5 Model Estimates
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP EBEP
Intercept 0:00
[0:08; 0:08]
I Values Std. 0:25
1:70
1:29
[0:34; 0:17] [2:33; 1:11] [1:86; 0:66]
B Values Std. 0:44
2:29
1:48
[0:35; 0:52] [1:65; 2:94] [0:81; 2:16]
PMW Std. 1:72
[0:68; 2:67]
Intercept[1] 1:40 1:57
[2:18; 4:31] [1:62; 4:10]
Intercept[2] 3:26 3:44
[0:46; 6:09] [0:27; 6:00]
Intercept[3] 4:21
4:42
[0:61; 7:18] [1:32; 7:06]
Intercept[4] 5:80
6:07
[2:06; 8:64] [2:90; 8:65]
Intercept[5] 6:86
7:16
[3:25; 9:82] [3:96; 9:77]
Intercept[6] 8:62
8:97
[4:87; 11:58] [5:88; 11:74]
SD of Random Eects
Intercept
Subject
4:41
4:16
[3:90; 4:96] [3:66; 4:65]
Intercept
EBEP
3:00
2:51
[0:84; 6:36] [0:76; 5:39]
I Values Std
EBEP
0:27
0:31
[0:00; 0:77] [0:00; 0:89]
B Values Std
EBEP
0:20
0:27
[0:00; 0:63] [0:00; 0:84]
PMW Std
EBEP
0:70
[0:10; 1:77]
0 outside 95% highest posterior density interval
C.4 Study 5
Here we report model estimates for each of the three regression models estimated for Study
5 (See main text for discussion). Specically, we report mean posterior estimates with Highest
Posterior Density Intervals for the xed and random eects estimated via these models (See Table
C.14). We also report estimates from models that adjust for individual-level political ideology (M
= 2:97, SD = 2:01; See Table C.15). Note, for these models, N = 508 as 3 participants indicated
that they either did not know or have a political ideology.
C.4.1 Mediation Results
To investigate whether perceived moral wrongdoing statistically mediated the eect of the
Binding values, we used the same approach as in Study 3. Specically, we estimated AMEs and
ADEs for PMW (M) using standardized Binding values as the exogenous treatment variable and
perceived EBEP justication as the outcome variable. To evaluate the eects of standardized
Binding values, we focus on expected changes in PMW and EBEP justication given a change
from 0 to 1 in standardized Binding values (i.e. the dierence between being at the mean of
Binding values vs being one standard deviation above the mean). All procedural steps were
otherwise identical to those used for the mediation analysis reported for Study 3.
97
Table C.15: Study 5 Model Estimates Adjusted for Ideology
Fixed Eects Model 1 Model 2 Model 3
PMW Std. EBEP EBEP
Intercept 0:00
[0:07; 0:07]
I Values Std. 0:09
1:22
1:08
[0:17; 0:00] [1:95; 0:43] [1:85; 0:29]
B Values Std. 0:25
1:70
1:31
[0:16; 0:34] [1:00; 2:50] [0:58; 2:09]
PMW Std. 1:48
[0:60; 2:44]
Ideology Std. 0:41
1:29
0:67
[0:33; 0:50] [0:31; 2:20] [0:14; 1:43]
Intercept[1] 1:61 1:62
[1:75; 4:18] [1:43; 4:36]
Intercept[2] 3:49
3:51
[0:30; 6:26] [0:40; 6:20]
Intercept[3] 4:47
4:50
[1:12; 7:11] [1:27; 7:11]
Intercept[4] 6:11
6:17
[2:81; 8:79] [2:90; 8:74]
Intercept[5] 7:19
7:27
[3:90; 9:88] [4:07; 9:96]
Intercept[6] 8:96
9:08
[5:73; 11:77] [5:99; 11:92]
SD of Random Eects
Intercept
Subject
4:38
4:19
[3:85; 4:91] [3:70; 4:70]
Intercept
EBEP
2:65
2:52
[0:79; 5:63] [0:70; 5:32]
I Values Std
EBEP
0:43
0:43
[0:00; 1:15] [0:00; 1:22]
B Values Std
EBEP
0:31
0:36
[0:00; 0:96] [0:00; 1:08]
PMW Std
EBEP
0:59
[0:00; 1:61]
Ideology Std
EBEP
0:63
0:48
[0:09; 1:62] [0:00; 1:27]
0 outside 95% highest posterior density interval
Table C.16: Study 5 Mediation Estimates for PMW for EBEP \Slightly justied"
EBEP AME ADE Total
Facebook 0.01 [0, 0.07] 0.02 [0, 0.13] 0.03 [0, 0.2]
Flyer 0.01 [0, 0.07] 0.02 [0, 0.15] 0.03 [0, 0.25]
Yell 0.01 [0, 0.05] 0.02 [0, 0.15] 0.03 [0, 0.22]
Assault 0.01 [0, 0.06] 0.02 [0, 0.13] 0.03 [0, 0.21]
Marginal 0.01 [0, 0.06] 0.02 [0, 0.15] 0.03 [0, 0.23]
Table C.17: Study 5 Mediation Estimates for PMW for EBEP \Slightly justied" Adjusted for
Ideology
EBEP AME ADE Total
Facebook 0.003 [0, 0.026] 0.018 [0, 0.159] 0.021 [0.001, 0.174]
Flyer 0.003 [0, 0.029] 0.017 [0, 0.137] 0.02 [0.001, 0.161]
Yell 0.003 [0, 0.02] 0.015 [0, 0.122] 0.018 [0, 0.146]
Assault 0.003 [0, 0.015] 0.014 [0, 0.115] 0.016 [0, 0.126]
Marginal 0.003 [0, 0.02] 0.015 [0, 0.123] 0.018 [0, 0.147]
98
Table C.18: Study 5 Mediation Estimates for PMW at Each Response Level
EBEP Response Level AME ADE Total
Facebook 1 -0.112 [-0.26, -0.01] -0.246 [-0.45, -0.04] -0.358 [-0.64, -0.05]
2 0.066 [-0.09, 0.16] 0.133 [-0.18, 0.27] 0.2 [-0.27, 0.38]
3 0.022 [0, 0.08] 0.051 [0, 0.15] 0.073 [0, 0.2]
4 0.017 [0, 0.1] 0.042 [0, 0.19] 0.059 [0, 0.27]
5 0.004 [0, 0.03] 0.01 [0, 0.07] 0.014 [0, 0.11]
6 0.002 [0, 0.02] 0.006 [0, 0.04] 0.008 [0, 0.06]
7 0.001 [0, 0] 0.003 [0, 0.01] 0.005 [0, 0.01]
Flyer 1 -0.116 [-0.25, -0.01] -0.252 [-0.44, -0.04] -0.368 [-0.63, -0.08]
2 0.064 [-0.12, 0.16] 0.133 [-0.19, 0.27] 0.198 [-0.29, 0.39]
3 0.023 [0, 0.08] 0.052 [-0.01, 0.13] 0.075 [0, 0.2]
4 0.02 [0, 0.11] 0.045 [0, 0.21] 0.065 [0, 0.3]
5 0.005 [0, 0.04] 0.012 [0, 0.09] 0.017 [0, 0.14]
6 0.003 [0, 0.02] 0.007 [0, 0.05] 0.01 [0, 0.08]
7 0.001 [0, 0.01] 0.002 [0, 0.01] 0.003 [0, 0.02]
Yell 1 -0.111 [-0.26, -0.01] -0.241 [-0.45, -0.04] -0.353 [-0.62, -0.06]
2 0.064 [-0.08, 0.16] 0.131 [-0.19, 0.28] 0.195 [-0.27, 0.39]
3 0.022 [0, 0.08] 0.049 [0, 0.13] 0.071 [0, 0.2]
4 0.018 [0, 0.1] 0.042 [0, 0.19] 0.06 [0, 0.28]
5 0.005 [0, 0.03] 0.011 [0, 0.08] 0.016 [0, 0.13]
6 0.002 [0, 0.01] 0.007 [0, 0.05] 0.009 [0, 0.06]
7 0 [0, 0] 0.002 [0, 0.01] 0.002 [0, 0.01]
Assault 1 -0.115 [-0.25, -0.01] -0.252 [-0.46, -0.04] -0.366 [-0.63, -0.07]
2 0.067 [-0.1, 0.15] 0.138 [-0.16, 0.27] 0.205 [-0.26, 0.38]
3 0.022 [0, 0.08] 0.051 [0, 0.14] 0.073 [0, 0.2]
4 0.018 [0, 0.1] 0.043 [0, 0.2] 0.061 [0, 0.28]
5 0.004 [0, 0.04] 0.01 [0, 0.07] 0.015 [0, 0.11]
6 0.002 [0, 0.02] 0.006 [0, 0.04] 0.009 [0, 0.07]
7 0.001 [0, 0] 0.002 [0, 0.01] 0.003 [0, 0.02]
99
Table C.19: Study 5 Mediation Estimates for PMW at Each Response Level Adjusted for Ideology
EBEP Response Level AME ADE Total
Facebook 1 -0.051 [-0.13, 0] -0.209 [-0.42, -0.02] -0.259 [-0.49, -0.03]
2 0.032 [-0.04, 0.08] 0.117 [-0.16, 0.27] 0.149 [-0.21, 0.31]
3 0.009 [0, 0.04] 0.041 [-0.03, 0.13] 0.05 [-0.01, 0.16]
4 0.007 [0, 0.04] 0.033 [0, 0.17] 0.04 [0, 0.21]
5 0.002 [0, 0.01] 0.009 [0, 0.08] 0.011 [0, 0.09]
6 0.001 [0, 0.01] 0.006 [0, 0.06] 0.007 [0, 0.06]
7 0 [0, 0] 0.002 [0, 0.01] 0.003 [0, 0.02]
Flyer 1 -0.051 [-0.13, 0] -0.206 [-0.41, -0.03] -0.258 [-0.5, -0.04]
2 0.031 [-0.04, 0.08] 0.116 [-0.18, 0.25] 0.147 [-0.2, 0.3]
3 0.009 [0, 0.04] 0.04 [-0.02, 0.13] 0.049 [-0.01, 0.16]
4 0.007 [0, 0.05] 0.034 [0, 0.18] 0.042 [0, 0.23]
5 0.002 [0, 0.02] 0.009 [0, 0.08] 0.011 [0, 0.1]
6 0.001 [0, 0.01] 0.005 [0, 0.04] 0.006 [0, 0.05]
7 0 [0, 0] 0.002 [0, 0.01] 0.002 [0, 0.01]
Yell 1 -0.051 [-0.13, 0] -0.212 [-0.43, -0.03] -0.263 [-0.53, -0.04]
2 0.031 [-0.04, 0.08] 0.115 [-0.18, 0.27] 0.145 [-0.21, 0.31]
3 0.01 [0, 0.04] 0.044 [0, 0.14] 0.054 [0, 0.16]
4 0.008 [0, 0.05] 0.038 [0, 0.19] 0.046 [0, 0.23]
5 0.002 [0, 0.01] 0.009 [0, 0.07] 0.011 [0, 0.09]
6 0.001 [0, 0.01] 0.005 [0, 0.04] 0.006 [0, 0.05]
7 0 [0, 0] 0.001 [0, 0.01] 0.001 [0, 0.01]
Assault 1 -0.051 [-0.12, 0] -0.208 [-0.41, -0.02] -0.26 [-0.5, -0.03]
2 0.034 [-0.02, 0.08] 0.123 [-0.15, 0.26] 0.156 [-0.15, 0.31]
3 0.009 [0, 0.04] 0.041 [0, 0.13] 0.05 [0, 0.15]
4 0.006 [0, 0.04] 0.031 [0, 0.17] 0.037 [0, 0.2]
5 0.001 [0, 0.01] 0.007 [0, 0.06] 0.008 [0, 0.07]
6 0.001 [0, 0] 0.005 [0, 0.03] 0.006 [0, 0.04]
7 0 [0, 0] 0.002 [0, 0.01] 0.002 [0, 0.01]
100
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Socio-ecological psychology of moral values
PDF
How misinformation exploits moral values and framing: insights from social media platforms and behavioral experiments
PDF
How perceived moral congruence shapes propensities to engage in pro-group behaviors
PDF
The moral foundations of needle exchange attitudes
PDF
The spread of moral content in online social networks
PDF
Three text-based approaches to the evolution of political values and attitudes
PDF
A Bayesian region of measurement equivalence (ROME) framework for establishing measurement invariance
PDF
Socially-informed content analysis of online human behavior
PDF
Countering problematic content in digital space: bias reduction and dynamic content adaptation
PDF
Identifying Social Roles in Online Contentious Discussions
PDF
Powerful guts: how power limits the role of disgust in moral judgment
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
A roadmap for changing student roadmaps: designing interventions that use future “me” to change academic outcomes
PDF
Sacrificing cost, performance and usability for privacy: understanding the value of privacy in a multi‐criteria decision problem
PDF
Self-reported and physiological responses to hate speech and criticisms of systemic social inequality: an investigation of response patterns and their mediation…
PDF
Balancing prediction and explanation in the study of language usage and speaker attributes
PDF
Differentiated motivations of preservice teachers to enter the teaching profession in Hawai‘i
PDF
A neuropsychological exploration of low-SES adolescents’ life goals and their motives
PDF
A framework for research in human-agent negotiation
PDF
A series of longitudinal analyses of patient reported outcomes to further the understanding of care-management of comorbid diabetes and depression in a safety-net healthcare system
Asset Metadata
Creator
Hoover, Joe E.
(author)
Core Title
Bound in hatred: a multi-methodological investigation of morally motivated acts of hate
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
07/12/2020
Defense Date
09/06/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Hate,hate crime,moral psychology,moral values,multilevel regression and poststratification,natural language processing,OAI-PMH Harvest,Prejudice,small area estimation,Violence
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Dehghani, Morteza (
committee chair
), Barbera, Pablo (
committee member
), Lai, Hok Chio (
committee member
), Wood, Wendy (
committee member
), Yazdiha, Hajar (
committee member
)
Creator Email
joehoover88@gmail.com,joseph.hoover@kellogg.northwestern.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-327341
Unique identifier
UC11665853
Identifier
etd-HooverJoeE-8666.pdf (filename),usctheses-c89-327341 (legacy record id)
Legacy Identifier
etd-HooverJoeE-8666.pdf
Dmrecord
327341
Document Type
Dissertation
Rights
Hoover, Joe E.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
hate crime
moral psychology
moral values
multilevel regression and poststratification
natural language processing
small area estimation