Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Balancing prediction and explanation in the study of language usage and speaker attributes
(USC Thesis Other)
Balancing prediction and explanation in the study of language usage and speaker attributes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
BALANCING PREDICTION AND EXPLANATION IN THE STUDY OF LANGUAGE USAGE
AND SPEAKER ATTRIBUTES
by
Brendan Timothy Kennedy
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2022
Copyright 2022 Brendan Timothy Kennedy
Dedication
This thesis is dedicated to my parents, Tim and Carol, who supported me in my many years of growth and
education and taught me to love truth and knowledge; and to my wife, Delaram, without whom I would
not have finished this thesis, and to whom I owe all my confidence and persistence.
ii
Acknowledgments
This thesis is disguised as an individual accomplishment, though it is truly a testament to collaboration,
mentorship, and friendship. Each word, thought, technique, insight, and argument is the product of my
own labor interacting with the input and encouragement of others.
I owe the greatest thanks to my advisor, Morteza Dehghani, who believed in me and gave me the
opportunity to grow and produce scientific research. I thank my close friends Aida Mostafazadeh Da-
vani and Mohammad Atari, for their many contributions to my research, my growth, and my life. And I
acknowledge my wife, Delaram, who has kept me sane, focused, and happy through the years.
I thank my collaborators, including Xisen Jin, Xiang Ren, Maury Courtland, Jason Zevin, and Jesse
Graham. I am particularly grateful to Professor Ren for his help in several projects, his teaching, and his
support as a committee member for my dissertation; and to Professor Zevin, for advising me in research,
sharing his humor with me, and for also serving on my committee. I also thank my other committee
members over the course of my PhD, Professors Aiichiro Nakano, Cyrus Shahabi, and Jonathan May, for
their helpful critiques and feedback. I am grateful to my lab mates over the years — Justin Garten, Reihane
Boghrati, Joe Hoover, Leigh Yeh, Ali Omrani, Melissa Reyes, Drew Kogon — for our collaborations, conver-
sations, and time spent together. And finally, I would like to acknowledge my undergraduate professors
and mentors who motivated and inspired me to pursue a postgraduate degree: Professor Paul de Palma
and Father Tim Clancy.
iii
TableofContents
Dedication ii
Acknowledgments iii
ListofTables vi
ListofFigures viii
Abstract x
Chapter1: Introduction 1
Chapter2: MoralConcernsareDifferentiallyObservableinLanguage 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Moral Foundations Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Psychological Insight from Facebook Language . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Overview of the Present Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Analysis 1: Predicting Moral Concerns from Language . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Text Representation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Analysis 2: Signatures of Moral Concerns in Language . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter3: ContextualizingHateSpeechClassifierswithPost-hocExplanation 34
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Analyzing Group Identifier Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Model Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
iv
3.4.3 Bias in Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Contextualizing Hate Speech Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Regularization Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Conclusion & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter4: Exemplar-basedExplanationsofSpeaker-LanguageRelationships 46
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Speaker-Language Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 Multi-Instance Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.3 Instance-based Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Speaker-Attribute Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 YourMorals Facebook Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Political Speeches Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Preprocessing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.2 Application of MIL Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.3 Non-MIL Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Prediction Performance and Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.1 Prediction Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5.2 Validating Attention Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Producing Exemplars using MIL Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.1 Exemplar Clouds of Moral Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.2 Exemplars from Rep-the-Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6.3 Attention-Clustering for Political Party . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Chapter5: Conclusions 65
Bibliography 68
Appendices 80
A.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
A.1.1 Analysis 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
A.1.2 Analysis 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
B.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
B.2.1 Full List of Curated Group Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 91
B.2.2 Visualizations of Effect of Regularization . . . . . . . . . . . . . . . . . . . . . . . . 91
B.2.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
B.2.4 Cross-Domain Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.2.5 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
C.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C.3.1 Attention-Cluster Exemplars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
v
ListofTables
2.1 Summary of methods used to extract features for regularized text regression of individual-
level moral concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 For predicting each moral concern, percent variance explained across representation
methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Performance metrics for classifying hate speech in the GHC, with and without explanation
regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Performance metrics for classifying hate speech in the Stormfront dataset (de Gibert et al.,
2018), with and without explanation regularization . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Top 20 words by mean SOC weight before (BERT) and after (Reg.) regularization for GHC 42
4.1 Descriptive statistics of the Train (Tr), validation (Val), and Test (Te) partitions for the
Facebook-Moral Concerns (YourMorals) and Political Speeches (Speeches) datasets, after
preprocessing and instance segmentation. All bags are of equal size (25). . . . . . . . . . . 53
4.2 Test set metrics for predicting 5 moral concerns from Facebook posts and for predicting
political party from political speeches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 High and low attention instances for one randomly sampled bag in the political speeches
test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Exemplars generated for Purity morality via Rep-the-Set . . . . . . . . . . . . . . . . . . . 61
4.5 Exemplars generated for Loyalty morality via Rep-the-Set . . . . . . . . . . . . . . . . . . 61
4.6 Exemplars from the Speeches dataset, generated by clustering high-attention test instances
and extracting the centroid instance, per cluster . . . . . . . . . . . . . . . . . . . . . . . . 62
A.1 Average metrics per foundation, across LDA topics reported in main text. . . . . . . . . . . 81
A.2 LDA metrics for topics, as indicated by their top 10 words, reported on in the main text. . . 82
vi
A.3 Two-way ANOVAs (for each foundation) of text representation and aggregation with
interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.4 Mean R
2
(SE) without excluding participants due to number of posts (N =3,643). . . . . . 88
A.5 Mean R
2
(SE) excluding participants with fewer than 25 posts (N =3,643). . . . . . . . . . 89
B.1 25 group identifiers selected from top weighted words in the TF-IDF BOW linear classifier
on the GHC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
B.2 10 group identifiers selected for the Stormfront dataset. . . . . . . . . . . . . . . . . . . . . 90
B.3 Cross domain F1 on Gab, Stormfront (Stf.) datasets . . . . . . . . . . . . . . . . . . . . . . 92
B.4 Per epoch training time of different methods on the GHC . . . . . . . . . . . . . . . . . . . 93
C.1 Five randomly sampled instances from the cluster containing the exemplar: “the terrible
reality for mostpoor children in americain 2015 is that thesesimple goals are asout of reach
as flying tothe moon. but because of thedata that we have asa result of nochild left behind.
africanamerican students improved by 15 points and latino students improved by 21 points.” . 94
C.2 Ten additional cluster centroid exemplars, generated from a clustering of high-attention
instances in the test set, for Republican ground-truth bags. . . . . . . . . . . . . . . . . . . 95
C.3 Ten additional cluster centroid exemplars, generated from a clustering of high-attention
instances in the test set, for Democrat ground-truth bags. . . . . . . . . . . . . . . . . . . . 96
vii
ListofFigures
2.1 Model coefficients corresponding to the effect of each individual-level moral concern on
each category in the updated Moral Foundations Dictionary . . . . . . . . . . . . . . . . . 25
2.2 Model coefficients of the effect of individual-level moral concerns on each category in the
Linguistic Inquiry and Word Count (LIWC) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Model coefficients of the effect of individual-level moral concerns on each individual-level
topic probability, generated via Latent Dirichlet Allocation (LDA) . . . . . . . . . . . . . . 28
3.1 Examples showing two different usages of social group identifiers . . . . . . . . . . . . . . 35
3.2 Visualization of impact of social group identifiers on performance of Bag-of-Words model
on hate speech classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Hierarchical explanations on a test instance from GHC
test
before and after explanation
regularization, where false positive predictions are corrected. . . . . . . . . . . . . . . . . . 41
4.1 Visualization of the Multi-Layer Perceptron with Attention, used for modeling and
capturing instance-level relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 An exemplar cloud showing the most extreme positive instance predictions for predicting
Care moral concerns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 An exemplar cloud showing the most extreme positive instance predictions for predicting
Loyalty moral concerns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
A.1 Results of tuning the number of topics in the LDA model. The metric on the y-axis is
the Kullback Leibler (KL) divergence between the singular value distributions of the
Term-Weight matrix (M1) and the normalized Document-Term matrix (M1) (Arun et al.,
2010). Here, 300 is selected as the number of topics that minimizes the desired metric. . . . 81
A.2 Wordclouds visualizing the most positively predicted topics per foundation. Distinct
colors within each subplot indicate distinct topics. Word font size is proportional to the
word-probability within topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.3 Estimated marginal means of explained variance, with respect to post aggregation approach. 85
viii
A.4 Estimated marginal means of R
2
s based on representation technique and foundation . . . . 87
A.5 Coefficients for negative binomial models of LIWC categories, which did not meet the
threshold for significance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
B.1 Hierarchical explanations on a test instance from the NYT dataset where false positive
predictions are corrected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
B.2 Hierarchical explanations on a test instance from the Gab dataset where both models make
correct positive predictions. However, the explanations reveal that only the regularized
model is making correct predictions for correct reasons. . . . . . . . . . . . . . . . . . . . . 91
ix
Abstract
Language has often been considered to be a window into the mind, yet peering through that window
remains an ongoing challenge. Quantifying language for the purpose of psychological insight has gener-
ally been done by using either lexicon-based word counting or latent topic modeling. At the same time,
developments within Natural Language Processing (NLP) and Machine Learning (ML) have shown these
methods to be limited in terms of predictive validity, particularly when compared to methods for repre-
senting text based on neural network architectures. In addition, recent advances in “explainability” have
emphasized not only the predictive advantages of modern approaches, but also the ability to explain and
understand complex models of language.
This thesis investigates how emerging methods for representing text and explaining ML models can
be used for understanding the relationship between speaker attributes and language usage. First, new
evidence concerning the relationship between moral concerns and social media language is presented,
showing that, using text embedding methods, certain moral concerns are more predictable from language
than others. Next, the utility of novel explainable approaches in NLP is demonstrated on a related task,
the reduction of bias in hate speech classifiers. And lastly, a new approach for explaining the relationship
between speaker attributes and language is presented which relies on “exemplars,” key instances that are
particularly relevant to a given speaker attribute.
At a high-level, this thesis helps to formalize the prediction and explanation of speaker-language re-
lationships, and motivates further work which will extend the paradigm of exemplar-based explanation.
x
Predicting speaker attributes from language usage should not only be performed using established tech-
niques such as word counting or topic modeling, but also should be approached as a formal NLP task, with
the goal of building superior predictive models and explaining those models. Using exemplars to under-
stand psychological and demographic attributes of speakers is a new direction for both explainable NLP
and psychological science, and represents a new way for both fields to develop, apply, and evaluate mod-
els of speakers’ language. Altogether, this thesis can serve as a foundation for continued interdisciplinary
research into predicting and explaining speaker-language relationships.
xi
Chapter1
Introduction
Researchers in the past decade have established correlations between language usage and speaker at-
tributes, such as age, gender, political ideology, and dimensions of personality. These correlations can
be seen in a practical sense as pathways for gaining insight into different online populations — e.g., by
inferring the age and gender of individuals based on their language — but the more general impact of
this category of research is the production of scientific knowledge about how people use language differ-
ently depending on underlying factors. Given that the purpose of language is communication, what these
speaker-language findings tell us is that static psychological and demographic attributes of speakers often
intersect with their communicative activities and interests.
In this thesis, I improve upon earlier works’ approach to the problem of extracting knowledge from
models of speaker-language effects. Specifically, I argue that prior studies of speaker-language effects have
insufficiently utilized the tools of prediction and explanation. Both in terms of increasing the validity of
inferences regarding these effects, and of gaining greater insight into the connection between speaker
and language, I propose a novel framework aimed toward achieving a balance between the practices of
prediction and formal explanation.
The proposed framework is based on three broader ideas, which map onto the three chapters of the
thesis. First, a model with superior predictive performance (e.g., a model based on distributed semantics)
1
will have superior representation of language, and thus a better, more accurate representation of the under-
lying connection between speaker attributes and language. Next, predictive models are bounded in terms
of scientific usefulness by their ability to be explained; furthermore, emerging methods from Natural Lan-
guage Processing (NLP) and Machine Learning (ML) have enabled a larger class of models to be explained,
including the complex, black box models that are used to represent language. Lastly, instance-level expla-
nations, rather thanfeature-based, offer the clearest route to achieving both prediction and explanation for
speaker-language studies. By determining which particular instances are indicative of a given population,
a new level of insight into the speaker-language relationship can be studied.
In Chapter 2, I perform the novel task of predicting moral concerns from social media status updates. In
performing this task, the relatedness of language and moral concerns at the individual level is established
for the first time. In contrast to prior research, Chapter 1 focuses on the relative predictive signal provided
by a host of methods, including prediction using distributed representations from Transformer-based mod-
els. However, while this work finds Transformer-based methods to be the best-performing methods for
moral concern prediction, a major limitation of these methods is their inherent lack of interpretability.
In Chapter 3, I demonstrate the utility of dedicated explanation algorithms in the context of a related
problem, the measurement and removal of bias in hate speech classification models. Pretrained transformer
models were fine-tuned to predict the hate speech label of texts, and subsequently high-importance words
and phrases were extracted using post hoc explanation. Even while Transformer-based methods were
used, aposthocexplanation approach allowed us to inspect the model for biases, and subsequently reduce
the impact of biased terms in the dataset using explanation regularization.
Finally, Chapter 4 develops a novel explanation method for understanding the speaker-language rela-
tionship. This framework is based on an observation regarding the structure of speaker-language data, and
2
on an emerging consensus regarding the optimal path toward interpretability. The structure of speaker-
attribute datasets is not that of traditional NLP classification or regression tasks. Instead of words, sen-
tences, or documents being the unit of analysis, speaker-attribute datasets consist of sets of documents,
with each set being paired to a single variable. In my method, I propose to treat this modeling task as a
multi-instance learning (MIL) problem, which explicitly requires representing each particular instance (i.e.,
document) in addition to making speaker-level predictions. Furthermore, explainability research in NLP
and ML has suggested the idea that examples (or “exemplars”) of a given concept convey a better, clearer
explanation than feature-based alternatives (i.e., sets of words or phrases that are correlated with a given
speaker attribute). Together, I use methods from the MIL literature to explicitly model the contribution of
particular instances to predicting speaker attributes, using model outputs to generate “exemplars,” which
are particularly important instances.
In summary, this thesis provides a new methodological framework for explaining associations between
language usage and individual-level variables. Its core contributions are a case for improving the validity
of inferences from text data using prediction, the case for applying explainability methods and practices
to the study of speaker attributes and language, and the idea that exemplar-based explanations can offer a
new, potentially more useful and informative approach to understanding speaker-language relationships.
3
Chapter2
MoralConcernsareDifferentiallyObservableinLanguage
2.1 Introduction
Language is a fundamental medium for much of the human sciences; it is the “stuff of thought” (Pinker,
2007), the very material with which we communicate, express, teach, remember, and govern. Accordingly,
it has been shown that the words people use convey rich information about their personality traits (G.
Park et al., 2015; H. A. Schwartz et al., 2013), demographic characteristics (Pennebaker et al., 2003), and
mental health (Rodriguez et al., 2010), among other psychological factors (Tausczik & Pennebaker, 2010).
Language is also a window into our moral worlds: through language, humans create and share religious
ideas (i.e., in religious texts), conduct ideological debate (Clifford & Jerit, 2013), and express their personal
values (Boyd et al., 2015). In fact, language, in facilitating communication and expression, is the backdrop
of many measures of moral phenomena: psychologists compile moral linguistic stimuli (Clifford et al., 2015;
Greene et al., 2001; Pennycook et al., 2014), design questionnaires on personal values and moral concerns
(Graham et al., 2011; S. H. Schwartz et al., 2001), and conduct interviews on moral topics (Gilligan, 1977;
Hallen, 2000; Kohlberg, 1981).
Motivated in part by the desire to investigate moral cognition in more ecologically valid ways (Hof-
mann et al., 2014), over the past decade a growing body of research has used recorded language to chart the
moral domain in “the wild” (Dehghani et al., 2016). In considering the interaction between political content
4
and moral rhetoric online, Grover et al. (2019) investigated immigration policy debates in the United States
on Twitter, finding that pro-immigration and anti-immigration tweets contained differing types of moral
content. Similarly, Mooijman et al. (2018) found, in a large-scale analysis of Twitter posts, that voicing
moral concerns preceded the escalation of violence at a protest.
The sharing of moral language has also drawn attention, with Brady et al. (2017) finding that moral
messages on Twitter were shared at a greater rate than others (cf. Burton et al., 2019). Similarly, recent
work has examined the role of moral framing in political persuasion (Day et al., 2014; Feinberg & Willer,
2015; Voelkel & Feinberg, 2018). For example, Feinberg and Willer found that political arguments framed
according to the audience’s moral concerns were more successful. In all, these recent works on morality
and language have highlighted moral rhetoric in political contexts as well as moral content posted on and
shared via social media.
Despite the obvious ties between morality and language, as well as the recent interest in studying
moral rhetoric and framing, moral language has yet to be investigated directly in relation to individual-
level moral concerns. In previous work, moral language is used as a proxy to individuals’ moral concerns;
in other words, it is implicitly assumed that there is a one-to-one correspondence between composing
moral content and being morally predisposed. Our aim in this work is to empirically test whether this
association exists. Using a wide range of methods, we explore how moral concerns of Care, Fairness,
Loyalty, Authority, and Purity as conceptualized by Moral Foundations Theory (MFT; Graham et al., 2013;
Haidt & Joseph, 2004), and measured by the Moral Foundations Questionnaire (MFQ; Graham et al., 2008),
relate to the usage of moral language. Establishing a link between moral concerns and moral language
would benefit the internal validity of ongoing observational research of moral language.
The interrelatedness of morality and language also points to another untested implicit assumption in
the literature. The previous work in the domain of moral rhetoric makes the implicit assumption that the
only association between moral concerns and language is in moral language. However, moral language is
5
a narrow “slice” of language as a whole, and, as we have discussed, there is a rich supply of psychological
information that can be found in peoples’ language. In addition to explicit moral rhetoric, language offers
the opportunity to “excavate people’s thoughts, feelings, motivations, and connections with others” (Pen-
nebaker, 2013, p. xi), generating insight into the psychological states of speakers in relation to their moral
concerns. Furthermore, recent theoretical work has argued that language facilitates multiple “moral func-
tions” from a social cognition standpoint (Li & Tomasello, 2021): humans initiate, preserve (i.e., maintain
through generations, justify against alternatives), revise, and act on morality uniquely through language.
As such, it is artificially limiting to suggest that morality occurs in language usage only when individu-
als use explicitly moral words; rather, it is likely that individuals take part in moral debates and topics
by engaging in a wide array of social language. This perspective reinforces the importance of examining
the relation between morality and language usage in general, versus drawing the line at explicitly moral
language.
Even though we make no informed hypotheses with respect to the potential links between moral
concerns and non-moral areas of language, generating such a set of observations presents an opportunity
for an “extensive examination and collection of relevant phenomena and the description of universal or
contingent invariances” (Rozin, 2001, p. 3) in the moral domain, which is a vital but underappreciated
component of theory development in moral psychology (see Muthukrishna & Henrich, 2019).
These untested assumptions — that moral rhetoric is associated with a congruent set of individual-level
moral concerns, and that the connection between moral concerns and language is exclusively through
moral rhetoric — motivate the present research, which measures the individual-level associations between
observed language on social media and moral concerns, as conceptualized via MFT.
6
2.1.1 MoralFoundationsTheory
The majority of the studies at the intersection of moral psychology and language (e.g., Araque et al., 2020;
Mokhberian et al., 2020; Mooijman et al., 2018; Rezapour et al., 2019) rely on MFT as a guiding framework.
MFT provides a predictive and pluralistic view of moral concerns that can facilitate an exploration into how
such concerns are manifested in language. MFT was developed in order to fill the need of a systematic
theory of morality, explaining its evolutionary origins, developmental aspects, and cultural variations.
MFT can be viewed as an attempt to specify the psychological mechanisms which allow for intuitive bases
of moral judgments as well as moral reasoning. Care, Fairness, Loyalty, Authority, and Purity, according
to MFT, are five “foundations” that are conceptualized to have contributed to solving adaptive problems
over humans’ evolutionary past, and are ubiquitous in current human populations (Graham et al., 2013).
Each of the five foundations in MFT is conceptualized as having solved different adaptive problems in
humans’ evolutionary past. The Care foundation accounts for our nurturing of the young and caring for
the infirm. The Fairness foundation accounts for the development of human cooperation (Purzycki et al.,
2018), justice, and reciprocity. Care and Fairness together are considered ‘Individualizing’ foundations
given their emphasis on the well-being and success of individuals. In contrast, the ‘Binding’ foundations
— Loyalty, Authority, and Purity — account for the evolutionarily developed human pursuits of social
hierarchy, order, and inherent sanctity or holiness (Graham et al., 2011).
The body of work devoted to studying language within MFT has been motivated by a theoretically-
informed text-analytic tool, the Moral Foundations Dictionary (MFD; Graham et al., 2009) which consists of
295 words (and word stems) related to each of the five moral foundations. Words like “peace”, “compassion”,
and “security”, when observed in language, are taken to indicate the speaker’s endorsement or attention
to a particular moral concern (in this case, the Care concern). The taxonomy of MFT — five foundations,
and “vice” and “virtue” dimensions of each (e.g., “justice” and “fair” for Fairness virtue, “injustice” and
“segregate” for Fairness vice) — has additionally been used to inform annotation of moral sentiment in
7
text (Hoover et al., 2020). MFD has been used in analyzing moral rhetoric and policy debates in social
media (e.g., Garten et al., 2018; Mokhberian et al., 2020) as well as in the language of political elites (e.g.,
S. Y. N. Wang & Inbar, 2020).
Two key limitations of MFD-inspired research, which closely align with the two untested assump-
tions of previous analyses of moral language, further motivate and inform the present work. First, the
MFD, and furthermore the categories of moral sentiment proposed by Hoover et al. (2020), have unknown
psychometric qualities with respect to moral concerns at the individual-level of analysis. As alluded to
above in the discussion of moral language studies, it is unknown whether individual-level differences in
usage of MFD words are associated with individual-level differences in moral concerns. For example, it
is unknown whether using Purity words is associated with an underlying concern with Purity. Second,
previous research at the intersection of morality and language is guided almost exclusively by the MFD,
and by the notion of moral language. Exploratory analysis using other categories of language can open up
the possibility for new insights from language to be generated for understanding moral concerns, such as
affective or social categories of words (Tausczik & Pennebaker, 2010).
2.1.2 PsychologicalInsightfromFacebookLanguage
Recently, the potential of using digital records of human behavior from online social media has been real-
ized in many psychological research domains. Examples include the discovery that personality dimensions
have been found in Facebook status updates (Garcia & Sikström, 2014; G. Park et al., 2015; H. A. Schwartz
et al., 2013), county-level heart disease mortality in the U.S. has been shown to be predicted by Twitter
language (Eichstaedt, Schwartz, Giorgi, et al., 2018; Eichstaedt et al., 2015), and political elites’ usage of
moral rhetoric on Twitter was related to their relative levels of power, specifically that U.S. Democrats
used more moral language following the election of Republican Donald Trump to the presidency in 2016
(S. Y. N. Wang & Inbar, 2020).
8
In the present work, Facebook status updates were selected for analysis as a rich, untapped source of
insight into the moral domain. Facebook, the most widely-used online social medium in the United States
(Facebook, 2019), has been argued to have incredible potential for observational research on individuals
in natural contexts (Eichstaedt, Smith, et al., 2018). Though Facebook language is not “everyday” in the
sense that transcribed language recorded by instruments such as the Electronically Activated Recorder
(EAR; Mehl et al., 2001) are, it gives an unobtrusive record of individuals’ communication with others and
their broadcasting of experiences, emotions, and opinions (Garcia & Sikström, 2014). Eichstaedt, Smith,
et al. (2018) give guidelines — both technological and ethical — for the appropriate conduct of Facebook-
based research in the social sciences, and we adopt their approach in order to gather naturally-occurring
language data from Facebook.
2.1.3 OverviewofthePresentWork
The present work provides the first direct attempt to link individual-level moral concerns and unobtru-
sively observed language, conducting an investigation of specifically moral language as well as an ex-
ploration of language in general. To provide this link, a large sample (n = 2,691) of online participants’
responses to the MFQ (Graham et al., 2008), in addition to their volunteered Facebook status updates (n =
107,798), was collected.
Analysis 1 tests whether individual-level moral concerns can be predicted from language. This Analysis
is a proof-of-concept in two dimensions: first, using dictionary-based techniques, the previously tacitly
assumed link between moral language (i.e., measured via MFD) and individual-level moral concerns is
tested; second, general measures of language, facilitated by leading NLP techniques, are similarly used to
explore the association between language and moral concerns. Analysis 2 is a follow-up exploration of
the particular “signatures”, or “linguistic traces”, of each moral foundation in language. Previous work
has used topic models (G. Park et al., 2015) and dictionaries (Boyd et al., 2015) to identify the correlates of
9
dimensions of personality and personal values, respectively. The present work uses similar tools to build
models of language measures as a function of individual-level moral concerns. In what follows, we first
present a full description of the data and then Analyses 1 and 2.
2.2 Data
We recruited a sample of participants who voluntarily completed self-report measures on yourmorals.org
and consented to having their Facebook posts accessed for research purposes. This research was reviewed
and approved by the University of Southern California’s Institutional Review Board (UP-07-00393-AM019).
Status updates were collected, with the approval of Facebook, via the Facebook API in a single bulk
retrieval of individuals’ posts at the time they completed the survey. Initially, 4,414 participants completed
the survey, volunteered their Facebook information, and had at least one Facebook post. These participants
were filtered due to being outside the age range of 18–65 (592 participants), a similar filter as was applied
by G. Park et al. (2015). After cleaning these participants’ Facebook posts of hyperlinks, picture links,
and “mentions” using regular expressions and tokenizing text using the Natural Language Toolkit (nltk
version 3.4.4; Loper & Bird, 2002) in Python (3.6.7), 53,901 of the 165,787 posts were removed that were
either too short (less than five tokens) or could not be recognized as English
1
, using the langdetect
Python library (version 1.0.7). Lastly, participants with less than 10 Facebook posts (1,131 participants)
were removed from the study, similarly to G. Park et al. (2015). All analyses were repeated with 25 status
updates as the threshold and with no threshold at all; these results are included in Supplemental Materials.
This filtering process resulted in 2,691 participants. Participants’ last status update ranged in time from
pre-2016 (20 participants), to 2016 (n = 1294), to 2017 (n = 1377), reflecting the data collection period, which
spanned May, 2016 to March, 2017. Participants posted an average of 40 status updates (Mdn = 37, SD =
21.4), with an average of 1,158 words per participant (Mdn = 893, SD = 982.4). The participants’ collected
1
This is necessary given that the dictionaries and pre-processing tools that we use assume English text.
10
posts totaled 107,798, averaging 29.0 tokens in length (Mdn = 16.0, SD = 34.8). In the full sample of 2,691,
participants self-reported age (M =32.8,Mdn =30.0,SD =11.9) and sex (Male: 1,535, 57.0%; Female: 1,156,
43.0%). Though we recognize that gender identity is non-binary, participants were only given the oppor-
tunity to identify their binary sex (male or female), which is a limitation of the present work. Participants
reported political ideology/political party and religious identification, though these were characterized by
high numbers of missing values. For the 488 participants with valid religious identification, 390 identified
as Christian/Catholic (79.9%), 66 as Agnostic/Atheist (13.5%), 18 as Jewish (3.7%), and 14 as Other. For the
482 participants with valid political ideology/party, 174 identified as Liberal (37.1%), 113 as Conservative
(23.4%), 80 as Moderate/Independent (16.6%), 74 as Libertarian (15.4%), and 41 as Other.
All participants completed the 30-item MFQ, which consists of two 15-item sections, Relevance and
Judgments. The first section measures the relevance individuals ascribe to each of the foundations, by
asking: “When you decide whether something is right or wrong, to what extent are the following consid-
erations relevant to your thinking?” An example might be “Whether or not someone suffered emotionally”.
Items on the Relevance section are rated along a 6-point Likert-type scale ranging from 0 (Not at all rel-
evant) to 5 (Extremely relevant). The Judgments section consists of contextualized items that can gauge
actual moral judgments related to the five moral foundations. For example, participants judged the accept-
ability of “Chastity is an important and valuable virtue”. Items on the Judgments section are rated along a
6-point Likert-type scale ranging from 0 (Stronglydisagree) to 5 (Stronglyagree). In total, 6 items (3 each for
Relevance and Judgment) were collected per foundation, where the above examples correspond to Care and
Purity, respectively. The internal consistency coefficients in the present sample were .70,.64,.72,.78,.87
for Care, Fairness, Loyalty, Authority, and Purity.
11
2.3 Analysis1: PredictingMoralConcernsfromLanguage
Despite the growing use of moral language (i.e., moral rhetoric or moral sentiment) in studies of social
media posts and political persuasion, it is unknown whether moral language and individual-level moral
concerns have any meaningful association. Therefore, in this Analysis I test whether individuals’ moral
concerns can be predicted from measures of their usage of explicit moral language. In addition, the po-
tential relationship between individual-level moral concerns and language extends beyond purely moral
language, motivating a general investigation of how language can be used to predict moral concerns. For
both moral language and general language, predicting moral concerns from the respective language-based
measures is used as an indicator of relatedness, which presents stronger evidence than merely showing a
correlation (Yarkoni & Westfall, 2017).
To measure moral language, we rely on dictionaries of moral words, which provide a priori measures
of explicit, word-level indicators of target constructs (Pennebaker et al., 2001) that have been widely used
in previous works to measure moral language (e.g., Araque et al., 2020; Dehghani et al., 2016). There are
known limitations of this approach, however. The MFD was originally used to find that U.S. religious
sermons delivered by liberals and conservatives differed in the amount of moral language used (Graham
et al., 2009), but this initial finding was only partially supported in a multi-study replication effort (Frimer,
2020), casting doubt on the validity of the MFD given its instability across studies. Additionally, the cover-
age of the MFD has been questioned, as it contains relatively few unique words and stems (approximately
32 per category). As such, MFD measures can be sparse and are prone to missing moral words outside
its a priori lexicon. To account for this limitation, the MFD2 (204 words per category) was developed
(Frimer et al., 2019), and is used in the present analysis in conjunction with the original MFD. However,
we note that measuring morality in language is a fundamentally challenging task, made so by its complex
linguistic nature — for example, individuals can say moral things without using explicitly moral words.
12
Previous research has yet to directly measure the external validity of MFD-based measures with respect
to individual-level moral concerns, which is addressed by the present work.
Beyond the strictly moral domain, our analysis explores other, more general ways that moral concerns
are observed in language. To measure language in general, we rely on a variety of techniques, including
the established lexicon-based approach of the Linguistic Inquiry and Word Count dictionaries (LIWC;
Pennebaker et al., 2001) as well as statistical methods from NLP. These more advanced techniques are based
on methodological progress in recent years that allows more accurate measurement of language. Each of
the 5 MFQ scores was regressed, in turn, on each of these language representations, with regularization
and cross-validation.
2.3.1 TextRepresentationMethods
2.3.1.1 Dictionaries
Three dictionaries were used in this analysis: the MFD, MFD2, and LIWC’s dictionary. Both the MFD
and MFD2 have ten categories: the “virtue” and “vice” words for each foundation (indicating opposite
polarities for each). MFD additionally has a “General Morality” category (e.g., “wrong”, “evil”, “good”).
The MFD was created by Graham et al. (2009) by first compiling lists of prototypical words, along with
synonyms, related terms, and antonyms, associated with each category and refining these lists by removing
unrelated words, while the expanded MFD2 was created by Frimer et al. (2019)
2
with the aid of word
embeddings (Mikolov et al., 2013). Both the MFD and the MFD2 are domain-specific dictionaries that
measure a narrow semantic range of words, and are used here to measure the predictiveness of moral
language with respect to individual-level moral concerns. As a more general, a priori measure of language,
the default dictionary in the Linguistic Inquiry and Word Count 2015 (LIWC; Pennebaker et al., 2015) was
used. The LIWC dictionary contains an exhaustive set of word categories ranging from function words
2
https://osf.io/ezn37/
13
(e.g., personal pronouns, prepositions), grammar (e.g., verbs, past tense), and psychological processes (e.g.,
cognitive processes, emotion words, social words). In all, 73 lower-level categories are in the hierarchical
taxonomy of LIWC (i.e., lower-level categories such as “positive emotions” are nested within “Affective
Processes”). Each category in MFD, MFD2, and LIWC was applied in the standard way (Pennebaker et al.,
2001), producing a rate of occurrence per category corresponding to the total number of occurrences of
a word in a category divided by the total number of words contained in a given participant’s posts (see
Table 2.1).
2.3.1.2 Bagofwordsmodeling
Topic modeling is an effective means of constructing data-driven text representations from word count
statistics, where representations are guided by word co-occurrence statistics rather than a priori categories.
Most prominently, Latent Dirichlet Allocation (LDA; Blei et al., 2003) models documents as mixtures of
topics, which are groups of related words that frequently co-occur in documents. For this analysis, LDA
was used for its ability to effectively extract relevant information from text, as opposed to its interpretive
value for exploring and visualizing trends in a corpus (see Analysis 2).
LDA topics were estimated via mallet
3
, specifically using the Python (v. 3.6) wrapper in the gensim
(v. 3.8) package. The mallet implementation of LDA implements collapsed Gibbs sampling (Griffiths &
Steyvers, 2004) and automates parameter optimization. The number of latent topics, which is a tunable
hyperparameter in LDA and other topic models, was separately tuned internally (as opposed to external
measures, such as prediction of individual-level moral concerns for held-out data) using a matrix factor-
ization metric (Arun et al., 2010) via the ldatuning (v 1.0.2) package in R (v 3.6.2). From a search grid
across 10–600, a final number of topics was determined to be 300 (see Supplemental Materials). After
re-estimating the topic model on the entire corpus of Facebook posts, a single vector of length 300 was
estimated per participant using the method of G. Park et al. (2015). For each participant, a probability score
3
http://mallet.cs.umass.edu
14
was computed for each topic, estimated given the fit topic model (which gives probabilities of topics given
words):
p(topic, participant)=
X
w
p(topic|w)· p(w|participant) (2.1)
over participants’ normalized word-count proportionsw.
2.3.1.3 Wordandtextembedding
Neural network-based methods are currently the leading paradigm in NLP, achieving break-through suc-
cess in predictive modeling in recent years (Devlin et al., 2019; Pennington et al., 2014; Vaswani et al.,
2017). Though the interpretability and usefulness of these methods for inferential purposes (e.g., learning
what types of language are associated with moral concerns) is very much an open question, their capacity
for capturing the meaning of language data is unsurpassed in most NLP modeling tasks. In the present
analysis, two methods were employed, word embeddings and contextualized word embeddings. Word em-
beddings (Mikolov et al., 2013; Pennington et al., 2014) map words into a geometric space that preserves
detailed semantic and syntactic information. For each participants’ set of posts, GloVe word embedding
vectors (Pennington et al., 2014), which were trained on text from the Common Crawl
4
, were mapped to
the individual words in the posts and subsequently averaged element-wise (i.e., across the dimensions of
the embedding). The result was a single vector, per participant, which was the “average” word embedding
that occurred in their posts
5
.
In addition, we apply Distributed Dictionary Representations (DDR; Garten et al., 2018) to the dictio-
naries used above (MFD, MFD2, and LIWC). DDR operates by first computing the average (element-wise)
of each dictionary, which represent the semantic space occupied collectively by the dictionary category in
question. Then, to compute the “loading” of a dictionary on a particular piece of text, the text’s average
4
Data are available athttps://nlp.stanford.edu/projects/glove/
5
Alternatively, average embeddings can be generated per post and subsequently averaged to a single participant-level vector,
but this was found to lower predictive results (see Supplemental Materials)
15
word embedding is similarly computed, and the geometric similarity is then computed between these two
vectors.
We also use contextualized word embeddings, which extract word embeddings from pretrained lan-
guage models (e.g., Devlin et al., 2019; Peters et al., 2018), which are “contextual” because a given word can
have multiple embeddings based on context. Whereas word embeddings such as GloVe produce the same
embedding for a given word, no matter the context, contextualized embeddings compute the embedding
of each word while taking its context into account. The importance of this contextual flexibility is illus-
trated by considering words like “bank”, which can be in reference to a financial institution, the shore of a
river, or a verb. In practice, contextual word embeddings are generated by feeding sequences of words into
large neural network models, computing these dynamic, contextual word vectors for each word, and tak-
ing their average. The information contained in these contextualized embeddings is comparatively richer
than word embeddings like GloVe, given their ability to generate representations that are dependent on
the composed meaning of words in sequence, rather than each word independent of context.
The present analysis used a previously trained instance of the Bidirectional Encoder Representations
from Transformers (BERT; Devlin et al., 2019). BERT is a model that processes words in sequence through
multiple layers of encoding, at each layer capturing more sequentially-dependent information (Vaswani
et al., 2017). The BERT model has been trained, using large datasets, on text comprehension tasks, such
as predicting masked-out words and whether one sentence follows another. These models are available
for download, allowing researchers to generate contextualized word embeddings by passing segments of
text through the previously-learned layers of BERT. BERT vectors were produced for a given sequence of
tokens using thetransformers
6
library in the Python (v3.6) programming language, where the last four
layers (of a possible 12) were averaged to form a single vector of length 768. During experimentation, it was
found that the best predictability via BERT was achieved by computing embedding vectors for each post
6
https://huggingface.co/transformers/
16
and subsequently averaging them, rather than computing a single embedding across all of a participants’
words (see Supplemental Materials).
Table 2.1: Summary of methods used to extract features for regularized text regression. Word count and
Distributed Dictionary Representation (DDR) were used to apply the Moral Foundations Dictionary (MFD),
the updated MFD, and the 2015 LIWC. Other general methods for encoding text include Topic Modeling,
using Latent Dirichlet Allocation (LDA), Global Vectors for Word Representation (GloVe), and contextu-
alized word embeddings via Bidirectional Encoder Representations from Transformer (BERT) language
models.
Method Description
Word Count
Dictionary words in each category are counted and normalized (Pennebaker
et al., 2001), using MFD, MFD2, and LIWC
DDR
A hybrid between dictionaries and word embeddings using geometric
similarity measures (Garten et al., 2018), using MFD, MFD2, and LIWC
LDA
A statistical model learns word clusters (“topics”) by leveraging word
co-occurrence information in Facebook posts (Blei et al., 2003)
GloVe
The word embeddings of words in a participants’ posts are averaged
(Pennington et al., 2014)
BERT
Contextualized word embeddings are extracted from the word-level
representation of a full, pre-trained language model (Devlin et al., 2019) and
subsequently averaged across words
Each method used for generating text representations is summarized in Table 2.1, which first lists
dictionary-based methods (word count, DDR), which were applied to each of the three dictionaries, as
well as three general techniques (LDA, GloVe, BERT).
2.3.2 Regressionanalysis
Every representation produced from the above methods was uniformly used as input in an ElasticNet
regression for predicting each MFQ score. ElasticNet allows tunable regularization, combining both LASSO
and Ridge regression penalties (Zou & Hastie, 2005) to reduce model complexity and enable optimal feature
selection, which is desired for high-dimensional data. Models were implemented in Python (v 3.6) using the
scikit-learn library (v 0.22 Pedregosa et al., 2011), with the ratio between L1 and L2 penalties and the
overall regularization term determined through maximizing R
2
with cross validated grid search. To obtain
17
estimates of variance for test-set R
2
s, 5-fold cross-validation was repeated 10 times. ElasticNet parameters
were selected within each fold’s training data and were not tuned to the test data for that fold.
2.3.3 Results
Full statistics on model performance across 10 rounds of 5-fold cross validation for each text representation
method for each foundation are provided in Table 2.2. Bold-face values indicate the highest R
2
achieved
within +/- one SE of each other, per foundation and method categorization.
Table 2.2: Percent variance explained (R
2
) across representation methods per foundation.
Representation Care Fairness Loyalty Authority Purity
Measures of Moral Language
MFD 1.0 (0.1) 0.3 (0.1) 1.3 (0.2) 1.6 (0.2) 2.2 (0.2)
MFD2 2.7(0.2) 0.6(0.1) 2.8(0.2) 4.8(0.2) 10.0(0.3)
MFD
DDR
4.8(0.3) 0.9(0.1) 4.9 (0.2) 7.2 (0.3) 8.2 (0.3)
MFD2
DDR
3.0 (0.2) -0.3 (0.1) 6.1(0.3) 8.0(0.3) 10.1(0.4)
General Measures of Language
LIWC 4.0 (0.2) 0.7 (0.1) 8.3 (0.3) 13.7 (0.4) 16.8 (0.4)
LIWC
DDR
5.7 (0.3) 2.4 (0.2) 8.0 (0.5) 13.7 (0.5) 16.5 (0.4)
LDA 5.9 (0.3) 1.2 (0.2) 8.8 (0.3) 15.6 (0.3) 17.2 (0.3)
GloVe 7.2 (0.3) 2.9(0.2) 10.9 (0.4) 17.7 (0.4) 20.0(0.5)
BERT 8.8(0.3) 3.2(0.2) 11.7(0.4) 18.8(0.3) 20.9(0.4)
Note. Mean and standard errors across 10 iterations of 5-fold cross validation (n = 50). Highest R
2
s appear in
bold.
For each foundation, a repeated measures Analysis of Variance (ANOVA) was performed, with repre-
sentation method as the grouping variable. Nine possible groups were compared, consisting of the nine
representation methods used. R
2
values from the 10 repeated 5-fold cross-validation runs were modeled
as the dependent variable. Each set of values satisfied the Shapiro-Wilk test of normality and Levene’s test
for homogeneity of variance between groups. There was a significant overall effect of representation for
predicting Care (F(8, 441) = 89.0,p < 0.001), Fairness (F(8, 441) = 54.4,p < 0.001), Loyalty (F(8, 441) = 117.3,
p < 0.001), Authority (F(8, 441) = 327.5, p < 0.001), and Purity concerns (F(8, 441) = 299.9, p < 0.001).
18
It is clear from Table 2.2 that MFD-based measures are in general less effective at predicting individual-
level moral concerns than are LIWC, LDA, GloVe, and BERT methods. In other words, the frequency
with which participants used explicitly moral words, such as “sacred”, “loyal”, or “compassion”, was only
slightly related to moral concerns when compared to measures of general language, such as a participant’s
average word embedding. Multiple operationalizations of moral dictionaries were tested in this analysis,
which showed that the most effective MFD-based measure also varied by foundation: for predicting Care
concerns, MFD
DDR
was significantly more effective than both word counting methods and MFD2
DDR
, but
MFD
DDR
and MFD2
DDR
were not significantly different in predicting other concerns. This finding can
be investigated in future work, as it may indicate the MFD’s differences, among foundations, in coverage
of relevant words.
Tukey-adjusted post-hoc pairwise comparisons were performed to assess the effects of more general
text representation methods. For Care, LIWC
DDR
and LDA representations were not significantly different
from each other while both explained more variance than LIWC (ps < 0.001). For Loyalty and Purity, all
three methods were not significantly different from each other, whereas LDA representations explained
more variance than both LIWC and LIWC
DDR
for Authority and LIWC
DDR
explained more variance than
LIWC for Fairness.
For all concerns, GloVe and BERT vectors explained more variance than all other methods (ps < 0.05)
with the exception that LIWC
DDR
and GloVe were not significantly different for Fairness concerns ( p =
0.47). GloVe and BERT yielded similar levels of information across foundations, where only for Care did
BERT explain significantly more variance than GloVe ( p = 0.003). Overall, the higher levels of explained
variance for GloVe and BERT embeddings shows that there are differences in language with respect to
moral concerns that are not captured by individuals’ lexicons (i.e., LIWC and LDA). Intuitively, individuals
with different moral concerns occupy different “semantic spaces”, which are captured by embeddings at a
19
more fine-grained level than LIWC or LDA, while the greatest “distances” in this space were observed for
Purity and Authority.
To quantify the differences between foundations in overall predictability, an additional two-way re-
peated measures ANOVA assessing the influence of representation and foundation on explained variance
was performed. Specifically, the influences of representation (9 groups) and foundation (5 groups) on av-
erage R2 was measured, as well as their interaction. There was a main effect of representation ( F(8, 2205)
= 812.2, p < 0.001) and foundation (F(4, 2205) = 2475.6, p < 0.001), with a significant interaction between
the two (F(32, 2205) = 67.4,p < 0.001). Tukey-adjusted post-hoc pairwise contrasts showed that differences
among foundations were uniformly significant at p < 0.001; in particular, R
2
values for Purity were higher
than all other foundations, whereas R
2
values were lowest for Fairness. Visualizations of the full set of
comparisons for this two-way ANOVA are shown in the Supplemental Materials.
2.3.4 Discussion
There is indeed a relationship between social media language and moral concerns as measured by the
MFQ, which varies significantly among foundations. Though the magnitude of this relationship changes
in terms of how text data is quantified, it is apparent from our analysis that moral concerns coexist with
different patterns of expression and communication on social media. Measures of moral language, specif-
ically via the MFD and its variants, predicted individual-level concerns to marginal degrees, especially
when compared to general-purpose measures. In particular, only about 1% of Fairness concerns’ variance
could be explained by MFD-based measures.
The range of methodologies used, specifically their difference in explained variance, shed light on what
drives the link between moral concerns and language. LIWC and LDA, which yield similar predictiveness
to each other, are able to pick up on individuals’ lexical differences — the categories of words and topics
20
they mention, as well as whether they spend more or less time speaking with a certain style (e.g., using per-
sonal pronouns). In contrast, MFD-based representations describe individuals’ usage of specifically moral
words. Given that LIWC and LDA representations described significantly more variance in individuals’
moral concerns, the lexical differences tied to moral concerns extend far beyond the strictly moral. This is
explored further in Analysis 2.
Further, we found that embedding methods consistently explained more variance than LIWC and LDA.
Since embedding methods are superior predictive approaches and are known to contain essential linguistic
information, they likely provide the most accurate indicator of the relationship between moral concerns
and language. One possible explanation for the higher predictiveness of embeddings is that individuals’
linguistic differences, in relation to their moral concerns, are captured by GloVe and BERT at a more fine-
grained level. GloVe embeddings can capture the fact that not only are “hurt”, “ugly”, and “nasty” in the
same word category (“negative emotion” in LIWC; Tausczik & Pennebaker, 2010, Appendix), but also that
within this category, “nasty” and “ugly” are more similar in meaning than “nasty” is to “hurt” or than “ugly”
is to “hurt”. In addition, contextualized BERT embeddings are further able to measure linguistic differences
by accounting for words in the context of their surroundings.
Though representations predict moral concerns to varying degrees, the differences between founda-
tions are robust with respect to representation technique. Individuals’ Purity concerns are strongly tied to
language, in such a way that even relatively naive quantification techniques can achieve high explanation
coefficients, while sophisticated techniques like BERT are able to achieve even higher explanation coeffi-
cients. Fairness concerns had the weakest association with language, regardless of the method employed.
21
2.4 Analysis2: SignaturesofMoralConcernsinLanguage
In Analysis 2, we explore the signatures of each moral concern in language using the interpretable NLP
techniques from Analysis 1. Specifically, we use dictionaries and topic models, rather than neural network-
based embedding approaches that might be superior predictors but are more opaque. Dictionaries yield
interpretable measures from text that are based on face-valid categorizations of words, with transparent
construction and wide usage contextualizing downstream inferences. Topic models like LDA are particu-
larly appropriate for visualizing and exploring text corpora (e.g., Eichstaedt et al., 2015), and are comple-
mentary to dictionaries in their ability to uncover word categories that are particular to a given dataset,
and not captured by predefined dictionary categories.
In this analysis, language-based measures were modeled as functionally dependent on individual-level
moral concerns. In other words, we test whether variance in each dimension of language usage can be
explained collectively by individuals’ moral concerns. Unique “signatures” of each moral concern are more
easily identifiable in such models, given the default post-hoc interpretation of regression analysis — i.e., the
effect of moral concerns on language can be examined, per concern, while keeping other individual-level
variables constant. This is in contrast to previous work analyzing the relationship between social media
language and personality traits, which largely regress participants’ traits on text representations (e.g., G.
Park et al., 2015; H. A. Schwartz et al., 2013) resulting in models that do not allow direct interpretations of
the particular effect of each moral concern on language-based outcomes.
2.4.1 Method
Three sets of linguistic features — MFD2, LIWC, and LDA topic model features — were modeled as de-
pendent variables in separate regressions. The MFD2 and LIWC dictionaries were applied in the same
way as in Analysis 1, but without normalizing raw word counts (as models used were offset regressions of
counts). Word-counts were chosen given that counts provide intuitive effects (i.e., rate of usage versus a
22
single similarity score). The LDA text representations computed in (2.1) using the same model as Analysis
1 were used as topic probabilities for each participant.
Each word count outcome (for MFD2 and LIWC) was modeled as a dependent variable in a separate
negative binomial regression with offsets for the logarithm of total word count, accounting for the fact
that participants had varying numbers of words in their posts. Thus, coefficients of these offset negative
binomial models correspond to the change in the rate of the particular category being modeled. LDA
topic outcomes were modeled using linear regressions. For each outcome, two models were fit: one with
foundation-level MFQ scores (standardized) as independent variables, and one that included MFQ scores as
well as controlled for the two demographic variables that were available in the full dataset, age and sex. All
models were applied to the same set of observations (N = 2,691), namely, individual-level measurements
of language, MFQ, and demographics. Negative binomial models were fit in R (Version 3.6; R Core Team,
2019) using theMASS package (Version 7.3-51.5; Ripley et al., 2013).
2.4.2 Results
Results are presented first for MFD2, which address questions as to the correspondence between moral
language and moral concerns. Specifically, we determine the categories of moral language that are pre-
dicted distinctly by each moral concern. Secondly, exploratory results are shown for the LIWC dictionary
categories and for LDA topics.
2.4.2.1 Morallanguage
Figure 2.1 showsβ values for negative binomial models of MFD2 outcomes with offsets for total number of
words by participants. p-values were corrected for multiple comparisons. Specifically, for models without
demographics, bonferroni corrections were made for 5 (MFQ) predictors, and for models with demograph-
ics, corrections were made for 7 predictors. Of primary interest is whether higher moral concerns predict
23
higher rates of moral language, in the corresponding moral dimension (e.g., higher Care concerns predict-
ing more Care language). This was found to be the case for Purity (β =0.350, SE = 0.026, p < 0.001), Care
(β = 0.166, SE = 0.015, p < 0.001), and Fairness (β = 0.106, SE = 0.031, p = 0.003), in the model without
participant age and sex. Loyalty (β = -0.018, SE = 0.020, p = 1.0) and Authority (β = -0.039, SE = 0.026,
p = 1.0) did not predict differences in the corresponding category of moral language. Dimensions of vice
(versusvirtue) within each category of moral language were, for the most part, not associated with higher
moral concerns. One exception was Cheating (Fairness) moral language, with a 1 SD increase predicting
significantly more Cheating language ( β = 0.113, SE = 0.036, p = 0.009).
Outside of the relationships between moral concerns and their corresponding dimension of moral
language, Purity and Fairness concerns both predicted differences in several categories of moral language.
Higher Purity concerns predicted a lower rate of Degradation language (β = -0.132, SE = 0.032, p < 0.001)
and a higher rate of Betrayal language (β = 0.220,SE = 0.088,p = 0.006). Higher Loyalty concerns predicted
a lower rate of Betrayal language (β = -0.167,SE = 0.088,p = 0.019), and higher Fairness concerns predicted
higher rates of Degradation (β = 0.082, SE = 0.035, p = 0.02) and Harm (β = 0.078, SE = 0.030, p = 0.009).
In terms of participant age and sex, women posted a significantly higher rate of Care language ( β =
-0.380, SE = 0.025, p < 0.001), while the effect of Care concerns on Care language was still significant after
controlling for age and sex (β = 0.104, SE = 0.015, p < 0.001). Men used significantly more words in the
Cheating (β = 0.421, SE = 0.062, p < 0.001), Fairness (β = 0.257, SE = 0.053, p < 0.001), Betrayal (β = 0.973,
SE = 0.138, p < 0.001), and Authority (β = 0.308, SE = 0.033, p < 0.001) categories. Lastly, higher rates of
Authority language (β = 0.106,SE = 0.015,p < 0.001) and Loyalty language (β = 0.058,SE = 0.013,p < 0.001)
were predicted with 1 SD increases in participant age.
24
(a) Models with only MFQ predictors. (b) Controlling for participant age and sex.
Figure 2.1: Coefficients for models of each category (rows) in the updated Moral Foundations Dictionary
(MFD2), with offsets for participants’ total word count. Values indicate the expected rate increase in the
MFD2 outcome, given a 1 SD increase in the Moral Foundations Questionnaire (MFQ) predictor. Cells
without numbered coefficients were not significant ( p< 0.05, bonferroni-corrected for comparisons across
number of predictors).
2.4.2.2 LIWCcategories
Figure 2.2 shows the results from offset negative binomial models of LIWC word counts. Higher Purity,
which our first analysis showed to be the most traceable concern, predicts significantly less sexual language
(β =− 0.193, SE = 0.043, p < .001) and swearing (β =− 0.172, SE = 0.038, p < 0.001). These effects were
robust to inclusion of age and sex in the models (see Figure 2.2). Additionally, higher Care is associated
with an increase in social language (i.e., personal pronouns, friends, family, affiliation), positive emotion,
and health-oriented language, though many of these relationships do not persist in models adjusting for
age and sex. Fairness and Loyalty concerns did not predict any coherent set of word topics, when adjusting
for other concerns and participant age and sex.
25
(a) Models with only MFQ predictors. (b) Controlling for participant age and sex.
Figure 2.2: Coefficients for negative binomial models of each category (rows) in the Linguistic Inquiry and
Word Count (LIWC) 2015 release, with offsets for participants’ total word count. Values indicate the ex-
pected rate increase in the LIWC outcome given a 1 SD increase in the Moral Foundations Questionnaire
(MFQ) predictor. Cells without numbered coefficients were not significant (bonferroni-corrected for com-
parisons across number of predictors), and rows without any significant MFQ predictors were dropped.
2.4.2.3 Topiccategories
Here we report results for the LDA models that controlled for participant age and sex. For each MFQ
predictor, the topics which were predicted with the highest positive value (i.e., a positive association be-
tween MFQ predictor and topic outcome) are reported in Figure 2.3. Topics that were not significant ( p <
0.01) were not reported. Topics are displayed using the top 8 terms, and coefficients are interpreted as the
percent-increase in the probability of a given topic occurring in a participant’s posts given a 1 SD increase
in the predictor, adjusting for other predictors.
26
A number of patterns emerge in Figure 2.3 which were not shown apparent from LIWC or MFD2
analyses. Higher Fairness concerns predict higher attention to political topics, for example healthcare,
human rights, and gun control, whereas higher Loyalty concerns predict attention to team sports and
honoring military service members.
To inform the interpretation of the topics visualized in Figure 2.3, three metrics proposed by Mimno
et al. (2011) for evaluating topic models — coherence, document entropy, and exclusivity — are reported in
the Supplemental Materials. These three metrics, respectively, refer to the coherence of individual topics,
whether a topic is concentrated over few documents (low entropy), and how distinct an individual topic is
versus other topics. Of note, topics included under Authority were, on average, lower in document entropy,
highly coherent, and highly exclusive. Topics for Purity were higher in document entropy and lower in
exclusivity, and the lowest levels of coherence were observed for Loyalty.
27
Figure 2.3: Moral Foundations Questionnaire (MFQ) coefficients of linear models predicting each
participant-level topic probability, generated via Latent Dirichlet Allocation (LDA). Printed coefficients
were multiplied by 100, thus are at percent level. Each row is a model, and all models included age and
gender.
28
2.4.3 Discussion
In this analysis we determined the distinct signatures of each moral concern in language, using a com-
bination of pre-defined dictionaries and topic modeling. By controlling for demographics and modeling
language-based outcomes as dependent on moral concerns, we generated interpretable measures of the
extent to which each moral concern influenced the occurrence of a given category of language. We first
found that Care, Fairness, and Purity concerns were positively associated with the usage of words from
the corresponding MFD2 categories, and used LIWC and LDA-based measures to explore other facets of
language: Care concerns specifically predict charity- and health-related posts, Fairness concerns predict
political and justice-related language, and Purity concerns motivate prayer, biblical quotation, and grati-
tude (toward God). Though effects between moral concern and moral language were not found for Au-
thority and Loyalty, associations were found between Authority and language about family, socializing,
and daily life, and between Loyalty and language about military members and team sports.
2.5 GeneralDiscussion
This exploratory study generated new insight into the relationship between individual-level moral con-
cerns and language, both in terms of moral language, which has been extensively used by previous studies
to analyze morality in ecologically valid settings, and in other, more general facets of language use. In
Analysis 1, all moral concerns except Fairness were predicted from general language measures with size-
able effect sizes, with the highest for Purity. For all moral concerns, moral language measures were able
to explain less variance than general measures. In Analysis 2, associations between moral and non-moral
categories of language and moral concerns were found, including an association between religious lan-
guage and Purity, between social language and Authority, Care, and Purity, and between political topics
and Fairness.
29
While associating observed language with validated measures is not new in psychology, the present
work is the first attempt to do so in the moral domain. While moral language has been observed in nat-
uralistic contexts (e.g. Mokhberian et al., 2020; Mooijman et al., 2018), these measures are not anchored
at the individual level to psychometrically validated measures. To ensure the validity and consistency of
evidence generated for this new domain, we expanded on the methodologies of previous work (G. Park
et al., 2015; H. A. Schwartz et al., 2013) in two ways: (a) a diverse collection of NLP methods to contex-
tualize effect sizes, and (b) modeling language-based outcomes as dependent variables in order to isolate
relationships between individual-level moral concerns and language.
The findings of Analysis 1 indicate that each moral concern can be differentially predicted from lan-
guage. The highest explained variance was observed for Purity, followed by Authority. Indeed, Purity and
Authority concerns have been previously shown to be predictive of ideological disagreements and culture
wars (Graham et al., 2009; Koleva et al., 2012), and thus might be expected to manifest in markedly differ-
ent categories of social language. Care and Fairness, which have lower variance overall and do not have
the same predictiveness of political ideology as do the Binding foundations, are predictable but only to a
limited extent. Fairness in particular was found to be least traceable in language among the five moral
foundations.
In Analysis 2, the types of language associated with moral concerns were markedly different, even
where explained variance was similar. In all five cases, there was at least one category of language (either
dictionary category or LDA topic) which could be intuitively associated with preconceptions of the given
foundation: Care concerns predicted Care virtue language, specifically familial words and positive affective
language; Fairness predicted Fairness virtue and vice language, specifically posts talking about general
concerns for members of society, such as health care, protests, and climate change; Loyalty predicted
language about team sports; Authority predicted familial language and accounts of social experiences; and
Purity predicted a large array of religious and spiritual categories of language, combining biblical quotes
30
with publicly shared prayer. These findings are indicative of the fact that moral concerns do manifest in
public displays and endorsements which are particular to each moral concern, to varying degrees.
It is notable that Authority — and to a lesser extent, Loyalty — are positively associated with an array of
language usage patterns, but these are not captured by the MFD. This indicates that Loyalty and Authority
concerns do not manifest in language through direct endorsement or recognition of the corresponding
virtues (as is true for Care, Fairness, and Purity). Instead, they manifest through higher rates of references
to family and friends, a higher rate of personal pronouns, more “netspeak” (i.e., colloquial language), and
more sharing about team sports. Though it is more apparent in the MFD-based measures, the same is
also true for Care. Higher Care values predict significantly more “we” pronouns and posts about personal
health and affiliation, which are not explicitly modeled in the MFD operationalization of moral language.
Those high in Fairness, on the other hand, were preoccupied with political topics and violations of equality
or justice; among all concerns, it is the only one for which a positive effect is seen for “vice” language in
the corresponding moral category. These findings reflect on the many ways that individual-level moral
concerns manifest in language, and that the current view of moral language — in particular, measured via
the MFD — might not be accurately capturing these signals.
2.5.1 Limitations
A number of limitations condition the interpretation of our findings, while simultaneously identifying
directions for future work. Internet-based questionnaire responses, such as the MFQ responses gathered
in the present study, have great potential for expanding the scale and diversity of study samples (Gosling
et al., 2004), but it is known that they are not representative across cultures. Participants completing the
MFQ on yourmorals.org do so voluntarily, as do those who grant access to their Facebook accounts. This
self-selection bias poses potential confounds for our study, and encourages the collection of similar data
through other venues or by collecting more meta-data on participants.
31
Participants of the study were predominately from Western, Educated, Industrialized, Rich, and Demo-
cratic societies (WEIRD; Henrich et al. 2010, also see Atari et al. 2020 for a discussion on non-WEIRD moral
psychology).It is unknown whether the findings in this study would replicate in non-WEIRD populations,
which constitute the vast majority of the world’s population. Since MFT was developed, in part, to estab-
lish a descriptive account of moral concerns outside those observed in the secular West, the generalization
of the present findings outside WEIRD samples is important for understanding the general relationship
between moral foundations and language.
Additionally, the English-centricity of NLP has been the focus of much recent criticism and emphasis
(e.g., Bender & Friedman, 2018), and much of the resources for text analysis in psychology are exclusively
in English or are translated from English e.g., the Japanese MFD; Matsuo et al., 2019. These findings
cannot be generalized to non-English-speaking cultures. Though we are unable to extend the present
work towards other languages, due to the available sample population and the language of dictionaries we
use, we acknowledge that the present findings do not necessarily generalize to other languages. Bridging
the language gap can be one way in which the present work is replicated and expanded in other cultures,
as this would require both non-WEIRD research participants and non-WEIRD researchers (i.e., researchers
fluent in non-English languages and non-WEIRD morality Medin et al., 2017).
Lastly, our work builds upon the theoretical framework provided by MFT in compiling evidence about
individuals’ moral concerns. Of course, there are other theories of the fundamental structure of human
morality, with some arguing for these alternative theories over MFT. In particular, other theories address
the structure of humans’ underlying values (S. H. Schwartz, 1992), the dyadic structure of morality in terms
of “moral agents” and “moral patients” (Gray & Wegner, 2009), morality as cooperation (Curry et al., 2019),
relational contexts (Rai & Fiske, 2011), the motivational emphasis of morality (Janoff-Bulman & Carnes,
2013), and the normative emphasis of morality (actions vs. consequences; Cushman, 2013; Mikhail, 2007).
The present work is the first study in moral psychology to directly compare social media language and
32
measures of moral concerns from a validated survey; thus, our analyses were restricted to the available
data, which only extended to MFT. Future data collection and language analyses can be completed for
other theories, allowing comparisons and a richer, more general set of inferences about language and
moral concerns.
2.6 Conclusion
Among the five moral foundations (Care, Fairness, Loyalty, Authority, and Purity), Purity concerns are
most traceable in social media language. Fairness concerns, on the other hand, are least traceable. Individ-
uals who highly endorsed Purity shared religious and spiritual content on Facebook, whereas people who
scored higher on Fairness were slightly more likely to share content related to social justice and equality.
High levels of Care, Loyalty, and Authority were found to motivate a mixed collection of socially-oriented
language categories. The link between moral concerns and language was found to extend beyond exclu-
sively moral language. Overall, this research establishes a missing link in moral psychology by providing
evidence that individual-level moral concerns are differentially associated with language data collected
from individuals’ Facebook accounts.
33
Chapter3
ContextualizingHateSpeechClassifierswithPost-hocExplanation
3.1 Introduction
Hate speech detection is part of the ongoing effort to limit the harm done by oppressive and abusive
language (Gagliardone et al., 2015; Gelber & McNamara, 2016; Mohan et al., 2017; Waldron, 2012). Perfor-
mance has improved with access to more data and more sophisticated algorithms (e.g., Basile et al., 2019;
Del Vigna12 et al., 2017; Mondal et al., 2017; Silva et al., 2016), but the relative sparsity of hate speech
requires sampling using keywords (e.g., Olteanu et al., 2018) or sampling from environments with unusu-
ally high rates of hate speech (e.g., de Gibert et al., 2018; Hoover et al., 2019). Modern text classifiers thus
struggle to learn a model of hate speech that generalizes to real-world applications (Wiegand et al., 2019).
A specific problem found in neural hate speech classifiers is their over-sensitivity to group identifiers
like “Muslim”, “gay”, and “black”, which are only hate speech when combined with the right context (Dixon
et al., 2018). In Figure 3.1 we see two documents containing the word “black” that a fine-tuned BERT model
predicted to be hate speech, while only the second occurs in a hateful context.
Neural text classifiers achieve state-of-the-art performance in hate speech detection, but are uninter-
pretable and can break when presented with unexpected inputs (Niven & Kao, 2019). It is thus difficult to
contextualize a model’s treatment of identifier words. Our approach to this problem is to use the Sampling
34
“[F]or many Africans, the most threatening kind of ethnic
hatred is black against black.” - New York Times “There is a great discrepancy between whites and blacks
in SA. It is … [because] blacks will always be the most
backward race in the world.” Anonymous user, Gab.com Figure 3.1: Two documents which are classified as hate speech by a fine-tuned BERT classifier. Group
identifiers are underlined.
and Occlusion (SOC) explanation algorithm, which estimates model-agnostic, context-independent post-
hoc feature importance (Jin et al., 2020). We apply this approach to the Gab Hate Corpus (Kennedy et al.,
2020), a new corpus labeled for “hate-based rhetoric”, and an annotated corpus from the Stormfront white
supremacist online forum (de Gibert et al., 2018).
Based on the explanations generated via SOC, which showed models were biased towards group iden-
tifiers, we then propose a novel regularization-based approach in order to increase model sensitivity to the
context surrounding group identifiers. We regularize importance of group identifiers at training, coercing
models to consider the context surrounding them.
We find that regularization reduces the attention given to group identifiers and heightens the impor-
tance of the more generalizable features of hate speech, such as dehumanizing and insulting language. In
experiments on an out-of-domain test set of news articles containing group identifiers, which are heuris-
tically assumed to be non-hate speech, we find that regularization greatly reduces the false positive rate,
while in-domain, out-of-sample classification performance is either maintained or improved.
3.2 RelatedWork
Our work is conceptually influenced by Warner and Hirschberg (2012), who formulated hate speech detec-
tion as disambiguating the use of offensive words from abusive versus non-abusive contexts. More recent
approaches applied to a wide typology of hate speech (Waseem et al., 2017), build supervised models trained
on annotated (e.g., de Gibert et al., 2018; Waseem & Hovy, 2016) or heuristically-labeled (Olteanu et al.,
35
2018; Wulczyn et al., 2017) data. These models suffer from the highly skewed distributions of language in
these datasets (Wiegand et al., 2019).
Research on bias in classification models also influences this work. Dixon et al. (2018) measured and
mitigated bias in toxicity classifiers towards social groups, avoiding undesirable predictions of toxicity
towards innocuous sentences containing tokens like “gay”. Similarly, annotators’ biases towards certain
social groups were found to be magnified during classifier training Mostafazadeh Davani et al. (2020).
Specifically within the domain of hate speech and abusive language, J. H. Park et al. (2018) and Sap et al.
(2019) have defined and studied gender- and racial-bias.Techniques for bias reduction in these settings
include data augmentation by training on less biased data, term swapping (i.e., swapping gender words),
and using debiased word embeddings (Bolukbasi et al., 2016).
Complementing these works, we directly manipulate models’ modeling of the context surrounding
identifier terms by regularizing explanations of these terms. To interpret and modulate fine-tuned lan-
guage models like BERT, which achieve SotA performance in hate speech detection tasks (MacAvaney et
al., 2019; Mandl et al., 2019), we focus on post-hoc explanation approaches (Guidotti et al., 2019). These
explanations reveal either word-level (Ribeiro et al., 2016; Sundararajan et al., 2017) or phrase-level im-
portance (Murdoch et al., 2018a; Singh et al., 2019) of inputs to predictions. F. Liu and Avci, 2019; Rieger
et al., 2019 are closely related works in regularizing explanations for fair text classification. However, the
explanation methods applied are either incompatible with BERT, or known to be inefficient for regular-
ization as discussed in Rieger et al., 2019. We further identify explanations are different in their semantics
and compare two explanation algorithms that can be regularized efficiently in our setup. Besides, training
by improving counterfactual fairness Garg et al., 2019 is another closely related line of works.
36
3.3 Data
We selected two public corpora for our experiments which highlight the rhetorical aspects of hate speech,
versus merely the usage of slurs and explicitly offensive language (see Davidson et al., 2017). The “Gab
Hate Corpus” (GHC; Kennedy et al., 2020) is a large, random sample (N = 27,655) from the Pushshift.io data
dump of the Gab network
1
, which we have annotated according to a typology of “hate-based rhetoric”, a
construct motivated by hate speech criminal codes outside the U.S. and social science research on prejudice
and dehumanization. Gab is a social network with a high rate of hate speech Lima et al., 2018; Zannettou
et al., 2018 and populated by the “Alt-right” (Anthony, 2016; Benson, 2016). Similarly with respect to
domain and definitions, de Gibert et al. (2018) sampled and annotated posts from the “Stormfront” web
domain (Meddaugh & Kay, 2009) and annotated at the sentence level according to a similar annotation
guide as used in the GHC.
Train and test splits were randomly generated for Stormfront sentences (80/20) with “hate” taken as a
positive binary label, and a test set was compiled from the GHC by drawing a random stratified sample with
respect to the “target population” tag (possible values including race/ethnicity target, gender, religious,
etc.). A single “hate” label was created by taking the union of two main labels, “human degradation” and
“calls for violence”. Training data for the GHC (GHC
train
) included 24,353 posts with 2,027 labeled as
hate, and test data for the GHC (GHC
test
) included 1,586 posts with 372 labeled as hate. Stormfront splits
resulted in 7,896 (1,059 hate) training sentences, 979 (122) validation, and 1,998 (246) test.
3.4 AnalyzingGroupIdentifierBias
To establish and define our problem more quantitatively, we analyze hate speech models’ bias towards
group identifiers and how this leads to false positive errors during prediction. We analyze the top features
of a linear model and use post-hoc explanations applied to a fine-tuned BERT model in order to measure
1
https://files.pushshift.io/gab/
37
models’ bias towards these terms. We then establish the effect of these tendencies on model predictions
using an adversarial-like dataset of New York Times articles.
3.4.1 ClassificationModels
We apply our analyses on two text classifiers, logistic regression with bag of words features and a fine-
tuned BERT model (Devlin et al., 2018). The BERT model appends a special CLS token at the beginning
of the input sentence and feeds the sentence into stacked layers of Transformer Vaswani et al., 2017 en-
coders. The representation of the CLS token at the final layer is fed into a linear layer to perform 2-way
classification (hate or non-hate). Model configuration and training details can be found in the Section B.2.3.
3.4.2 ModelInterpretation
We first determine a model’s sensitivity towards group identifiers by examining the models themselves.
Linear classifiers can be examined in terms of their most highly-weighted features. We apply a post-hoc
explanation algorithm for this task of extracting similar information from the fine-tuned methods discussed
above.
3.4.2.1 Groupidentifiersinlinearmodels
From the top features in a bag-of-words logistic regression of hate speech on GHC
train
, we collected a
set of twenty-five identity words (not restricted to social group terms, but terms identifying a group in
general), including “homosexual”, “muslim”, and “black”, which are used in our later analyses. The full list
is in Appendix (B.2.1).
3.4.2.2 Explanation-basedmeasures
State-of-the-art fine-tuned BERT models are able to model complicated word and phrase compositions: for
example, some words are only offensive when they are composed with specific ethnic groups. To capture
38
0 10 20
# Removed Identity Terms
0.30
0.35
0.40
0.45
0.50
0.55
0.60
F1
Hate Detection
Gab
Stormfront
0 10 20
# Removed Identity Terms
0.76
0.78
0.80
0.82
0.84
0.86
0.88
0.90
Accuracy
NYT Adversarial
Figure 3.2: BoW F1 scores (trained on GHC
train
and evaluated on GHC
test
) as a function of how many
group identifiers are removed (left). Accuracy of same models on NYT dataset with no hate speech (right).
this, we apply a state-of-the-art Sampling and Occlusion (SOC) algorithm which is capable of generating
hierarchical explanations for a prediction.
To generate hierarchical explanations, SOC starts by assigning importance score for phrases in a way
that eliminates compositional effect between the phrase and its context x
δ around it within a window.
Given a phrasep appearing in a sentencex, SOC assigns an importance scoreϕ (p) to show how the phrase
p contribute so that the sentence is classified as a hate speech. The algorithm computes the difference of
the unnormalized prediction scores(x) between “hate” and “non-hate” in the 2-way classifier. Then the
algorithm evaluates average change of s(x) when the phrase is masked with padding tokens (noted as
x\p) for different inputs, in which the N-word contexts around the phrasep are sampled from a pretrained
language model, while other words remain the same as the givenx. Formally, the importance scoreϕ (p)
is measured as,
ϕ (p)=E
x
δ [s(x)− s(x\p)] (3.1)
In the meantime, SOC algorithm perform agglomerative clustering over explanations to generate a hier-
archical layout.
39
3.4.2.3 AveragedWord-levelSOCExplanation
Using SOC explanations output on GHC
test
, we compute average word importance and present the top 20
in Table 3.3.
3.4.3 BiasinPrediction
Hate speech models can be over-attentive to group identifiers, as we have seen by inspecting them through
feature analysis and a post-hoc explanation approach. The effect of this during prediction is that models
over-associate these terms with hate speech and choose to neglect the context around the identifier, result-
ing in false positives. To provide an external measure of models’ over-sensitivity to group identifiers, we
construct an adversarial test set of New York Times (NYT) articles that are filtered to contain a balanced,
random sample of the twenty-five group identifiers (Section B.2.1). This gives us 12,500 documents which
are devoid of hate speech as defined by our typologies, excepting quotation.
Method/Metrics Precision Recall F1 NYTAcc.
BoW 62.80 56.72 59.60 75.61
BERT 69.87± 1.7 66.83± 7.0 67.91± 3.1 77.79± 4.8
BoW + WR 54.65 52.15 53.37 89.72
BERT + WR 67.61± 2.8 60.08± 6.6 63.44± 3.1 89.78± 3.8
BERT + OC (α =0.1) 60.56± 1.8 69.72± 3.6 64.14± 3.2 89.43± 4.3
BERT + SOC (α =0.1) 70.17± 2.5 69.03± 3.0 69.52± 1.3 83.16± 5.0
BERT + SOC (α =1.0) 64.29± 3.1 69.41± 3.8 66.67± 2.5 90.06± 2.6
Table 3.1: Precision, recall, F
1
(%) on GHC
test
test set and accuracy (%) on NYT evaluation set. We report
mean and standard deviation of the performance across 10 runs for BERT, BERT + WR (word removal),
BERT + OC, and BERT + SOC.
It is key for models to notignore identifiers, but to match them with the right context. Figure 3.2 shows
the effect of ignoring identifiers: random subsets of words ranging in size from 0 to25 are removed, with
each subset sample size repeated5 times. Decreased rates of false positives on the NYT set are accompanied
by poor performance in hate speech detection.
40
Method/Metrics Precision Recall F1 NYTAcc.
BoW 36.95 58.13 45.18 66.78
BERT 57.76± 3.9 54.43± 8.1 55.44± 2.9 92.29± 4.1
BoW + WR 36.24 55.69 43.91 81.34
BERT + WR 53.16± 4.3 57.03± 5.7 54.60± 1.7 92.47± 3.4
BERT + OC (α =0.1) 57.47± 3.7 51.10± 4.4 53.82± 1.3 95.39± 2.3
BERT + SOC (α =0.1) 57.29± 3.4 54.27± 3.3 55.55± 1.1 93.93± 3.6
BERT + SOC (α =1.0) 56.05± 3.9 54.35± 3.4 54.97± 1.1 95.40± 2.0
Table 3.2: Precision, recall, F
1
(%) on Stormfront (Stf.) test set and accuracy (%) on NYT evaluation set.
We report mean and standard deviation of the performance across 10 runs for BERT, BERT + WR (word
removal), BERT + OC, and BERT + SOC.
There has been a rise and fall of hate against the jews
hate against the jews
of hate
of
the jews
(a) BERT
There has been a rise and fall of hate against the jews
hate against the jews
hate against
of
(b) BERT + SOC regularization
Figure 3.3: Hierarchical explanations on a test instance from GHC
test
before and after explanation regu-
larization, where false positive predictions are corrected.
3.5 ContextualizingHateSpeechModels
We have shown hate speech models to be over-sensitive to group identifiers and unable to learn from the
context surrounding these words during training. To address this problem in state-of-the-art models, we
propose that models can be regularized to give no explained importance to identifier terms. We explain
our approach as well as a naive baseline based on removing these terms.
3.5.0.1 WordRemovalBaseline
The simplest approach is to remove group identifiers altogether. We remove words from the term list found
in Section B.2.1 from both training and testing sentences.
41
BERT ∆ Rank Reg. ∆ Rank
ni**er +0 ni**er +0
ni**ers -7 fag +35
kike -90 traitor +38
mosques -260 faggot +5
ni**a -269 bastard +814
jews -773 blamed +294
kikes -190 alive +1013
nihon -515 prostitute +56
faggot +5 ni**ers -7
nip -314 undermine +442
islam -882 punished +491
homosexuality -1368 infection +2556
nuke -129 accusing +2408
niro -734 jaggot +8
muhammad -635 poisoned +357
faggots -128 shitskin +62
nitrous -597 ought +229
mexican -51 rotting +358
negro -346 stayed +5606
muslim -1855 destroys +1448
Table 3.3: Top 20 words by mean SOC weight before (BERT) and after (Reg.) regularization for GHC.
Changes in the rank of importance as a result of regularization are also shown. Curated set of group
identifiers are highlighted.
3.5.0.2 ExplanationRegularization
Given that SOC explanations are fully differentiable, during training, we regularize SOC explanations on
the group identifiers to be close to 0 in addition to the classification objective L
′
. The combined learning
objective is written as follows.
L=L
′
+α X
w∈x∩S
[ϕ (w)]
2
, (3.2)
whereS notes for the set of group names andx notes for the input word sequence.α is a hyperparameter
for the strength of the regularization.
In addition to SOC, we also experiment with regularizing input occlusion (OC) explanations, defined
as the prediction change when a word or phrase is masked out, which bypass the sampling step in SOC.
42
3.6 RegularizationExperiments
3.6.1 ExperimentDetails
Balancing performance on hate speech detection and the NYT test set is our quantitative measure of how
well a model has learned the contexts in which group identifiers are used for hate speech. We apply
our regularization approach to this task, and compare with a word removal strategy for the fine-tuned
BERT model. We repeat the process for both the GHC and Stormfront, evaluating test set hate speech
classification in-domain and accuracy on the NYT test set. For the GHC, we used the full list of 25 terms; for
Stormfront, we used the 10 terms which were also found in the top predictive features in linear classifiers
for the Stormfront data. Congruently, for Stormfront we filtered the NYT corpus to only contain these 10
terms (N = 5,000).
3.6.2 Results
Performance is reported in Tables 3.1 and 3.2. For the GHC, we see an improvement for in-domain hate
speech classification, as well as an improvement in false positive reduction on the NYT corpus. For Storm-
front, we see the same improvements for in-domain F
1
) and NYT. For the GHC, the most marked difference
between BERT+WR and BERT+SOC is increased recall, suggesting that baseline removal largely mitigates
bias towards identifiers at the cost of more false negatives.
As discussed in section 3.4.2.2, SOC eliminates the compositional effects of a given word or phrase. As
a result, regularizing SOC explanations does not prohibit the model from utilizing contextual information
related to group identifiers. This can possibly explain the improved performance in hate speech detection
relative to word removal.
WordImportanceinRegularizedModels We determined that regularization improves a models focus
on non-identifier context in prediction. In table 3.3 we show the changes in word importance as measured
43
by SOC. Identity terms’ importance decreases, and we also see a significant increase in importance of terms
related to hate speech (“poisoned”, “blamed”, etc.) suggesting that models have learned from the identifier
terms’ context.
Visualizing Effects of Regularization We can further see the effect of regularization by considering
Figure 3.3, where hierarchically clustered explanations from SOC are visualized before and after regular-
ization, correcting a false positive.
3.7 Conclusion&FutureWork
Regularizing SOC explanations of group identifiers tunes hate speech classifiers to be more context-sensitive
and less reliant on high-frequency words in imbalanced training sets. Complementing prior work in bias
detection and removal in the context of hate speech and in other settings, our method is directly integrated
into Transformer-based models and does not rely on data augmentation. As such, it is an encouraging
technique towards directing models’ internal representation of target phenomena via lexical anchors.
Future work includes direct extension and validation of this technique with other language models
such as GPT-2 (Radford et al., 2019); experimenting with other hate speech or offensive language datasets;
and experimenting with these and other sets of identity terms. Also motivated by the present work is the
more general pursuit of integrating structure into neural models like BERT.
Regularized hate speech classifiers increases sensitivity to the compositionality of hate speech, but the
phenomena remain highly complex rhetorically and difficult to learn through supervision. For example,
this post from the GHC requires background information and reasoning across sentences in order to clas-
sify as offensive or prejudiced: “Donald Trump received much criticism for referring to Haiti, El Salvador
and Africa as ‘shitholes’. He was simply speaking the truth.” The examples we presented (see Appendix
B.1 and B.2) show that regularization leads to models that are context-sensitive to a degree, but not to the
44
extent of reasoning over sentences like those above. We hope that the present work can motivate more
attempts to inject more structure into hate speech classification.
Explanation algorithms offer a window into complex predictive models, and regularization as per-
formed in this work can improve models’ internal representations of target phenomena.
45
Chapter4
Exemplar-basedExplanationsofSpeaker-LanguageRelationships
4.1 Introduction
Social scientists have become increasingly interested in quantifying and explaining the connection be-
tween individuals’ language usage and their psychological and demographic attributes. Previous studies
have found, for example, relationships between age and language in blogs (Argamon et al., 2007); person-
ality dimensions and Facebook posts (G. Park et al., 2015); depression and Twitter posts (De Choudhury
et al., 2013); Facebook language and moral concerns (Kennedy et al., 2021); and political ideology and Twit-
ter posts (Preoţiuc-Pietro et al., 2017). Toward understanding these relationships, researchers have relied
on feature importance assigned to word categories — either predefined lexica or latent topics. However,
this approach produces coarse interpretations taken out of context. In this work, we introduce a new ap-
proach to analyzing the relationship between language usage and speaker attributes, which emphasizes
interpretability at the instance level by producing exemplars for each attribute.
Instance-based interpretability (e.g., Brunet et al., 2019; Card et al., 2019; Papernot & McDaniel, 2018) is
a subset of interpretability research that focuses on relating models to influential instances. By the defini-
tion, an interpretation is presented in an intuitive, human-readable medium (Montavon et al., 2018) — i.e.,
sentences and images are interpretable, but vectors or feature weights are not. Furthermore, non-expert
judges prefer “explanation-by-example” (i.e., producing exemplars) in order to understand the predictions
46
of an ML model (Jeyakumar et al., 2020). For the task of interpreting speaker-attribute predictions from
language, exemplars would allow researchers to understand the relationship by reading coherent text that
is highly prototypical of given attribute. This allows insight into speaker-attributes that is contextualized
and intuitive. For instance, this could be used to show medical professionals examples of particular symp-
toms and diagnoses, rather than correlated words or word groups; to show psychologists the particular
types of verbal behavior that are tied to moral concerns or personality, rather than words most often used;
or to show political scientists examples of how members from different political parties tend to speak that
are most different from each other.
The main technical challenge with producing exemplars of speaker-attributes is the large, multi-document
nature of respective datasets. Observations in such a dataset consist of a speaker attribute paired with
potentially numerous long documents (e.g., Facebook posts, political speeches). Thus, we adopt Multi-
Instance Learning (MIL; Dietterich et al., 1997). MIL describes the setting in which labels are associated
with sets of instances (“bags”). One desirable quality of some MIL methods is that they generate predictions
for, or assign relevance to, individual instances.
In this work, we apply MIL in a novel setting and demonstrate ways in which it can be used to produce
“exemplars” of speaker-attributes. We develop a novel modeling pipeline, involving document segmenta-
tion, “re-bagging” speakers’ text data, application of MIL models, and extraction of exemplars and exem-
plar clusters from model outputs. We experiment with three distinct MIL modeling strategies: (1) a MIL
multi-layer perceptron (MLP) which averages instance-level predictions to make speaker-level predictions
(Mean-MIL MLP); (2) an attention-based MIL MLP method based on Ilse et al. (2018) (Attention-MIL MLP);
and (3) Rep-the-Set, a method for learning set aggregation using a flow-based algorithm which produces
latent sets of key instances Skianis et al. (2020).
We find, during experimentation, that these MIL methods are comparable in most cases to non-MIL
state-of-the-art benchmarks, and in some cases surpass these benchmarks. More importantly, we design
47
new ways of interpreting speaker-language relationships based on the outputs of these models. First, we
extract exemplar instances from the “Averaging” MIL MLP by identifying extreme positive values at the
instance-level. Second, we develop an approach based on the attention weights assigned to instances.
Specifically, we clusters high-attention instances generated by the model and compute cluster centroids,
which are then taken as exemplars of the model. And lastly, we develop an approach to identify exemplar
sets of instances, applied to the model developed by Skianis et al. (2020), based on finding the bag of
instances that maximized the model’s prediction for a given speaker-attribute.
4.2 RelatedWork
4.2.1 Speaker-LanguageAnalysis
Prior approaches for speaker-language analysis gravitate toward the identification of word categories that
are associated with a given speaker-attribute. Some approaches — termed “top–down” approaches — define
these categories a priori (e.g., The Linguistic Inquiry and Word Count Pennebaker et al., 2001). These word
categories can be content categories (e.g., “family”), types of emotion words, or grammatical categories.
Examples of this lexicon-based approach include Boyd et al. (2015), which related individuals’ core values
to their language. In general, regression coefficients or correlation statistics are used to relate the frequency
of a given category to a speaker-attribute.
Other approaches have taken a bottom–up approach, deriving word categories via Latent Dirichlet
Allocation (LDA; Blei et al., 2003), Latent Semantic Analysis (LSA; Deerwester et al., 1990), and related
techniques. Their use includes Garcia and Sikström (2014), which used LSA to analyze the relationship
between Facebook posts and personality traits; H. A. Schwartz et al. (2013), McFarland et al. (2013), and
others, which have applied LDA to identify linguistic trends associated with changes in personality across
large samples of Facebook participants; and Eisenstein et al. (2010), which developed a latent variable
48
topic model in order to quantify geo-spatial differences in word usage. Interpretation of these exploratory
models often involves word cloud visualization; however, word clouds present words and phrases outside
the original contexts in which they were used.
4.2.2 Multi-InstanceLearning
MIL was introduced by Dietterich et al. (1997) as the setting in which labels are paired with “bags” (i.e.,
sets) of instances. Some methods refer to this scenario as weak or distant supervision (Pappas & Popescu-
Belis, 2017), or posit that MIL is a special case of semi-supervised learning (Zhou & Xu, 2007). The original
formulation of MIL, which applied strictly to classification, took the union over a bag of instances such
that a bag is positive for a class if at least one instance within the bag is positive and negative if there
are no positive instances in the bag (Foulds & Frank, 2010). In our work we take the relaxed version of
MIL, which views instances’ contribution to bag labels in an agnostic way (e.g., there could be multiple
instances which influence the bag label; see G. Liu et al., 2012).
The present work focuses on methods which explicitly model the relationship between instances and
bag-level predictions; in contrast, multi instance kernels (Gärtner et al., 2002) and other early MIL tech-
niques made predictions strictly at the bag level. In particular, G. Liu et al. (2012) proposed a similarity-
based “instance pruning” approach that isolated key instances in a MIL dataset; unlike our work, the
method by G. Liu et al. (2012) does not generate predictions for unseen data. Recently, learning permutation-
invariant aggregations of sets for neural architectures has become a method for dynamically inferring the
importance of single instances to predictions (Ilse et al., 2018).
Several works have begun to apply MIL to tasks in NLP, specifically text classification. For aspect-
level sentiment regression, Pappas and Popescu-Belis (2017) divided documents into sentences, explicitly
modeling instance (sentence) relevance for predicting the bag label using a learned weighting mechanism.
For the same task, Kotzias et al. (2015) formulated the MIL task (in the context of document-modeling) as
49
the propagation of bag-level labels to instance-level labels (i.e., sentence labels). In computer vision, other
works which attempt to predict instances in the MIL setting include Briggs et al. (2012) and Q. Wang et al.
(2014).
4.2.3 Instance-basedExplanation
The present work is motivated by instance-level explanation, the practice of relating models or particular
predictions to instances seen during training. The most common application of instance interpretability
in NLP is to incorporate instance interpretability in order to make improvements: for example, Card et
al. (2019) proposed to learn a weighted sum over training instances at the last layer of a neural network,
showing which training inputs influenced each prediction; similarly, Han and Tsvetkov (2021) used in-
terpretations at the instance level to perform “influence tuning,” in order to correct models’ relying on
spurious signals.
Despite major advancements in interpretable NLP in terms of metrics (e.g., Jacovi & Goldberg, 2020)
and methods (e.g., Jin et al., 2019; Murdoch et al., 2018b), there have been scarce applications of inter-
pretable NLP toward the answering of social scientific questions. Interpreting machine learning models
has precedent in the natural and physical sciences (Roscher et al., 2020), though the particular data domains
and modeling tasks useful to social sciences have yet to be studied from the perspective of interpretability
research in NLP.
4.3 Speaker-AttributeDatasets
We experiment on two English-language datasets (and prediction tasks) that both have interpretive com-
ponents; note that this is not the extent of speaker-language data (see Section 4.2). First, we experiment on
a moral values prediction task that can be used to understand and analyze individuals’ moral psychology.
50
Second, we experiment on the classification of political party in the United States, which can be used to
understand the nature of partisan gaps and ideological differences between the two political parties.
4.3.1 YourMoralsFacebookDataset
The first task we experiment on is the prediction (regression) of individuals’ moral concerns from their
language, using a dataset of Facebook posts and survey responses introduced by Kennedy et al. (2021).
This dataset consists of Facebook posts authored by research participants who have completed the Moral
Foundations Questionnaire (MFQ), a survey of “moral concerns” (Graham et al., 2008). The MFQ measures
individuals’ concern with five moral foundations: Care, Fairness, Loyalty, Authority, and Purity. For ex-
ample, one Care item in the MFQ asks participants to indicate their level of agreement with the statement,
“Compassion for those who are suffering is the most crucial virtue,” and one Authority item similarly asks
for participants’ agreement with, “Respect for authority is something all children need to learn.
1
”
Kennedy et al. (2021) performed initial analysis of these data, performing five separate text regressions
with individuals’ entire Facebook posts as input (i.e., the non-MIL setting). Among the authors’ findings
was the fact that Purity and Authority were the more predictive foundations, with Fairness being difficult
to predict. The original dataset, before filtering based on number of instances (see below), contains ap-
proximately 162,816 Facebook posts from approximately 3,643 research participants, with an average of
22.4 tokens per post.
4.3.2 PoliticalSpeechesDataset
Next, we apply our framework to the task of classifying speakers’ political ideology from their text. We use
a public dataset introduced by Gentzkow et al. (2019), which contains full-text transcriptions of speeches
given by politicians in the U.S. congress. Due to modeling feasibility and the size of the entire dataset, we
1
For the complete list of items, see Graham et al. (2008)
51
select data from the 114
th
congress (2015–2017). This dataset contained 67,556 original speeches from 539
speakers (56% Republican), averaging 257.8 tokens per speech.
4.4 Methodology
In this section, we describe our framework for producing exemplar interpretations in the context of speaker-
attributes. We document model setup, preprocessing, the models used, and the baselines to which com-
parisons were made.
4.4.1 PreprocessingPipeline
Texts were split into sentences and tokenized using NLTK (Loper & Bird, 2002), with user mentions and
hashtags removed and text cast to lowercase.
Textsegmentationandbagging In our experiments, we seek a level of textual granularity between full
documents (e.g., multi-paragraph Facebook posts or extended speeches or blogs) and sentences. This is
done for reasons of interpretability, rather than modeling, as reading instances at the interpretation stage
is more humanly tractable with shorter texts. At the same time, the short, contextless nature of individual
sentences inspire a slightly larger unit of analysis.
To segment paragraphs, we rely on the method of Alemi and Ginsparg (2015)
2
. In this approach, text
segmentation is formulated as finding the optimal segmentation for some scoring function measuring
coherency of segments. Each segment is defined as the concatenation of k consecutive basic elements. In
this paper, we consider sentences as basic elements, set1≤ k≤ 5, our scoring function is the dot product
of each segment’s sentence representations (acquired using theSentenceTransformers python library), and
solve the optimization problem with a dynamic programming approach guaranteeing the optimal solution.
The output of the optimal solution were paragraphs, varying from one to multiple sentences long.
2
Using the implementation provided byhttps://github.com/chschock/textsplit
52
Number of Instances Number of Bags Average Instance Length
YourMorals
Tr
122,825 4,913 13.1
YourMorals
Val
16,325 653 13.5
YourMorals
Te
34,975 1,399 13.0
Speeches
Tr
520,325 20,813 22.9
Speeches
Val
72,825 2,913 22.5
Speeches
Te
110,050 4,402 22.9
Table 4.1: Descriptive statistics of the Train (Tr), validation (Val), and Test (Te) partitions for the Facebook-
Moral Concerns (YourMorals) and Political Speeches (Speeches) datasets, after preprocessing and instance
segmentation. All bags are of equal size (25).
Due to the high variance in the number of paragraphs per speaker in each respective dataset, we
‘bagged’ participants’ paragraphs into smaller bags by sampling without replacement. In our experiments,
we selected 25 as a bag size, and excluded participants with fewer than 25 paragraphs.
Textencoding Across methods, we use a consistent approach to encode instances (paragraphs). Fixed-
length vectors were generated, per instance, using the SentenceTransformers library in Python, using a
pretrained “MPNET” (Song et al., 2020) text encoder
3
, an adaption of Bidirectional Encoders for Bidirec-
tional Encoder Representations from Transformers (Devlin et al., 2019). Instances (paragraphs) are thus
represented as 768-length vectors with a max segment length of 256.
4.4.2 ApplicationofMILTechniques
Our general modeling approach is to use MIL techniques that produce, or enable the production of, instance-
level information related to speaker-attribute labels. We develop or adapt existing methods which produce
instance-level predictions in addition to bag-level predictions; general instance relevance via “attention”;
and produce latent sets which can be used to interpret at the instance level.
3
https://huggingface.co/sentence-transformers/all-mpnet-base-v2
53
4.4.2.1 AveragedMILMLPnetwork
The first model we develop is a multi-instance, multi-layer perceptron, which operates on bags of precom-
puted embeddings as inputs, and generates both instance-level and speaker-level predictions. Formally,
letB
ij
∈R
768
denote thej
th
instance of thei
th
bag, with the vector produced by a pretrained sentence
encoder. We first produce a hidden representation of B
ij
with an MLP, H
ij
∈ R
H
, and then aggregate
H
ij
s by taking the average, forming a single bag-level predictionH
i
.
4.4.2.2 MILMLPnetworkwithAttention
We adapt the method of Ilse et al. (2018) which was originally developed and applied to image classification.
Specifically, we formulate an Attention model as a multi-layer perceptron (MLP), which dimensions learned
through parameter tuning, with an attention layer as proposed by Ilse et al. 2018. The authors importantly
provided analysis of the attention mechanism for the aggregation of instances in the MIL setting, showing
it to be “permutation invariant.”
x
1 x
2 x
3 x
n Text Instances h
1
h
2 h
3
h
n
Encoder Encoded
Instances Modeling
Layer y
1 y
2 y
3 y
n Weighted
Sum Y Attention
Layer ɑ
1 ɑ
2 ɑ
3 ɑ
n Figure 4.1: A full visualization of the MIL MLP network with attention. Like the MIL MLP network, this
Attention model encodes embeddings of text instances independently; the difference, however, is that the
weight given to each instance is computed dynamically by a 1-layer MLP — or attention layer — and a
bag-level prediction is made by computing a weighted sum using the attention values as instance weights.
Formally, letB
ij
∈R
768
denote thej
th
instance of thei
th
bag, with the vector produced by a pretrained
sentence encoder. We first produce a hidden representation of B
ij
with an MLP, H
ij
∈ R
H
, and then
54
aggregateH
ij
s using attention, or a learned weighted sum. α ij
denotes the importance ofH
ij
, whereα i
is parameterized by a perceptron layer with a softmax activation.
4.4.2.3 Rep-the-Set
Lastly, we apply the method developed by Skianis et al. (2020), which learns set aggregations by computing
the correspondences between the input set andhiddensets by “maximizing flow” through the hidden sets.
In addition to setting or matching state-of-the-art performance on text classification and graph classifi-
cation, “Rep-The-Set” also offers interpretability through the analysis of the hidden sets. In the work of
Skianis et al. 2020, elements of hidden sets, which are themselves embeddings, were compared to word
embeddings, and the most similar words to the items of the hidden set were used for interpretation.
4.4.3 Non-MILBaselines
For the purpose of establishing performance baselines for MIL methods, we also train a standard MLP
on the averaged representation of a bag of posts. Hyperparameters (e.g., number of layers, layer dimen-
sionality, learning rate, optimizer) were again tuned via Optuna. Inputs were generated by averaging
element-wise the 25 posts in a given bag. In addition, we trained a model based on bag-of-words features.
Specifically, we trained Support Vector Machine classifiers and regressors, with linear kernels, and with
Term Frequency–Inverse Document Frequency (TF-IDF) features.
4.4.4 ImplementationDetails
MLP models and Rep-the-Set were modeled in PyTorch, with hyperparameter tuning performed using
the Optuna (Akiba et al., 2019) library
4
. SVM models were implemented in Sci-kit learn (Pedregosa et al.,
2011), with theC parameter tuned with Optuna.
4
See SM for full details on tuning
55
For all modeling, data were split into 70-10-20 sections (train, validate, test). For YourMorals training
tasks, models were optimized using the Mean Squared Error (MSE) loss, while for the political speeches
classification, binary cross entropy was used. Learning rate scheduling was used (learning rate halved
after validation loss failed to improve for 3 epochs), and models were trained for 50 epochs or with early
stopping, with patience equal to 8. All hyperparameter tuning was tuned using the same optimization
method, specifically the Optuna library Akiba et al. (2019). For all MLP models, tuned parameters included
number of layers (1–3 layers), layer size (100–1000 units), layer activation (ReLU or Tanh), Layer dropout
rate (0.2–0.5), learning rate (1e
− 5
–5e
− 3
), batch size (100–300), and optimizer (Adam or RMSProp). Ad-
ditionally, the Attention-MIL model had an extra parameter, the attention-layer dimensionality (100–500
units). Rep-the-Set model tuned parameters include the number of hidden sets (5–200), the number of
members of hidden sets (5–200), and the learning rate (1e
− 4
–5e
− 3
). For each model and each prediction
task, 100 Optuna trials were run, with median-trial-pruning and Tree-structured Parzen Estimator for pa-
rameter sampling (Bergstra et al., 2011). Each Optuna training run (100 trials) took between 1 and 5 hours
— depending on model size — on a single Nvidia GForce GTX 1080 Ti GPU.
4.5 PredictionPerformanceandModelValidation
In this section, we interpret the speaker-language relationship using outputs produced by our methodol-
ogy. After validating the performance of each model on both tasks, we demonstrate three exemplar-based
interpretation approaches. First, we apply instance predictions from theMean-MIL to produce exemplar
clouds of moral concerns for YourMorals dataset. Specifically, we interpret extreme instance-level predic-
tions as exemplars of the predicted concept. Second, we take the outputs of Attention-MIL as indications
of instance importance for the Speeches test set. With these high-importance instances, we perform clus-
tering and generate exemplars based on cluster centroids. And lastly, the Rep-the-Set method is used
56
to generate interpretable sets of instances which maximize predicted values of speaker-attributes for the
YourMorals test set.
4.5.1 PredictionPerformance
Table 4.2: Test set metrics for predicting 5 moral concerns from Facebook posts and for predicting political
party from political speeches.
Care Fairness Loyalty Authority Purity Political Speeches
Mean Squared Error (↓) Accuracy (↑)
SVM
LIWC
0.904 0.996 0.908 0.894 0.996 60.4%
SVM
LDA
0.934 0.990 0.894 0.880 0.899 80.6%
Non-MIL MLP 0.842 0.992 0.842 0.745 0.834 82.3%
Mean-MIL MLP 0.827 0.977 0.856 0.784 0.867 82.0%
Attention-MIL MLP 0.838 1.003 0.852 0.785 0.840 84.3%
Rep-the-Set 0.816 0.988 0.845 0.754 0.786 91.6%
The speaker-level performance from all models on the moral concern regression tasks and the polit-
ical speeches classification task are shown in Table 4.2 for the test sets. The goal of this evaluation is to
determine whether MIL methods, by disaggregating speakers’ texts and potentially exposing models to
a higher noise-to-signal ratio, perform significantly worse than their non-MIL counterparts. Our results
were mixed, but mostly in the affirmative: MIL methods are equal to or better than non-MIL methods in
most cases. Mean-MIL and Attention-MIL out-performed the non-MIL MLP in places (Care and Fairness)
4.5.2 ValidatingAttentionWeights
There has been debate as to whether attention weights can be used for interpretation (Jain & Wallace, 2019;
Wiegreffe & Pinter, 2019). One of the critical measures of attention weights’ validity — in the context of
self-attention applied over sequences
5
— is whether permuting attention weights alters model predictions.
In our analysis, we find conclusively that permuting attention weights alters model predictions. Com-
pared to the reference accuracy (82.4%), permuting attention weights within bags and aggregating using a
5
In our context, attention is applied to sets of items
57
Table 4.3: High and low attention instances for one randomly sampled bag in the test set. Attention (α ) and
instance-level predictions (
ˆ
Y) are listed for the two highest-attention instances and 5 of the zero-attention
instances. Attention weights sum to 1 in a given bag.
α ˆ
Y Text
0.96 1.00
but that is what we are going to get if epas proposed rules for new
and existing power plants go into effect. folks in my district have
had enough of this kind of executive overreach by the white house.
0.03 0.99
this new red tape by the epa will hamper american energy security.
it also puts our national security at risk. the epas plan is an
unnecessary attempt to eliminate reliable and affordable energy.
...
0.00 0.52 i reserve the balance of my time.
0.00 0.83 i understand their motivations.
0.00 0.58 i encourage my colleagues to support s.j.
0.00 0.46 or after a terrorist threat or event.
0.00 0.58 i take the obligation to stand up for them seriously.
weighted sum achieves a 64.9% accuracy; furthermore, the accuracy between the permuted prediction and
the reference prediction was only 62.5%.
Given that sparsity can be a key feature of interpretability (Lipton, 2018), we investigated whether
the attention weights were sparse or dense (as was the case for the YourMorals prediction tasks). We
found attention weights in the task of political party prediction to be highly sparse. Just 25.5% of attention
weights assigned were non-zero (greater than0.01), just 12.3% of weights were greater than 0.05, and 4.7%
were greater than 0.25. Of the 4,402 test bags, 72.9% had exactly one instance with attention weight greater
than0.25, while 95.4% had at least one. This suggests that a small number of instances in a given bag were
responsible for the prediction. Combined with the statistics from the prior section, evidently the attention
weights for this model are not only sparse, but high (non-negative) weights were key to the prediction.
58
4.6 ProducingExemplarsusingMILMethods
4.6.1 ExemplarCloudsofMoralConcerns
Analogous to the word cloud method of visualizing speaker-language models (H. A. Schwartz et al., 2013),
here we present exemplar clouds. In this context of this prediction task, which is the prediction of con-
tinuous outcomes, we identify exemplars for a given variable as the instances with the most positively
predicted values across the test dataset. Specifically, we pass test inputs (i.e., bags of instances) into the
trained model, and extract the predicted weights for each instance, which are each on the scale of the given
target variable. For example, in the case of modeling Care moral concerns, a bag consisting of 25 sentences
or short paragraphs, all coming from the same individual, are passed into the model. The model produces
a prediction for each of the 25 instances, which are then averaged to make the final model prediction.
To generate exemplars using these instance-level predictions for the test data, I simply rank all in-
stances by predicted value, and select the highest ranked instances as exemplars of the respective moral
concern. For example, exemplars for Care moral concerns are displayed in “Cloud” format in Figure 4.2,
and exemplars for Loyalty moral concerns are similarly displayed in Figure 4.3.
to all of my friends of color, my muslim friends,
my immigrant friends, my friends in the lgbtqia
community, my fellow humans the world over -
im sorry.
we all need caring, loving
thoughts right now.
i love the pictures but so
sad.
but actually what they
really need now more
than ever is our love,
understanding, discourse
and most importantly,
presence.
my heart is hurting for a nation that is
frustrated, exhausted, and outright
tired from the racism, xenophobia,
sexual assault, sexism, and
homophobia that has been exhibited
in this country.
there are many worthy
causes out there that
deserve attention.
we might not be able to fix the
worlds problems, but we can
take the time to make sure our
loved ones know how much they
mean to us.
Figure 4.2: An exemplar cloud showing the most extreme positive instance predictions for predicting Care
moral concerns.
59
to our brother, who is now in gods
eternal glory...this is for you, and in
memory of your life.
super proud again of the woodson jv volleyball
team who took down a very determined
yorktown squad in a 3 set nail-biter match tonite
22-25, 25-21, and 15-13 at yorktown.
a man who trust god is
blessed because his
hope is in the lord!
super proud of the woodson jv volleyball team
who took down a tough oakton squad in straight
sets tonite 25-20, 25-20 at oakton.
up 2 scores!!
thanks for all you have done for me
my family and so many others!
as a tennessee fan,i never thought i
would say this, but roll tide!
Loyalty
Figure 4.3: An exemplar cloud showing the most extreme positive instance predictions for predicting Loy-
alty moral concerns.
Researchers can use these exemplar clouds to understand the way that high-morality manifests in
language, in readable, condensed examples. For example, the Care exemplars in Figure 4.2 show that high-
Care is associated with the expression of concern or care for specific others (e.g., friends of color), humanity
in general, and the self (e.g., “we all need ...”); it is also associated with emotional expression and calls for
more attention to “worthy causes.” In addition, the Loyalty exemplars shown a variety of themes, such as
team-based sports, spiritual expression, and giving thanks.
4.6.2 ExemplarsfromRep-the-Set
Next, we generated exemplars for the YourMorals dataset using Rep-the-Set. In the original work, Skianis
et al. (2020) interpreted the hidden sets — the major component of their proposed network — by finding
individual words that were similar to the elements of a given hidden set and centroids of each hidden set.
However, these interpretations are not related to the particular predictions of the model; furthermore, in
experiments we found that this method of interpretation did not scale well to larger network dimensions.
In our case, we are interested in exemplars of the speaker-attribute being modeled. Thus, we designed
60
a method to extract a set of instances that maximize the prediction of the network for a given speaker-
attribute.
A greedy heuristic was used to determine the bag of test instances which maximize the predicted value
of the network when predicting Purity. Starting from an empty setS, the test instance that, when added
toS, achieved the maximum predicted value from the model, was added toS and removed from the set of
available instances. This was repeated a number of times. Table 4.4 shows 10 instances from the maximum
bag for modeling Purity moral concerns, and similarly Table 4.5 for Loyalty.
Table 4.4: Exemplars generated for Purity morality via Rep-the-Set. A greedy heuristic was used to deter-
mine the set of test instances which maximize the predicted value of the network when predicting Purity.
lets pray for each other today, regardless of who or what your spiritual foundation!
2:8-9 and grow in christ through faith rom.
thank you everyone for the prayers!
help us to show the saviors love in the face of adversity!
prayer for harmony
dear father, your word tells us in romans 12:16 nlt live in harmony with each other.
our word and deed is god focused!
please help us remember we have a charge to glorify you because you gave your son for our
soul to be save, and fit it for the sky.
dear heavenly father, i pray we increase our faith in you.
prayers for us appreciated!
Table 4.5: Exemplars generated for Loyalty morality via Rep-the-Set.
thank you to all veterans and current military men and women. write done in comments when
you do!
thank you to all our soldiers past and present for your service.
to all veterans but especially those im fortunate enough to know and be related to: thank you
for your service.
thanks for all you have done for me my family and so many others!
thank you so much to all who have served in our military.
prayers for us appreciated!
on behalf of my family, i want to thank everyone for all of the kind words and prayer.
blessed is the man that trusteth in the lord, and whose hope the lord is.
thanks for all the prayers of support.
thank you for dedicating your life and for your self-less sacrifice to those that came before and
after me–and those that are with me.
61
4.6.3 Attention-ClusteringforPoliticalParty
In this section we show how high-attention instances can be used to identify multiple types of exemplars.
We conduct our analysis exclusively for the Speeches
Test
dataset, as we found in analysis that attention
weights were only sparse for the political party classification task. Given the size of the data and the
numerous instances which received high prediction scores for either Democrat or Republican bags, a new
approach (from extreme predictions) was necessary in order to identify exemplars. A simple clustering
approach was applied, in which high-attention instances were clustered and central members of a given
cluster were taken to be an exemplar.
Table 4.6: Exemplars from the Speeches dataset, generated by clustering high-attention test instances and
extracting the centroid instance, per cluster. Included in this table are three cluster centroids each for
Democrat-skewed clusters (measured by the average ground truth label for that cluster) and Republican-
skewed clusters.
Democrat
republicans regularly stood up to address climate change. the fortress of climate
denial constructed by the big polluters. and the polluterfunded attack machine has
turned on you. and that addressing carbon pollution with a price on carbon would
be a "political loser."
the terrible reality for most poor children in america in 2015 is that these simple
goals are as out of reach as flying to the moon. but because of the data that we have
as a result of no child left behind. africanamerican students improved by 15 points
and latino students improved by 21 points.
since sandy hook there have been hundreds of gun deaths in connecticut. fortynine
people at the pulse were killed.
Republican
continuing evidence of the disastrous obama doctrine. the world does not believe
that our country is serious about taking on isis. last year president obama stated
that isis was not a serious threat. isis is a clear and present danger to the american
people.
and i am convinced that the foundational importance of religious liberty is not just
in america. we must redouble our commitment to fighting for those around the
world who do not enjoy the basic right to worship as they choose. and caged by isis
for eating during ramadan".
the epa persists in these illegal activities. the epas plan would grant it jurisdiction
over fully 95 percent of my home state of california. the government used the clean
water act to attack a family farm for shifting to more efficient irrigation systems.
allowing an unaccountable federal agency to insert itself into land use decisions
across the state.
62
Out of all test set instances (N = 110,050), instances were selected for clustering if their attention score
was greater than 0.5 (n= 2,774). Usingk-means clustering,k was selected to be 50 using the elbow method.
The resulting clusters were separated into four groups: 7 clusters containing purely procedural comments;
21 clusters highly skewed toward Democrats (measured by average ground truth label across included
instances); 16 clusters highly skewed toward Republicans, and 6 neutral clusters by average ground truth
label. The most central instances (as measured by distance to the given cluster centroid) are taken to be
exemplars for the given class (Republican or Democrat). Two cluster-based exemplars each for Democrats
and Republicans are shown in Table 4.6.
4.7 Discussion
We introduced a new approach to analyzing the relationship between language usage and speaker at-
tributes, which emphasizes interpretability at the instance level by producing exemplars for each attribute.
We adopted the paradigm of Multi Instance Learning and, through the application of three separate meth-
ods (paired with simple post-processing steps), extracted exemplars that can be used by experts to under-
stand the constructs behind speaker attributes.
The present work represents a first step in the development of this paradigm for practical usage;
namely, it represents a proof of concept and an illustration that exemplar-based interpretation provides
rich insight into speaker-attributes and, more importantly, gives scientists the ability to understand psy-
chological facets of language. The primary limitation of our work is in terms of evaluation; empirical
evaluation was made difficult by the qualitative nature of this work. However, future works are planned to
compare the efficacy of our exemplar explanations to previous methods of explanation (e.g., word clouds).
Next steps include: (1) more rigorous evaluation of the effectiveness of exemplar-based interpretation us-
ing user studies; (2) experimentation with different domains, such as predicting mental health outcomes
from language usage; and (3) identification of new ways to generate and present exemplars.
63
Lastly, potential risks of our method concern security and the preservation of confidentiality, in cases in
which data are not public. Feature-based interpretation methods do not expose actual data to the public, but
exemplar-based methods do. To mitigate this risk, future works can be cautious with releasing exemplars,
and use them in their own research, when data are sensitive.
64
Chapter5
Conclusions
How should we understand the relationship between speakers and their language? This thesis presents
an answer to this question, which is informed by recent advances in NLP for representing text data, cur-
rent thinking in NLP and ML concerning the explainability of high-performing models, and the idea that
instance-level interpretability, specifically the identification of exemplars, offers a means to productively
combine the advances of NLP with the intuitive, self-explanatory nature of language.
There is a key linkage between the performance difference between lexical techniques and text em-
bedding, which we observe in Chapter 2, that transcends practical significance. If our objective is to build
automatic systems that are capable of predicting the personality, moral concerns, or political ideology of
individuals from their language, then the benefit of predictive superiority is clear. However, the objec-
tive encountered in this thesis is to understand the connection between language and speaker attributes.
Accordingly, the prediction delta between text embedding (and similarly any approach introduced in the
future) and past, lexicon- and topic-based methods has to do with validity. Better-performing methods logi-
cally contain a superior representative of language. Thus, any downstream inferences we make with better-
performing methods are more valid than alternatives, specifically when one is measuring the strength of
a given association (i.e., the effect size) or comparing the effect sizes of two effect sizes for two different
attributes (e.g., Fairness and Purity concerns in Chapter 2).
65
Though using more predictive models for scientific inference from language data is important, pre-
diction alone is insufficient. Researchers in NLP have altogether adopted the encouraging stance that it is
fundamentally ineffectual to simply predict some phenomenon from language. Rather, we ought to know
why certain models perform well, in which cases they fail, and whether they are basing predictions on ap-
propriate signal. In Chapter 3, I demonstrated the gainful application of post hoc explanations to achieve
understanding and control over a black box text classification model. To this point, however, such “ex-
plainability” research in NLP has not been generally applied to social scientific research agendas, where
the goal is inference toward some theoretical insight rather than useful prediction.
In this thesis, I identified this gap between explainability research and certain areas of the social sci-
ences (i.e., those areas which treat “text as data”; Grimmer and Stewart 2013), particularly in the study
of speaker attributes and observed language usage. In addition, this gap is not trivially overcome. In
Chapter 4, I noted that the primary technical barrier for satisfactorily explaining the relationship between
speaker attributes and language usage is the structure of such datasets.
Finally, in Chapter 4 I introduced the concept of exemplar explanations for use in text analysis in the
social sciences. This is a substantial departure from prior works’ reliance on word categories and word
clouds, not only in form but also in the very substance of what determines an explanation. This research
posed the question, “Which is more instructive and generates more insight: a prototype of a speaker
attribute in language, or an exemplar?” In this context, a prototype consists of the words most distinctively
used by a given group, while an exemplar consists of an instance that most exemplifies the features of
language used by a given group. This is a new way of thinking about speaker-language relationships, and
while this thesis merely scratches the surface of what exemplars can teach us about language and the mind,
future works can entertain the possibility of applying them through new techniques, to new domains, and
to ask different types of questions.
66
Language is incredibly complex, and its connection to speakers’ psychological and demographic at-
tributes is no different. To understand this connection, we must not only build models that can predictively
account for the speaker-language relationship, but also explanation strategies that can be at once valid,
intuitive, and instructive. This thesis provides one framework for achieving a balance between these two
objectives, but it is surely only the beginning.
67
Bibliography
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter
optimization framework. Proceedings of the 25th ACM SIGKDD international conference on
knowledge discovery & data mining, 2623–2631.
Alemi, A. A., & Ginsparg, P. (2015). Text segmentation based on semantic word embeddings. arXiv
preprint arXiv:1503.05543.
Anthony, A. (2016). Inside the hate-filled echo chamber of racism and conspiracy theories. The guardian,
18.
Araque, O., Gatti, L., & Kalimeri, K. (2020). Moralstrength: Exploiting a moral lexicon and embedding
similarity for moral foundations prediction. Knowledge-based systems, 191, 105184.
Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the blogosphere: Age, gender and
the varieties of self-expression. First Monday.
Arun, R., Suresh, V., Madhavan, C. V., & Murthy, M. N. (2010). On finding the natural number of topics
with latent dirichlet allocation: Some observations. Pacific-Asia conference on knowledge discovery
and data mining, 391–402.
Atari, M., Graham, J., & Dehghani, M. (2020). Foundations of morality in iran. Evolution and Human
Behavior, 41, 367–384.
Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F. M. R., Rosso, P., & Sanguinetti, M. (2019).
Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in
twitter. Proceedings of the 13th International Workshop on Semantic Evaluation, 54–63.
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating
system bias and enabling better science. Transactions of the Association for Computational
Linguistics, 6, 587–604.
Benson, T. (2016). Inside the twitter for racists: Gab the site where milo yiannopoulos goes to troll now.
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization.
Advances in neural information processing systems, 24.
68
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning
research, 3(Jan), 993–1022.
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer
programmer as woman is to homemaker? debiasing word embeddings. Advances in neural
information processing systems, 4349–4357.
Boyd, R. L., Wilson, S. R., Pennebaker, J. W., Kosinski, M., Stillwell, D. J., & Mihalcea, R. (2015). Values in
words: Using language to evaluate and understand personal values. ICWSM, 31–40.
Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Van Bavel, J. J. (2017). Emotion shapes the diffusion of
moralized content in social networks. Proceedings of the National Academy of Sciences, 114,
7313–7318.
Briggs, F., Fern, X. Z., & Raich, R. (2012). Rank-loss support instance machines for miml instance
annotation. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery
and data mining, 534–542.
Brunet, M.-E., Alkalay-Houlihan, C., Anderson, A., & Zemel, R. (2019). Understanding the origins of bias
in word embeddings. International Conference on Machine Learning, 803–811.
Burton, J., Cruz, N., & Hahn, U. (2019). How real is moral contagion in online social networks? CogSci,
175–181.
Card, D., Zhang, M., & Smith, N. A. (2019). Deep weighted averaging classifiers. Proceedings of the
Conference on Fairness, Accountability, and Transparency, 369–378.
Clifford, S., Iyengar, V., Cabeza, R., & Sinnott-Armstrong, W. (2015). Moral foundations vignettes: A
standardized stimulus database of scenarios based on moral foundations theory. Behavior
Research Methods, 47, 1178–1198.
Clifford, S., & Jerit, J. (2013). How words do the work of politics: Moral foundations theory and the debate
over stem cell research. The Journal of Politics, 75, 659–671.
Curry, O., Whitehouse, H., & Mullins, D. (2019). Is it good to cooperate? testing the theory of
morality-as-cooperation in 60 societies. Current Anthropology, 60(1).
Cushman, F. (2013). Action, outcome, and value: A dual-system framework for morality. Personality and
social psychology review, 17(3), 273–292.
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the
problem of offensive language. Eleventh international AAAI conference on web and social media.
Day, M. V., Fiske, S. T., Downing, E. L., & Trail, T. E. (2014). Shifting liberal and conservative attitudes
using moral foundations theory. Personality and Social Psychology Bulletin, 40, 1559–1573.
De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media.
Proceedings of the International AAAI Conference on Web and Social Media, 7.
69
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent
semantic analysis. Journal of the American society for information science, 41, 391–407.
de Gibert, O., Perez, N., Pablos, A. G., & Cuadros, M. (2018). Hate speech dataset from a white supremacy
forum. Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 11–20.
Dehghani, M., Johnson, K., Hoover, J., Sagi, E., Garten, J., Parmar, N. J., Vaisey, S., Iliev, R., & Graham, J.
(2016). Purity homophily in social networks. Journal of Experimental Psychology: General, 145,
366.
Del Vigna12, F., Cimino23, A., Dell’Orletta, F., Petrocchi, M., & Tesconi, M. (2017). Hate me, hate me not:
Hate speech detection on facebook. Proceedings of the First Italian Conference on Cybersecurity
(ITASEC17), 86–95.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional
transformers for language understanding. Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long and Short Papers), 4171–4186.
Dietterich, T. G., Lathrop, R. H., & Lozano-Pérez, T. (1997). Solving the multiple instance problem with
axis-parallel rectangles. Artificial intelligence , 89, 31–71.
Dixon, L., Li, J., Sorensen, J., Thain, N., & Vasserman, L. (2018). Measuring and mitigating unintended bias
in text classification. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society,
67–73.
Eichstaedt, J. C., Schwartz, H. A., Giorgi, S., Kern, M. L., Park, G., Sap, M., Labarthe, D. R., Larson, E. E.,
Seligman, M., & Ungar, L. H. (2018). More evidence that twitter language predicts heart disease:
A response and replication. https://doi.org/10.31234/osf.io/p75ku
Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S.,
Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., &
Seligman, M. E. P. (2015). Psychological language on twitter predicts county-level heart disease
mortality. Psychological Science, 26, 159–169.
Eichstaedt, J. C., Smith, R. J., Merchant, R. M., Ungar, L. H., Crutchley, P., Preoţiuc-Pietro, D., Asch, D. A.,
& Schwartz, H. A. (2018). Facebook language predicts depression in medical records. Proceedings
of the National Academy of Sciences, 115(44), 11203–11208.
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. (2010). A latent variable model for geographic lexical
variation. Proceedings of the 2010 conference on empirical methods in natural language processing,
1277–1287.
Facebook. (2019). Number of monthly active facebook users worldwide as of 2nd quarter 2019 (in
millions) [graph]. https://www.statista.com/statistics/264810/number-of-monthly-active-
facebook-users-worldwide/
70
Feinberg, M., & Willer, R. (2015). From gulf to bridge: When do moral arguments facilitate political
influence? Personality and Social Psychology Bulletin, 41(12), 1665–1681.
Foulds, J., & Frank, E. (2010). A review of multi-instance learning assumptions.Theknowledgeengineering
review, 25(1), 1–25.
Frimer, J. (2020). Do liberals and conservatives use different moral languages? two replications and six
extensions of graham, haidt, and nosek’s (2009) moral text analysis. Journal of Research in
Personality, 84, 103906.
Frimer, J., Boghrati, R., Haidt, J., Graham, J., & Dehghani, M. (2019). Moral foundations dictionary for
linguistic analyses 2.0 [Unpublished Manuscript].
Gagliardone, I., Gal, D., Alves, T., & Martinez, G. (2015). Countering online hate speech. Unesco Publishing.
Garcia, D., & Sikström, S. (2014). The dark side of facebook: Semantic representations of status updates
predict the dark triad of personality. Personality and Individual Differences , 67, 92–96.
Garg, S., Perot, V., Limtiaco, N., Taly, A., Chi, E. H., & Beutel, A. (2019). Counterfactual fairness in text
classification through robustness. Proceedingsofthe2019AAAI/ACMConferenceonAI,Ethics,and
Society, 219–226.
Garten, J., Hoover, J., Johnson, K. M., Boghrati, R., Iskiwitch, C., & Dehghani, M. (2018). Dictionaries and
distributions: Combining expert knowledge and large scale textual data content analysis.
Behavior Research Methods, 50, 344–361.
Gärtner, T., Flach, P. A., Kowalczyk, A., & Smola, A. J. (2002). Multi-instance kernels. ICML, 19, 179–186.
Gelber, K., & McNamara, L. (2016). Evidencing the harms of hate speech. Social Identities, 22(3), 324–341.
Gentzkow, M., Shapiro, J. M., & Taddy, M. (2019). Measuring group differences in high-dimensional
choices: Method and application to congressional speech. Econometrica, 87(4), 1307–1340.
https://doi.org/https://doi.org/10.3982/ECTA16566
Gilligan, C. (1977). In a different voice: Women’s conceptions of self and of morality. Harvard educational
review, 47(4), 481–517.
Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? a
comparative analysis of six preconceptions about internet questionnaires. American psychologist,
59(2), 93.
Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations
theory: The pragmatic validity of moral pluralism. Advances in experimental social psychology
(pp. 55–130). Elsevier.
Graham, J., Haidt, J., & Nosek, B. A. (2008). The moral foundations questionnaire. MoralFoundations. org.
Graham, J., Haidt, J., & Nosek, B. A. (2009). Liberals and conservatives rely on different sets of moral
foundations. Journal of Personality and Social Psychology, 96, 1029–1046.
71
Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., & Ditto, P. H. (2011). Mapping the moral domain.
Journal of Personality and Social Psychology, 101, 366–385.
Gray, K., & Wegner, D. M. (2009). Moral typecasting: Divergent perceptions of moral agents and moral
patients. Journal of personality and social psychology, 96(3), 505.
Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An fmri investigation
of emotional engagement in moral judgment. Science, 293(5537), 2105–2108.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of
Sciences, 101(suppl 1), 5228–5235.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis
methods for political texts. Political analysis, 21(3), 267–297.
Grover, T., Bayraktaroglu, E., Mark, G., & Rho, E. H. R. (2019). Moral and affective differences in us
immigration policy debate on twitter. Computer Supported Cooperative Work (CSCW), 28(3-4),
317–355.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2019). A survey of
methods for explaining black box models. ACM computing surveys (CSUR), 51(5), 93.
Haidt, J., & Joseph, C. (2004). Intuitive ethics: How innately prepared intuitions generate culturally
variable virtues. Daedalus, 133, 55–66.
Hallen, B. (2000). The good, the bad, and the beautiful: Discourse about values in yoruba culture. Indiana
University Press.
Han, X., & Tsvetkov, Y. (2021). Influence tuning: Demoting spurious correlations via instance attribution
and instance-driven updates. arXiv preprint arXiv:2110.03212.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain
Sciences, 33, 61–83.
Hofmann, W., Wisneski, D. C., Brandt, M. J., & Skitka, L. J. (2014). Morality in everyday life. Science,
345(6202), 1340–1343.
Hoover, J., Portillo-Wightman, G., Yeh, L., Havaldar, S., Davani, A. M., Lin, Y., Kennedy, B., Atari, M.,
Kamel, Z., Mendlen, M., Moreno, G., Park, C., Chang, T. E., Chin, J., Leong, C., Leung, J. Y.,
Mirinjian, A., & Dehghani, M. (2020). Moral foundations twitter corpus: A collection of 35k tweets
annotated for moral sentiment. Social Psychological and Personality Science, 194855061987662.
Hoover, J., Atari, M., Davani, A. M., Kennedy, B., Portillo-Wightman, G., Yeh, L., Kogon, D., &
Dehghani, M. (2019). Bound in hatred: The role of group-based morality in acts of hate. PsyArXiv.
Ilse, M., Tomczak, J. M., & Welling, M. (2018). Attention-based deep multiple instance learning. 35th
International Conference on Machine Learning, ICML 2018, 3376–3391.
72
Jacovi, A., & Goldberg, Y. (2020). Towards faithfully interpretable NLP systems: How should we define
and evaluate faithfulness? Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, 4198–4205. https://doi.org/10.18653/v1/2020.acl-main.386
Jain, S., & Wallace, B. C. (2019). Attention is not explanation. arXiv preprint arXiv:1902.10186.
Janoff-Bulman, R., & Carnes, N. C. (2013). Surveying the moral landscape: Moral motives and
group-based moralities. Personality and Social Psychology Review, 17(3), 219–236.
Jeyakumar, J. V., Noor, J., Cheng, Y.-H., Garcia, L., & Srivastava, M. (2020). How can i explain this to you?
an empirical study of deep neural network explanation methods. Advances in Neural Information
Processing Systems.
Jin, X., Wei, Z., Du, J., Xue, X., & Ren, X. (2019). Towards hierarchical importance attribution: Explaining
compositional semantics for neural sequence models. International Conference on Learning
Representations.
Jin, X., Wei, Z., Du, J., Xue, X., & Ren, X. (2020). Towards hierarchical importance attribution: Explaining
compositional semantics for neural sequence models. International Conference on Learning
Representations. https://openreview.net/forum?id=BkxRRkSKwr
Kennedy, B., Atari, M., Davani, A. M., Yeh, L., Omrani, A., Kim, Y., Coombs Jr., K., Havaldar, S.,
Portillo-Wightman, G., Gonzalez, E., Hoover, J., Azatian, A., Cardenas, G., Hussain, A., Lara, A.,
Omary, A., Park, C., Wang, X., Wijaya, C., . . . Dehghani, M. (2020). The gab hate corpus: A
collection of 27k posts annotated for hate speech. https://doi.org/10.31234/osf.io/hqjxn
Kennedy, B., Atari, M., Davani, A. M., Hoover, J., Omrani, A., Graham, J., & Dehghani, M. (2021). Moral
concerns are differentially observable in language. Cognition, 212, 104696.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization.arXivpreprintarXiv:1412.6980.
Kohlberg, L. (1981). Essays on moral development: The psychology of moral development (Vol. 2). San
Francisco: harper & row.
Koleva, S. P., Graham, J., Iyer, R., Ditto, P. H., & Haidt, J. (2012). Tracing the threads: How five moral
concerns (especially purity) help explain culture war attitudes. Journal of Research in Personality,
46, 184–194.
Kotzias, D., Denil, M., De Freitas, N., & Smyth, P. (2015). From group to individual labels using deep
features. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 597–606.
Lee, J., Lee, Y., Kim, J., Kosiorek, A. R., Choi, S., & Teh, Y. W. (2019). Set transformer: A framework for
attention-based permutation-invariant neural networks. Proceedings of the 36
th
International
Conference on Machine Learning.
Lenth, R., Singmann, H., Love, J., Buerkner, P., & Herve, M. (2018). Emmeans: Estimated marginal means,
aka least-squares means. R package version, 1(1).
73
Li, L., & Tomasello, M. (2021). On the moral functions of language. Social Cognition, 39(1), 99–116.
Lima, L., Reis, J. C., Melo, P., Murai, F., Araujo, L., Vikatos, P., & Benevenuto, F. (2018). Inside the
right-leaning echo chambers: Characterizing gab, an unmoderated social system. 2018 IEEE/ACM
InternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM), 515–522.
Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of
interpretability is both important and slippery. Queue, 16(3), 31–57.
Liu, F., & Avci, B. (2019). Incorporating priors with feature attribution on text classification. arXiv
preprint arXiv:1906.08286.
Liu, G., Wu, J., & Zhou, Z.-H. (2012). Key instance detection in multi-instance learning. Asian Conference
on Machine Learning, 253–268.
Loper, E., & Bird, S. (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.
MacAvaney, S., Yao, H.-R., Yang, E., Russell, K., Goharian, N., & Frieder, O. (2019). Hate speech detection:
Challenges and solutions. PloS one, 14(8).
Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., & Patel, A. (2019). Overview of the
hasoc track at fire 2019: Hate speech and offensive content identification in indo-european
languages. Proceedings of the 11th Forum for Information Retrieval Evaluation, 14–17.
Maron, O., & Lozano-Pérez, T. (1998). A framework for multiple-instance learning. Advances in neural
information processing systems, 570–576.
Matsuo, A., Sasahara, K., Taguchi, Y., & Karasawa, M. (2019). Development and validation of the japanese
moral foundations dictionary. PLoS One, 14.
McCallum, A. K. (2002). Mallet: A machine learning for language toolkit [http://mallet.cs.umass.edu].
McFarland, D. A., Ramage, D., Chuang, J., Heer, J., Manning, C. D., & Jurafsky, D. (2013). Differentiating
language usage through topic models. Poetics, 41(6), 607–625.
Meddaugh, P. M., & Kay, J. (2009). Hate speech or “reasonable racism?” the other in stormfront.Journalof
Mass Media Ethics, 24(4), 251–268.
Medin, D., Ojalehto, B., Marin, A., & Bang, M. (2017). Systems of (non-) diversity. Nature Human
Behaviour, 1(5), 1–5.
Mehl, M. R., Pennebaker, J. W., Crow, D. M., Dabbs, J., & Price, J. H. (2001). The electronically activated
recorder (ear): A device for sampling naturalistic daily activities and conversations. Behavior
Research Methods, Instruments, & Computers, 33, 517–523.
Mikhail, J. (2007). Universal moral grammar: Theory, evidence and the future. Trends in cognitive sciences,
11(4), 143–152.
74
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. Advances in neural information processing systems,
3111–3119.
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence
in topic models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language
Processing, 262–272.
Mohan, S., Guha, A., Harris, M., Popowich, F., Schuster, A., & Priebe, C. (2017). The impact of toxic
language on the health of reddit communities. Canadian Conference on Artificial Intelligence ,
51–56.
Mokhberian, N., Abeliuk, A., Cummings, P., & Lerman, K. (2020). Moral framing and ideological bias of
news. In S. Aref, K. Bontcheva, M. Braghieri, F. Dignum, F. Giannotti, F. Grisolia, & D. Pedreschi
(Eds.), Social informatics (pp. 206–219).
Mondal, M., Silva, L. A., & Benevenuto, F. (2017). A measurement study of hate speech in social media.
Proceedings of the 28th ACM Conference on Hypertext and Social Media, 85–94.
Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep
neural networks. Digital Signal Processing, 73, 1–15.
Mooijman, M., Hoover, J., Lin, Y., Ji, H., & Dehghani, M. (2018). Moralization in social networks and the
emergence of violence during protests. Nature Human Behaviour, 2, 389.
Mostafazadeh Davani, A., Atari, M., Kennedy, B., Havaldar, S., & Dehghani, M. (2020). Hatred is in the eye
of the annotator: Hate speech classifiers learn human-like social stereotypes (in press). 31st
Annual Conference of the Cognitive Science Society (CogSci).
Murdoch, W. J., Liu, P. J., & Yu, B. (2018a). Beyond word importance: Contextual decomposition to extract
interactions from LSTMs. International Conference on Learning Representations.
https://openreview.net/forum?id=rkRwGg-0Z
Murdoch, W. J., Liu, P. J., & Yu, B. (2018b). Beyond word importance: Contextual decomposition to extract
interactions from lstms. International Conference on Learning Representations.
Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour, 3, 221–229.
Niven, T., & Kao, H.-Y. (2019). Probing neural network comprehension of natural language arguments.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
4658–4664.
Olteanu, A., Castillo, C., Boy, J., & Varshney, K. R. (2018). The effect of extremist violence on hateful
speech online. Twelfth International AAAI Conference on Web and Social Media.
Papernot, N., & McDaniel, P. (2018). Deep k-nearest neighbors: Towards confident, interpretable and
robust deep learning. arXiv preprint arXiv:1803.04765.
75
Pappas, N., & Popescu-Belis, A. (2017). Explicit document modeling through weighted multiple-instance
learning. Journal of Artificial Intelligence Research , 58, 591–626.
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., &
Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal
of Personality and Social Psychology, 108, 934–952.
Park, J. H., Shin, J., & Fung, P. (2018). Reducing gender bias in abusive language detection. Proceedings of
the 2018 Conference on Empirical Methods in Natural Language Processing, 2799–2804.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., &
Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning
Research, 12, 2825–2830.
Pennebaker, J. W. (2013). The secret life of pronouns: What our words say about us. Bloomsbury Publishing
USA.
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric
properties of liwc2015 (tech. rep.). University of Texas at Austin. Austin, TX.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001.
Mahway: Lawrence Erlbaum Associates, 71, 2001.
Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language
use: Our words, our selves. Annual Review of Psychology, 54, 547–577.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation.
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP),
1532–1543.
Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). The role of analytic
thinking in moral judgements and values. Thinking & reasoning, 20(2), 188–214.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep
contextualized word representations. arXiv preprint arXiv:1802.05365.
Pinker, S. (2007). The stuff of thought: Language as a window into human nature . Penguin.
Preoţiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political ideology
prediction of twitter users. Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), 729–740.
Purzycki, B. G., Pisor, A. C., Apicella, C., Atkinson, Q., Cohen, E., Henrich, J., McElreath, R.,
McNamara, R. A., Norenzayan, A., Willard, A. K., & Xygalatas, D. (2018). The cognitive and
cultural foundations of moral behavior. EVOLUTION AND HUMAN BEHAVIOR, 39(5), 490–501.
https://doi.org/{10.1016/j.evolhumbehav.2018.04.004}
76
R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical
Computing. Vienna, Austria. https://www.R-project.org/
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are
unsupervised multitask learners. Open AI Blog.
Rai, T. S., & Fiske, A. P. (2011). Moral psychology is relationship regulation: Moral motives for unity,
hierarchy, equality, and proportionality. Psychological review, 118(1), 57.
Rezapour, R., Shah, S. H., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing
morality. Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis, 35–45.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any
classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery
and data mining, 1135–1144.
Rieger, L., Singh, C., Murdoch, W. J., & Yu, B. (2019). Interpretations are useful: Penalizing explanations to
align neural networks with prior knowledge. arXiv preprint arXiv:1909.13584.
Ripley, B., Venables, B., Bates, D. M., Hornik, K., Gebhardt, A., Firth, D., & Ripley, M. B. (2013). Package
‘mass’. Cran R, 538. http://cran.r-project.org/web/packages/MASS/MASS.pdf
Rodriguez, A. J., Holleran, S. E., & Mehl, M. R. (2010). Reading between the lines: The lay assessment of
subclinical depression from written self-descriptions. Journal of personality, 78(2), 575–598.
Roscher, R., Bohn, B., Duarte, M. F., & Garcke, J. (2020). Explainable machine learning for scientific
insights and discoveries. IEEE Access, 8, 42200–42216.
Rozin, P. (2001). Social psychology and science: Some lessons from solomon asch. Personality and Social
Psychology Review, 5, 2–14.
Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech
detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
1668–1678.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A.,
Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013). Personality, gender, and age in the
language of social media: The open-vocabulary approach. PLoS One, 8, e73791.
Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and
empirical tests in 20 countries. Advances in experimental social psychology, 25(1), 1–65.
Schwartz, S. H., Melech, G., Lehmann, A., Burgess, S., Harris, M., & Owens, V. (2001). Extending the
cross-cultural validity of the theory of basic human values with a different method of
measurement. Journal of cross-cultural psychology, 32(5), 519–542.
Silva, L., Mondal, M., Correa, D., Benevenuto, F., & Weber, I. (2016). Analyzing the targets of hate in
online social media. Tenth International AAAI Conference on Web and Social Media.
77
Singh, C., Murdoch, W. J., & Yu, B. (2019). Hierarchical interpretations for neural network predictions.
International Conference on Learning Representations.
https://openreview.net/forum?id=SkEqro0ctQ
Skianis, K., Nikolentzos, G., Limnios, S., & Vazirgiannis, M. (2020). Rep the set: Neural networks for
learning set representations. International conference on artificial intelligence and statistics ,
1410–1420.
Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). Mpnet: Masked and permuted pre-training for
language understanding. arXiv preprint arXiv:2004.09297.
Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. Proceedings of the
34th International Conference on Machine Learning-Volume 70, 3319–3328.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: Liwc and computerized
text analysis methods. Journal of Language and Social Psychology, 29, 24–54.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I.
(2017). Attention is all you need. Advances in neural information processing systems, 5998–6008.
Voelkel, J. G., & Feinberg, M. (2018). Morally reframed arguments can affect support for political
candidates. Social Psychological and Personality Science, 9(8), 917–924.
Waldron, J. (2012). The harm in hate speech. Harvard University Press.
Wang, Q., Ruan, L., & Si, L. (2014). Adaptive knowledge transfer for multiple instance learning in image
classification. Proceedings of the AAAI Conference on Artificial Intelligence , 28.
Wang, S. Y. N., & Inbar, Y. (2020). Moral language use by u.s. political elites.
https://doi.org/10.31234/osf.io/a9h83
Warner, W., & Hirschberg, J. (2012). Detecting hate speech on the world wide web. Proceedings of the
second workshop on language in social media, 19–26.
Waseem, Z., Davidson, T., Warmsley, D., & Weber, I. (2017). Understanding abuse: A typology of abusive
language detection subtasks. Proceedings of the First Workshop on Abusive Language Online,
78–84.
Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech
detection on twitter. Proceedings of the NAACL student research workshop, 88–93.
Wiegand, M., Ruppenhofer, J., & Kleinbauer, T. (2019). Detection of abusive language: The problem of
biased datasets. Proceedings of the 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and
Short Papers), 602–608.
Wiegreffe, S., & Pinter, Y. (2019). Attention is not not explanation. Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), 11–20.
78
Wulczyn, E., Thain, N., & Dixon, L. (2017). Ex machina: Personal attacks seen at scale. Proceedings of the
26th International Conference on World Wide Web, 1391–1399.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from
machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.
Zannettou, S., Bradlyn, B., De Cristofaro, E., Kwak, H., Sirivianos, M., Stringini, G., & Blackburn, J. (2018).
What is gab: A bastion of free speech or an alt-right echo chamber. Companion Proceedings of the
The Web Conference 2018, 1007–1014.
Zhou, Z.-H., & Xu, J.-M. (2007). On the relation between multi-instance learning and semi-supervised
learning. Proceedings of the 24th international conference on Machine learning, 1167–1174.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal
statistical society: series B (statistical methodology), 67(2), 301–320.
79
Appendices
A.1 Chapter2
A.1.1 Analysis1
A.1.1.1 LDAModelTuning,Evaluation,andVisualization
Selecting the number of topics The number of topics in an LDA model cannot be specified a priori.
Mimno et al. (2011) notes that, while more topics inherently leads to an improved modeling of the dataset
(i.e., improved log-likelihood), topic models with more topics also have small, incoherent topics. To select
the number of topics that balances modeling and coherence, the factorization method of Arun et al. (2010)
was applied. This method computes the Kullback-Leibler (KL) divergence between two matrices that result
from LDA, the Term-Weight matrix and the Document-Term matrix. When this KL divergence begins to
increase after lowering to a minimum, Arun et al. (2010) found that out-of-sample predictiveness of the
topic model began to deteriorate. The metric was computed across a range of values (10 to 100 by 10; 100
to 300 by 25; 300 to 600 by 50). Figure A.1 shows the results of this search, and that 300 topics were selected
based on this tuning.
Coherence,documententropy,andexclusivity. In order to evaluate each topic, three metrics were
computed based on the recommendations of Mimno et al. (2011) via theMALLET package (McCallum, 2002).
The interpretation of these metrics, which is explained in the main text and is contained in the package
80
Figure A.1: Results of tuning the number of topics in the LDA model. The metric on the y-axis is the
Kullback Leibler (KL) divergence between the singular value distributions of the Term-Weight matrix (M1)
and the normalized Document-Term matrix (M1) (Arun et al., 2010). Here, 300 is selected as the number
of topics that minimizes the desired metric.
documentation
1
, indicates the spread of topics across documents, the coherence based on observed co-
occurrence, and the exclusivity relative to other topics.
The average values of these metrics, by foundation are reported in Table A.1, with standard scaling
having been applied to the entire set of 300 topics.
Table A.1: Average metrics per foundation, across LDA topics reported in main text.
Moral concern Document Entropy Coherence Exclusivity
Care (n = 14) 0.03 0.07 -0.26
Fairness (n = 10) -0.09 -0.06 0.06
Loyalty (n = 4) 0.51 -0.75 -0.15
Authority (n = 10) -0.69 0.51 0.27
Purity (n = 20) 0.19 -0.06 -0.24
Higher values of document entropy (i.e., Loyalty and Purity) indicate that, on average, the topics in
question are spread over many Facebook posts; low values (e.g., Authority) indicate a highly concentrated
set of topics. All foundations except Loyalty had topics that had average or high coherence. The most
exclusive sets of topics were for Authority and Fairness, an indication that the relationships found for
these foundations corresponded to a more distinct type of language than for Care, Loyalty, and Purity.
1
http://mallet.cs.umass.edu/diagnostics.php
81
The full set of metrics for these topics are shown in Table A.2. Note that several topics appear in
multiple subsections of the table (notably those in Care and Purity subsections of the table), as indicated
in the main text. Values are again on a standard scale, across all 300 topics.
Table A.2: LDA metrics for topics, as indicated by their top 10 words, reported on in the main text.
DocEnt Coh Excl
Care
would_be support close friends wanted as_well thank_you share family personal 1.16 -0.62 -1.17
love loving soul god loves ethics hospitality my_wife loved relationship_with mercy -0.09 -0.69 -0.56
miss love world so_much my_heart perfect thank_you happiness loves joy 1.01 0.92 -1.13
miss made love today so_much how_much mom loved you_guys my_mom 0.86 0.58 -1.09
wonderful love birthday so_much sweet happy thank_you happy_birthday hope amazing 1.51 1.19 -1.38
day beautiful love world happy celebrate happy_mothers moms national happy_valentines 1.25 -0.46 -0.80
life love person so_many heart soul remember meaning call loved true 0.76 0.30 -1.40
beautiful life love so_much years friend more_than friendship happy_anniversary xxx 0.86 0.40 -0.59
best_friend love cant_believe forever awesome my_favorite happy_birthday daddy years_old mommy 1.02 -0.15 -0.66
Fairness
paris nuclear war syria american iraq military united_states japan japanese -0.27 0.33 1.23
white cultural culture racial country america race racist racism sexism bigotry 0.08 1.19 -0.13
water land camp lands oil tribe nodapl pipeline dakota_access standing_rock dapl -1.39 1.12 1.71
history place part thousand remember born pakistan american citizen an_american agreed -0.73 -0.52 0.18
leave uk british london europe england britain brexit scotland eu -0.19 1.05 1.32
march human_rights protest women woman immigrants rights pussy sexually marching womens_rights -0.47 0.40 1.09
send support action personal dear email request congressman interest gov conflicts -1.35 0.57 0.95
Loyalty
defense game win fans football super_bowl nfl broncos brady patriots 0.11 0.20 1.59
congrats big awesome season tonight end amazing strike boys cheer 0.16 -0.83 -0.14
team season game fan win fans baseball of_thrones cubs world_series 0.98 0.20 1.07
tonight play germany german guitar ken bass bacteria mm drum -1.07 -1.37 1.65
ball team playing game play fun games ladies played role 1.11 -0.20 -0.27
sports team win football watching basketball teams match coach sport 0.73 0.60 1.63
Authority
day back school today tomorrow work kids week looking_forward gym 1.58 -0.20 -1.18
day today work lovely challenge happy pretty_good mood leap glorious 1.34 -1.49 -0.96
day days today post challenge every_day everyday raise_awareness nominate push_ups -1.48 2.19 -0.10
made wonderful birthday so_much awesome great thank_you making everyone_who birthday_wishes 1.59 1.06 -0.63
wanna lol wish_could funny wow must_be hahaha omg ha yup 0.98 -1.89 -0.08
day beautiful home today pretty great city walk enjoying visited 1.11 -0.94 -1.64
day beautiful love world happy celebrate happy_mothers moms national happy_valentines 1.25 -0.46 -0.80
day today started yesterday finally special missing todays lunch came_out 1.42 -0.91 -1.13
day kids daddy dad mom my_mom my_dad father dads happy_fathers 0.58 0.39 0.44
Purity
love loving soul god loves ethics hospitality my_wife loved relationship_with mercy -0.09 -0.69 -0.56
miss love world so_much my_heart perfect thank_you happiness loves joy 1.01 0.92 -1.13
miss made love today so_much how_much mom loved you_guys my_mom 0.86 0.58 -1.09
life love people cant_wait so_much so_many thank_you amazing incredible such_an 1.18 0.95 -0.90
wonderful love birthday so_much sweet happy thank_you happy_birthday hope amazing 1.51 1.19 -1.38
life love person so_many heart soul remember meaning call loved true 0.76 0.30 -1.40
eautiful life love so_much years friend more_than friendship happy_anniversary xxx 0.86 0.40 -0.59
earth peace heart god praise lord gods prayers pray prayer 0.47 0.96 0.03
best_friend love cant_believe forever awesome my_favorite happy_birthday daddy years_old mommy 1.02 -0.15 -0.66
82
Visualization of topics via wordclouds. In addition to the heatmap of coefficients in the main text
displaying significant positive effects for MFQ scores predicting LDA topics, here we display wordcloud
visualizations of the “linguistic signature” of each moral foundation. Figure A.2 shows 5 wordclouds, each
containing the top predictive topics per foundation as computed in the reported analyses.
love
friends
family
share
mom
amazing day
thank_you
happy_birthday
dad
awesome
life
im
friend
birthday
world
hard
beautiful
my_heart
laugh
today
great
support
so_much
hate
happiness
laughing
my_mom
at_least
heart
my_dad
moments
cry
repost
tears
joy
wonderful
close
facebook_friends
miss
so_many
moms
things
sweet
sharing
daddy
smile
girl
moment
wedding
more_than
trying_to
pain
baby
happy_mothers
enjoy
such_an
hope
celebrate
forever
coming
yall
loved
happy_th
happy
absolutely
so_proud
brought
beauty
talented
meet
crying
words
brother
person
true
cant_wait
memories
kind
facebook
friendship
sister
full
loving
peace
daily
wishing
cant_believe
crazy
hit
mommy
loves
mother
button
hits
mama
kids
hotline
place
have_been
eyes
smiles
bring
best_friend
making
forget
song
remember
blessed giving
happy_thanksgiving
mutual
you_guys
kid
continue
big
hold
inside
as_well
so_grateful
soul
partner
realize
broken
helping
staying
childhood
house
drive
man
marriage
buddy
fun
boy
bit
company
fathers_day
lil
hubby
left
loss
father
angel
little_girl
invite
fabulous
falling
helped
parents
adventure
appreciation
burst
wife
im_tired
proud
so_lucky
god
visiting
group
adore
seattle
lady
karen
shed
part_of
(a) Care
america
gun
support
country
guns
war
justice
climate_change
called
eu
uk
israel
history
brexit
act
climate
world
british
obamacare
people
nation
hitler
aca
american
military
oil
science
leave
real
syria
rights
stand
wall
bill
judge
iraq
refugees
britain
weapons
folks
neil
political
scotus
knife
earth
soldiers
nra
natural planet
jews
isis
stop
iran
kill
conditions
health_insurance
bomb
care
nazi
epa
cost
give
replace
duty
men
part
dead
vote
nature
forces
plan
finland
absolute
told
rise
fire
(b) Fairness
beach
game
west
service
south
east
win
veterans
north
team
sun
summer
thank_you
honor
served
season
play
side
army
vacation
military
navy
moon
football
serving
crowd
sunset
our_country
village
veteran
tonight
florida
nice
miami
pool
sports
enjoying
central
games
weather
city
looks_like
playing
tan
vietnam
cubs
dark
california
sand
view
great
teams
island
va
sunshine
soccer
beat fan
field
player
baseball
wild
gave
coach
fans
lets_go
century
valley
golf
monk
join
vets
sport
headed
explor
perfect
players
indian
lake
cape
nfl
country
medal
patriots
woman
freedom
bay
bank
jan
mark
unit
ball
boys
fallen
san
feat
spot
king
lose
brett
brady
lot
reed
dr
(c) Loyalty
happy
day
lol
baby
thank_you
challenge
today
mom
dad
love
birthday_wishes
babies
birthday
so_much
look_at
guess
post
nice
cute
work
new_year
good
great
days
yeah
omg
im
my_mom
feel
spent
my_dad
nominate
hope
blessed
looks_like
time
hour
lunch
these_two
spend
moms
wanna
haha
everyone_who
weekend
happy_birthday
beautiful
special
born
tag
miss
picture
daddy
kinda
morning
happy_mothers
god
hours
cake
xx
year
office
makes_me
video
trip
sweet
wishes
tho
visit
family
gift
tomorrow
hard
celebrate
fun
feeling
friends
celebrating
mama
grateful
greatest
uh
every_time
mommy
kittens
mother
loving
kids
feeding
so_many
awww
gratitude
enjoy
honor
tip
nominated
journey
part_of
hat
able_to
lovely
butterfly
dead
sharing
hmm
loved
holiday
super
ah
look_like
bro
kelsey home
wishing
kid cuz
kind
how_much
lil
dear
rock
chain
dads
lots_of
healthy
adorable
st
birth
boys
father
sara
gifts
womb
feed
begins
long
leap
warm
my_dads
cuties
ice
twin
guys
glad
adventures
hell
pic
babe
mum
ch
early
flowers
son
(d) Authority
love
life
church
god
mom
day
thank_you
happy_birthday
dad
live
friends
birthday
jesus
family
world
lord
amazing
my_heart
living
pray so_much
make
today
support
hate
my_mom
christ
give
my_dad
prayer
pray_for
miss
wonderful
heart
joy
let_us
so_many
moms
strength
sweet
those_who
daddy
beautiful
prayers
churches
girl
heaven
every_day
moment
wisdom
more_than
hope
baby
faith
lives
gods
celebrate
catholic
courage
everyday
gospel
loved
bible
forever
mormon
happy_th
father
help_us
leaders
mission
saints
absolutely
word
son
brought
children
bishop
worship
beauty
kind
christian
brother
pope
youth
person
cant_wait
memories
sister
man
lead
men
lds
mormons
friendship
doctrine
pastor
full
words
saved
loving
bring
friend
wishing
hearts
cant_believe
praise
kids
learn
earth
kindness
jesus_christ
theology
sin
ye
peace
have_been
share
deep
making
matthew
song
holy
great
happy_anniversary
glory
spirit
utah
kid
big
as_well
thy
truth
christians
cross
reminder
compassion
partner
show
broken
dads
lots_of
thee
soul
elder
set
luke
my_wife
privilege
relationship_with
save_us
sins
path
carry
marriage
fun
choices
purpose
boys
lil
hubby
loss
constant
evil
follow
nlt
lived
birth
lifes
sign
find
romantic
parents
alive
impact
amanda
proud
moments
adore
(e) Purity
Figure A.2: Wordclouds visualizing the most positively predicted topics per foundation. Distinct colors
within each subplot indicate distinct topics. Word font size is proportional to the word-probability within
topic.
A.1.1.2 PostAggregation
In Analysis 1, one set of effects was reported per representation method. Here, we report the results of two
“post aggregation” approaches per method, from which the better of the two was selected and reported in
the main text.
83
The learning task of predicting individual-level variables, e.g., survey responses or demographics, from
collections of individuals’ posts or documents, is formally known as “multi-instance learning” (MIL; Diet-
terich et al., 1997; Maron & Lozano-Pérez, 1998). MIL applies when observations come in the form of
collections of observations (i.e., instances) per predicted variable. MIL problems variably disentangle the
tasks of representing individual units and aggregating those units to form a single representation per tar-
get variable. Two approaches were applied for representing individuals’ texts, given an understanding of
the prediction task as a MIL problem: first, participants’ posts were concatenated into a single document
and representation methods were applied to this single document; second, representation methods were
applied to each post, and the resulting representations were averaged to form a single mean representation
per participant. Other MIL techniques offer potentially more predictive power by maximally learning post-
level and participant-level representations separately (e.g. Ilse et al., 2018; Kotzias et al., 2015; Lee et al.,
2019). In the present research, these more advanced MIL techniques were attempted, though results were
largely negative (likely due to the lack of sufficient observations to model). Determining the usefulness of
such techniques is left for future work.
Table A.3: Two-way ANOVAs (for each foundation) of text representation and aggregation with interac-
tion.
df Care Fairness Loyalty Authority Purity
Representation F(8, 882) 178.1
∗∗∗ 113.1
∗∗∗ 261.9
∗∗∗ 710.9
∗∗∗ 648.3
∗∗∗ Aggregation F(1, 882) 2.5 7.0
∗∗ 66.1
∗∗∗ 100.1
∗∗∗ 101.6
∗∗∗ Representation:Aggregation F(8,882) 7.5
∗∗∗ 14.0
∗∗∗ 6.1
∗∗∗ 8.5
∗∗∗ 13.3
∗∗∗ Note:
∗∗∗ p < 0.001
∗∗ p < 0.01
Table A.3 reports the results of five two-way ANOVAs, one each per foundation, estimating the main
effects plus interaction of text representation method and aggregation technique. A large effect of text
representation is seen for each foundation and, with the exception of Care, a modest effect of aggregation.
Significant interactions of representation and aggregation are also observed for each foundation.
84
Given the significant interactions observed in Table A.3, Figure A.3 shows estimated marginal means,
computed via the emmeans
2
package (Version 1.4.5; Lenth et al., 2018) in R (v. 3.6). Pairwise comparisons
of each Representation-Aggregation combination were compared using Tukey’s adjustment for post-hoc
multiple comparisons; non-overlapping line segments in Figure A.3 indicate significant pairwise differ-
ences at p < 0.05. Across Loyalty, Authority, and Purity, non-MIL representations were more effective
than averaging post-level representations (MIL-mean) for the majority of methods. For BERT, MIL-mean
was bordering on significant improvement over non-MIL for Purity ( p = 0.06), and was more effective for
predicting Care (p < 0.001). The most effective representation was used in the results of Analysis 1, i.e.,
non-MIL was used for all non-BERT representations while MIL-mean was used for BERT representations.
Figure A.3: Estimated marginal means of explained variance, with respect to post aggregation approach.
Note: Confidence intervals correspond to a critical value of 0.001 in Tukey-adjusted post hoc comparison within
representation method.
2
https://cran.r-project.org/web/packages/emmeans/index.html
85
Statistically, representations computed from a participant’s posts altogether (non-MIL) had less noise
and more signal than the alternative (MIL-mean). The successes of computing post-level representations of
contextualized word vectors (i.e., BERT) suggests that, as models develop in their ability to model language,
they become more localized and attuned to fine-grained signals in the input.
A.1.1.3 PosthocPairwiseComparisonVisualization
The two-way ANOVAs in Analysis 1, of the effect of method and foundation on R
2
values, are here analyzed
in greater detail. Recall that each ANOVA had main effects of foundation and method as well as their
interaction. Figure A.4 shows pairwise comparisons of the effects of representation within each foundation,
Tukey-adjusted at a critical value of 0.001.
A.1.1.4 SensitivityAnalysisofThresholdingPostFrequency
Participants were filtered if they had less than 10 posts. This was done in order to reduce the noise ratio
from the data, as individuals with few posts are inherently hard to categorize based on their language due
to insufficient information. However, this may have introduced unwanted confounds, since the threshold
was arbitrarily chosen. Mirrored results from our first analysis are given here for two different thresholds:
(a) discarding no participants (i.e., threshold equal to 1), and (b) thresholding at 25.
First, analyses were repeated with all participants included with at least one post (N = 3,643). Repre-
sentations were computed identically, and ElasticNet models were grid searched over the same parameters.
In both cases, averaging representations was again more effective for BERT representations (as it was for
the analyses reported in the main text). The results of this unfiltered analyses are shown in Table A.4.
Additionally, a more restrictive threshold could have been considered than the one selected in the main
text (10 posts). Specifically, G. Park et al. (2015) used 25 posts as a cut-off, thus results were additionally
86
Figure A.4: Estimated marginal means of R
2
s based on representation technique and foundation
Note: Confidence intervals correspond to a critical value of 0.001 in Tukey-adjusted post hoc comparison within
foundation.
reported for participants with at least 25 posts (N = 1,886). Results of this more restrictive analysis are
reported in Table A.5.
87
Table A.4: Mean R
2
(SE) without excluding participants due to number of posts (N =3,643).
Representation Care Fairness Loyalty Authority Purity
Measures of Moral Language
MFD -0.4 (0.2) -0.1(0.1) -0.7 (0.2) -0.5 (0.3) -0.1 (0.2)
MFD2 0.3(0.1) -0.3 (0.1) 1.1(0.2) 2.8(0.2) 4.2(0.4)
MFD
DDR
2.3(0.2) 0.5(0.1) 3.1(0.2) 4.0(0.2) 3.9 (0.2)
MFD2
DDR
1.2 (0.1) -0.0 (0.1) 3.3(0.2) 4.4(0.2) 5.0(0.2)
General Measures of Language
LIWC 1.7 (0.1) -0.3 (0.1) 2.3 (0.2) 4.1 (0.3) 6.7 (0.3)
LIWC
DDR
2.4 (0.2) 0.8 (0.1) 5.1 (0.2) 7.6 (0.3) 8.0 (0.3)
LDA 3.4 (0.2) 0.8 (0.1) 4.9 (0.3) 10.1 (0.3) 12.9(0.2)
GloVe 2.9 (0.1) 1.0 (0.1) 5.2 (0.2) 8.6 (0.3) 8.9 (0.3)
BERT 4.2(0.2) 1.5(0.1) 7.9(0.3) 11.4(0.3) 12.8(0.3)
Note. Mean and standard errors across 10 iterations of 5-fold cross validation (n = 50). Highest R
2
s appear in
bold.
With all participants considered, including those with only a single post, the maximum R
2
s lower in
value, though resemble the same pattern of predictability. Notably, LDA was effective with these data,
suggesting that LDA is able to pick up on signal when there is less signal (and potentially more noise).
Increasing the threshold to 25, thereby increasing the language data for each person but lowering the
overall sample size (and potentially introducing additional confounds if posting frequency is not randomly
distributed), results in higher R
2
s but roughly the same pattern of performance as seen with the threshold
of 10 reported in the main text. Overall, changing the document threshold resulted in similar results though
with varying magnitude, and some small differences in the efficacy of each method.
88
Table A.5: Mean R
2
(SE) excluding participants with fewer than 25 posts (N =3,643).
Representation Care Fairness Loyalty Authority Purity
Measures of Moral Language
Participantswith<25postsremoved(N =1,886)
Measures of Moral Language
MFD 0.6 (0.1) -0.3 (0.1) 1.9 (0.2) 1.5 (0.3) 3.2 (0.3)
MFD2 2.8(0.2) 0.6(0.1) 3.1(0.2) 5.1(0.3) 12.7(0.3)
MFD
DDR
4.8(0.3) 1.1(0.2) 4.3 (0.3) 7.1 (0.4) 8.6 (0.4)
MFD2
DDR
2.6 (0.2) -0.5 (0.1) 5.8(0.3) 9.1(0.4) 12.4(0.3)
General Measures of Language
LIWC 4.3 (0.3) 1.5 (0.2) 8.5 (0.4) 16.0 (0.6) 18.2 (0.4)
LIWC
DDR
6.8 (0.4) 2.5 (0.4) 9.4 (0.5) 15.1 (0.5) 18.1 (0.5)
LDA 5.5 (0.3) 3.7 (0.3) 10.3 (0.4) 17.6 (0.5) 18.7 (0.4)
GloVe 8.4(0.4) 4.9(0.3) 12.1(0.4) 20.4 (0.5) 23.0(0.4)
BERT 8.1(0.4) 3.8 (0.3) 12.1(0.4) 21.7(0.5) 21.9 (0.5)
Note. Mean and standard errors across 10 iterations of 5-fold cross validation (n = 50). Highest R
2
s appear in
bold.
A.1.2 Analysis2
A.1.2.1 FullLIWCResults
In Analysis 2, LIWC categories were dropped if there were no significant MFQ predictors for models con-
trolling for participant age and sex. Here, we provide the model coefficients for those dropped LIWC
categories. Figure A.5 shows the full set of coefficients for negative binomial models with offsets for par-
ticipants’ total word count. Recall that no LIWC categories were included with correctedp> 0.01.
89
(a) Models with only MFQ predictors. (b) Controlling for participant age and sex.
Figure A.5: Coefficients for negative binomial models of LIWC categories, which did not meet the threshold
for significance.
Note.
∗ p< 0.05,.p< 0.1 (bonferroni-corrected across predictors).
muslim jew jews white islam blacks muslims women whites gay black democat islamic allah jewish lesbian
transgender race brown woman mexican religion homosexual homosexuality africans
Table B.1: 25 group identifiers selected from top weighted words in the TF-IDF BOW linear classifier on
the GHC.
jew jews mexican blacks jewish brown black muslim homosexual islam
Table B.2: 10 group identifiers selected for the Stormfront dataset.
90
B.2 Chapter3
B.2.1 FullListofCuratedGroupIdentifiers
B.2.2 VisualizationsofEffectofRegularization
‘… truth behind them, ’ said one muslim shop owner
shop owner muslim one said
one muslim shop owner
(a) BERT
‘… truth behind them, ’ said one muslim shop owner
shop owner muslim one said
said one muslim
(b) BERT + SOC regularization
Figure B.1: Hierarchical explanations on a test instance from the NYT dataset where false positive predic-
tions are corrected.
The jews are just evil money lenders
just money are jews The evil lenders
The jews are
(a) BERT
The jews are just evil money lenders
just money are jews The
just evil
evil lenders
The jews
(b) BERT + SOC regularization
Figure B.2: Hierarchical explanations on a test instance from the Gab dataset where both models make
correct positive predictions. However, the explanations reveal that only the regularized model is making
correct predictions for correct reasons.
B.2.3 ImplementationDetails
TrainingDetails. We fine-tune over the BERT-base model using the public code
3
, where the batch size is
set to 32 and the learning rate of the Adam (Kingma & Ba, 2014) optimizer is set to2× 10
− 5
. The validation
is performed every 200 iterations and the learning rate is halved when the validation F1 decreases. The
training stops when the learning rate is halved for 5 times. To handle the data imbalance issue, we reweight
3
https://github.com/huggingface/transformers
91
the training loss so that positive examples are weighted 10 times as negative examples on the Gab dataset
and 8 times on the Stormfront dataset.
B.2.3.1 ExplanationAlgorithmDetails.
For the SOC algorithm, we set the number of samples and the size of the context window as 20 and 20
respectively for explanation analysis, and set two parameters as 5 and 5 respectively for explanation reg-
ularization.
B.2.4 Cross-DomainPerformance
In addition to evaluating each model within-domain (i.e., training on GHC
train
and evaluating on GHC
test
)
we evaluated each model across domains. The results of these experiments, conducted in the same way as
before, are presented in Table B.3.
Method/Dataset Gab→Stf. F1 Stf.→GabF1
BoW 32.39 46.71
BERT 42.84± 1.2 53.80± 5.5
BoW + WR 27.45 44.81
BERT + WR 39.10± 1.3 55.31± 4.0
BERT + OC (α =0.1) 40.60± 1.6 56.90± 1.8
BERT + SOC (α =0.1) 41.88± 1.0 55.75± 2.1
BERT + SOC (α =1.0) 39.20± 2.7 56.82± 3.9
Table B.3: Cross domain F1 on Gab, Stormfront (Stf.) datasets. We report mean and standard deviation of
the performance within 10 runs for BERT, BERT + WR (word removal), BERT + OC, and BERT + SOC.
B.2.5 ComputationalEfficiency
We further show our approach is time and memory-efficient. Table B.4 shows per epoch training time
and GPU memory use of BERT, BERT+OC and BERT+SOC on the Gab corpus. We use one GeForce RTX
92
2080 Ti GPU to train each model. The training times of BERT+SOC and BERT+OC are only 4 times and 2
times of the original BERT. It is in contrast to the explanation regularization approach in F. Liu and Avci,
2019, where it is reported to require 30x training time for the reported results on shallow CNN models.
The inefficiency is introduced by the gradients over gradients, as also pointed out by Rieger et al., 2019.
Besides, our approach introduces only a small increase on the GPU memory use.
Methods Trainingtime GPUmemoryuse
BERT 5 m 1 s 9095 M
BERT+OC 12 m 36 s 9411 M
BERT+SOC 19 m 38 s 9725 M
Table B.4: Per epoch training time of different methods on the Gab corpus. All methods finish training at
around the third epoch.
C.3 Chapter4
C.3.1 Attention-ClusterExemplars
C.3.1.1 SamplingfromClusters
The main text included exemplars from clusters, or instances that were closest to the attention-based
cluster centroids. Here, we qualitatively confirm that these clusters are semantically tight. Table C.1
contains 5 randomly sampled instances from one of the cluster centroids reported in the main text (“the
terrible reality...improved by 21 points”). The sampled instances show that the original cluster centroid
sufficiently represents the content of the cluster, which contains Democrats’ concern over education and
potential disparities based on race.
93
we need a 21st century education system that makes investment in all our nations children.
that and only that will help our nation compete in the global economy.
we used to subsidize the public education system in this country through that discrimination
in our labor. because of their work. and to be able to help perfect their own craft as teachers
by learning from their peers and also serving as master teachers. including receiving an
equitable share of federal resources.
subject to discussion among parents. and communities across the country about overtesting. i
know there is a lot we can do to streamline tests.
since 2003. the reading gap between black and white fourth graders has closed by 16
percentage points. when i met with students during those visits and asked them about their
vision for their own future.
consent of parents to placing youth in reform school." at least 830.000 indian children were
taken to boarding schools to allegedly "civilize them."
Table C.1: Five randomly sampled instances from the cluster containing the exemplar: “the terrible reality
for most poor children in america in 2015 is that these simple goals are as out of reach as flying to the moon.
butbecauseofthedatathatwehaveasaresultofnochildleftbehind. africanamericanstudentsimprovedby
15 points and latino students improved by 21 points.”
C.3.1.2 Additionalclustercentroids
In order to provide a more comprehensive report on the exemplars extracted through clustering of high-
attention instances, here we report ten additional cluster centroids for both Republicans (Table C.2) and
Democrats (Table C.3). The contents of these exemplars include, for the Republicans: voicing affordability
concerns about President Obama and the Affordable Care Act; accusing the Environmental Protection
Agency (EPA) of illegality and acting out of its authority; discussing the problem of rising debt; honoring
patriots; and protecting the interests of American families (among others). For Democrats, these exemplars
include: concerns about gun violence and calls for action; concerns about the poverty of school children;
partisan conflicts concerning pollution and climate denail; closing the gender wage gap; investments in
technology and infrastructure; and the impacts of global warming (among others).
94
though a final deal has not been yet announced. what i have heard from leaders in this
administration is even more disturbing. based off the details of the jcpoa announced in april.
so of course people around the country are very concerned when they see once again that the
insurance they are mandated to buy by president obama and the democrats. i was under the
understanding that this was to be affordable with all the letters of "affordable" in capital
lettershealthcare. there is a headline in the connecticut mirror: "insurers seek rate hikes for 2016
obamacare plans." i believe it has harmed the health care system. this was the bailout of the
insurance companies that president obama and the democrats built into the presidents health care
law to get them to go along.
the epa persists in these illegal activities. the epas plan would grant it jurisdiction over fully 95
percent of my home state of california. the government used the clean water act to attack a family
farm for shifting to more efficient irrigation systems. allowing an unaccountable federal agency to
insert itself into land use decisions across the state.
congress has that power to go out and say that this does or doesnt make sense. what article i says
there is that "no money shall be drawn from the treasury. but for congress to decide what do we
fund. this is about common sense in reasserting authority with regard to article i. section 9. clause 7.
the presidents timid response about how to take on isis and how to define our enemy has
emboldened our enemy. it is so disappointing that president obama believes a climate change
summit is somehow a rebuke of international terrorism. president obama is choosing to pursue his
climate change agenda instead of addressing how we are going to destroy isis. our president seems
to believe that global warming is a greater threat than international terrorism.
since 2009. the debt has grown from $10 trillion to $18 trillion. it is a simple concept: force the
federal government to live within its means. the president presented his budget yesterday.
president obama once said: you know what. i intend to protect the constitution. we cannot allow
the rule of law to be so trod upon that we live in an arbitrary governmental world where they
collect anything they want anytime they want.
are costing our economy $1.8 trillion each year. the bureaucrats in washington who are writing
these excessive regulations are seemingly focused on saving the world but are forgetting what is
happening to american families. the costs of these regulations are born by people who can least
afford it. agencies should stop and consider what they are doing.
federal bureaucrats are unilaterally deciding. do we really need the federal government telling us
how to landscape our own backyards? this is not the way to stop the difficult headwinds our
economy faces.
today i rise to honor a fellow ohioan who has done so much for our country. and good luck in the
future.
Table C.2: Ten additional cluster centroid exemplars, generated from a clustering of high-attention in-
stances in the test set, for Republican ground-truth bags.
95
the toll that gun violence takes on our communities is too great. but senate republicans blocked
these commonsense reforms on the senate floor. yet in response to mass shootings. we addressed
gun violence headon. it is past time for congress to act.
we have to start believing there are kidsthey are not someone elses kids. over half our public school
children are poor enough that they qualify for free and reduced lunch. because as the senator from
new jersey said.
republicans regularly stood up to address climate change. the fortress of climate denial constructed
by the big polluters. and the polluterfunded attack machine has turned on you. and that addressing
carbon pollution with a price on carbon would be a "political loser."
the fight for equal pay for equal work has spanned generations and continues to impact nearly
every corner of our country. this means a woman is compensated hundreds of thousands of dollars
to millions of dollars less than a man with no other explanation for the disparity than their gender.
republicans have weakened our highest court in the land. the chairman of the judiciary committee
recently suggested we put down on paper how the senate treats supreme court nominees. the
reason we have one supreme court is so it can issue final decisions on the merits after the lower
courts have been unable to do so in a consistent fashion. the chairman and all republicans should go
back to that letter to use as roadmap for considering chief judge garlands nomination now.
todays senate republicans will not act responsibly. as well as the 25 judicial nominations that have
been passed out by voice vote from the judiciary committee. after blocking immigration reform.
republicans are holding chief judge garlands nomination hostage in the hope that the republican
party will nominate donald trump and they can then have donald trump make a different
nomination. as well as by every republican on the judiciary committee.
the terrible reality for most poor children in america in 2015 is that these simple goals are as out of
reach as flying to the moon. but because of the data that we have as a result of no child left behind.
africanamerican students improved by 15 points and latino students improved by 21 points.
workers deserve to be paid fair wages. i ask my colleagues to support middle class families by
voting against this amendment. the amendment before us would do just that. but its a living wage.
$86 billion backlog to bring our transit system just up to a state of good repairnot to build out more
options to get people out of congestion and traffic. we cant afford to invest. it involves high tech. 40
percent of the service of the national highway system is degraded to the point where you have to
dig it up and put in a new roadbed.
the seas are rising across the globe. a different study prepared for the u.s. we measure the exploding
acidity of the seas. people who insist that the climate has not warmed in recent decades ignore this
one little thingthe oceans.
Table C.3: Ten additional cluster centroid exemplars, generated from a clustering of high-attention in-
stances in the test set, for Democrat ground-truth bags.
96
Abstract (if available)
Abstract
Language has often been considered to be a window into the mind, yet peering through that window remains an ongoing challenge. Quantifying language for the purpose of psychological insight has generally been done by using either lexicon-based word counting or latent topic modeling. At the same time, developments within Natural Language Processing (NLP) and Machine Learning (ML) have shown these methods to be limited in terms of predictive validity, particularly when compared to methods for representing text based on neural network architectures. In addition, recent advances in "explainability" have emphasized not only the predictive advantages of modern approaches, but also the ability to explain and understand complex models of language.
This thesis investigates how emerging methods for representing text and explaining ML models can be used for understanding the relationship between speaker attributes and language usage. First, new evidence concerning the relationship between moral concerns and social media language is presented, showing that, using text embedding methods, certain moral concerns are more predictable from language than others. Next, the utility of novel explainable approaches in NLP is demonstrated on a related task, the reduction of bias in hate speech classifiers. And lastly, a new approach for explaining the relationship between speaker attributes and language is presented which relies on "exemplars," key instances that are particularly relevant to a given speaker attribute.
At a high-level, this thesis helps to formalize the prediction and explanation of speaker-language relationships, and motivates further work which will extend the paradigm of exemplar-based explanation. Predicting speaker attributes from language usage should not only be performed using established techniques such as word counting or topic modeling, but also should be approached as a formal NLP task, with the goal of building superior predictive models and explaining those models. Using exemplars to understand psychological and demographic attributes of speakers is a new direction for both explainable NLP and psychological science, and represents a new way for both fields to develop, apply, and evaluate models of speakers' language. Altogether, this thesis can serve as a foundation for continued interdisciplinary research into predicting and explaining speaker-language relationships.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Integrating annotator biases into modeling subjective language classification tasks
PDF
Language understanding in context: incorporating information about sources and targets
PDF
Identifying and mitigating safety risks in language models
PDF
Countering problematic content in digital space: bias reduction and dynamic content adaptation
PDF
Building generalizable language models for code processing
PDF
Generating and utilizing machine explanations for trustworthy NLP
PDF
Externalized reasoning in language models for scalable and trustworthy AI
PDF
Improving language understanding and summarization by leveraging auxiliary information through self-supervised or unsupervised learning
PDF
Event-centric reasoning with neuro-symbolic networks and knowledge incorporation
PDF
Grounding language in images and videos
PDF
Towards generalized event understanding in text via generative models
PDF
Neural creative language generation
PDF
Computational narrative models of character representations to estimate audience perception
PDF
Aggregating symbols for language models
PDF
The inevitable problem of rare phenomena learning in machine translation
PDF
Creating cross-modal, context-aware representations of music for downstream tasks
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Modeling dynamic behaviors in the wild
PDF
Multimodal and self-guided clustering approaches toward context aware speaker diarization
PDF
Fairness in natural language generation
Asset Metadata
Creator
Kennedy, Brendan Timothy
(author)
Core Title
Balancing prediction and explanation in the study of language usage and speaker attributes
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2022-08
Publication Date
05/31/2022
Defense Date
04/06/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computational social science,explainability,natural language processing,OAI-PMH Harvest,speaker analysis,text analysis
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Dehghani, Morteza (
committee chair
), Nakano, Aiichiro (
committee member
), Ren, Xiang (
committee member
), Zevin, Jason (
committee member
)
Creator Email
bkennedy1704@gmail.com,btkenned@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111339144
Unique identifier
UC111339144
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Kennedy, Brendan Timothy
Internet Media Type
application/pdf
Type
texts
Source
20220608-usctheses-batch-945
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
computational social science
explainability
natural language processing
speaker analysis
text analysis