Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Building and validating computational models of emotional expressivity in a natural social task
(USC Thesis Other)
Building and validating computational models of emotional expressivity in a natural social task
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Building and Validating Computational Models of Emotional
Expressivity in a Natural Social Task
by
Su Lei
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2023
Copyright 2023 Su Lei
I dedicate this dissertation to susu’s bike,
for all the ups and downs.
ii
Acknowledgements
To Jon, Thank you for taking me on from day one when I walked into your affective computing
classroom. There are so many things to say but I am forever grateful for your patience and trust in
me.
I would like to thank my committee members Laurent and Shri for providing me guidance; my
external collaborators Stacy and Ursula for hosting me as a visiting researcher; my colleagues and
mentors in ICT, Mohammad, Gale, Giota, Kalin, Alesia and Jill for helping me and working with
me over the years.
I would like to thank all of my friends who keep me company around the world... my lab sister
Jessie, my roadhouse fellows Mathieu, Eli and Cathy. Aike! Evan and Ivi! My Chinese gang, to
my robotsis, Zola, Franky and Emily. Cecilia, Lili, Jasmine, Stella, and Xing.
To all the friends and strangers who shared miles with me in the dirt and tarmac around the
world, thank you for your existence, and I hope to see you on the road or in the woods again soon.
To my mom and dad, my uncles and aunts, my cousins, my grandmas, and my grandpas who
are not with me anymore, thank you for your endless support. I am so lucky for being born into
this family full of love, generosity and kindness.
to erik, Thank you for boarding this adventure with me and showing up for me. I love you.
to susu, I want to thank you for being brave. I hope you stay alive and never stop wandering. I
hope you allow yourself to just be. When in doubt, follow your curiosity. When in darkness, love.
Good luck with your next chapter!
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Technical Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Theoretical Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Appraisal Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Interpersonal-Dynamics Perspective . . . . . . . . . . . . . . . . . . . . . 6
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Why Does Emotion Matter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Basic Emotion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1.1 Context-Ignorant Basic Emotion Recognition . . . . . . . . . . 11
2.1.2 Constructivist View of Emotion . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Appraisal Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Emotional Expressivity as a Construct . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Measuring and Predicting Expressivity . . . . . . . . . . . . . . . . . . . 17
2.3 Modeling Dyadic Synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3: Iterated Prisoner’s Dilemma Corpus . . . . . . . . . . . . . . . . . . . . . 21
3.1 USC-ICT IPD corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 USC-ICT IPD-Still corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 4: Emotional Expressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Perceived Expressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Annotation Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Annotation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iv
4.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Predicting Perceived Expressivity . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 5: Predicting Appraisals from Expressions . . . . . . . . . . . . . . . . . . . 37
5.1 Defining Events in the Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Defining Appraisal in the Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . 40
5.3 Relating Expressions to Event Appraisals . . . . . . . . . . . . . . . . . . . . . . 43
5.3.1 Analyzing Theoretical Factors . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.2 Fine-grained Event Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Chapter 6: Dyadic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1 Dyadic Synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.1.1 Pre-processing Signals . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.1.2 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . 56
6.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.2.1 Synchrony of Expressivity . . . . . . . . . . . . . . . . . . . . . 58
6.1.2.2 Synchrony of Facial Factors . . . . . . . . . . . . . . . . . . . . 62
6.1.3 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 7: Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 68
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
v
List of Tables
2.1 Meta-analysis of DTW constraints from past wrok . . . . . . . . . . . . . . . . . . 20
4.1 Model Performance (R
2
), bold text highlights best performing feature set within
its modality group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
List of Figures
2.1 Summary statistics of products in the Affective Computing Product Database. . . . 12
3.1 Left Game interface. Right Game payoff matrix. Each player gets 5 tickets if both
choose to cooperate (split), 1 ticket if both choose to defect (steal). When the
player chooses to cooperate (split) and their opponent chooses to defect (steal), the
player gets nothing while their opponent gets 10 tickets. . . . . . . . . . . . . . . . 21
3.2 Player reaction after learning a joint outcome in the game. . . . . . . . . . . . . . 22
3.3 (a) A example of how expressivity changes during a 10-round game for a single
typical participant. Dashed lines show when each joint decision is revealed. (b)
Averaging across all participants, expressivity peaks about 3-seconds after a de-
cision is revealed. Each line represents a different facial factor (see in Section
4.2.2). Dashed line marks when the last player picked their choice, the reveal of
the joint decision, and the start of the next round, respectively. The gray region is
the 7-second window of our analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Pairwise correlations (Pearson’s r with Holm-Bonferroni correction) among rating
tasks. We choose Pearson’s r after visual inspection of each rating item’s his-
togram. All items fairly follow a normal distribution except head, posture and
hand. We also ran non-parametric test Kendall rank correlation, the results were
similar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Representative words (top 10 words with the highest TFIDF scores) used by anno-
tators to describe what was expressed in the video by outcome. Annotators used
’happy’ and ’amused’ to describe expressions regardless of game outcome. . . . . 29
5.1 Event representation in IPD. Circles represent the possible outcomes of a single
round. Arcs represent possible outcome transitions. Each arrow is labeled with the
number of such transitions that occur in the corpus. Cooler color represents more
common transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Player smile intensity across the game events: Each arrow represents one of the 16
events analyzed. Circles represent four possible game outcomes on a given round,
Cooler color represents lower emotional expressivity. . . . . . . . . . . . . . . . . 44
5.3 Expressivity score and unexpectedness of mutual cooperation as a function of the
number of preceding mutual cooperation when players stay in mutual cooperation. 47
vii
5.4 Expressivity score and unexpectedness of mutual cooperation as function of num-
ber of preceding mutual cooperation in CC→DC . . . . . . . . . . . . . . . . . . 48
6.1 Illustrates two pathways to synchrony: (a) Dyads synchronize their faces due to
contagion or mimicry–that they react to each other’s facial expressions. (b) Dyads
synchronize their facial display because they experience shared stimulus–the rev-
elation of joint outcome in the IPD task. . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Frame samples of dyads’ reaction after learning a joint outcome. (a) shows the
video condition where the dyad can see each other in real-time through webcam.
(b) shows the still condition, the dyad can only see a still image of their partner.
We see synchrony in both conditions and supported by the plots of expressivity
scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type. The
real dyads display significantly higher synchrony than the randomly paired dyads.
Real dyads in the video condition displayed significantly higher synchrony than in
the still condition. We do not find this pattern in randomly paired dyads, which in
return validates our measure of synchrony. . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type, con-
dition and outcome. Top row shows real dyads display higher synchrony of ex-
pressivity in the video condition than the still condition when reacting to CD/DC
outcome but not when reacting to CC or DD. Bottom row shows that the find-
ing in real dyads is evidence of real synchrony in interaction since they cannot be
replicated in randomly paired dyads. . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.5 Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type. Real
dyads display significantly higher synchrony of facial factors than randomly paired
dyads. Real dyads in video condition displayed significantly higher synchrony of
facial factors than in the still condition. . . . . . . . . . . . . . . . . . . . . . . . . 63
viii
Abstract
In this dissertation, I innovate automatic facial analysis methods and use them to yield fundamental
insights into the source and function of facial expressions in face-to-face social interaction. Facial
expressions play an essential role in shaping human social behavior. The ability to accurately rec-
ognize, interpret and respond to emotional expressions is a hallmark of human social intelligence,
and automating this ability is a key focus of computer science research. Machines that possess this
skill could enhance the capabilities of human-machine interfaces, help diagnose social disorders,
improve predictive models of human behavior, or serve as methodological tools in social science
research. My dissertation focuses on this last application. Specifically, I examine two competing
perspectives on the social meaning of facial expressions and show that automated methods can
yield novel insights.
In terms of technical innovation, I develop novel methods to interpret the meaning of facial
expressions in terms of facial expressivity. Within computer science, facial expression analysis
has been heavily influenced by the “basic emotion theory” which claims that expressions reflect
the activation of a small number of discrete emotions (e.g., joy, hope, or fear). Thus, automatic
emotion recognition methods seek to classify facial displays into these discrete categories to form
insights into how an individual is interpreting a situation and what they will do next. However,
more recent psychological findings have largely discredited this theory, highlighting that people
show a wide range of idiosyncratic expressions in response to the same event. Motivated by this
more recent research, I develop supervised machine learning models to automatically measure
perceived expressivity from video data.
In terms of theoretical innovation, I demonstrate how automatic expressivity recognition yields
insight into alternative psychological theories on the nature of emotional expressions in social tasks
ix
by analyzing a large corpus of people engaged in the iterated prisoner’s dilemma task. This is a
canonical task used to test theories of social cognition and the function of facial expressions.
First, I explore the appraisal perspective which claims that expressions reflect an individual’s
appraisal of how actions within a social task relate to their goals. I find that by analyzing facial ex-
pressions produced by participants, a computer can reliably predict how actions in the task impact
participants’ appraisals (specifically, we predict if the action was unexpected). Further, we show
that automatic expressivity recognition dramatically improves the accuracy of these predictions
over traditional emotion recognition. This lends support to the theory that expressions are, in a
sense, directly caused by the social task.
Second, I explore a contrasting perspective, interpersonal-dynamics theory, which argues that
expressions are, in a sense, directly caused by the partner’s expressions. This perspective empha-
sizes processes such as synchrony, mimicry, and contagion to explain moment-to-moment expres-
sions. The appraisal perspective counters that any observed synchrony simply reflects a shared
appraisal of social actions. I use automatic expressivity recognition to contrast these perspectives.
Specifically, I analyze synchrony in two experimental conditions: a “still” condition where dyads
see only a still image of their partner, and a “video” condition with real-time visual access to their
partner’s facial reactions. Using Dynamic Time Warping, I evaluate synchrony in both real and
randomly paired dyads. Results reveal that synchrony exists even without visual cues, suggesting
that shared appraisals contribute to synchrony, but that synchrony significantly increases when the
partner is visible. This suggests that both perspectives must be integrated to best explain facial
displays.
In conclusion, both appraisal and interpersonal-dynamics perspectives reinforce the signifi-
cance of emotional expressivity in interpreting facial displays and fostering social coordination in
cooperative and competitive contexts. These insights offer valuable contributions to affective com-
puting and the understanding of social interaction mechanisms. I also discuss potential limitations
and future research directions for further exploring the complexities of social interactions.
x
Chapter 1
Introduction
In this dissertation, I present an automatic facial analysis methods and apply them to reveal crucial
insights into the origin and role of facial expressions during face-to-face social interactions. Facial
expressions are integral in shaping human social behavior as they facilitate the communication of
emotions, intentions, and information between individuals. The capability to precisely recognize,
interpret, and react to emotional expressions is a defining feature of human social intelligence.
In recent years, automating this ability has emerged as a primary objective in computer science
research, driven by its potential to revolutionize various fields and applications.
For example, machines equipped with the ability to accurately analyze facial expressions can
greatly enhance the capabilities of human-machine interfaces, enabling more intuitive and effi-
cient interactions. By recognizing and interpreting user emotions, these interfaces can adapt their
responses and functionalities to better suit the users’ needs and preferences. In the medical and
mental health realms, automated facial analysis methods can assist in diagnosing social disorders,
such as autism spectrum disorder or social anxiety disorder, by accurately identifying atypical fa-
cial expressions or reactions. Furthermore, automated facial analysis methods can serve as valuable
methodological tools in social science research. By providing an objective, efficient, and scalable
approach to analyze facial expressions, these methods can overcome the limitations of traditional,
manual coding techniques, enabling researchers to explore new questions and gain novel insights
into human social behavior. My primary interest is in the last application. Specifically, I develop
1
tool to analyze facial expressions automatically and examine two competing perspectives on the
social meaning of facial expressions and show that automated methods can yield novel insights.
1.1 Technical Innovation
The prevailing perspective in the study of emotional expressions was historically shaped by the
work of psychologist Paul Ekman, who, through his cross-cultural research, posited the existence
of six “basic” emotions—happiness, sadness, anger, fear, surprise, and disgust—each associated
with a specific and universally recognized facial expression. This perspective, known as Ekman’s
Basic Emotion Theory [1], provided a framework for understanding emotional expressions based
on the premise that certain expressions are innate, biologically determined, and universally under-
stood across cultures.
Ekman’s theory has had a profound impact on the field of affective computing, with many
computational models adopting the assumption that there is a direct one-to-one correspondence
between basic facial expressions and specific emotional states. In addition, inspired by Ekman’s
work and early appraisal theories [2], much of the work in affective computing follows a “standard
model” which argues that: (1) events of personal significance to an individual are appraised and
trigger an emotional response and (2) this response is reflected in external emotional signals, espe-
cially facial expressions, as a window into the affective state [1] and (3) these expressions influence
the behavior of perceivers (e.g., through contagion or inferences about the senders’ affective state)
[3] [4]. In line with these views, many studies have collected data of social interactions, examined
facial expressions, and made predictions about significant events. Vinkemeier et al. [5] tried to
predict poker folds from face reactions to events in a poker game. Hoegen et al. [6] tried to predict
cooperative or noncooperative responses based on facial reactions to events in a social dilemma.
Mussel et al. [7] found that offers in an ultimatum game were more often accepted if the proposer
smiled and less often accepted if the proposer showed angry facial expression.
2
Despite its influence, Basic Emotion Theory has faced critical challenges as researchers have
increasingly recognized the dynamic and context-dependent nature of emotional expressions [8]
and the predominant view in emotion research today is that the “standard model” is incorrect, or
at least requires significant qualification. For example, Jack et al. [9] and Du et al. [10] argue
that emotions are neither basic or universal. Others see emotional expressions as communicative
acts that shape social encounters [11]. Thus, they are not necessarily a reflection of the underlying
emotional state [8], and share much with other communicative acts (words, gestures) [12]. Evi-
dence suggests that facial expressions may not serve as reliable indicators of self-reported emotion
[13] [14] either. Rather, expressions vary both between and within individuals and with the situ-
ation, and convey important sources of meaning other than emotion [11]. Contrary to the notion
of static, universally understood facial expressions, empirical findings suggest that expressions are
highly influenced by individual differences, cultural norms, social contexts, and relational dynam-
ics. These factors can lead to variations in how individuals express and perceive emotions, adding
layers of complexity to the interpretation of facial cues.
For instance, research has shown that smiles, often considered indicative of joy or happiness,
are expressed in a wide range of contexts and can convey meanings beyond positive emotion.
Smiles can serve as social signals to convey politeness [15]. They can also be used strategically to
assert dominance or manipulate social dynamics [16]. People also smile when they are frustrated
[17] or surprised [18]. Additionally, individuals may use smiles to regulate their own emotional
states or manage the emotional experiences of others, contributing to emotional contagion and
rapport-building [19] [20] [21]. These varied functions challenge the notion of a simple mapping
between smiles and emotional states.
In response to these challenges, alternative theoretical frameworks, such as the constructivist
view, have emerged. This perspective [8] [22] argues that emotions are actively constructed by
the brain based on a combination of basic psychological ingredients, such as sensory input, in-
teroceptive signals, and past experiences. This shift in understanding has spurred interest in the
development of data-driven methods that can capture the subtleties of emotional expressions in
3
ecologically valid settings. To build a perceived emotional expressivity model, is my aim to an-
swer this call. I expand beyond basic expressions to include emotional expressivity as a feature.
I assess emotional expressivity by measuring the extent to which people outwardly display their
emotions as a reaction to an event through the judgement of third-party observers. Psychological
research has shown that emotional expressivity, as a construct, predicts interpersonal perceptions
and outcomes [23] [24] [25] [26], and recent affective computing research has shown success in
recognizing expressivity [27] [28]. We show that algorithms more accurately infer how someone
is impacted by their partner’s actions from emotional expressivety alone than from an analysis
that relies solely on specific facial expressions. In particular, I train a supervised machine learn-
ing model that predicts third-party judgments of emotional expressivity with high accuracy. To
validate my emotional expressivity model, I use it to infer information in a natual task social.
1.2 Theoretical Innovation
In terms of theoretical innovation, this dissertation demonstrates how automatic expressivity recog-
nition contributes to the understanding of alternative psychological theories concerning the nature
of emotional expressions in social tasks. By analyzing a large corpus of individuals engaged in the
iterated prisoner’s dilemma task, I showcase the potential of automatic facial analysis methods to
shed light on the underlying mechanisms of social cognition and the role of facial expressions in
these contexts.
The iterated prisoner’s dilemma task is a widely-used paradigm for testing theories of social
cognition, cooperation, and the function of facial expressions. In this task, two individuals re-
peatedly decide whether to cooperate with or betray each other, with their choices impacting their
respective rewards. The task allows for the investigation of how individuals navigate complex
social situations, employ strategies, and react to the decisions of others.
4
1.2.1 Appraisal Perspective
From the Appraisal perspective, I introduce several methodological innovations to enhance what
can be inferred from emotional reactions during a social task. First, rather than positing an under-
lying emotional state, I predict objective features of the event that precipitated the reaction (e.g.,
did it benefit or harm the expressor?). From the perspective of emotion research, this side-steps the
debate if expressions signify-emotion [1] or communicate intentions [11]. If someone is harmed
by their partner and shows displeasure, I cannot say if this reflects genuine feelings or deliberate
communication (see [29]), and future research should replicate this findings in nonsocial settings.
That said, these results are inconsistent with a strong interpretation of Basic Emotion Theory in
that our participants smile when events cause them harm.
Second, I leverage appraisal theory as a framework to characterize the expression-precipitating
event. Appraisal theory [30] contends that people interpret events in terms of a number of specific
judgments, called appraisal variables, that capture ways the event impacts a persons beliefs, goals
and norms (e.g., does the event advance or hinder goals? was it expected?). Though developed
to characterize emotional reactions, appraisal variables have also been argued to serve as an ideal
representation language to characterize the meaning of events [31] [32], thus I use appraisal vari-
ables as a way to derive affectively-relevant objective features of expression-precipitating events.
Specifically, I examine if expressions predict the unexpectedness, goal-congruence, and norm-
compatibility of events. I leave as an open question whether other event representations might
serve as better representations of events.
Finally, I examine spontaneous expressions in a social task known as the iterated prisoner’s
dilemma (IPD). On each of ten rounds, players simultaneously decide whether to cooperate (C)
or defect (D) with their partner and receive a financial reward based on their joint decision. The
reward structure creates a dilemma between cooperating vs. exploiting one’s partner. IPD has
received intense interest in psychological and economic research as it encapsulates challenges
inherent in social decision-making (including predicting partner behavior, trust formation, betrayal
and repair) in a simple task that is easily understood by participants. Further, it vividly highlights
5
the failure of classical economic models to predict actual human behavior as players cooperate far
more than classical economic models predict, and thus IPD is a useful laboratory to examine how
best to enrich models of human social cognition.
Though the implications of IPD for the real world are hotly debated, it has been applied to
an astonishing array of human decision-making problems and led to numerous theories of human
social cognition, and findings discovered with IPD extended to a variety of other tasks including
trust games [33], group dilemmas [34], and negotiations [35]. Together, these findings highlight
the role of emotion in human social decision-making. For example, models more accurately predict
human decisions when they incorporate internal feelings like guilt and envy [36]. More germane
to this current study, people have been shown to attend to the expressions of their partner when
deciding whether to cooperate with them in the future. For example, several studies [3] [33] find
that people are more likely to cooperate with a partner that shows guilt after a trust violation
than with one that smiles. Further, these studies show that observers use these expressions to
infer how the expresser appraises the trust violation. Much of this work, unfortunately, relies on
computer-generated opponents that display exaggerated facial expressions. Here I seek to reduce
this limitation by examining how pairs of human participants emote when they are free to play and
express however they see fit.
1.2.2 Interpersonal-Dynamics Perspective
From the interpersonal dynamics pespective, I aim to give insight into these alternative explana-
tions for facial expressions in social tasks. I investigate the sources of synchrony in expressivity
in dyads during the Iterated Prisoner’s Dilemma (IPD) task. I examine two conditions: a still con-
dition, where dyads cannot see their partner’s facial reactions, and a video condition, where dyads
can see their partner’s reactions in real-time. I explore whether synchrony in real interactions is
driven by visual cues from the partner’s facial expressions, by the shared experience of reacting to
a shared action, or by a combination of both factors.
Thus, I explore two hypothesized mechanisms by which synchrony arises:
6
H1: Interpersonal synchrony Dyads will display higher synchrony in the video condition
than the still condition due to their ability to see and attune to their partner’s facial expressions.
H2: Shared task Dyads will display synchrony in the still condition due to their participation
in a joint activity.
H2a: Task coordination moderates synchrony If synchrony is driven by the task, it can only
arise if players are actually coordinating their activities. Thus, I predict greater synchrony when
decisions are aligned (e.g., cooperate together) compared with when one disrupts coordination
(e.g., exploiting their partner).
My findings reveal that synchrony between real dyads is significantly higher than that of ran-
domly paired dyads, suggesting that synchrony is a genuine phenomenon in social interactions.
Furthermore, I find that synchrony is present even in the still condition, where dyads do not have
visual access to their partner’s facial reactions, indicating that the shared experience of reacting
to the joint outcome contributes to synchrony. Additionally, I observe that the ability to see the
partner’s facial reaction in real-time in the video condition enhances synchrony, especially when
dyad members’ decisions are not aligned.
Overall, this study provides valuable insights into the multifaceted nature of dyadic synchrony
and highlights the interplay between shared experiences and visual cues in driving synchrony dur-
ing interactive social situations. By understanding the sources and mechanisms of synchrony, I
contribute to a more comprehensive understanding of how individuals coordinate and align their
behavior and emotions in cooperative and competitive contexts.
1.3 Outline
In conclusion, the appraisal and interpersonal-dynamics perspectives both emphasize the impor-
tance of emotional expressivity in interpreting facial displays and facilitating social coordination
in cooperative and competitive contexts. The findings derived from the application of automatic
7
facial analysis methods offer valuable contributions to the fields of affective computing and the
understanding of social interaction mechanisms.
The remainder of this dissertation is structured as follows: in Chapter 2, I will start with ex-
plaining why emotion matters. Next I overview the prevalence of basic emotion theory based
context-independent emotion recognition in affective computing and explain why it is problem-
atic. I overview alternative emotion theories to study emotional expressions and provide literature
review on emotional expressivity as an alternative construct. Furthermore, I review dyadic syn-
chrony as additional influence on emotional expressions in social settings and provide literature
review on how to measure synchrony using dynamic time warping. In Chapter 3, I will introduce
the Iterated Prisoner’s Dilemma corpus I have worked with in this dissertation. In Chapter 4, I
share how I build the expressivity model. In Chapter 5, I relate emotional expressivity with ap-
praisals. In Chapter 6, I share the analysis of dyadic synchrony. In the last Chapter, I conclude and
offer directions for future work.
8
Chapter 2
Related Work
2.1 Why Does Emotion Matter?
This thesis aims to answer the question – what can we infer from emotional reactions in a social
task? To start, we need to first address why we care about inferring information from emotional
reactions. What is an emotional reaction? What is an emotion and why does emotion matter?
The question has been a topic of philosophical and psychological enquiry for centuries. From
Plato and Aristotle to Descartes, Darwin and Freud, continuing to twentieth and twenty first cen-
tury, researchers have not reached a consensus on the question. The perception of emotion as
problematic or undesirable has deep historical roots that date back to ancient times [37]. Philoso-
phers, scholars, and thinkers from various intellectual traditions have often debated the role of
emotion in human behavior, decision-making, and morality. Among them, Cicero, the Roman
statesman and philosopher, expressed skepticism about the influence of passion over reason. In
Cicero’s view, emotions were seen as inferior to rational thought, and individuals who succumbed
to their passions were believed to lack the capacity for sound reasoning. In the modern era, this
perspective has been echoed in various social norms and expectations that may encourage emo-
tional restraint or discourage the expression of certain feelings. For instance, the idea that “it is
weak to show feelings” is a common stereotype that can be seen in many societies. This belief
implies that emotional vulnerability or expressiveness is a sign of weakness or lack of self-control,
9
and it may lead individuals to suppress their emotions in order to project an image of strength and
rationalism.
However, the modern view towards emotions has evolved from the war between emotion and
reason. Researchers argue that we have entered the age of Affectism [38]. Current thinking con-
siders emotions are beneficial and an essential part of the human experience. They are essential for
our survival and well-being [39] [40]. They provide us with information about our environment
and help us to regulate our behavior [41]. They help us to communicate and connect with others
[42]. They play an important role in morality [43]. Last but not the least, research has shown that
we simply cannot make complex decision without emotions [44] [45].
So far I think I have convinced you that emotion matters. However, despite extensive research,
a comprehensive and grand unified theory of emotion has yet to be established, and discussions on
the nature and mechanisms of emotions continue to be a topic of scholarly debate [46]. As individ-
uals, we can all recognize that emotions involve our personal, subjective experiences, commonly
known as our feelings. However, emotions encompass more than just our conscious feelings. In
the field of affective computing, researchers often refrain from directly studying feelings because
measuring the ground truth of subjective experiences poses a significant challenge. Similarly, neu-
roscientist Ralph Adolphs argued that even though feelings are important and well worth study, it
is not an ideal starting point to study the science of emotion [47] [48]. Indeed emotion consists of
multiple components that can be studied from different perspectives, depending on the researcher’s
focus and methods. For instance, Adolphs views emotions as functional states, implemented in
the activity of neural systems, that regulate complex behaviors. He distinguishes emotion states,
concepts and experiences. The functional states of emotion give rise to various elements, includ-
ing conscious experiences or feelings, and causes observable emotional expressions, or emotional
reactions. Furthermore, the impact of these emotional experiences and our memories of them con-
tribute to our understanding and conceptualization of emotions, as well as the language we use to
discuss them [49]. Various theories of emotion diverge in their definitions of emotion based on the
10
specific phases or components they emphasize. In the subsequent section, I will provide a more
detailed overview of several emotion theories that are pertinent to my research.
2.1.1 Basic Emotion Theory
Traditionally, the field has been heavily influenced by a 30-year old theory known as Basic emotion
theory [50]. The idea is that emotional expressions are caused by some outcome in the world. This
outcome evokes some emotion within the person, which in turn, triggers one of a set of basic
emotional expressions. Despite recent advancement in basic emotion theory [51], the context-
independent assumption of basic emotion theory – that emotional expressions are universal signals
of underlying emotional state – has become the basis of many academic research and commercial
product development. The controversy is that basic emotion theory suggests that if a machine can
classify this expression into one of these basic categories, then it should be possible to infer the
person’s emotional state. Following that, we can make a variety of meaningful inferences, such
as the nature of the outcome or what the person will do next. This approach has been applied
to predict how people will vote [52], detect student frustration or boredom [53], and decide if to
interview a potential employee [54]. In the next section, I will discuss in details the significance
and potential risks associated with Context-Ignorant Basic Emotion Recognition (CIBER).
2.1.1.1 Context-Ignorant Basic Emotion Recognition
Context-Ignorant Basic Emotion Recognition, as the name suggests, refers to emotion recognition
product built based on Ekman’s Basic Emotion Theory. These are methods that (1) ignorant of
context (i.e., promoted to work across domains) (2) claim to recognize how someone feels, (3)
categorizing these feelings in terms of basic emotion labels. Barrett and colleagues’ [8] extensive
review focused specifically on facial expressions and found, across a wide range of studies, and
found wide-spread failure to accurately classify basic emotional feelings from facial displays. As
in, there is no evidence that CIBER is possible with current scientific methods. Other work has
found similar conclusions with other modalities, such as vocal expressions [55] or physiology [56].
11
Figure 2.1: Summary statistics of products in the Affective Computing Product Database.
12
Just to reiterate, these findings do not say that affect recognition is not possible. They simply in-
dicate that self-reported or induced emotional feelings, represented as basic emotions cannot be
recovered from facial expressions alone. Many companies make the scientifically-dubious claim
that basic emotions can be recognized from de-contextualized facial expressions. With the Asso-
ciation for the Advancement of Affective Computing (AAAC), I conducted a survey to capture a
snapshot of the affective computing industry and examine what services are being promoted and
what qualifications are provided on the appropriate use of these techniques CIBER methods claim
to work independent of a specific application domain. Among the 80 products that is surveyed in
2021, as highlighted in Figure. 2.1(d), we can see that the majority of products (61%) are con-
text independent. CIBER methods claim to recognize basic emotional feelings from users (see
Figure. 2.1(b)-(c)). Almost half of the methods, 40%, explicitly recognized basic emotion labels.
An additional 25% of the methods did not clearly articulate their classification scheme and, thus,
the percentage could be far higher. Among all the products that provide function of recognizing
emotions, at least 81% of them were advertised to recognize felt emotions, 10% were unclear or
provided conflicting information, and only a few (8%) claim to recognize perceived or displayed
emotion.
Affect Recognition has considerable potential to benefit society. For example, by detecting
patients at risk of suicide [57] or identifying infants suffering intense pain [58]. If used incor-
rectly, such methods can also cause significant harm to individuals, as well as the community of
researchers and AI practitioners hoping to realize these societal benefits. Even a small number of
problematic applications can damage the entire field of affective computing through a loss of pub-
lic confidence. What our industry survey suggests is that the problem is not insignificant. Many
commercial products make claims that are misleading at best, and absurd and dangerous at worst.
Whether this is done intentionally to sell products, or due to well-intended developers that fail to
understand the difference between “emotion recognition” and “perceived emotion recognition”.
Affective computing offers significant advantages to contemporary society, with the promise
of even greater contributions in the future. However, these advantages can be jeopardized by false
13
or deceptive assertions, even if made by only a handful of individuals or entities. The study of
emotions is inherently complex, as individuals often hold deeply ingrained beliefs about the nature
of emotions, and numerous companies strive to develop products based on these beliefs. When
such intuitive understanding diverges from scientific evidence, it is the responsibility of the field to
bring attention to these discrepancies. Thankfully, current research has begun to explicitly address
the issue of context-based emotion recognition [59]. In addition, we can also draw upon alternate
theories of emotion to provide further insights.
2.1.2 Constructivist View of Emotion
What are the alternatives to the context-independent basic emotion theory? Constructivist view
of emotion provides one way to consider context. Barrett argues that emotions are actively con-
structed by the brain based on a combination of basic psychological ingredients, such as sensory
input, interoceptive signals (sensations from within the body), and past experiences [22]. These
ingredients are integrated to produce an emotional experience that is tailored to the individual’s cur-
rent context and needs. Barrett emphasizes the importance of conceptual knowledge and language
in shaping emotional experiences. Conceptual knowledge helps the brain interpret ambiguous sig-
nals and construct meaningful emotional experiences. Her perspective acknowledges the flexibil-
ity and context-dependency of emotional experiences and expressions. Contrary to Ekman’s Basic
Emotion Theory, she argues that there is no consistent mapping between specific emotions and
particular physiological responses or facial expressions. The same emotion (e.g., anger) can be as-
sociated with different physiological patterns and facial expressions in different people or contexts.
Similarly, different emotions can share similar physiological or expressive features. This shift in
understanding has spurred interest in the development of data-driven methods that can capture
the subtleties of emotional expressions in ecologically valid settings. Researchers are leveraging
data-driven techniques to explore the dynamics of emotional expressions within social interactions,
cultural contexts, and communicative goals.
14
2.1.3 Appraisal Theory
Another way to consider context is appraisal theory. Appraisal theory [60] [61] is a psychological
framework that aims to explain how individuals interpret, evaluate, and react emotionally to events
or stimuli in their environment. According to this theory, emotions are elicited and differentiated
based on a person’s cognitive assessment or appraisal of a situation. This cognitive evaluation
involves multiple dimensions or appraisal variables that reflect the ways an event or stimulus relates
to an individual’s beliefs, goals, and norms, such as novelty, goal relevance, goal congruence,
coping potential, and norm compatibility.
The development of appraisal theory has significantly contributed to the understanding of emo-
tions, demonstrating that they are not merely automatic reactions to stimuli but complex processes
deeply intertwined with cognitive evaluations. Emotion arises from relationship between context
(i.e., situation) and individual goals (which can vary by individual and culture). By highlight-
ing the critical role of personal interpretations and evaluations in shaping emotional experiences,
appraisal theory emphasizes the subjective nature of emotions and their close connection with
cognitive processes. This perspective has important implications for various domains, including
emotion regulation, therapy, and affective computing.
2.2 Emotional Expressivity as a Construct
Since emotional expressions are neither universal or basic [9] [10], I consider alternative construct
to measure them. Although emotion and affective computing research has emphasized the impor-
tance of specific facial expressions, researchers of nonverbal communication (and, indeed, the face
and gesture community) have taken a broader view of nonverbal signals. Within this broader tradi-
tion, nonverbal expressivity (i.e., the presence and strength of behaviors that convey some thought
or emotion) has been shown to have a profound impact on interpersonal perception and outcomes
[62]. For example, work by Burgoon et al. [23] and Berneiri et al. [24] characterized expressivity
in terms of presence and dynamics of facial movements, gestures, and posture, and they found that
15
expressivity was a primary factor in the establishment of rapport between speakers. Even when this
work has emphasized emotional behaviors, but has considered the presence of emotional expres-
sions in toto, rather than examining the presence of specific expressions. Similar to expressivity,
past work from the face and gesture community has also taken a holistic approach, Hernandez et al.
[63] built a model with face and head gestures to automatically measure the engagement level of
TV viewers. Admittedly, not all nonverbal behaviors are necessarily expressing a specific emotion.
Therefore we could potentially benefit from learning them with a more generalized construct.
Expressivity has been examined from various perspectives. For example, Boone and Buck [26]
considered expressivity “as the accuracy with which an individual displays or communicates his
or her emotions.” From an evolutionary perspective under the context of a social dilemma, they
argued that emotional expressivity signals trustworthiness and serves as a marker for cooperative
behaviors. In clinical research, expressivity also plays a critical part in studying affective fea-
tures and the disorders of social interactions among psychiatric patients. A severe reduction in
facial expressivity or irregularity in nonverbal production, is associated with conditions such as
schizophrenia, depression, autism, and Parkinson’s Disease. Evidence had shown that schizophre-
nia patients displayed atypical expressions and were less facially expressive than controls [64],
even so when they experienced as much emotion [65]. Girad et al. [66] found that when symptoms
were severe, patients with depression regulated interpersonal distance by displaying more facial
action units associated with negative emotion and less associated with positive emotion. A recent
meta-analysis [67] suggested that facial expressions of people with autism are atypical. Georgescu
et al. [68] advocated to use virtual characters to assess and train individuals with high-functioning
autism, and further help them improve social skills. Buck et al. [69] proposed a technique to
study the emotional expression and communication style of behaviorally disordered children and
schizophrenic patients and their family members. Mounting evidence has suggested that advance
in automatic recognizing and understanding of expressivity could help us better study social inter-
action and help develop diagnostic and treatment tools for clinical assessment.
16
2.2.1 Measuring and Predicting Expressivity
Traditionally expressivity was measured either by self-assessment (sender) or by experts conduct-
ing time-consuming manual annotations (observer). Tickle [70] developed a rating protocol for the
observers to measure expressive behavior for patients with Parkinson’s Disease. Kring et al. [25],
Gross and John [71] built two well-validated self-assessment tools to measure the extent to which
people consider themselves “outwardly exhibit emotions” or “reveal feelings.” The questionnaires
ask people to rate themselves on questions such as “I display my emotions to other people” or
“No matter how nervous or upset I am, I tend to keep a calm exterior.” These tools assess the
expressivity of oneself as a personality trait, and they emphasize that the definition of emotional
expressivity is not limited to a specific emotion (though [71] provides subscales of positive expres-
sivity and negative expressivity), or limited to a specific modality/channel of expression. Among
all modalities, facial expressivity has been studied most extensively. Along with the advancement
in computer vision, researchers can integrate automatic facial expression recognition tools to gain
insights into facial expressivity measurement. Neubauer et al. [72] used tracked facial expres-
sions to represent facial expressivity directly; Wu et al. [73] developed a more nuanced arithmetic
calculation based on Tickle’s protocol.
To distinguish current work from self-report of the senders’ emotional expressivity, I focus
entirely on perceived expressivity from the perspective of the observers. More importantly, I aim
to build an automatic predictor of perceived expressivity. There is little work done similarly. To
study patients with Parkinson’s, Joshi et al. [74] acquired ground truth ratings based on Tickle’s
protocol, and built machine learning models to predict expressivity from automatically tracked fa-
cial features. More recently, Lin et al. [75] investigated the perceived expressiveness of senders
participating in different emotional tasks, and they found nonverbal features associated with per-
ceived expressivity differed by emotional contexts. Though in our context of a social dilemma,
senders might experience different emotions. I do not draw such distinction, and instead, focus
our investigation on how different modalities, and their temporal dynamics, interact to determine
perceived expressivity.
17
2.3 Modeling Dyadic Synchrony
Why synchrony? In the context of social interaction, measuring synchrony can help us under-
stand the dynamics of communication, empathy, and emotional resonance between individuals. In
[76], Wood and colleagues argue that affective synchrony serves three interrelated functions: it en-
ables efficient information exchange, allows for interpersonal emotion regulation, and builds social
bonds. It is obviously important to understand how the dyads’ facial expressions and emotional
expressivity dynamically evolves with each other. It’s interesting to see whether 1) there is any
synchrony at all 2) how do synchrony varies when the dyads are reacting to different game result.
How to measure synchrony? There are a lot of related work. We can get inspiration from the
most related work, i.e., find how other people measured synchrony of facial expressions. We can
also get inspiration from as far or as general as measure synchronization or similarity between
two time series. We should also consider the “facial expressions” can be one dimensional, that
is measured as emotional expressivity; it can also be multivariate, that is measured in 20+ action
units [77]. In between the two extreme cases, there are a lot of work on measuring synchronization
between two behavioral signals. For example, measuring the synchronization of pupil dialation
[78] between two participants while they are listening to music, measuring synchronization of
body posture [79] between human and virtual avatar while they playing a negotiation game. There
are many methods documented in the literature to measure synchrony, but there are also a lot
of nuances, choices of parameters within each methods. In this dissertation I want to focus on
dynamic time warping.
2.3.1 Dynamic Time Warping
Dynamic time warping (DTW) is a dynamic programming algorithm to analyze the similarity of
two time series that might vary in time or speed. It was first proposed by speech recognition
researchers in 1978 [80]. It has since been applied in many fields, including speech recognition,
music analysis, bioinformatics, developmental and social psychology (mother-infant relationship,
18
close relationship), and finance. First, I want to introduce DTW in more details. On the high
level, DTW is a technique to find the optimal match between two temporal sequences. To do
so, the algorithm goes through all possible ways to match partial or all of the elements between
the two sequences. Each individual match is calculated by a distance measure, e.g., how much
does it cost to match these two elements? The algorithm returns the optimal path which has the
smallest summed distance – in doing so, we are able to stretch or compress the the sequences
to resemble each other the most. Depends on the nature of the two sequences, and if there are
interactions between the two sequences, we can also apply restrictions and rules on how each
element or frame of the sequences can be matched. For example, which distance measure do we
use to match the individual element (e.g., Euclidean, Manhattan, or Cosine similarity)? Do we
need to find a match for each frame from the other sequence? How much do we allow skipping,
i.e., what is the maximum amount of time stretch and compression allowed at any point of the
sequence (local slope constraints)? Are partial matching allowed – that there will be frames left
unmatched from one or both sequences? Do the start and the end frame from one sequence need
to be matched with the start and the end frame from the other sequence? Do we allow matching
from one direction only (asymmetrical matching) or the matching should be symmetrical from
both direction (symmetrical meaning – if we exchange the reference and query, we will still get
the same results)? All of the these questions were considered by Sakoe and Chiba’s original paper
[80], as well as Rabiner and Juang’s Speech Recognition book [81]. Here I want to narrow down
and focus on reviewing how DTW has been applied to measure synchrony of behavioral signals.
In Table 2.1, I will do a meta-analysis on how researchers considered constraints and rules in the
past work. In Section 6, I will present how I choose to measure synchrony using Dynamic Time
Warping based on these literature.
19
Table 2.1: Meta-analysis of DTW constraints from past wrok
Signal Distance
Measure
Symmetrical Start & End Windowing Step Pattern
(local slope
constraints)
Normalization Implementation
[77] Facial ex-
pressions
Euclidean Asymmetric Open start,
open end
N/A Asymmetric Normalized [82]
[83] Facial ex-
pressions
Euclidean Asymmetric Fixed start,
fixed end
SakoeChiba,
10
Asymmetric Normalized [82]
[79] Posture Euclidean Symmetric Fixed start,
fixed end
N/A Symmetric2 Normalized [82]
[78] Eye track-
ing
Cosine Symmetric Fixed start,
fixed end
N/A Unclear No Normaliza-
tion
a
Unclear
[84] Posture Euclidean Symmetric Fixed start,
fixed end
N/A Rabiner
Juang 6C
Normalized [82]
[85] Multimodal Euclidean Symmetric Fixed start,
fixed end
SakoeChiba,
5
N/A Normalized [82]
[86] Multimodal Euclidean Symmetric Fixed start,
fixed end
SakoeChiba,
5
N/A unclear unclear
current
work
Facial ex-
pressions
Euclidean Symmetric Fixed start,
fixed end
SakoeChiba,
4
SymmetricP1 Normalized [82]
a
In the paper, it says ”DTW costs can only be compared across signals of equal length, as longer signals require more warping windows and have an inherent
disadvantage for accruing warping costs.”
20
Chapter 3
Iterated Prisoner’s Dilemma Corpus
3.1 USC-ICT IPD corpus
To address the goals outlined in Section 1, we use the USC IPD Corpus which contains more than
6000 spontaneous nonverbal reactions to decisions in an iterated prisoner’s dilemma game [87]. In
this 2-player game, participants play 10 rounds with the same opponent. Each round, players can
choose to split a number of lottery tickets (the cooperative choice) or attempt to steal from their
opponent (the defect choice). The tickets earned are shown in Fig. 3.1, right. The task presents a
dilemma as there is an incentive to cooperate, but a temptation to exploit their opponent.
Figure 3.1: Left Game interface. Right Game payoff matrix. Each player gets 5 tickets if both
choose to cooperate (split), 1 ticket if both choose to defect (steal). When the player chooses to
cooperate (split) and their opponent chooses to defect (steal), the player gets nothing while their
opponent gets 10 tickets.
21
The IPD Corpus contains videos of 716 individuals (51% female, age 18-65) playing a web-
based version IPD modeled after the UK TV show Golden Balls (see Fig. 3.1, left). The study was
approved by USC’s Institutional Review Board and participants were recruited from Craigslist and
randomly paired. They received $25 USD for participation and, to provide emotional meaning to
their actions, their in-game earnings were entered into several $100 USD lotteries. As our focus
was on the impact of facial expressions, players played the game over a video link with no audio.
Figure 3.2: Player reaction after learning a joint outcome in the game.
We focus on expressive reactions after each joint-decision was revealed, both because such
reactions have been highlighted as crucial for shaping cooperation in prior psychological research
[3] [33] [34], and because these are the moments in our corpus when participants show the strongest
expressions (see Fig. 3.2).
The fact that players react most to the revelation of joint outcomes can be seen both when ob-
serving individual players, but also when we aggregate expressions across all games. For example,
Fig. 3.3a illustrates the emotional expressivity of a single player across ten rounds of their game
(using the measure of expressivity defined in Chapter 4). Each dashed line marks the moment
when the joint decision was revealed in each round. Expressivity peaks after each joint decision
22
is revealed. Fig. 3.3b illustrates the average intensity of different facial factors (as described in
Section 4.2.2) as a function of time, averaged across all joint outcomes. This reveals that people
begin to show facial reactions about one second after learning the joint outcome, reaching peak
expressivity as they begin to start the next round of the game and fading after a few seconds. Fol-
lowing these observations, we focused our analysis on the 7-second window indicated by the gray
region of Fig. 3.3b.
3.2 USC-ICT IPD-Still corpus
To investigate the source of synchrony, I focused on an experimental subset of the USC IPD corpus
introduced in the previous section. I examine a subset of the corpus “IPD-Still”, 94 dyads (188
participants) that were randomly assigned to either a video condition or a still image condition. In
the video condition (N dyads=45, mixed gender dyads, 52% female), participants play Split Steal
where they can see their partner through the webcam during the task. In the still image condition (N
dyads=49, mixed gender dyads, 47% female), instead of seeing their partner in real-time through
the webcam, participants can only see a still image of their partner who was instructed to pose
a neutral facial expression before the task began. This experimental setup precludes participants
from observing their partner’s response to the joint outcomes as they simultaneously experience
the unveiling of the results.
23
Figure 3.3: (a) A example of how expressivity changes during a 10-round game for a single typical
participant. Dashed lines show when each joint decision is revealed. (b) Averaging across all
participants, expressivity peaks about 3-seconds after a decision is revealed. Each line represents
a different facial factor (see in Section 4.2.2). Dashed line marks when the last player picked their
choice, the reveal of the joint decision, and the start of the next round, respectively. The gray region
is the 7-second window of our analysis.
24
Chapter 4
Emotional Expressivity
We hypothesize that expressivity can yield more meaningful inferences than a focus on specific
facial expressions. We test this hypothesis in several stages. First, we verify that observers can
reliably and consistently annotate if an individual is expressive when the outcome of a round is
revealed. In doing so, we encourage observers to attend to more than facial expressions (e.g., nods,
shakes, posture, and gestures). We further examine if observers see these reactions as conveying
something about the individual’s emotional state, or if they reveal other non-emotional thoughts.
We find that observers are highly consistent in their ratings, attend primarily to facial expres-
sions, and view these expressions as conveying affective meaning, although this meaning is more
complex than a list of basic emotion terms. Next, we learn machine learning models to predict
expressivity ratings from a variety of visual features. Features are drawn from a variety of visual
modalities (e.g., facial expression, or head movement) and include both static measures (e.g., the
average intensity of a smile) and dynamic measures (e.g., smile velocity). We find that models do
a good job at predicting perceived expressivity, and that dynamic measures of the face are the best
predictors.
25
4.1 Perceived Expressivity
4.1.1 Annotation Task
We recruited 274 annotators from Amazon Mechanical Turk
1
to rate the 7-second window illus-
trated in Fig. 3.3b. Videos were only selected if participants consented to share their footage with
the general public for research purposes which resulted in 1000 video segments requiring annota-
tion. Each annotator rated 20 randomly-presented videos. Annotators were provided some context
(they were told these videos showed players’ reactions within a two-person game) but were not
told the specific outcome of the joint decision. They were allowed to watch each video as many
times as they wanted as they answered a series of annotation questions (using 7-point Likert-type
scales).
Annotators were asked two questions related to the individual’s expressivity. Reaction item–how
strongly is the person reacting (from “No action at all” to “Strong reaction”) and expressive
item–how expressive was the person (from “Not at all” to “Highly”). As we were interested in
what behaviors conveyed this expression, we also asked annotators to indicate the extent to which
their expressivity rating was based on facial expressions, head movements, posture movements, or
hand or arm movements (4 items in total, from “Not at all” to “Clearly”). Next, we asked two
questions to assess if annotators viewed these expressions as conveying emotion–to what extent
does the person seem to be expressing emotion, or thought–to what extent does the person seem
to be expressing a thought or concept other than emotion (both from “Not at all” to “Clearly”).
Finally, we asked annotators to provide a brief description of what they felt was expressed.
4.1.2 Annotation Results
Inter-raterreliability: Intraclass correlation coefficient (ICC) was calculated to assess if annotators
were able to provide consistent responses. Each video was rated by a different group of (randomly
1
We requested experienced workers from the US, limiting to workers who have more than 5000 tasks approved
with 95% or higher approval rate. We do not have other demographic information about the workers.
26
selected) annotators, and we combined these into a single mean rating for each video. Thus,
a one-way random ICC(1,k) was used [88] to assess agreement. Due to the limitation of the
randomization process of our survey platform, we received an uneven number of ratings for each
video. Most videos received 5 or 6 ratings, N(5)=513, and N(6)=437. A few videos received 4
or 7 ratings, N(4)=19, and N(7)=31. To calculate ICC(1,k), we chose k=5. Videos that received
only 4 ratings were treated as one rater missing, and we randomly sampled 5 ratings for videos that
received more than 5 ratings.
For each video, we asked the annotators eight 7-point Likert-type questions and one free-form
text question. We achieved an overall ICC=0.76 (95% CI [0.75, 0.77]) among all Likert-type items.
According to [88], ICC values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and
greater than 0.90 are considered poor, moderate, good, and excellent reliability, respectively. For
each item, the ICCs are all within moderate to good range except for the thought item. Specifically,
annotators agreed on the reaction item (ICC=0.80, 95% CI [0.78, 0.82]) and expressivity item
(ICC=0.80, 95% CI [0.78, 0.82]). They also achieved reasonable agreement on what part of the
body conveyed their perception of expressivity-facial expression (ICC=0.77, 95% CI [0.75, 0.79]),
head movements (ICC=0.70, 95% CI [0.67, 0.73]), posture (ICC=0.58, 95% CI [0.54, 0.62]), and
hand or arm (ICC=0.62, 95% CI [0.58, 0.66]). Finally, the annotators agreed on the extent to which
people were expressing emotion (ICC=0.78, 95% CI [0.75, 0.80]), but had different opinions on
whether a thought other than emotion was being expressed (ICC=0.27, 95% CI [0.19, 0.34]).
Components of Expressivity: We perform statistical analysis to understand how the annota-
tors made inferences from all modalities. We see a very high correlation between reaction and
expressivity (Pearson’s r=0.97). As annotators considered these two questions almost identical, we
collapse them and use the mean of the two items as an expressivity score for the ground truth of our
predictive models in the next section (shown in Fig. 4.1 as expressivity). To examine how much
these modalities contribute in conveying the impressions to the annotators, we fit a multiple lin-
ear regression with standardized face, head, posture and hand to explain composited expressivity
score.
27
Figure 4.1: Pairwise correlations (Pearson’s r with Holm-Bonferroni correction) among rating
tasks. We choose Pearson’s r after visual inspection of each rating item’s histogram. All items
fairly follow a normal distribution except head, posture and hand. We also ran non-parametric test
Kendall rank correlation, the results were similar.
28
The model was highly significant ( F
4,995
=6.03, p<.0001), and all modalities combined explain
over 90% (R
2
-adjusted=90.7%, predicted R
2
=90.6%) of variance in expressivity score. Not sur-
prisingly, face contributed the most (β=0.73, 95% CI [0.70, 0.75], p<.0001, large effect size
2
η
2
p
=0.75), head (β=0.15, 95% CI [0.12, 0.18], p<.0001, small to median effect sizeη
2
p
=0.08) and
posture (β=0.15, 95% CI [0.12, 0.19], p<.0001, small to median effect sizeη
2
p
=0.07) contributed
almost the same but much less than face. Last, Hand had a very small but significant contribution
(β=0.03, 95% CI [0.005, 0.056], p=0.02, small effect sizeη
2
p
=0.01).
Figure 4.2: Representative words (top 10 words with the highest TFIDF scores) used by annotators
to describe what was expressed in the video by outcome. Annotators used ’happy’ and ’amused’
to describe expressions regardless of game outcome.
We can also see in Fig. 4.1, emotion is highly correlated with expressivity (r=0.95). It suggests
that the more expressive the annotators considered the person was, the more confident they were
2
Rule of thumb interpretation for effect size based on https://imaging.mrc-cbu.cam.ac.uk/statswiki/
FAQ/effectSize
29
that the person was expressing an emotion. Emotion’s high correlation with face (r=0.93) tells
us that the more the annotators perceive expressivity from people’s faces, the more likely that the
annotators consider the person to be expressing an emotion.
Open-ended responses: We used the open-ended descriptions of reactions to get a sense of
what was being conveyed. We grouped videos into four categories based on the outcomes. For
each video, we concatenated the text descriptions provided by each crowd worker, removed stop
words (e.g., extremely common words like “a”, “the”, “yours”, “then”), and stemmed the rest (e.g.,
“surprised” and “surprising” will be considered the same after stemming). Then we calculated the
term frequency-inverse document frequency (TFIDF) for the combined texts. TFIDF is a common
weighting scheme in text analysis [89]. A higher value means the word appears more often in a
certain category after taking into consideration that this word might appear more often in general
among all categories. Words with the highest ten TFIDF scores for each category are shown in
Fig. 4.2. One might expect that people show more positive emotion after successful cooperation
or winning (e.g., CC or DC), and less so otherwise. However we do not see this pattern based on
the observer’s descriptions. Broadly we see the same top ten words for each outcome. But for
mutual cooperation (CC) and mutual defection (DD), we see “bored” ranked much higher than the
other two outcomes when one player cooperated and the other player defected (see more analysis
on appraisal in Chapter 5).
4.1.3 Summary
We have shown that observers were able to perceive the expressive behaviors of the players in IPD
with consistency. Though the observers relied mostly on facial expressions when judging expres-
sivity, other modalities (e.g., face, head movements, and gestures) also contributed significantly.
We also found that observers see the reactions of players as reflecting their internal emotional state,
less much so as conveying thoughts. Finally, the observers found players were more “bored” when
both parties made the same choices (either cooperate or defect) than when they chose differently.
30
4.2 Predicting Perceived Expressivity
Our goal is to build a model to predict perceived expressivity from visual features that are auto-
matically extracted from videos. Now that we have established that expressivity can be reliably
annotated by human judges, we learn predictive models from these annotations. As annotators
indicated their impressions arose from several modalities of behavior (e.g., face, head movements,
and postures), we examine the accuracy of both unimodal (e.g., using only facial features) and
multimodal (e.g., using facial features and head features) models. Results are compared with a
previously proposed baseline model.
4.2.1 Label
For each video, we would first calculate the mean of all raters for each rating item, then we use the
mean of reaction and expressivity as perceived expressivity score. However, recall in Section 4.1.2,
the inter-rater reliability among the annotators was acceptable but not great. To filter relatively
less reliable annotators, we computed modified z-scores using median absolute deviation (MAD)
[90] of perceived expressivity scores for each video. Then for each crowd worker, we calculated
the average of the modified z-scores for the 20 videos they rated. Since our sample size was
quite small (4-7 ratings for each video) when calculating the modified z-score, the common cutoff
criteria of 3 MADs [91] would not be appropriate in our case. Instead of picking an arbitrary cutoff
threshold, we removed 10% (N=28) annotators with the highest modified z-scores on average (the
least reliable annotators in our task). This filtering procedure improved the ICC of perceived
expressivity score to 0.87 with a 95% CI from 0.86 to 0.88 and ICC of overall items to 0.82 with a
95% CI from 0.81 to 0.82.
4.2.2 Features
Here we describe how visual features are automatically extracted, and grouped to capture the
modality of “face,” “head,” and “posture.” We disregard “hand” due to its limited contribution
31
to annotators’ perception and also player’s hands are often not present in the view. Within each
modality, we also categorize features as dynamic or non-dynamic. Features describing standard
deviation or max of values or features describing velocity are considered as dynamic features.
Features describing averaged values or count numbers are non-dynamic. For example, the velocity
of action unit intensity is considered dynamic feature and the count of head nod is considered non-
dynamic. Thus for each unimodal, we have three feature sets in total. One includes all features in
the modality, one includes all the dynamic features, and one includes all the non-dynamic features.
In addition to unimodals, we also present multimodal feature sets that combine all three unimodals
feature sets, all three dynamic feature sets, and all three non-dynamic feature sets.
Face(numberoffeaturesN=156): Several schemes have been proposed for characterizing fa-
cial expressions. Some affective computing approaches focus on basic emotion labels (e.g., anger,
surprise) but as these prototypes rarely occur in natural situations, we focus on finer representations
of facial movement. The most common approach is to decompose facial movement into a number
of facial Action Units (AUs) [92]. Here, we use a commercial AU-detector based on CERT [93] to
extract twenty common AUs
3
.
Some more recent work has found better (and more explainable) results by focusing on a small
number of facial factors [94]. These are commonly co-occurring AUs discovered by factor analysis
over several large expression datasets. So in addition to individual AUs, we extract six orthogonal
facial factors – Enjoyment Smile (F1), Eyebrows Up (F2), Open Mouth (F3), Mouth Tightening
(F4), Eye Tightening (F5) and Mouth Frown (F6) – following the procedure outlined in [94].
Facial expressions are often described by static features (i.e., an expression is present or absent),
but facial motion can be far more important for shaping observer’s impressions [95] [96]. To
capture expression dynamics, we construct several additional features. For each facial factor and
AU, velocity is also calculated by approximating their first derivative to describe the change in its
intensity. Then we compute the average, standard deviation, and max for each of the signals. Thus
each signal is represented by six features in total.
3
[93] outputs AU1, AU2, AU4, AU5, AU6, AU7, AU9, AU10, AU12, AU14, AU15, AU17, AU18, AU20, AU23,
AU24, AU25, AU26, AU28, and AU43. All AUs except AU43 (Eye Closed) were included in our analysis.
32
Besides specific expressions, we also construct some features to capture an overall measure
of facial movement. We create a composite facial feature (AU-SUM) by summing the ten most
commonly activate AUs
4
across the entire IPD corpus (these were activated at least 15% of all
frames).
Head (N=26): ZFace [97] was used to track head orientations in three directions (pitch, yaw,
and roll). For each direction, displacement from the individual’s mean and the velocity of displace-
ment were calculated. Then we calculated average, standard deviation, and max values for each
direction as head movement features.
In addition, we used the head gesture detector described in [98] to get frame by frame head nod
and head shake binary prediction. Then for each video, we counted the number of head nods and
head shakes as head gesture features.
Posture (N=21): We assess overall body movements with optical flow and facial landmarks.
Optical flow captures movement between consecutive frames caused by any participant motion
including posture and hand movements. We computed dense optical flow using OpenCV’s im-
plementation of Gunner Farneback’s algorithm [99]. For each frame, we take the sum of flow
magnitude across all pixels, then calculate average, standard deviation, and max values as our flow
features.
We use facial landmarks to infer changes in the head posture with ZFace [97] as is tracking
is robust to head and posture shifts. As in [100], for each facial landmark, we calculate the cur-
rent frame’s displacement from the mean position of the individual during the full game session.
Then we calculate the velocity of displacement. Principal component analysis is performed on
both displacement features and velocity of displacement features to reduce 49 landmarks to two
dimensions (94%, 86% variance is preserved respectively). For each of these features, we also
capture movement dynamics by calculating the average, standard deviation, and max values for
landmarks’ displacement and velocity as features.
4
AU1, AU2, AU6, AU10, AU12, AU14, AU17, AU23, AU24, and AU25.
33
Baseline (N=19): In past work, the sum across averaged AUs or the count of the six basic
emotion label occurrences (joy, surprise, sadness, anger, disgust, and fear) were often used directly
as an estimate of total expressivity [72] [101]. In line with this, we use 19 averaged AUs as our
baseline feature sets, then we can compare whether our choice of features can improve the model
performance.
4.2.3 Models
Since our expressivity labels were computed from a 7-point Likert-type scale, we formulate the
task to predict the perceived expressivity score as a regression problem. We experiment with two
interpretable models.
Lasso: Lasso is a method to avoid over-fitting in regression by penalizing regression coeffi-
cients with L
1
regularization. Shrinkage is controlled by a constant factor λ [102]. In a similar
context [6], lasso was used to select features before a classifier was fit to predict a player’s de-
cision in IPD from game behaviors and facial expressions. Lasso is also easy to tune with one
hyperparameterλ. We perform a grid search between 0 and 1 with a step of 0.1 to find the best λ.
Random Forest: Compared to Lasso, random forests are an ensemble decision-tree method,
which is equally interpretable without assuming a linear relationship between the features and the
response. We utilize bootstrapped training examples to reduce variance, thus avoid overfitting
[103]. Two hyperparameters are tuned with grid search in our experiments: number of trees (10,
30, 50) and max tree depth (4, 6, 8).
4.2.4 Experiments
PerformanceMetric: We use R
2
as our performance metric, which measures the proportion of the
variance for a dependent variable that’s explained by variables in a regression model.
Experiment Design: For each of our feature sets, we use nested 10-fold cross-validation to
train, validate, and evaluate our models [104]. In the inner loop, we perform grid search 10-fold
cross-validation to tune hyperparameters and record R
2
for each fold. The best model is selected
34
based on averaged R
2
during this validation process. We then test on the outer loop with another
10-fold cross-validation. In the outer loop, R
2
is recorded for each fold, and the 10 scores are
averaged as our test score.
4.2.5 Results
The results can be found in Table 4.1. First, we observe both models are suitable for our task, and
random forests outperform lasso in all tasks. Second, dynamic models achieved comparable results
to models with combined features in all modalities, and multimodal and face models outperform
baseline model. We can also see dynamic features alone are sufficient, and non-dynamic features
do not contribute additional information to explain variance in perceived expressivity. Another im-
portant observation is that the face feature sets perform very close to the multimodal sets. Features
from other modalities provide very little additional information.
To get insight into what the model learned, we examine the top 10 most important features
based on the mean decrease in impurity for random forests with multimodal feature sets. The
most important 8 features are related to smile dynamics. By far, the most important feature
(weight=0.21) is the standard deviation of the velocity of Enjoyment Smile (facial factor F1). In
physics, this feature is referred to as root mean square (RMS) velocity fluctuation, which is used
to calculate turbulence intensity. Turbulence related feature has proven useful in other research
on nonverbal behavior [105]. Other important features include the standard deviation of AU-SUM
(weight=0.14), a measure of general facial movement, and the max velocity of Enjoyment Smile
(weight=0.1). One posture dynamic feature (max velocity of overall landmarks, .02) and one head
movement feature (standard deviation of overall head movement, .02) are also helpful, but the
weights are relatively small.
35
Table 4.1: Model Performance (R
2
), bold text highlights best performing feature set within its
modality group.
Lasso Random Forests
Baseline 0.39 0.55
Multimodal 0.63 0.67
Multimodal Dynamic 0.63 0.66
Multimodal Non-dynamic 0.50 0.58
Face 0.60 0.64
Face Dynamic 0.60 0.63
Face Non-dynamic 0.39 0.55
Head 0.30 0.36
Head Dynamic 0.28 0.34
Head Non-dynamic 0.14 0.16
Posture 0.16 0.28
Posture Dynamic 0.16 0.28
Posture Non-dynamic 0.15 0.15
4.2.6 Summary
We have shown that expressivity can be predicted with high accuracy. Consistent with the ob-
servers, our models rely mostly on facial expression features–especially the dynamic ones–to pre-
dict expressivity.
4.3 Discussion
We find that third-party observers can reliably annotate the perceived emotional expressivity of
players after events in the iterated prisoner’s game. These judgments are primarily based on facial
expressions and are seen as conveying important information about the player’s emotional state.
We show that these ratings can be reliably predicted using features based on the players’ ex-
pressions, head movements, and posture. Features associated with facial expression dynamics were
the most predictive feature.
36
Chapter 5
Predicting Appraisals from Expressions
The previous section highlights that people are highly-expressive immediately following events
in the prisoner’s dilemma game, but how do these expressions relate to the nature of the game
outcome? Do people smile more when when they win money compared to when they are exploited
(as predicted by Basic Emotion Theory)? To answer this question, we examine the relationship
between objective features of the preceding event (defined in terms of appraisal variables) and
facial expressions. Rather than focusing on specific expressions, we also consider expressivity.
5.1 Defining Events in the Prisoner’s Dilemma
Expressions are reactions to personally-relevant events [61]. One of the key challenges in analyzing
facial reactions in an unfolding situation, like IPD, is identifying what constitutes an “event”.
One common approach is to take the entire interaction as a unit of analysis. For example, [106]
examined correlations between the frequency of expressions and the number of times a player
cooperates across an entire game. They found that players that smile more are exploited more
often. This was offered as evidence that players take advantage of opponents that smile. Yet an
equally consistent explanation is that players smile when they are exploited (even if this seems to
contradict Basic Emotion Theory). Only by examining the time course of events within the game
can we disambiguate these explanations.
37
At the other extreme, we could treat events as the outcome of a single round in isolation.
This approach was taken in an analysis of the Golden Balls TV show, where no differences in
expressions were found [107]. However, solely focusing on a single outcome ignores the history
leading up to this point. This history can dramatically change how someone might view the joint
outcome. For example, being exploited (CD) following mutual cooperation (CC) feels worse (and
more surprising) than being exploited after having just exploited one’s partner (which seems more
like deserved punishment, and corresponds to the common tactic of tit-for-tat). In theory, the
entire sequence of decisions up to a given round could be important for determining emotional
expressions. The problem with this approach is that there are 4
10
(>1M) possible sequences. Even
with a corpus of 700 players, we do not have enough data to draw meaningful conclusions about
the entire sequence of events.
Instead, we adopt local action sequences as our unit of analysis and average across all ex-
pressions that share the same sequence. In particular, we distinguish events by the immediate
joint outcome of a round (i.e., CC, CD, DC, or DD) plus the joint outcome of the immediately
preceding round. This corresponds to a ”memory-1” strategy in game theory (such as tit-for-tat
and win-stay-lose-shift). Research on how people play IPD suggests that memory-1 strategies ex-
plain most of the variance in human players’ decisions [108] [109], and thus we might hope they
similarly explain most of the variance in their expressions. This representation yields the sixteen
”events” shown as arrows in Fig. 5.1, and their frequency in our corpus. We argue this is a reason-
able compromise between the need for a rich event representation and the need for a large number
of data points per event. Further, we examine finer-grain event definitions for a subset of events
where enough data allows for this exploration.
38
Figure 5.1: Event representation in IPD. Circles represent the possible outcomes of a single round.
Arcs represent possible outcome transitions. Each arrow is labeled with the number of such tran-
sitions that occur in the corpus. Cooler color represents more common transition.
39
5.2 Defining Appraisal in the Prisoner’s Dilemma
Appraisal theory argues that individuals appraise how an event relates to their internal goals and
norms. Given our definition of events, we next operationalize three appraisal variables shared by
virtually every appraisal theory––goal-congruence, unexpectedness, and norm-compatibility:
Goal-congruence refers to if an event advances or hinders an individual’s goals. Goal-congruence
determines if an event is seen as pleasurable or aversive. In the corpus, players were encouraged to
focus on the goal of winning, both by instruction and providing them a financial reward based on
the number of points they earn during the game. A large body of prior work has demonstrated that
such financial rewards explain the bulk of variance in self-reported pleasure experienced in such
games [110] [111]. Thus, while players may bring many goals to the game, the goal-congruence
of a game event, and thus the pleasure they feel, should be correlated with the objective number of
points they earn. If goal-congruence is directly linked to external expressions as suggested by basic
emotion theory and some prior appraisal research [112], then facial expressions of pleasure (e.g.,
smiles) should be correlated with how a game event impacts an individual’s objective financial
self-interest.
Thus, to operationalize goal-congruence, we equate self-interests with the monetary payoff
shown in Fig. 3.1,
1
where GC
p
is the expected valence of the expression of player p according to
the goal-congruence hypothesis.
GC
p
=
5 if current outcome is CC
1 if current outcome is DD
0 if current outcome is CD
10 if current outcome is DC
(5.1)
1
Psychological research suggests that rewards/losses may be judged with respect to a reference point [113]. In this
case would likely correspond to the difference in reward from the previous round (i.e., did I earn more or less than the
last round). We tested both absolute and relative formulations of self-interest and social-interest. Relative values fared
far worse and won’t be discussed further.
40
If expressions are determined by self-interests, we would expect people to either show the most
positive expressions or be the most expressive, when exploiting their opponent (DC) and the most
negative when being exploited (CD), regardless of the preceding state.
Unexpectedness refers to the perceived likelihood of an event. Unexpectedness helps shape the
type of emotion reported (e.g, surprise or fear) but also strongly correlates with the intensity of af-
fective experience [114] [115] [110]. For example, a person is not especially happy upon receiving
a monthly paycheck but would have a strong negative emotion if the paycheck unexpectedly failed
to appear. Novel and unexpected events also require people’s attention to determine potential con-
sequences and they may signal dangers or opportunities [116]. If unexpectedness is directly linked
to external expressions as suggested by basic emotion theory and some prior appraisal research,
then the intensity of facial expressions, and perhaps their nature (e.g., surprised face), should be
correlated with the objective unexpectedness of a game event.
To operationalize unexpectedness, we need to estimate the subjective probability of an event in
the game. Although the corpus does not contain the player’s subjective probability of events, we
can estimate these perceptions from the objective frequency of events in the corpus. For example,
Fig. 5.1 shows the transition CC→CC is very common, therefore, participants are unlikely to be
surprised if this pattern occurs. In contrast, DD→CC is quite rare, which should be perceived as
unexpected
2
. Rather than using the probability directly, it is common in psychological models to
equate the subjective experience to negative log-likelihood of an event, as described in information
theory [117] (e.g., see [118]):
Unexpectedness
t
= log
1
P(outcome
t
| outcome
t− 1
)
(5.2)
where P(outcome
t
| outcome
t− 1
) is the conditional probability of an outcome (e.g. CC) given the
previous outcome.
2
This definition is symmetric. If CC →DC occurs, the defection will be assumed to be equally surprising to each
player. Arguably, the defector is less surprised because they chose to defect. The value of asymmetric models should
be considered in future research.
41
Norm-compatibility refers to if an event is consistent with moral principles or social norms. In
economic games like prisoner’s dilemma, norm-compatibility typically refers to if an outcome is
fair or equitable [36]. Thus, while exploiting one’s opponent enhances self-interest (and thus might
be seen as pleasurable), it also violates the norm of fairness and thus may produce moral emotions
such as guilt. According to appraisal theories, the norm-compatibility of an event can shape the
types of emotions experience (e.g., pride, guilt, envy) but also the intensity of experienced pleasure
or displeasure (as when financial loss of being exploited is compounded by the unfair nature of the
act). The intensity of self-reported pleasure has been well-captured by Fehr and Schmidt’s social-
interest equations [36]. If norm-compatibility is directly linked to external expressions as suggested
by basic emotion theory and some prior appraisal research, then the intensity of facial expressions,
and perhaps their nature (e.g., angry face), should be correlated with the objective social-interest
of a game event.
Following Fehr and Schmidt [36], we operationalize norm-compatibility with Equation 5.3,
NC
p
= U
p
− α· max(U
o
− U
p
,0)− β· max(U
p
− U
o
,0) (5.3)
where NC
p
is the expected valence of the expression of player p according to the norm-compatibility
hypothesis, U
p
and U
o
are the monetary payoff a player and their opponent earn in one round, α
andβ are envy and guilt parameters. de Melo [119] did extensive experiments to estimateα andβ
in a set of social dilemmas and we adopt his values in our analysis (α=1.77,β=0.24). Unpublished
work in our lab suggests these values are reasonable for modeling player decisions in IPD. With
this operationalization, we have
NC
p
=
5 if current outcome is CC
1 if current outcome is DD
− 17.7 if current outcome is CD
7.6 if current outcome is DC
(5.4)
42
A player will show the most positive responses when achieving mutual cooperation (CC) and
the most negative when being exploited (CD), regardless of the preceding state.
5.3 Relating Expressions to Event Appraisals
Now that we have defined events and appraisal variables, we examine if expressive reactions reveal
information about how an individual appraises an event. Specifically, we examine the relationship
of emotional expressivity and specific expressions with the three event appraisals above.
To foreshadow our findings, emotional expressivity strongly predicts the unexpectedness of
events. We follow this analysis with a finer-grained representation of events that better distin-
guishes the unexpectedness of different local event sequences (e.g., exploitation after one round
of mutual cooperation is less unexpected than exploitation after multiple rounds of mutual co-
operation). This secondary analysis further confirms the association between expressivity and
unexpectedness.
5.3.1 Analyzing Theoretical Factors
For each of the 16 events defined above (and illustrated in Fig. 5.2), we examine the relationship
between event appraisals and emotional reactions in the seven seconds following the moment when
players learn the joint outcome. In terms of emotional reactions, we examine emotional expres-
sivity (as predicted by the learned model with best performance in Table 4.1, i.e., Random Forests
model with Multimodal feature set) and the facial factors outlined in Section 4.2). We focus on
facial factors, rather than specific facial action units, as the learned model found these more predic-
tive, they are easier to interpret, and their smaller number reduces the risk of spurious associations
due to multiple comparisons.
For each event (e.g., CC following CC), we calculated the mean expressivity score and the
mean facial factor scores over each video frame in the seven seconds following the outcome re-
veal. These scores are then averaged across all instances of each event in the corpus. Fig. 5.2
43
Figure 5.2: Player smile intensity across the game events: Each arrow represents one of the 16
events analyzed. Circles represent four possible game outcomes on a given round, Cooler color
represents lower emotional expressivity.
44
illustrates the intensity of Enjoyment Smile (F1) elicited by each event. This shows, for example,
continued mutual cooperation (CC→CC) yields weaker F1 than mutual cooperation following ex-
ploitation (DC→CC). This relationship between smile intensity and the prior outcome highlights
the importance of using local event sequences as the unit of analysis, rather than simply examining
expressions after the joint outcome.
Visual inspection of Fig. 5.2 highlights that events incongruent with player goals (and thus
expected to produce weaker F1 under the “standard model”) actually produce stronger F1. For
example, the goal-incongruent event of being exploited after mutual cooperation (i.e., CC→CD)
shows more intense F1 than the goal congruent event of continued cooperation (e.g. CC→CC).
We systematically examine the relationship between expressions and event-appraisals by per-
forming step-wise regression across the sixteen transitions shown in Fig. 5.2. Using the six facial
factor intensity scores and expressivity score as predictors and AIC as the model selection criteria
[120], we perform three backward linear regressions to predict each appraisal variable. Firstly,
goal-congruence was not predicted by anything. Secondly, norm-compatibility was only found
to have a significant relationship with mouth frowns (F6) ( F
1,14
=6.50, p=.023, large effect size
η
2
p
=0.32). Lastly, unexpectedness yielded the strongest fit and was predicted together by three
variables. Among them, expressivity score contributed the most (F
3,12
=24.62, p=.0003, large ef-
fect size η
2
p
=0.67), Eyebrows Up (F2) (F
3,12
=4.77, p=.05, small effect size η
2
p
=0.12) and Mouth
Tightening (F5) (F
3,12
=5.48, p=.04, small effect sizeη
2
p
=0.13) also contributed marginally.
As described in Equation. 5.1 and 5.4, goal-congruence and norm-compatibility do not strictly
follow normal distribution, so we also run three pair-wise non-parametric tests (Kendall rank cor-
relation with Holm-Bonferroni correction) to examine the correlation between expressions and ap-
praisals. The only significant correlation we found was between Unexpectedness and expressivity
score (τ=.60, p=.04).
None of these patterns is consistent with the “standard model” of emotional expression which
would have predicted, for example, greater Enjoyment Smile (F1) when the player received the
most money (as captured by goal-congruence). If anything, this relationship is reversed. For
45
example, people showed greater Enjoyment Smile (F1) after being exploited–mean (F1, CC→
CD)=0.59–than when they achieved mutual cooperation–mean (F1, CC→CC)=0.42–though this
reverse pattern did not reach significance across all 16 events. According to work on social norms,
players should show negative expressions, like Mouth Frown (F6), when they are exploited. In-
stead, players showed more Mouth Frown (F6) when they exploited their opponent: mean (F6,
CC→CD)=0.08 whereas mean (F6, CC→DC)=0.13. Only these association between emotional
expressivity and unexpectedness shows a strong and clear relationship.
5.3.2 Fine-grained Event Analysis
We have shown that expressivity predicts the unexpectedness of game events. For example, be-
ing exploited by one’s opponent after they cooperated with you is unexpected and evokes strong
expressions. Assuming this analysis is correct, we would expect that being exploited after several
rounds of mutual cooperation would be more surprising and evoke even stronger reactions. Here,
we explore this possibility.
The previous analysis focused on memory-1 event representations (i.e., the outcome and the
immediately preceding outcome) because there is insufficient data to examine a finer-grained rep-
resentation. However, as can be seen in Fig. 5.1, the transition CC→CC was the most common
local action sequence (2550 instances). To show our finding regarding the relationship between
expressivity and unexpectedness of game events is robust across other possible representations,
we explored two additional analyses to see if this conclusion still holds if we use a finer-grained
representation of events. Using just these transitions, we create a new event representation looking
at outcomes as a function of the number of preceding mutual cooperation: i.e., CC(N)→¬CC.
If our above analysis is correct, players that have successfully cooperated for several rounds
might expect they will cooperate again, and thus show less expressive when they indeed success-
fully cooperate. Fig. 5.3 illustrates that emotional expressivity indeed attenuates the longer players
maintain mutual cooperation. This figure shows emotional expressivity (as predicted by the learned
model) and unexpectedness (calculated based on Equation.5.2) as a function of how many rounds
46
Figure 5.3: Expressivity score and unexpectedness of mutual cooperation as a function of the
number of preceding mutual cooperation when players stay in mutual cooperation.
players previously chose mutual cooperation. Linear regression results also support the downward
pattern. To predict emotional expressivity using the number of preceding rounds of mutual coop-
eration as predictor, we observe a significant negative relationship ( β=-.01, p=.002). It shows that,
on average for each additional round a player stays in mutual cooperation, their reaction to the
game result (as measured by expressivity score) decreases by 0.01.
Just as staying within a predicted pattern would be less surprising, deviating from a stable pat-
tern of cooperation should evoke greater surprise. We perform linear regression to examine if the
emotional expressivity of leaving CC depends on how many rounds players previously cooper-
ated together. To predict emotional expressivity using the number of preceding rounds of mutual
cooperation as predictor, we observe a significant positive relationship ( β=.03, p=.015). To get
insight on if this effect depends on the final outcome (e.g., CC(N) →CD vs. CC(N)→DC), we
further analyze each outcome separately. In general, we observe an upward trend for all transitions
(CC→DD, CC→CD and CC→DC), but only see a significant relationship between expressivity
score and the number of preceding mutual cooperation (β=.04, p=.008) in CC→DC. Fig. 5.4
shows that the player that chooses to defect shows more expressive reactions when successfully
47
exploiting their opponent. Presumably, this does not purely reflect surprise (as they chose to de-
fect) but it may reflect what Ortony, Clore, and Collins [114] refer to as prospect-based appraisals
(i.e., satisfaction or fears confirmed) which are argued to relate to the likelihood of the event.
Figure 5.4: Expressivity score and unexpectedness of mutual cooperation as function of number
of preceding mutual cooperation in CC→DC
5.3.3 Discussion
Overall, results are inconsistent with the “standard model”. Smiles and frowns were not diagnostic
of the value of events for the individual (goal-congruence). Frowns showed some association
with norm-compatibility but in the opposite direction as predicted by the “standard model”. The
strongest associations were between emotional expressivity and the unexpectedness of outcomes.
This holds for the sixteen memory-1 events and for a finer-grained analysis that considered the
number of rounds that players successfully cooperated before the event.
48
5.4 General Discussion
The present study is part of a growing critical reexamination, within affective computing, of what
machines can infer from facial expressions in social settings. Our findings undermine the “standard
model” that specific facial expressions reliably reveal a person’s feelings. Though we did find
strong evidence that facial expressions communicate important information about social situations.
Thus, our results reaffirm the importance of automatic facial expression recognition in general, but
caution against simplistic interpretations of these signals.
We make two important contributions towards a more nuanced interpretation of facial displays.
Our first contribution is to focus on emotional expressivity as a construct, rather than specific facial
expressions. We demonstrated that human observers could reliability annotate the expressivity of
individuals playing the prisoner’s dilemma, and that we could accurately predict these annotations
from visual features. Both the annotators and the learned models highlight the importance of fa-
cial expressions, and facial dynamics in particular, in performing these inferences. Our second
contribution is to use appraisal theory as a framework to capture the meaning of facial expres-
sions. Rather than inferring subjective emotional feelings, we examined the relationship between
expressions and objective measures of congruence, norm-compatibility, and the unexpectedness
of events. Contrary to the “standard model”, expressions that are claimed to indicate pleasures
showed no association with how much money an individual could earn from an outcome (i.e.,
goal-congruence) or the extent to which the event was fair (i.e., norm-compatibility). The only ap-
praisal variable we could predict from facial expressions with confidence was the unexpectedness
of events. We observed that people were more expressive when the situation was more unexpected.
Our findings provide evidence that facial expressions indicate appraisal (unexpectedness) but also
offer pause to techniques that adopt common sense interpretations of facial expressions: e.g., that
smiles indicate pleasure. We also show that our expressivity model is a superior choice to specific
facial expressions.
There are several limitations to this current work. Our analysis depends on the accuracy of auto-
matic facial analysis. Even though we adopted the state-of-art software to track facial expressions,
49
these techniques can be inaccurate due to the complexities in naturalistic data: facial occlusion,
extreme head orientation, poor lighting, racial and age variability. It is possible that the variance
introduced by these characterises obscured some findings. Our results also depend on the choice of
features used to represent facial motion, and other representation might yield different results. By
focusing on objective characteristic of events (e.g., how much money could a participant earn?),
we may have missed subjective goals, such as having fun, that might have elicited some expres-
sions. Although other related studies emphasize that, when asked, people report smiling when they
don’t feel joy in this game [121]. The current analysis focused primarily on memory-1 strategies
to characterize events (though an exploratory fine-grained analysis suggests the findings will ex-
tend to richer representations, given sufficient data). Nor did we consider how prior or concurrent
expressions of the partner shaped expressions (rather, these were treated as random noise). This
limits our ability to analyze the iterative and interactive nature of the game. Other representations,
like the full history of decisions leading up to a transition, might yield further insights. Our results
are also specific to social decision and with relatively low stakes (opportunity to win $100 USD).
These results need to be extended to other situations (e.g., non-social) and other, non-economic,
sources of reward (e.g., [122]).
50
Chapter 6
Dyadic Dynamics
In this chapter, I move on from the focus on the individuals to the dyads. So far in my analysis I
have treated each player as an independent individual, but that only allows us to study one aspect of
the story. The information hidden in the dynamics within the dyads are essential to understanding
the complex interactions. Expressive behaviors are bidirectional intra- and interpersonal processes
[123] [124]. The individual within a dyad not only react to their own emotions as a reaction to
the event or stimuli, but also to those of their partners. The individual also uses their partner’s
expressive behavior to interpret the partner’s intention behind their action [3]. In the context of the
IPD game, the players also have the incentive to regulate their emotions (up or down regulation) in
order to influence their partner’s choice in the next round or to not deviate from the social norms.
Studying dyadic dynamics allows us to explore these bidirectional influences and understand how
they impact outcomes. In this dissertation, I study the dyadic dynamics from two angles. First,
I study dyadic synchrony. Specifically, I explore whether there are synchrony between the dyads
and how do they differ depends on what happened in the game? In addition, I want to understand
where the synchrony is coming from. Is it a social process – because the players are mimicking
each other’s face, or is it due to the shared stimuli – that they are reacting to a shared task? Second,
I build generative models of dyadic interaction.
51
6.1 Dyadic Synchrony
Within affective computing, there is a strong tradition of treating facial expressions as though they
arise from some internal evaluation by the individual. For example, when viewing advertisements,
facial expressions are assumed to reflect the individual’s evaluation of the product and are used to
predict purchase intentions [125], or in the context of intelligent tutoring, facial expressions are
assumed to reflect the student’s evaluation of their learning experience and used to respond to their
inferred boredom or frustration [126]. This is true even in social situations. For example, studies of
people engaged in a social task like the prisoner’s dilemma often analyze the expressions of each
individual separately and make inferences about what just occurred [18] or what action they might
perform next. In such work, the task the individual in performing (e.g., watching an ad, learning
math or playing a game) is assumed to explain most of the variance in the person’s expressive
display, and displays of other social actors are ignored or analyzed separately.
However, another research tradition, also present in affective computing, emphasizes the mu-
tual influence of emotional displays between social actors. For example, research on facial mimicry
argues that people spontaneously copy the expressions of their partner [127] [128], and research on
emotional contagion argues people “catch” each other’s emotions and presumably become more
aligned in their expressive displays [129]. The phenomenon of dyadic synchrony—where two
individuals exhibit aligned expressions, gestures, or physiological responses—has been widely
observed in various social contexts, including parent-infant interactions citefeldman2007parent
[130], romantic relationships [131] [132], teacher-student coordination [133], therapist-patient in-
teractions [134], and collaborative tasks [135] [136]. Synchrony is thought to play a crucial role in
fostering social cohesion, facilitating communication, and promoting cooperation and mutual un-
derstanding between individuals [137]. In this tradition of research, the dyad or group is analyzed
as a unit and assumed to explain most of the variance in each person’s expressive display. The
notion that some of the expressions may arise from the task itself is typically ignored or perhaps
considered as a general factor that predicts the level of synchrony (e.g., negotiators in a cooperative
negotiation are more synchronized than in a competitive one [79]).
52
Figure 6.1: Illustrates two pathways to synchrony: (a) Dyads synchronize their faces due to con-
tagion or mimicry–that they react to each other’s facial expressions. (b) Dyads synchronize their
facial display because they experience shared stimulus–the revelation of joint outcome in the IPD
task.
53
Despite the well-documented importance of synchrony in social interactions, the underlying
sources and mechanisms that drive dyadic synchrony remain an active area of inquiry. On the one
hand, it is conceivable that synchrony might not arise from interpersonal contagion at all (Figure.
6.1a), but rather reflect the fact that both individuals are engaged in the same task (Figure. 6.1b).
For example, friends become more synchronized in their neural responses when watching a movie,
not because they are responding to each other but because they are experiencing the same film,
even if on separate days [138]. Strangers in a study of responses to commercial ads smile together
[139], not because they are influencing each other but because they are sharing the same stimuli.
On the other hand, studies that examine the facial expressions of individuals in interactive tasks,
such as the Prisoner’s Dilemma, may overemphasize the impact of the shared task and miss the
important role that synchrony, mimicry or contagion play in explaining player’s expressions.
The present study aims to give insight into these alternative explanations for facial expressions
in social tasks. We investigate the sources of synchrony in expressivity in dyads during the Iterated
Prisoner’s Dilemma (IPD) task. We examine two conditions: a still condition, where dyads cannot
see their partner’s facial reactions, and a video condition, where dyads can see their partner’s
reactions in real-time. We explore whether synchrony in real interactions is driven by visual cues
from the partner’s facial expressions, by the shared experience of reacting to a shared action, or by
a combination of both factors.
Thus, we explore two hypothesized mechanisms by which synchrony arises:
H1: Interpersonal synchrony Dyads will display higher synchrony in the video condition
than the still condition due to their ability to see and attune to their partner’s facial expressions.
H2: Shared task Dyads will display synchrony in the still condition due to their participation
in a joint activity.
H2a: Task coordination moderates synchrony If synchrony is driven by the task, it can only
arise if players are actually coordinating their activities. Thus, we predict greater synchrony when
decisions are aligned (e.g., cooperate together) compared with when one disrupts coordination
(e.g., exploiting their partner).
54
Hypothesis H2a claims that any synchrony from the shared task will be moderated by the
joint outcome, so we focus on the segments of reactions of the dyads following the disclosure of
each joint decision. Following [140], we focus on the 7-second segments which capture the most
expressive reactions of the dyads in response to the reveal of the joint outcome. See more details
about the segment of choice in [140]. We also consider three joint decisions as outcomes, 1) mutual
cooperation (CC) when they both choose split; 2) mutual defection (DD) when they both choose
steal; and 3) CD or DC when one of the players chooses split and the other chooses steal.
Our findings reveal that synchrony between real dyads is significantly higher than that of ran-
domly paired dyads, suggesting that synchrony is a genuine phenomenon in social interactions.
Furthermore, we find that synchrony is present even in the still condition, where dyads do not have
visual access to their partner’s facial reactions, indicating that the shared experience of reacting
to the joint outcome contributes to synchrony. Additionally, we observe that the ability to see the
partner’s facial reaction in real-time in the video condition enhances synchrony, especially when
dyad members’ decisions are not aligned.
Overall, this study provides valuable insights into the multifaceted nature of dyadic synchrony
and highlights the interplay between shared experiences and visual cues in driving synchrony dur-
ing interactive social situations. By understanding the sources and mechanisms of synchrony, we
contribute to a more comprehensive understanding of how individuals coordinate and align their
behavior and emotions in cooperative and competitive contexts.
6.1.1 Methods
6.1.1.1 Pre-processing Signals
First, the facial expressions and emotional expressivity of each individual were examined and
automatically extracted. For each individual, facial expressions were automatically extracted using
OpenFace [141]. It provides frame by frame tracking at the rate of 30 frames per second of 20 facial
action units. We further reduce the dimension of 20 facial action units to six facial factors following
the work in [94]. This recent work has found better (and more explainable) results by focusing on
55
a small number of commonly co-occurring AUs discovered by factor analysis over several large
expression datasets. We extract six orthogonal facial factors – Enjoyment Smile (F1), Eyebrows
Up (F2), Open Mouth (F3), Mouth Tightening (F4), Eye Tightening (F5) and Mouth Frown (F6)
– following the procedure outlined in [94]. These facial factors were z-scored for each individual.
Similarly, for each individual, emotional expressivity was automatically extracted using the model
introduced by [27]. The model is trained on third-party annotators’ judgment on how expressive
the IPD reaction shots are. It outputs frame-by-frame tracking at the rate of 6 frames per second of
perceived expressivity score. Similar to the facial factors, the emotional expressivity scores were
also z-scored for each individual. The graph in Figure. 3.3(a) shows a univariate of expressivity
score and Figure. 3.3(b) shows the intensity of six facial factors. There were missing frames in
both the facial factor data and emotional expressivity data due to OpenFace tracking failure. These
failures were largely due to partial occlusion, such as when a player used their hand to cover part
of their face. To ensure that the measurement of synchrony was meaningful, we pre-aligned the
sequences of player and partner using timestamp data. Furthermore, we only included frames in the
analysis if data from both players were available at a given frame. Additionally, we set a minimal
length requirement for the synchrony analysis. Specifically, the dyads were included in the analysis
only if they had missed less than 40% of the 7-second segment, that is at least 25 frames available
for emotional expressivity score and at least 126 frames available for facial factors.
6.1.1.2 Dynamic Time Warping
We use Dynamic Time Warping (DTW) to calculate the synchrony between facial expressions and
emotional expressivity of the partners in a dyad. DTW is a widely used method for measuring the
similarity between two time series that may vary in time or speed [80]. It has shown promising
results when applied to the field of speech recognition [142] [143]. In recent years, it has also been
increasingly used in social and behavioral sciences to measure synchrony in human interactions
[77] [78]. DTW is particularly useful in cases where the time series being compared are not
perfectly aligned, have different lengths, or have variations in speed or duration. In our case, DTW
56
allowed us to compare the facial factors and emotional expressivity scores of the two partners in a
dyad, even if they varied in time or speed. This was important because the facial expressions and
emotional expressivity of each partner could differ in terms of their onset (how fast they react to
the task result or to their partner’s facial expressions), duration, and intensity, making it difficult to
directly compare them without aligning them first.
DTW works by finding the optimal alignment between two time series based on their simi-
larity. The algorithm achieves this by warping one time series to match the other, stretching or
compressing it as necessary. DTW accomplishes this by constructing a matrix that shows the dis-
tance between each pair of time points in the two time series. This matrix is then used to find the
optimal path, or warping path, between the two time series that minimizes the distance (alignment
costs) between them. Depending on the nature of the two sequences, we can also apply restrictions
and rules on how each element or frame of the sequences can be matched. We selected specific
parameters that are grounded in our understanding of the interaction and informed by previous re-
search [78] [79] [83] [77]. We used Euclidean distance to calculate the local distance (uni-variate
for emotional expressivity and multi-variate for facial factors). For global constraints, we used
Sakoe Chiba Window with a band of 4 seconds [80]. Our rationale for the window size is that a
reaction time exceeding 4 seconds to a stimulus (such as an outcome or a partner’s facial expres-
sion) should be interpreted as an independent expression rather than a response to the stimulus.
For local constraints, we chose Sakoe Chiba symmetricP1 [80] [82]. This step pattern restrict the
transiton to 3 possible optons,
1. (i− 1, j− 1)→(i, j)
2. (i− 2, j− 1)→(i, j)
3. (i− 1, j− 2)→(i, j)
Option (1) suggests that symmetric P1 constraint allows the path to move diagonally with a
1:1 ratio, meaning that the frames in the sequences need to be aligned one to one without a gap
and only once. Option (2) and option (3) allow the path to have a 2:1 or 1:2 ratio in the horizontal
57
and vertical directions, meaning that one frame can be matched to at most two frames on either
direction. This constraint offers more flexibility in the warping path than used in [77], offering
more flexibility in matching sequences with varying local speeds–which we think it is crucial when
matching facial expression signals given people do exhibit different expression dynamics [144].
Finally, to account for variations in the lengths of our sequences, we normalize the alignment
cost by dividing the total cost by the total length of the sequences.
Figure 6.2: Frame samples of dyads’ reaction after learning a joint outcome. (a) shows the video
condition where the dyad can see each other in real-time through webcam. (b) shows the still
condition, the dyad can only see a still image of their partner. We see synchrony in both conditions
and supported by the plots of expressivity scores.
6.1.2 Results
6.1.2.1 Synchrony of Expressivity
We examine the synchrony of expressivity by aligning the dyad’s frame-by-frame expressivity
score with DTW. Then the normalized cost of alignment is used to represent reversed synchrony,
i.e., the higher the cost, the lower the synchrony.
58
Validityanalysis First we apply analysis to examine whether any interpretation of synchrony of
expressivity is valid in real interaction. To achieve this, we create a randomly paired dyad for each
original dyad by substituting one player with a segment from a participant who never interacted
with either member of the dyad. We also follow the same pipeline to pre-process the segment from
the randomly selected player. Lastly, we assess synchrony in both still and video conditions for the
original dyads and compare it with that of the randomly paired dyads.
Figure 6.3: Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type. The real
dyads display significantly higher synchrony than the randomly paired dyads. Real dyads in the
video condition displayed significantly higher synchrony than in the still condition. We do not find
this pattern in randomly paired dyads, which in return validates our measure of synchrony.
Figure. 6.2 displays frame samples from a dyad in the video condition as well as a dyad in the
still condition, along with corresponding plots of their expressivity scores over time. We can clearly
see sychrony of expressivity in both conditions. In Figure. 6.3, we can see that irrespective of
whether the condition is video or still, the alignment cost for randomly paired dyads is significantly
59
higher than that for IPD dyads. This indicates that real dyads exhibit a significantly greater degree
of synchrony of expressivity compared to randomly paired dyads. This finding supports that our
measure of synchrony of expressivity was successful, and dyads display synchrony of expressivity
in reacting to the results even when they do not see their partner’s facial reactions.
Linearmixed-effectsmodel We build linear mixed-effects model with normalized DTW align-
ment cost (mean centered) as the responsible variable, condition (still or video), outcome (CC,
DD or CD/DC) and their interaction term as independent variable. Sum contrasts was used for the
interaction term. Dyad ID was added as a random effect.
Alignment Cost=β
0
+β
1
· Condition
+β
2
· Outcome
+β
3
· (Condition× Outcome)
+ u
Dyad ID
+ε
There is no main effect of outcome but a main effect of condition,β
1
=-0.07, F(1, 90.79)=9.22,
p=0.0032. The main effect of condition indicates that dyads in the video condition display more
synchrony of expressivity than the grand mean of all dyads (note when interpreting the coefficient,
the negative coefficient means less alignment cost and higher synchrony). In addition to the main
effect of condition, there is a significant interaction effect between condition and the “CD/DC”
level (coef.=-0.05, p=0.04) and “CC” level (coef.=0.07, p=0.0096) of outcome. The significant
interaction effect, F(2, 412.60)=1.19, indicates that the outcome the dyads are reacting to modifies
the effect of condition on synchrony of expressivity. Specifically, on average being able to see
the partner in real time (video condition) help the dyads synchronize significantly more in terms
of expressivity, but when the outcome is “CC” the dyads don’t need to see their partner to be
synchronized. On the other hand, when the outcome is “CD/DC”, being able to see the partner in
real time help the dyads synchronize their expressivity significantly more.
60
We also did the same linear mixed-effects modeling for randomly paired dyads to compare
to the findings we have for real dyads. There is no significant main effect of either Condition or
outcome. There is no significant interaction term either. It further validates that our findings are a
true indication of dyadic synchrony.
Figure 6.4: Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type, condi-
tion and outcome. Top row shows real dyads display higher synchrony of expressivity in the video
condition than the still condition when reacting to CD/DC outcome but not when reacting to CC or
DD. Bottom row shows that the finding in real dyads is evidence of real synchrony in interaction
since they cannot be replicated in randomly paired dyads.
To visualize the simple effects, in Figure. 6.4, the top row shows the synchrony of expressivity
in IPD Dyads, and compares the synchrony of expressivity in the still condition to the video con-
dition by outcome. The bottom row shows the same comparisons but in randomly paired dyads.
First, we notice significant differences in synchrony of expressivity between the still condition and
61
the video condition in real dyads but not in randomly paired dyads. In randomly paired dyads, syn-
chrony of expressivity are roughly the same across all outcome and both conditions. IPD Dyads
show significantly higher synchrony in the video condition when the outcome is CD/DC but not
when the outcome is mutual cooperation (CC) or mutual defection (DD). Namely, when the dyads
were not syncing in their decision, seeing their partner’s facial reaction to the outcome helps them
sync up their expressive facial display. In contrast, if the dyads were already in mutual cooper-
ation or mutual defection, seeing their partner’s facial reaction would not help them show more
synchrony of expressivity.
6.1.2.2 Synchrony of Facial Factors
The synchrony of expressivity tells us a coarse story about how synchronized the dyads are when
reacting to the outcome. Next, we examine the synchrony of facial factors. The multivariate align-
ment of facial factors tell a more nuanced story about the synchrony within the dyads. Following
the same pipeline as we examine the synchrony of expressivity, we align the dyad’s frame-by-frame
facial factors with DTW. Then the normalized cost of alignment is used to represent reversed syn-
chrony, i.e., the higher the cost, the lower the synchrony.
Validity analysis To assess the validity of interpreting synchrony in facial expressions with
facial factors, we first conduct an analysis to determine whether any such interpretation is valid
during actual interactions. Similar to validity analysis for synchrony of expressivity, we create
randomly paired dyads for each set of interacting individuals. We do this by replacing one indi-
vidual in the pair with a segment from a randomly selected participant who has not interacted with
either member of the original dyad during the task. The same pre-processing steps are applied to
the segment from the randomly selected participant. We then evaluate synchrony levels for the
dyads under both still and video conditions and compare these levels with those obtained from the
randomly paired dyads. As shown in Figure. 6.5, we observe higher alignment costs for randomly
paired dyads compared to real dyads, regardless of the condition. This indicates higher synchrony
of facial factors in real interactions (real dyads) compared to simulated interactions (randomly
62
Figure 6.5: Normalized DTW Alignment cost (reversed synchrony score) by Dyads Type. Real
dyads display significantly higher synchrony of facial factors than randomly paired dyads. Real
dyads in video condition displayed significantly higher synchrony of facial factors than in the still
condition.
63
paired dyads). This result confirms the success of our synchrony measure, showing that dyads dis-
play synchrony of facial factor in response to the joint outcome even without seeing their partner’s
facial reactions.
Linearmixed-effectsmodel Following the same structure as the linear mixed effects model we
did for the analysis of synchrony of expressivity, we build linear mixed effects model with normal-
ized DTW alignment cost (mean centered) as responsible variable, condition (still or video), out-
come (CC, DD or CD/DC) and their interaction term as independent variable. Dyad ID was added
as a random effect. There is main effect of condition, β
1
=-0.078, F(1, 88.6)=5.72, p=0.02. The
is also main effect of outcome, F(2,328.06)=4.15, for the “CD/DC” level (coef.=0.098, p=0.02)
and for the level “CC” (coef.=-0.097, p=0.02). Specifically, when reacting to the “CC” outcome,
IPD Dyads exhibit higher-than-average synchrony of facial factors, whereas when reacting to the
“CD/DC” outcome, they display lower-than-average synchrony of facial factors. No interaction
effect is significant but a trending negative interaction effect was observed between condition and
the “CD/DC” level of outcome (coef.=-0.09, p=0.067). This effect is similar to the significant in-
teraction effect we have for the synchrony of expressivity, which suggests that dyads display higher
synchrony in the video condition in general but being able to see the partner’s facial reaction in
real-time significantly helped the dyads synchronize their facial expressions when they are reacting
to the outcome of “CD/DC”.
6.1.3 Discussion and Conclusion
We investigated dyadic synchrony in the Iterated Prisoner’s Dilemma task using Dynamic Time
Warping (DTW) to align frame-by-frame expressivity scores and facial factors. The normalized
DTW cost measured reversed synchrony. Validity analyses showed genuine dyadic synchrony in
real dyads but not in randomly paired dyads, regardless of video or still condition. This finding
provides support for hypothesis H2.
64
We also find support for our hypothesis H1. Linear mixed-effects models revealed the impact
of condition (video versus still) and outcome (CC, DD, CD/DC) on synchrony. Our findings re-
veal higher synchrony in both expressivity and facial factors when individuals interact in the video
condition compared to the still condition. However, the two measures of synchrony are influenced
differently by the outcome, reflecting their distinct attributes. Synchrony of expressivity captures
the alignment of overall expressiveness or emotional intensity exhibited by both members of a
dyad. Importantly, this measure is valence-agnostic, meaning it assesses the degree to which in-
dividuals synchronize the level of their expressivity without differentiating between positive and
negative emotional qualities or specific facial expressions. Conversely, synchrony of facial factors
measures the alignment of specific facial components between individuals, providing a more nu-
anced, valence-aware assessment of synchrony. This measure captures the alignment of specific
facial expressions, allowing for a more detailed analysis of dyadic coordination.
Additionally, evidences also validate hypothesis H2a. The main effect of outcome on syn-
chrony of facial factors suggests that dyads exhibit the highest synchrony when their joint outcome
is mutual cooperation (CC)–a positively aligned outcome. This finding indicates that the positivity
and harmony of mutual cooperation are reflected in the alignment of facial factors. In contrast,
dyads exhibit lower synchrony when the joint decision is misaligned (CD/DC), where the outcome
is positive for one partner but negative for the other. The discrepancy in valence is reflected in the
reduced synchronization of facial factors in this scenario. With the valence-agnostic measure of
expressivity, we show that visual cues from the partner’s facial reactions appear to be especially
important for synchronizing emotional expressivity when the valence of outcome is misaligned.
This finding reinforces the idea that seeing the partner’s facial reaction in real-time provides valu-
able information for coordinating one’s own facial display, enhancing synchrony in emotional
expressivity.
Overall, our findings indicate that dyadic synchrony emerges from a combination of inter-
personal influence and exposure to common external stimuli. Dyads exhibit synchronized facial
65
expressions in response to a joint outcome, even in the absence of visual contact. Moreover, the op-
portunity to observe a partner’s real-time facial reactions plays a key role in facilitating synchrony,
particularly in situations where dyad members’ decisions are misaligned. Our findings highlight
the complex dynamics of dyadic synchrony, which vary according to the condition under which the
interaction takes place and the outcome of the interaction, and emphasize the insights offered by
both valence-aware and valence-agnostic measures of synchrony. Lastly, our research underscores
the importance for scholars to consider both interpersonal influence and shared external stimuli as
contributing factors to synchrony when examining the dynamics of dyadic interactions.
There are also some limitations in the current work that we hope to address in the future. The
study touches on the controversy of whether facial expressions are true indications of emotion (as
advocated by basic emotion theory [50]) or serve as communicative acts that convey social inten-
tions [145]. Given that people in the still condition could not see each other, they certainly were not
deliberately using their expressions to shape their partner’s behavior. Fridlund [21] proposed the
idea of “implicit sociality” to explain such expressions—that people spontaneously produce com-
municative displays when they know they are engaged in a joint task, even if their expressions are
not seen. We did not specifically consider the underlying motivations for facial displays. In future
work, it will be interesting to examine differences in facial displays across these conditions, includ-
ing whether people engage in less impression management. Understanding the interplay between
emotional expression, emotional regulation, and emotional mimicry could provide deeper insights
into the dynamics of social interactions and the factors that shape interpersonal communication
[146].
Another limitation of our study is that we did not fully account for the iterative nature of the
Iterated Prisoner’s Dilemma (IPD) task. In our analysis, we treated each round of the IPD as an
independent task, examining the dynamics between dyads based solely on the interactions within
that specific round. This approach may overlook the potential influence of prior interactions and
decisions on subsequent rounds. Future research could benefit from examining the impact of the
iterative nature of the IPD task on dyadic synchrony by considering the influence of prior rounds
66
on subsequent interactions. Analyzing the longitudinal patterns of behavior and the temporal evo-
lution of synchrony could provide valuable insights into the complex interplay of cooperation,
competition, and social dynamics in repeated social interactions.
67
Chapter 7
Conclusions and Future Work
This dissertation has made substantial contributions to affective computing by innovating auto-
matic facial analysis methods and using them to yield fundamental insights into the source and
function of facial expressions in face-to-face social interaction. The appraisal and interpersonal-
dynamics perspectives both emphasize the importance of emotional expressivity in interpreting
facial displays and facilitating social coordination in cooperative and competitive contexts. The
appraisal perspective posits that facial expressions are primarily driven by individuals’ appraisal of
events, providing insights into their internal emotional states. On the other hand, the interpersonal-
dynamics perspective asserts that facial expressions are, in a sense, directly caused by the partner’s
expressions due to synchrony, mimicry, or contagion. To fully grasp the nature of emotional ex-
pressions within social contexts, it is imperative to consider both the appraisal perspective and the
interpersonal-dynamics perspective in tandem. By examining these competing perspectives, this
dissertation highlights the multifaceted nature of facial expressions and their roles in shaping social
interactions.
By focusing on emotional expressivity as a construct and using appraisal theory as a frame-
work to capture the meaning of facial expressions, I have provided an alternative representation
of facial displays. I demonstrated that human observers could reliably annotate the expressivity of
individuals playing the prisoner’s dilemma, and that we could accurately predict these annotations
from visual features. Both the annotators and the learned models highlight the importance of facial
expressions, and facial dynamics in particular, in performing these inferences.
68
Validating the emotional expressivity model from appraisal perspective, I use appraisal theory
as a framework to capture the meaning of facial expressions. Rather than inferring subjective
emotional feelings, we examined the relationship between expressions and objective measures
of congruence, norm-compatibility, and the unexpectedness of events. Contrary to the “standard
model”, expressions that are claimed to indicate pleasures showed no association with how much
money an individual could earn from an outcome (i.e., goal-congruence) or the extent to which
the event was fair (i.e., norm-compatibility). The only appraisal variable we could predict from
facial expressions with confidence was the unexpectedness of events. We observed that people
were more expressive when the situation was more unexpected. My findings provide evidence that
facial expressions indicate appraisal (unexpectedness) but also offer pause to techniques that adopt
common sense interpretations of facial expressions: e.g., that smiles indicate pleasure. We also
show that our expressivity model is a superior choice to specific facial expressions. These insights
offer valuable contributions to affective computing and the understanding of social interaction
mechanisms.
An essential aspect of understanding emotional expressions in social contexts is the consider-
ation of the interpersonal-dynamics perspective, which emphasizes the mutual influence of emo-
tional displays between social actors. Research on facial mimicry, emotional contagion, and dyadic
synchrony has demonstrated the significance of interpersonal dynamics in shaping emotional ex-
pressions and promoting social cohesion, communication, and cooperation. However, traditional
affective computing approaches often interpret facial expressions as manifestations of an individ-
ual’s internal appraisal without fully considering the presence or absence of a social situation.
By validating the emotional expressivity model from an interpersonal-dynamics perspective, I ex-
amine synchrony of expressivity in two experimental conditions: a “still” condition where dyads
only see a still image of their partner, and a “video” condition providing real-time visual access to
their partner’s facial reactions. Using Dynamic Time Warping, I assess synchrony of expressivity
in both real and randomly paired dyads. The results reveal that synchrony exists even without
visual cues, suggesting that echoing the appraisal perspective—shared experiences contribute to
69
synchrony. Furthermore, observing a partner’s facial reactions in the video condition significantly
enhances synchrony, particularly when dyad members make misaligned decisions. This finding
further validates the utility of my perceived expressivity model and underscores the importance of
interpersonal-dynamics in understanding and interpreting facial expressions in social situations.
However, several limitations need to be addressed. First, the accuracy of our automatic facial
analysis depends on the state-of-the-art software used to track facial expressions, and inaccuracies
due to complexities in naturalistic data might have affected the findings. Second, our results are
specific to social decision-making in the iterated prisoner’s dilemma task and might not be general-
izable to other social contexts or tasks. Third, by focusing on objective characteristics of events, we
may have missed subjective goals that could have elicited expressions. Fourth, our analysis treated
each round of the iterated prisoner’s dilemma as an independent task, potentially overlooking the
influence of prior interactions and decisions on subsequent rounds.
Future research directions could address these limitations and further explore the complexities
of social interactions. For instance, researchers can apply the insights gained from this study to a
broader range of social contexts and interactions. Our results are also specific to social decision
and with relatively low stakes (opportunity to win $100 USD). These results need to be extended
to other situations (e.g., non-social) and other, non-economic, sources of reward (e.g., [122]).
Additionally, incorporating other nonverbal cues, such as body language, vocal intonation, and
eye gaze, would provide a more comprehensive understanding of social interactions. Moreover,
investigating the influence of individual differences, cultural factors, and situational variables on
the role of facial expressions in social interactions could provide a richer understanding of the
factors that modulate the function and interpretation of facial expressions in diverse social settings.
Future work should also consider the underlying motivations for facial displays, examining dif-
ferences in facial displays across different conditions and the roles of emotional expression, emo-
tional regulation, and emotional mimicry in shaping interpersonal communication. Furthermore,
the iterative nature of the iterated prisoner’s dilemma task should be taken into account, examining
the impact of prior rounds on subsequent interactions and the longitudinal patterns of behavior and
70
temporal evolution of synchrony. The current analysis focused primarily on memory-1 strategies
to characterize events (though an exploratory fine-grained analysis suggests the findings will ex-
tend to richer representations, given sufficient data). Nor did we consider how prior or concurrent
expressions of the partner shaped expressions (rather, these were treated as random noise). This
limits our ability to analyze the iterative and interactive nature of the game. Future could consider
other representations, like the full history of decisions leading up to a transition, might yield fur-
ther insights. This analysis could offer valuable insights into the complex interplay of cooperation,
competition, and social dynamics in repeated social interactions.
In summary, this dissertation has provided valuable insights into the role of facial expressions in
social interactions and has contributed to the fields of affective computing and the understanding
of social interaction mechanisms. By acknowledging potential limitations and outlining future
research directions, this work paves the way for further exploration of the complexities of social
interactions and the development of more advanced affective computing systems that enhance our
ability to interpret and respond to emotional expressions, ultimately improving human-machine
interfaces, helping diagnose social disorders, and facilitating social science research.
71
References
1. Ekman, P. An argument for basic emotions. Cognition & emotion 6, 169–200 (1992).
2. Moors, A. Flavors of appraisal theories of emotion. Emotion Review 6, 303–307 (2014).
3. De Melo, C. M., Carnevale, P. J., Read, S. J. & Gratch, J. Reading people’s minds from
emotion expressions in interdependent decision making. Journal of personality and social
psychology 106, 73 (2014).
4. Van Kleef, G. A. How emotions regulate social life: The emotions as social information
(EASI) model. Current directions in psychological science 18, 184–188 (2009).
5. Vinkemeier, D., Valstar, M. & Gratch, J. Predicting folds in poker using action unit detec-
tors and decision trees in 2018 13th IEEE International Conference on Automatic Face &
Gesture Recognition (FG 2018) (2018), 504–511.
6. Hoegen, R., Stratou, G. & Gratch, J. Incorporating emotion perception into opponent mod-
eling for social dilemmas in Proceedings of the 16th Conference on Autonomous Agents and
MultiAgent Systems (2017), 801–809.
7. Mussel, P., G¨ oritz, A. S. & Hewig, J. The value of a smile: Facial expression affects ultimatum-
game responses. Judgment & Decision Making 8 (2013).
8. Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M. & Pollak, S. D. Emotional expres-
sions reconsidered: challenges to inferring emotion from human facial movements. Psycho-
logical Science in the Public Interest 20, 1–68 (2019).
9. Jack, R. E., Garrod, O. G., Yu, H., Caldara, R. & Schyns, P. G. Facial expressions of emotion
are not culturally universal. Proceedings of the National Academy of Sciences 109 (2012).
10. Du, S., Tao, Y . & Martinez, A. M. Compound facial expressions of emotion. Proceedings
of the National Academy of Sciences 111, E1454–E1462 (2014).
11. Fridlund, A. J. The new ethology of human facial expressions. The psychology of facial
expression 103 (1997).
12. Scarantino, A. How to do things with emotional expressions: The theory of affective prag-
matics. Psychological Inquiry 28, 165–185 (2017).
72
13. Reisenzein, R., B¨ ordgen, S., Holtbernd, T. & Matz, D. Evidence for strong dissociation
between emotion and facial displays: The case of surprise. Journal of personality and social
psychology 91, 295 (2006).
14. Reisenzein, R., Studtmann, M. & Horstmann, G. Coherence between emotion and facial
expression: Evidence from laboratory experiments. Emotion Review 5, 16–23 (2013).
15. Ambadar, Z., Cohn, J. F. & Reed, L. I. All smiles are not created equal: Morphology and
timing of smiles perceived as amused, polite, and embarrassed/nervous. Journal of nonver-
bal behavior 33, 17–34 (2009).
16. Niedenthal, P. M., Mermillod, M., Maringer, M. & Hess, U. The Simulation of Smiles
(SIMS) model: Embodied simulation and the meaning of facial expression. Behavioral and
brain sciences 33, 417–433 (2010).
17. Hoque, M. & Picard, R. W. Acted vs. natural frustration and delight: Many people smile in
natural frustration in Face and Gesture 2011 (2011), 354–359.
18. Lei, S. & Gratch, J. Smiles Signal Surprise in a Social Dilemma in 2019 8th International
Conference on Affective Computing and Intelligent Interaction (ACII) (2019), 627–633.
19. Gross, J. J. & Mu˜ noz, R. F. Emotion regulation and mental health. Clinical psychology:
Science and practice 2, 151–164 (1995).
20. Zaki, J. & Williams, W. C. Interpersonal emotion regulation. Emotion 13, 803 (2013).
21. Fridlund, A. J. Sociality of solitary smiling: Potentiation by an implicit audience. Journal
of personality and social psychology 60, 229 (1991).
22. Barrett, L. F. The theory of constructed emotion: an active inference account of interocep-
tion and categorization. Social cognitive and affective neuroscience 12, 1–23 (2017).
23. Burgoon, J. K. & Le Poire, B. A. Nonverbal cues and interpersonal judgments: Participant
and observer perceptions of intimacy, dominance, composure, and formality. Communica-
tions Monographs 66, 105–124 (1999).
24. Bernieri, F. J., Gillis, J. S., Davis, J. M. & Grahe, J. E. Dyad rapport and the accuracy of
its judgment across situations: A lens model analysis. Journal of Personality and Social
Psychology 71, 110 (1996).
25. Kring, A. M., Smith, D. A. & Neale, J. M. Individual differences in dispositional expres-
siveness: development and validation of the Emotional Expressivity Scale. Journal of per-
sonality and social psychology 66, 934 (1994).
26. Boone, R. T. & Buck, R. Emotional expressivity and trustworthiness: The role of nonver-
bal behavior in the evolution of cooperation. Journal of Nonverbal Behavior 27, 163–182
(2003).
73
27. Lei, S., Stefanov, K. & Gratch, J. Emotion or expressivity? an automated analysis of non-
verbal perception in a social dilemma in 2020 15th IEEE International Conference on Au-
tomatic Face and Gesture Recognition (FG 2020)(FG) (2020), 770–777.
28. Lin, V ., Girard, J. M., Sayette, M. A. & Morency, L.-P. Toward Multimodal Modeling of
Emotional Expressiveness in Proceedings of the 2020 International Conference on Multi-
modal Interaction (2020), 548–557.
29. Zloteanu, M. & Krumhuber, E. G. Expression authenticity: The role of genuine and delib-
erate displays in emotion perception. Frontiers in Psychology 11, 611248 (2021).
30. Scherer, K. R., Mortillaro, M., Rotondi, I., Sergi, I. & Trznadel, S. Appraisal-driven facial
actions as building blocks for emotion inference. Journal of personality and social psychol-
ogy 114, 358 (2018).
31. Marsella, S. C. & Gratch, J. EMA: A process model of appraisal dynamics. Cognitive Sys-
tems Research 10, 70–90 (2009).
32. Tekoppele, J. L., De Hooge, I. E. & van Trijp, H. C. We’ve got a situation here!–How
situation-perception dimensions and appraisal dimensions of emotion overlap. Personality
and Individual Differences 200, 111878 (2023).
33. Shore, D. M. & Parkinson, B. Interpersonal effects of strategic and spontaneous guilt com-
munication in trust games. Cognition and Emotion 32, 1382–1390 (2018).
34. Rychlowska, M., van der Schalk, J., Gratch, J., Breitinger, E. & Manstead, A. S. Beyond
actions: Reparatory effects of regret in intergroup trust games. Journal of Experimental
Social Psychology 82, 74–84 (2019).
35. Van Kleef, G. A., De Dreu, C. K. & Manstead, A. S. The interpersonal effects of anger and
happiness in negotiations. Journal of personality and social psychology 86, 57 (2004).
36. Fehr, E. & Schmidt, K. M. A theory of fairness, competition, and cooperation. The quarterly
journal of economics 114, 817–868 (1999).
37. Solomon, R. C. The philosophy of emotions. M. Lewic & Haviland, The Handbook of emo-
tions, 3 (1993).
38. Dukes, D., Abrams, K., Adolphs, R., Ahmed, M. E., Beatty, A., Berridge, K. C., et al. The
rise of affectivism. Nature human behaviour 5, 816–820.
39. Fredrickson, B. L. & Joiner, T. Positive emotions trigger upward spirals toward emotional
well-being. Psychological science 13, 172–175 (2002).
40. Lyubomirsky, S., King, L. & Diener, E. The benefits of frequent positive affect: Does hap-
piness lead to success? Psychological bulletin 131, 803 (2005).
74
41. Kappas, A. Emotion and regulation are one! Emotion Review 3, 17–25 (2011).
42. Keltner, D. & Haidt, J. Social functions of emotions at four levels of analysis. Cognition &
Emotion 13, 505–521 (1999).
43. Prinz, J. The emotional basis of moral judgments. Philosophical explorations 9, 29–43
(2006).
44. Damasio, A. R. Descartes’ error (Random House, 2006).
45. Lerner, J. S., Li, Y ., Valdesolo, P. & Kassam, K. S. Emotion and decision making. Annual
review of psychology 66, 799–823 (2015).
46. Adolphs, R., Mlodinow, L. & Barrett, L. F. What is an emotion? Current Biology 29, R1060–
R1064 (2019).
47. Adolphs, R. A Science of Emotion without Feelings 2022. http://emotionresearcher.
com/a-science-of-emotion-without-feelings/.
48. Adolphs, R. & Andler, D. Investigating emotions as functional states distinct from feelings.
Emotion Review 10, 191–201 (2018).
49. Adolphs, R. How should neuroscience study emotions? By distinguishing emotion states,
concepts, and experiences. Social cognitive and affective neuroscience 12, 24–31 (2017).
50. Ekman, P. et al. Basic emotions. Handbook of cognition and emotion 98, 16 (1999).
51. Keltner, D., Sauter, D., Tracy, J. & Cowen, A. Emotional expression: Advances in basic
emotion theory. Journal of nonverbal behavior 43, 133–160 (2019).
52. McDuff, D., El Kaliouby, R., Kodra, E. & Picard, R. Measuring voter’s candidate preference
based on affective responses to election debates in 2013 Humaine Association Conference
on Affective Computing and Intelligent Interaction (2013), 369–374.
53. Bosch, N., D’mello, S. K., Ocumpaugh, J., Baker, R. S. & Shute, V . Using video to au-
tomatically detect learner affect in computer-enabled classrooms. ACM Transactions on
Interactive Intelligent Systems (TiiS) 6, 1–26 (2016).
54. https://www.retorio.com.
55. Scherer, K. R. V ocal communication of emotion: A review of research paradigms. Speech
communication 40, 227–256 (2003).
56. Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., et al. A review of emotion recognition
using physiological signals. Sensors 18, 2074 (2018).
75
57. Ji, S., Pan, S., Li, X., Cambria, E., Long, G. & Huang, Z. Suicidal ideation detection: A
review of machine learning methods and applications. IEEE Transactions on Computational
Social Systems 8, 214–226 (2020).
58. Brahnam, S., Nanni, L. & Sexton, R. in Advanced Computational Intelligence Paradigms
in Healthcare–1 225–253 (Springer, 2007).
59. Kosti, R., Alvarez, J. M., Recasens, A. & Lapedriza, A. Context based emotion recognition
using emotic dataset. IEEE transactions on pattern analysis and machine intelligence 42,
2755–2766 (2019).
60. Scherer, K. R. Appraisal theory. (1999).
61. Moors, A., Ellsworth, P. C., Scherer, K. R. & Frijda, N. H. Appraisal theories of emotion:
State of the art and future development. Emotion Review 5, 119–124 (2013).
62. Ambady, N. & Rosenthal, R. Thin slices of expressive behavior as predictors of interper-
sonal consequences: A meta-analysis. Psychological bulletin 111, 256 (1992).
63. Hernandez, J., Liu, Z., Hulten, G., DeBarr, D., Krum, K. & Zhang, Z. Measuring the en-
gagement level of TV viewers in 2013 10th IEEE International Conference and Workshops
on Automatic Face and Gesture Recognition (FG) (2013), 1–7.
64. Hamm, J., Kohler, C. G., Gur, R. C. & Verma, R. Automated facial action coding system
for dynamic analysis of facial expressions in neuropsychiatric disorders. Journal of neuro-
science methods 200, 237–256 (2011).
65. Kring, A. M. & Neale, J. M. Do schizophrenic patients show a disjunctive relationship
among expressive, experiential, and psychophysiological components of emotion? Journal
of abnormal psychology 105, 249 (1996).
66. Girard, J. M., Cohn, J. F., Mahoor, M. H., Mavadati, S. M., Hammal, Z. & Rosenwald, D. P.
Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses.
Image and vision computing 32, 641–647 (2014).
67. Trevisan, D. A., Hoskyn, M. & Birmingham, E. Facial Expression Production in Autism: A
Meta-Analysis. Autism Research 11, 1586–1601 (2018).
68. Georgescu, A. L., Kuzmanovic, B., Roth, D., Bente, G. & V ogeley, K. The use of virtual
characters to assess and train non-verbal communication in high-functioning autism. Fron-
tiers in human neuroscience 8, 807 (2014).
69. Buck, R., Goldman, C., Easton, C. & Smith, N. Social learning and emotional education.
Emotions in psychopathology: Theory and research, 298–314 (1998).
70. Tickle-Degnen, L. The Interpersonal communication rating protocol: A manual for measur-
ing individual expressive behavior. Tufts University (2010).
76
71. Gross, J. J. & John, O. P. Revealing feelings: facets of emotional expressivity in self-reports,
peer ratings, and behavior. Journal of personality and social psychology 72, 435 (1997).
72. Neubauer, C., Mozgai, S., Chuang, B., Woolley, J. & Scherer, S. Manual and automatic
measures confirm—Intranasal oxytocin increases facial expressivity in 2017 Seventh Inter-
national Conference on Affective Computing and Intelligent Interaction (ACII) (2017), 229–
235.
73. Wu, P., Gonzalez, I., Patsis, G., Jiang, D., Sahli, H., Kerckhofs, E., et al. Objectifying fa-
cial expressivity assessment of Parkinson’s patients: preliminary study. Computational and
mathematical methods in medicine 2014 (2014).
74. Joshi, A., Tickle-Degnen, L., Gunnery, S., Ellis, T. & Betke, M. Predicting active facial
expressivity in people with Parkinson’s disease in Proceedings of the 9th ACM International
Conference on PErvasive Technologies Related to Assistive Environments (2016), 13.
75. Lin, V ., Girard, J. M. & Morency, L.-P. Context-Dependent Models for Predicting and Char-
acterizing Facial Expressiveness. arXiv preprint arXiv:1912.04523 (2019).
76. Wood, A., Lipson, J., Zhao, O. & Niedenthal, P. Forms and functions of affective synchrony.
Handbook of embodied psychology: Thinking, feeling, and acting, 381–402 (2021).
77. Zhao, F., Wood, A., Mutlu, B. & Niedenthal, P. Faces synchronize when communication
through spoken language is prevented. Emotion (2022).
78. Kang, O. & Wheatley, T. Pupil dilation patterns reflect the contents of consciousness. Con-
sciousness and Cognition 35, 128–135 (2015).
79. Fujiwara, K., Hoegen, R., Gratch, J. & Dunbar, N. E. Synchrony facilitates altruistic deci-
sion making for non-human avatars. Computers in Human Behavior 128, 107079 (2022).
80. Sakoe, H. & Chiba, S. Dynamic programming algorithm optimization for spoken word
recognition. IEEE transactions on acoustics, speech, and signal processing 26, 43–49 (1978).
81. Rabiner, L. & Juang, B.-H. Fundamentals of speech recognition (Prentice-Hall, Inc., 1993).
82. Giorgino, T. Computing and visualizing dynamic time warping alignments in R: the dtw
package. Journal of statistical Software 31, 1–24 (2009).
83. Kacorri, H. & Huenerfauth, M. Evaluating a dynamic time warping based scoring algorithm
for facial expressions in ASL animations in Proceedings of SLPAT 2015: 6th Workshop on
Speech and Language Processing for Assistive Technologies (2015), 29–35.
84. Van Der Zee, S. The effect of cognitive load on nonverbal mimicry in interview settings.
Unpublished doctoral thesis). Lancaster, UK: Lancaster University (2013).
77
85. M¨ uller, P., Huang, M. X. & Bulling, A. Detecting low rapport during natural interactions
in small groups from non-verbal behaviour in 23rd International Conference on Intelligent
User Interfaces (2018), 153–164.
86. Chikersal, P., Tomprou, M., Kim, Y . J., Woolley, A. W. & Dabbish, L. Deep structures of
collaboration: Physiological correlates of collective intelligence and group satisfaction in
Proceedings of the 2017 ACM conference on computer supported cooperative work and
social computing (2017), 873–888.
87. Hoegen, R., Stratou, G., Lucas, G. M. & Gratch, J. Comparing behavior towards humans
and virtual humans in a social dilemma in International Conference on Intelligent Virtual
Agents (2015), 452–460.
88. Koo, T. & Li, M. A guideline of selecting and reporting intraclass correlation coefficients
for reliability research. J Chiropr Med. 2016; 15 (2): 155–63
89. Ramos, J. et al. Using tf-idf to determine word relevance in document queries in Proceedings
of the first instructional conference on machine learning 242 (2003), 133–142.
90. Iglewicz, B. & Hoaglin, D. C. How to detect and handle outliers (Asq Press, 1993).
91. Leys, C., Ley, C., Klein, O., Bernard, P. & Licata, L. Detecting outliers: Do not use standard
deviation around the mean, use absolute deviation around the median. Journal of Experi-
mental Social Psychology 49, 764–766 (2013).
92. Ekman, P. Facial action coding system (1977).
93. Littlewort, G., Whitehill, J., Wu, T., Fasel, I., Frank, M., Movellan, J., et al. The computer
expression recognition toolbox (CERT) in Face and gesture 2011 (2011), 298–305.
94. Stratou, G., Van Der Schalk, J., Hoegen, R. & Gratch, J. Refactoring facial expressions:
An automatic analysis of natural occurring facial expressions in iterative social dilemma in
2017 Seventh International Conference on Affective Computing and Intelligent Interaction
(ACII) (2017), 427–433.
95. Ambadar, Z., Schooler, J. W. & Cohn, J. F. Deciphering the enigmatic face: The importance
of facial dynamics in interpreting subtle facial expressions. Psychological science 16, 403–
410 (2005).
96. Krumhuber, E., Manstead, A. S., Cosker, D., Marshall, D., Rosin, P. L. & Kappas, A. Facial
dynamics as indicators of trustworthiness and cooperative behavior. Emotion 7, 730 (2007).
97. Jeni, L. A., Cohn, J. F. & Kanade, T. Dense 3D face alignment from 2D videos in real-time
in 2015 11th IEEE international conference and workshops on automatic face and gesture
recognition (FG) 1 (2015), 1–8.
78
98. Mohammad, S., Stefanov, K., Kang, S.-H., Ondras, J. & Gratch, J. Multimodal Analysis
and Estimation of Intimate Self-Disclosure in 2019 International Conference on Multimodal
Interaction (ACM Press, New York, New York, USA, 2019), 59–68.
99. Farneb¨ ack, G. Two-frame motion estimation based on polynomial expansion in Scandina-
vian conference on Image analysis (2003), 363–370.
100. Hammal, Z., Wallace, E. R., Speltz, M. L., Heike, C. L., Birgfeld, C. B. & Cohn, J. F. Dy-
namics of Face and Head Movement in Infants with and without Craniofacial Microsomia:
An Automatic Approach. Plastic and Reconstructive Surgery Global Open 7 (2019).
101. Tr´ emeau, F., Malaspina, D., Duval, F., Corrˆ ea, H., Hager-Budny, M., Coin-Bariou, L., et al.
Facial expressiveness in patients with schizophrenia compared to depressed patients and
nonpatient comparison subjects. American Journal of Psychiatry 162, 92–101 (2005).
102. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statis-
tical Society: Series B (Methodological) 58, 267–288 (1996).
103. Breiman, L. Random forests. Machine learning 45, 5–32 (2001).
104. Cawley, G. C. & Talbot, N. L. On over-fitting in model selection and subsequent selec-
tion bias in performance evaluation. Journal of Machine Learning Research 11, 2079–2107
(2010).
105. Koppensteiner, M. & Grammer, K. Motion patterns in political speech and their influence
on personality ratings. Journal of Research in Personality 44, 374–379 (2010).
106. Stratou, G., Hoegen, R., Lucas, G. & Gratch, J. Emotional signaling in a social dilemma: An
automatic analysis in 2015 International Conference on Affective Computing and Intelligent
Interaction (ACII) (2015), 180–186.
107. Houlihan, S. D., Kleiman-Weiner, M., Tenenbaum, J. & Saxe, R. A generative model of peo-
ple’s intuitive theory of emotions: inverse planning in rich social games. in CogSci (2018).
108. Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in
the Prisoner’s Dilemma game. Nature 364, 56 (1993).
109. Hauert, C. & Schuster, H. G. Effects of increasing the number of players and memory size
in the iterated Prisoner’s Dilemma: a numerical approach. Proceedings of the Royal Society
of London. Series B: Biological Sciences 264, 513–519 (1997).
110. Mellers, B. A., Schwartz, A., Ho, K. & Ritov, I. . Decision affect theory: How we feel about
risky options. Psychological Science 8, 423–429 (1997).
111. Busemeyer, J. R. & Townsend, J. T. Decision field theory: a dynamic-cognitive approach to
decision making in an uncertain environment. Psychological review 100, 432 (1993).
79
112. Scherer, K. R. & Ellgring, H. Are facial expressions of emotion produced by categorical
affect programs or dynamically driven by appraisal? Emotion 7, 113–130 (2007).
113. Kahneman, D. & Tversky, A. in Handbook of the fundamentals of financial decision mak-
ing: Part I 99–127 (World Scientific, 2013).
114. Ortony, A., Clore, G. L. & Collins, A. The cognitive structure of emotions (Cambridge
university press, 1990).
115. Jacobs, E., Broekens, J. & Jonker, C. Joy, distress, hope, and fear in reinforcement learning
in Proceedings of the 2014 international conference on Autonomous agents and multi-agent
systems (2014), 1615–1616.
116. Ellsworth, P. C. & Scherer, K. R. Appraisal processes in emotion. (Oxford University Press,
2003).
117. Shannon, C. E. & Weaver, W. A mathematical theory of communication (University of Illi-
nois Press, Urbana, IL, 1948).
118. Itti, L. & Baldi, P. F. Bayesian surprise attracts human attention in Advances in neural
information processing systems (2005), 547–554.
119. De Melo, C. M. & Gratch, J. People show envy, not guilt, when making decisions with ma-
chines in 2015 International Conference on Affective Computing and Intelligent Interaction
(ACII) (2015), 315–321.
120. Akaike, H. in Selected Papers of Hirotugu Akaike 215–222 (Springer, 1974).
121. Hoegen, R., Gratch, J., Parkinson, B. & Shore, D. Signals of Emotion Regulation in a Social
Dilemma: Detection from Face and Context in 8th International Conference on. Affective
Computing & Intelligent Interaction (IEEE, Cambridge, UK, 2019).
122. Dehghani, M., Carnevale, P. J. & Gratch, J. Interpersonal effects of expressed anger and
sorrow in morally charged negotiation. Judgment & Decision Making 9 (2014).
123. Kappas, A. Social regulation of emotion: messy layers. Frontiers in psychology 4, 51.
124. Krumhuber, E. G. & Kappas, A. More what Duchenne smiles do, less what they express.
Perspectives on Psychological Science 17, 1566–1575.
125. McDuff, D., El Kaliouby, R., Cohn, J. F. & Picard, R. W. Predicting ad liking and pur-
chase intent: Large-scale analysis of facial responses to ads. IEEE Transactions on Affective
Computing 6, 223–235 (2014).
126. D’Mello, S., Olney, A., Williams, C. & Hays, P. Gaze tutor: A gaze-reactive intelligent
tutoring system. International Journal of human-computer studies 70, 377–398 (2012).
80
127. Hess, U., Philippot, P. & Blairy, S. Mimicry: Facts and fiction. The social context of nonver-
bal behavior, 213–241 (1999).
128. Hess, U. & Blairy, S. Facial mimicry and emotional contagion to dynamic emotional facial
expressions and their influence on decoding accuracy. International journal of psychophys-
iology 40, 129–141 (2001).
129. Hatfield, E., Cacioppo, J. T. & Rapson, R. L. Emotional contagion. Current directions in
psychological science 2, 96–100 (1993).
130. Waters, S. F., West, T. V . & Mendes, W. B. Stress contagion: Physiological covariation
between mothers and infants. Psychological science 25, 934–942 (2014).
131. Kinreich, S., Djalovski, A., Kraus, L., Louzoun, Y . & Feldman, R. Brain-to-brain synchrony
during naturalistic social interactions. Scientific reports 7, 17060 (2017).
132. Grafsgaard, J., Duran, N., Randall, A., Tao, C. & D’Mello, S. Generative multimodal models
of nonverbal synchrony in close relationships in 2018 13th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018) (2018), 195–202.
133. Bernieri, F. J. Coordinated movement and rapport in teacher-student interactions. Journal
of Nonverbal behavior 12, 120–138 (1988).
134. Ramseyer, F. & Tschacher, W. Nonverbal synchrony in psychotherapy: coordinated body
movement reflects relationship quality and outcome. Journal of consulting and clinical psy-
chology 79, 284 (2011).
135. Wiltermuth, S. S. & Heath, C. Synchrony and cooperation. Psychological science 20, 1–5
(2009).
136. Haataja, E., Malmberg, J. & J¨ arvel¨ a, S. Monitoring in collaborative learning: Co-occurrence
of observed behavior and physiological synchrony explored. Computers in Human Behavior
87, 337–347 (2018).
137. Valdesolo, P., Ouyang, J. & DeSteno, D. The rhythm of joint action: Synchrony promotes
cooperative ability. Journal of experimental social psychology 46, 693–695 (2010).
138. Parkinson, C., Kleinbaum, A. M. & Wheatley, T. Similar neural responses predict friend-
ship. Nature communications 9, 332 (2018).
139. McDuff, D., Girard, J. M. & Kaliouby, R. e. Large-scale observational evidence of cross-
cultural differences in facial behavior. Journal of Nonverbal Behavior 41, 1–19 (2017).
140. Lei, S. & Gratch, J. Emotional Expressivity is a Reliable Signal of Surprise. IEEE Transac-
tions on Affective Computing (2023).
81
141. Baltrusaitis, T., Zadeh, A., Lim, Y . C. & Morency, L.-P. Openface 2.0: Facial behavior
analysis toolkit in 2018 13th IEEE international conference on automatic face & gesture
recognition (FG 2018) (2018), 59–66.
142. Muda, L., Begam, M. & Elamvazuthi, I. V oice recognition algorithms using mel frequency
cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint
arXiv:1003.4083 (2010).
143. M¨ uller, M. Dynamic time warping. Information retrieval for music and motion, 69–84
(2007).
144. Schmidt, K. L. & Cohn, J. F. Dynamics of facial expression: Normative characteristics and
individual differences in IEEE International Conference on Multimedia and Expo, 2001.
ICME 2001. (2001), 547–550.
145. Fridlund, A. J. Human facial expression: An evolutionary view (Academic press, 2014).
146. Hess, U. & Fischer, A. Emotional mimicry as social regulation. Personality and social psy-
chology review 17, 142–157 (2013).
82
Abstract (if available)
Abstract
In this dissertation, I innovate automatic facial analysis methods and use them to yield fundamental insights into the source and function of facial expressions in face-to-face social interaction. Facial expressions play an essential role in shaping human social behavior. The ability to accurately rec- ognize, interpret and respond to emotional expressions is a hallmark of human social intelligence, and automating this ability is a key focus of computer science research. Machines that possess this skill could enhance the capabilities of human-machine interfaces, help diagnose social disorders, improve predictive models of human behavior, or serve as methodological tools in social science research. My dissertation focuses on this last application. Specifically, I examine two competing perspectives on the social meaning of facial expressions and show that automated methods can yield novel insights.
In terms of technical innovation, I develop novel methods to interpret the meaning of facial expressions in terms of facial expressivity. Within computer science, facial expression analysis has been heavily influenced by the “basic emotion theory” which claims that expressions reflect the activation of a small number of discrete emotions (e.g., joy, hope, or fear). Thus, automatic emotion recognition methods seek to classify facial displays into these discrete categories to form insights into how an individual is interpreting a situation and what they will do next. However, more recent psychological findings have largely discredited this theory, highlighting that people show a wide range of idiosyncratic expressions in response to the same event. Motivated by this more recent research, I develop supervised machine learning models to automatically measure perceived expressivity from video data.
In terms of theoretical innovation, I demonstrate how automatic expressivity recognition yields insight into alternative psychological theories on the nature of emotional expressions in social tasks by analyzing a large corpus of people engaged in the iterated prisoner’s dilemma task. This is a canonical task used to test theories of social cognition and the function of facial expressions.
First, I explore the appraisal perspective which claims that expressions reflect an individual’s appraisal of how actions within a social task relate to their goals. I find that by analyzing facial ex- pressions produced by participants, a computer can reliably predict how actions in the task impact participants’ appraisals (specifically, we predict if the action was unexpected). Further, we show that automatic expressivity recognition dramatically improves the accuracy of these predictions over traditional emotion recognition. This lends support to the theory that expressions are, in a sense, directly caused by the social task.
Second, I explore a contrasting perspective, interpersonal-dynamics theory, which argues that expressions are, in a sense, directly caused by the partner’s expressions. This perspective empha- sizes processes such as synchrony, mimicry, and contagion to explain moment-to-moment expres- sions. The appraisal perspective counters that any observed synchrony simply reflects a shared appraisal of social actions. I use automatic expressivity recognition to contrast these perspectives. Specifically, I analyze synchrony in two experimental conditions: a “still” condition where dyads see only a still image of their partner, and a “video” condition with real-time visual access to their partner’s facial reactions. Using Dynamic Time Warping, I evaluate synchrony in both real and randomly paired dyads. Results reveal that synchrony exists even without visual cues, suggesting that shared appraisals contribute to synchrony, but that synchrony significantly increases when the partner is visible. This suggests that both perspectives must be integrated to best explain facial displays.
In conclusion, both appraisal and interpersonal-dynamics perspectives reinforce the signifi- cance of emotional expressivity in interpreting facial displays and fostering social coordination in cooperative and competitive contexts. These insights offer valuable contributions to affective com- puting and the understanding of social interaction mechanisms. I also discuss potential limitations and future research directions for further exploring the complexities of social interactions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Decoding situational perspective: incorporating contextual influences into facial expression perception modeling
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Multimodality, context and continuous dynamics for recognition and analysis of emotional states, and applications in healthcare
PDF
Towards social virtual listeners: computational models of human nonverbal behaviors
PDF
Emotional appraisal in deep reinforcement learning
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
Towards generalizable expression and emotion recognition
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
Computational foundations for mixed-motive human-machine dialogue
PDF
Computational modeling of human behavior in negotiation and persuasion: the challenges of micro-level behavior annotations and multimodal modeling
PDF
Natural language description of emotion
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
Probabilistic framework for mining knowledge from georeferenced social annotation
PDF
Parasocial consensus sampling: modeling human nonverbal behaviors from multiple perspectives
PDF
Emotions in engineering: methods for the interpretation of ambiguous emotional content
PDF
Understanding the relationship between goals and attention
PDF
Eye-trace signatures of clinical populations under natural viewing
PDF
Modeling social causality and social judgment in multi-agent interactions
PDF
Computational narrative models of character representations to estimate audience perception
Asset Metadata
Creator
Lei, Su
(author)
Core Title
Building and validating computational models of emotional expressivity in a natural social task
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2023-08
Publication Date
08/02/2023
Defense Date
05/04/2023
Publisher
University of Southern California. Libraries
(digital)
Tag
affecting computing,facial expressions,OAI-PMH Harvest,synchrony
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gratch, Jonathan (
committee chair
), Itti, Laurent (
committee member
), Narayanan, Shri (
committee member
)
Creator Email
slei@ict.usc.edu,sulei@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113291867
Unique identifier
UC113291867
Identifier
etd-LeiSu-12180.pdf (filename)
Legacy Identifier
etd-LeiSu-12180
Document Type
Dissertation
Rights
Lei, Su
Internet Media Type
application/pdf
Type
texts
Source
20230802-usctheses-batch-1077
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Repository Email
cisadmin@lib.usc.edu
Tags
affecting computing
facial expressions
synchrony