Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Decoding situational perspective: incorporating contextual influences into facial expression perception modeling
(USC Thesis Other)
Decoding situational perspective: incorporating contextual influences into facial expression perception modeling
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Decoding Situational Perspective:
Incorporating Contextual Influences into Facial Expression Perception Modeling
by
Jessie Hoegen
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2024
Copyright 2024 Jessie Hoegen
Dedication
To Laska.
ii
Acknowledgements
The road to finishing this dissertation has been long and winding, and I am so grateful to everyone who
has supported me throughout these years.
Jon, thank you so much for your guidance and patience with me. Your support has been invaluable.
I want to extend my gratitude to the members of my dissertation committee, Emilio and Giorgio for
their valuable guidance and insight on this manuscript. As well as, Mohammad, Shri and Sven, for providing generous feedback on my thesis proposal.
Throughout my studies I have had the fortune of working alongside many mentors and co-authors who
have significantly influenced my work: Alan, Bin, Brian, Daniel, Danielle, David, Deepali, Gale, Giota, Job,
Kalin, Ken, Mary, Norah, and Panagiotis.
This thesis would not have been possible without the support of my ICT fellows: My lab sis Su, Cathy,
Eli, and Mathieu. We may have all been spread out across different corners of the earth, but I will never
forget the roadhouse.
To my parents and my sister, thank you for always believing in me.
Finally, I want to thank all the friends who have stood by me throughout these years: Aike, Alexandra,
Ali, Andrea, Ashley, Candie, Caroline, Chloe, Chris, Emmanuel, Erik, Gaellann, Hannes, Jack, James, Jessie,
Joel, Johnathan, Jonah, Jonathan, Jun, Karan, Koki, Leili, Lina, Lisa, Lisanne, Melissa, Mnirnal, Ramesh,
Sanne, Sarah, Silu, Silvia, Young, and Zahra.
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Emotion as Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Importance of Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Emotion in Social Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3: How Context Shapes the Perception of Expressions . . . . . . . . . . . . . . . . . . . . 12
3.1 Iterated Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 The Split or Steal Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Video-cued Recall Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2.1 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2.2 Behavioral Data and Context . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2.3 Video-Cued Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2.4 Automatic Facial Expression Annotations . . . . . . . . . . . . . . . . . . 22
3.3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Impact of the Face Alone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 Impact of the Context Alone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.3 Face and Context Combined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 4: The influence of Context on the Illocutionary Function of Expressions . . . . . . . . . . 32
4.1 The DyNego Negotiation Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Negotiation Studies Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iv
4.1.2 General Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.3 Negotiation structure and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Audio Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1.1 Automatic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1.2 Event Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 Visual Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.1 Expressive events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 Offer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Modeling the illocutionary function of expressions . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Predicting Expressive Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 5: The influence of Context on the Perlocutionary Function of Expressions . . . . . . . . . 58
5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Split or Steal Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.2 Automatic Behavior Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Perlocutionary Act . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.4 Model Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Social Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Chapter 6: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
v
List of Tables
3.1 A traditional payoff matrix of an iterated prisoner’s dilemma . . . . . . . . . . . . . . . . . 15
3.2 Payoffs for each combination of choices in the iterated prisoner’s dilemma game . . . . . . 19
4.1 Distributive condition payoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Integrative condition payoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Participant verbal response time (seconds) to agent speech . . . . . . . . . . . . . . . . . . 44
4.4 Classification results of majority vote and Random Forest Classifier (RFC) on 3-class
(accept, reject, counter) problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1 Round payoffs in the iterated prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Overview of extracted features used for modeling . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Performance of baselines and different models . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 F1-score performance of state based models . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 Examples of optimal strategies when using a neutral expression and manipulating certain
decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
vi
List of Figures
2.1 Analyzing expressions as communicative acts. . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 A representation of the decision making process in a prisoner’s dilemma. . . . . . . . . . . 15
3.2 The game interface. Participants make choices by selecting either the “split” or “steal”
option each round. They can also see the number of tickets won by themselves and by the
other player as well as the other player’s webcam video (top right) and their own webcam
video (bottom right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Lens model of the correlations between the facial factors and participant ratings on
valence. Green lines show significant correlations. Dashed lines show factors where I
found significant interactions between expression and context. . . . . . . . . . . . . . . . . 25
3.4 Difference in valence per condition, showing display on the top and perception below.
Valence ratings were collected using a continuous valence scale from -50 (negative
valence) to +50 (positive valence) and averaged per round outcome. . . . . . . . . . . . . . 27
3.5 The regression coefficients for smile and frown based on gamestate for perceived valence. 29
4.1 The female and male versions of the VH used in the studies. . . . . . . . . . . . . . . . . . 36
4.2 A sample of expressive participants as captured by the webcam during the study. From
left to right: Cluster 2-smile, Cluster 3-brow raise, Cluster 4-mouth open, Cluster 5-lip
tighten. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Total occurrences of all agent acts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Heat map of the distribution of agent acts throughout negotiations. The x-axis represents
time throughout negotiation, with the start of the negotiation on the left. With the color
representing the ratio of acts occurring at a specific time (Dark red representing close to
all speech acts that have occurred, Darker blue representing a low rate). . . . . . . . . . . . 42
vii
4.5 Heat map of participant expressivity measure in reaction to each act over time. The x-axis
represents time throughout negotiation, with the start of the negotiation on the left.
With the color representing the expressivity (with darker hues of blue representing low
expressivity, lighter blue and red as high expressivity). The colored area represents the
standard error of the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Clustered expressivity by act. The Y-axis represents the ratio of each expression for the
speech act. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.7 Expressive reactions to various speech acts by the VH. . . . . . . . . . . . . . . . . . . . . 49
4.8 Participant mean expressivity towards different agent offers. Error bars represent standard
error. The X-axis is the value of the offer for the agent, i.e. far right agent receives 66% of
the items, value in parentheses are the total number of offers (offers values with less than
20 total occurrences were not considered). The Y-axis shows the participant’s expressivity.
This plot shows the curvilinear nature of expressivity towards offers based on values.
Regression analysis showed there was a significant curvilinear effect (p=0.002). . . . . . . 50
4.9 Participant response to offers by agent value with linear regression. Similar to Figure 4.8
the X-axis represents the offer value to the agent with the total number of occurrences
of an offer in parentheses (total number of offers are different compared to Figure 4.8
as offers with no direct participant response are not considered).The Y-axis shows the
percent of responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.10 Participant expressivity over time, averaging over data depending on whether the
participant accepts, rejects or presents a counter offer. Data is anchored around end of VH
utterance, showing the average expressivity 1 second before end of speech until 5 seconds
after. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.11 Feature importance of the random forest classifier. . . . . . . . . . . . . . . . . . . . . . . . 56
5.1 Probability distribution of AU12 used for the “player being exploited” (CD) and “player
exploits” (DC) models. A circle notes the AU12 evidence value where max{P(C)−P(D)}
is observed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
viii
Abstract
The work that I performed during my doctoral studies relates to facial expressions within the field of Affective Computing. In this field, facial expressions are a popular subject of study. State-of-the-art facial
expression models and emotion perception models have gone through leaps of improvements and nowadays match the accuracy of professionally trained expression coders. Because of these improvements,
one might expect that automatic emotion recognition has become an indispensable tool for analyzing and
predicting human social behavior. Indeed, psychological theories argue that emotional expressions serve
crucial social functions such as revealing intentions and shaping partner behavior. Yet these theoretical
benefits have largely failed to materialize within the field of affective computing.
There is now growing understanding that one of the obstacles to the advancement of affective computing is how the concept of emotion is typically represented within affective computing. Influenced by early
theories from psychology, expressions are often treated as universal and context-independent signifiers of
an underlying emotional state. This latent state is then assumed to shape subsequent human behavior. Yet
more recent psychological theories argue that expressions should be seen more like words and function
to coordinate social behavior. My dissertation embraces this latter view and explores its consequences for
affective computing.
Following recent “pragmatic” theories of emotional expressions, I adopt the perspective that expressions in social settings should be best treated like words. Like words, the meaning of expressions must
be seen as context dependent. Just as “bank” might refer to the side of a river or a financial institution, a
ix
smile might refer to pleasure or anger depending on the surrounding context. And like words, expressions
can be examined from multiple perspectives. We can consider the “author’s” perspective (why did this
person produce this expression? what was their intent? what does it signal they will do next?) but also the
“reader’s” perspective (how does this expression shape the observer’s emotions, intentions and actions?).
I illustrate the utility of this perspective for analyzing human social behavior. Focusing on a series of
social tasks such as social dilemmas and negotiations, I show how the interpretation of facial expressions
is shaped by context, and that expressions, when combined with context, can usefully predict the author’s
intentions and consequences for the reader. Together, this body of research makes several important contributions. First, I add to the growing body of research that questions the utility of context-free methods
for automatically recognizing emotional expressions. Second, from the perspective of the author, I show
how both expressions and context are necessary for predicting an author’s subsequent actions in a face-toface negotiation from their expressions. Thirdly, from the perspective of the reader, I show how emotional
expressions shape the readers actions in a social dilemma. Finally, I show how these models could inform
the behavior of interactive synthetic agents, for example allowing them to strategically select emotional
expressions that will benefit a team task. More broadly, my dissertation illustrates the potential benefits of
incorporating a pragmatic perspective on the meaning of emotional expressions into the field of affective
computing.
x
Chapter 1
Introduction
Facial expression modeling has been a popular topic of study since the inception of the field of Affective
Computing. Current state-of-the-art expression perception models have improved greatly, especially with
the advent of deep learning and convolutional neural networks in the domain of computer vision [4].
Nowadays, many of these models having reached parity in performance to that of professionally trained
facial expression annotators (FACS coders). As such, facial expression perception has outgrown affective
computing and become ubiquitous within the technological world as a whole. These days, facial perception
models are being used by most major social media apps such as TikTok, Instagram and Snapchat.
While recognizing facial expressions can be useful in of itself (e.g., for driving the behavior of a digital
avatar), the hope is that expressions can reveal something more. For example, Affectiva a company that
creates facial expression detection software, state that their software can “detect complex and nuanced
human cognitive and emotional states” [1]. As psychological theories argue that emotional expressions
serve important social functions that can be used to reveal intentions and shape partner behavior, we might
expect that these models have become indispensable tools for understanding human social behavior. However, these theoretical benefits have, at the time of writing, not materialized within affective computing,
and past claims of success have come under intense scrutiny.
1
A possible reason for this is that there’s a misconception in how the concept of emotion is represented within the field. Affective computing was initially influenced by early theories from psychology
(e.g. discrete emotion theory), which often treated expressions as universal and context-independent signifiers of an underlying emotional state. This latent state is then assumed to shape subsequent human
behavior. However, there are alternatives to this view that are gaining scientific acceptance. Barrett and
colleagues [5] argue that it is not possible to fully determine emotional states using just facial expressions.
They argue that the same expressions can mean different things and that there is no unique mapping between a configuration of facial movements and instances of an emotion category. Furthermore, they argue
that taking context and culture into account is necessary to make accurate judgments about a person’s
emotion state and that these have not been sufficiently documented so far.
There is now a substantial body of psychological evidence that human interpretation of emotional
expressions is context-specific [92]. Studies show how the interpretation of an expression can change
dramatically depending on the situation in which it is produced. For example, an expression that without
context might be considered anger, could flip to joy after learning that this person has just won a tennis
match [6]. Or learning that someone is sexually aroused can flip the interpretation of an expression of
pain to one of pleasure [23]. These examples show that context is an important factor to take into account
when modeling emotions, however most emotion and expression perception models in affective computing
continue to rely on data annotations that are often divorced from the original context.
To address this context-sensitivity, more recent psychological theories argue that expressions should
be seen more like words and function to coordinate social behavior and as such are dependent on context
in order to determine their meaning. These theories often follow the concept of “pragmatics” as it is
understood within linguistics [50], which is the study of how “context” contributes to meaning to speech
acts, conversations and other types of communication. Take for example the word “bow.” This might refer
to the weapon that fires arrows, or to a decorative ribbon, or even to the tool that a violinist uses to play
2
their instrument. When looking at this word in isolation, context-independently, it is impossible to say
which of these definitions is the one that is intended. However, when looking at the word within the
surrounding context, understanding the meaning becomes possible, e.g. “I tied my hair back in a bow”, or
“i aimed my bow at the target”. Similar to words, expressions can be examined within different context.
An expression of someone shouting with their eyes closed could mean they are in pain, but it might also
be someone celebrating a hard fought victory.
By using the concept of looking at facial expressions in a similar way as linguists look at speech acts,
we can furthermore examine these expressions from different perspectives. In my dissertation, I will be
following Scarantino’s “Theory of Affective Pragmatics” [76]. Within this theory facial expressions have
a illocutionary and perlocutionary function, similar to how text can be studied to understand the author’s
intention (illocutionary act) or the effect of the text on those who read it (perlocution). When applying this
to facial expressions, the person displaying the expression can be seen as the author, which allows us to
reason about their perspective (why did this person produce this expression? what was their intent? what
does it signal they will do next?). While we can look at a person observing the expression, similar to a
reader of an author’s text, and their perspective (how does this expression shape the observer’s emotions,
intentions and actions?).
Adopting this perspective furthermore gives me a framework to look at the mechanism on how observers interpret other’s expressions in order to make judgments. Do these judgments correspond to the
reported felt emotions by the person displaying the expression? Do judgments may also differ based on
whether the person is actively participating in the interaction as a second-party or observing an interaction from afar as a third-party? In this sense, a contribution of my thesis is to bring computational methods
to recent psychological research struggling with these questions. For example, recent work in psychology
has claimed that people are often quite accurate at inferring their partner’s emotions within an unfolding
interaction [48] and that there is a fundamental difference in how people interpret events when they are
3
involved in the interaction compared to if they were mere observers [77]. I will reinforce this view with
automatic analysis of facial expressions in natural interactions. As such, my dissertation is a departure
from much of the focus in affective computing that relies on third-party judgments of professional annotators divorced from the original interaction (and often shown the face alone without any context). I argue
that the affective computing can only adequately address these questions by explicitly incorporating point
of view into affective computing models.
The thesis is organized as follows. First, I provide an overview on the current work within this domain
in Chapter 2. This reviews psychological research into the use of emotion as a communicative tool, current
research on the influence of context on the interpretation of emotion displays, and motivate the experimental tasks used in this thesis. Following this, the next three chapters focus on a series of social tasks
such as social dilemmas and negotiations, where I show how the interpretation of facial expressions is
shaped by context, and that expressions, when combined with context, can usefully predict the displayer’s
intentions and consequences for the observer. Chapter 3 describes a study where I explore the general
interaction of context on facial expressions. While the next two chapters will take a more in-depth look
into the specific functions of expressions by following Scarantino’s Theory of Affective Pragmatics. In
Chapter 4, I look at influence of context on the illocutionary function of expressions, by investigating how
it affect predicting expressive intent. Chapter 5 examines context through the lens of the perlocutionary
function of expressions, by investigating its influence on predicting expressive consequences.
Together, this body of research makes the following important contributions. First, I add to the growing
body of research that questions the utility of context-free methods for automatically recognizing emotional
expressions. Second, from the perspective of the displayer, I show how both expressions and context
are necessary for predicting an displayer’s subsequent actions in a face-to-face negotiation from their
expressions. Thirdly, from the perspective of the observer, I show how emotional expressions shape the
observers actions in a social dilemma. Finally, I show how these models could inform the behavior of
4
interactive synthetic agents, for example allowing them to strategically select emotional expressions that
will benefit a team task. More broadly, my dissertation illustrates the potential benefits of incorporating a
pragmatic and situational perspective on the meaning of emotional expressions into the field of affective
computing.
5
Chapter 2
Literature review
Affective computing has been roiled by recent debates over the meaning and value of “emotional” expressions. Historically, the field has been swayed by Ekman’s Basic Emotion Theory which argues that
expressions are involuntary facial movements associated with underlying basic emotion circuits such as
fear or anger [20]. In this theory, observable expressions, even without context, reveal inner emotional
states, which in turn, suggest how an individuals appraising an event and how they might respond (e.g.
flight vs fight). In opposition, Barrett and colleagues recently reviewed decades of research suggesting
little connection between facial expressions and internal states [5]. Rather than reveal emotion, some
have argued that expressions communicate social intentions [17, 76], and interaction partners use these
contextually-situated expressions to shape their subsequent responses [28, 7, 59]. This debate is far from
settled, but one benefit has been to highlight that facial expressions do far more than simply express emotion. Complementing prior work on emotional expressions, here we explore the communicative function
of facial expressions in the context of social tasks.
2.1 Emotion as Communication
There has been an ongoing debate about whether expressions reflect true inner feelings, both within fields
of psychology and in the community of Affective Computing. However, there is a clear consensus that
6
expressions inform and shape social behavior. Emotional expressions have been shown to serve essential
social functions among humans [65, 26], including the facilitation of efficient and nonthreatening communication [64, 67], telegraphing intentions [59], building trust [47], promoting fairness and cooperation [86,
24],helping others to regulate their own emotions [93], and shaping everyday decision-making [70]. Machines may become more successful in working with people by leveraging their emotional expressivity to
build trust and promote cooperation, to the extent that it is possible to realize these function in humanmachine interactions [74, 8, 9, 72, 58].
If we look at expressions as communicative acts, they are perhaps best analyzed using frameworks
developed for spoken language which is generally thought of as the main mode of human communication.
For example, Poggi and Pelachaud [73] and Scarantino [76] build on the concept of linguistic speech act
theory [3] as a framework for characterizing what people do when they produce an emotional expression
during social interaction. Speech act theories defines several functions of speech, such as the illocutionary
and perlocutionary functions of speech.
In particular, this thesis will build on Scarantino’s Theory of Affective Pragmatics [76] which characterizes expressions in terms of three standard linguistic categories. Scarantino was inspired by the work
of Austin on speech act theory and linguistic pragmatics [3]. Austin work on speech act theory, centered
on the act of uttering a specific utterance, which he referred to as the ‘Locutionary Act’. Austin noted that
there were two ways in which we use language, namely:
• The Illocutionary Act: what you do in uttering the sentence.
• The Perlocutionary Act: what you do by uttering the sentence.
The illucutionary act refers to the what one does in a sentence, for this Scarantino follows Searle’s taxonomy of illocutionary acts [79, 33] which includes the types of assertives, directives, commissives, expressives and proclamatives. In this case I am mostly interested in the expressives type, where the main
7
illocutionary point is the express the speaker’s feelings and attitudes. The perlocutionary act on the other
hand refers the what one gets done by uttering the sentence, summarized by Austin as the “effects upon
feelings, thoughts, or actions of the audience, or of the speaker, or of other persons.” Here I am most
interested in the effect upon the audience, or the person that is being spoken too.
Based on the above framework, Scarantino proposes to use a similar framework resulting in the Theory
of Affective Pragmatics [76]. This is how the following terms within expression research relate to speech
act theory:
• The expression (Locutionary act)
• The Expressive Intent (Illocutionary Act)
• The Expressive consequence (Perlocution act)
Analogous to the illocutionary function of speech, expressions can reveal the emotions of the sender
but also serve to communicate beliefs, intentions, and social requests. Analogous to the perlocutionary
function of speech, expressions can impact the feelings, thoughts, or actions of the audience. Just as with
speech, a single expression can perform all of these functions simultaneously. For example, if a listener
reacts to an offer in a negotiation with a frown and the presence of a frown increases the likelihood that
a listener will reject the offer, this expression serves an illocutionary function (i.e., provides probabilistic
information about their future intentions). If, upon seeing the listener’s frown, speakers tend to proactively withdraw or soften their offer, this expression serves a perlocutionary function (i.e., tends to induce
concession-making).
2.2 Importance of Context
If expressions are like language, then the context around the expression is crucial to decode it’s meaning.
For example, the most likely meaning and representation of the word “bank” in the sentence “I arrived at
8
Figure 2.1: Analyzing expressions as communicative acts.
the bank after crossing the” depends on whether the sentence ends with “road” or “river.” As we will see
in this thesis, the meaning of smile similarly depends on the situation and actions that preceded the smile.
Thus, for algorithms to predict how people embedded in a social context think about emotion, they
must include situational knowledge in the recognition process [80, 87]. This has motivated several researchers to incorporate some aspect of context into emotion recognition methods. For example, Kosti
and colleagues [46] created the EMOTIC dataset of images of people in a diverse set of natural situations,
annotated with their apparent emotion. Cowen and colleagues [16] collected a large corpus of discrete
facial expressions across a wide range of contexts (e.g., smiles while watching fireworks). Using these
databases, others have shown how, by both recognizing expressive displays and recognizing aspects of the
situation, algorithms can better predict at least how 3rd parties observers attribute emotions to the actors
in these situations [63]. My thesis extends this research by recognizing and interpreting emotions within
concrete social tasks. I will show, for example, that algorithms can make inferences about the intentions
of the expresser that enhance the accuracy of a down-stream tasks, such if the party will accept the other
party’s proposal in a negotiation.
9
As an alternative to recognizing the meaning of expressions in a specific context, other research has
sought to highlight the context specificity of expressions by creating agents that can manipulate social
outcomes by expressing particular expressions in particular contexts. For example, de Melo has shown
that agents that smile while cooperating with their human partner achieve very different outcomes than
agents that smile while competing with their human partner [59, 31]. This research further demonstrated
that people integrate both the agent’s expression and the context preceding its expression (e.g., the act
of cooperation or competition), to infer the agent’s future intentions, and use this inference to organize
their appropriate response. While this data utilized stylized animated expressions, more recent work by
our lab [51] highlights people make similar inferences when observing natural human expressions during
a multi-round social dilemma. A novelty of my thesis is to extend these findings to a wider variety of team
tasks and downstream inferences.
2.3 Emotion in Social Tasks
If emotion has important social functions that shape how people interact, then those expressions should
reveal or shape how those tasks unfold. For this reason, psychological research on human emotional expressions typically studies expressions in the context of concrete tasks with measurable inputs and outputs
and my thesis adopts this perspective. In particular, I study if machines can infer the expressive intent and
expressive consequences of emotions in two tasks often used for studying facial expressions and their
impact on social behavior: the iterated prisoner’s dilemma and the multi-issue bargaining tasks.
The iterated prisoner’s dilemma (IPD) is a canonical task for examining social cognition. Originally
developed by game theorists to understand how people should best balance cooperation and competition,
the task models many real-world situations requiring strategic decision making with other social actors.
Broadly, it captures situations where multiple actors could gain important benefits from cooperating or suffer from failing to do so, but find it difficult or expensive to coordinate their activities. Emotion researchers
10
have argued that emotion and emotional expressions play an important rule in helping people to resolve
this dilemma, thus the task plays an important role in empirical studies of how emotion shapes social
decisions. For example, it has been shown that players to attend to each other’s emotional expressions,
form inferences from these expressions about their partner’s goals or character, and use these inferences
to inform subsequent intentions to cooperate or compete [59, 40].
Negotiations, and more specifically multi-issue bargaining, are another important domain to study social cognition. Multi-issue negotiations can offer opportunities for parties to find win-win solutions, thus
they involve a mixture of cooperation and conflict, and thus evoke a wide array of feelings and expressions. Prior research suggests that facial expressions during a negotiation reveal whether a negotiator is
trustworthy [55], if they have high or low aspirations [89], how much they care about specific issues [32],
how satisfied they are with a given offer [27], and if they will accept an offer or walk away [69]. Negotiation is also an important domain for AI in general, given the growing interest in creating agents that
negotiate with people [30]. Already, commercial companies develop products that negotiate on the behalf
of users over the phone [91] or text chat [21], and research is expanding these algorithms’ “emotional
intelligence.” This includes both endowing algorithms with the ability to “read” affective signals, but also
how to use synthesized affective signals to teach skills [11], extract concessions [60] or convey trust [19].
Such applications will benefit from high-quality corpora of human negotiation behavior.
In the following chapters, I will examine the function of emotional expressions in these two tasks from
the various perspectives suggested by the Theory of Emotion Pragmatics. By grounding these studies in
concrete social tasks, this thesis moves beyond the common focus on what an expression means (i.e., is the
sender happy?) to what expressions do.
11
Chapter 3
How Context Shapes the Perception of Expressions
Emotion recognition has traditionally viewed expressions as reflecting a person’s inner feelings and downplayed the role of context in shaping how the expressions are interpreted (e.g., predicting felt emotion from
facial expression alone [5]). But as highlighted in the preceding chapter, modern theories like Scarantino’s
Theory of Affective Pragmatics [76] argue expressions should be seen more like words and analyzed in
similar ways. First, by analogy to linguistic theory, the interpretation of expressions should be closely
connected to the setting in which it is produced (e.g., a frown might signify guilt in one context but anger
in another). Second, like language, expressions can be analyzed from the perspective of expressive intent
(i.e., the illocutionary function), meaning what does the expressions reveal about the displayer’s social
intentions (e.g., an expression of guilt might mean the person displaying is sorry and will do better in the
future). Third, expressions can be analyzed from the perspective of their expressive consequences (i.e.,
the perlocutionary function), meaning what the expression does to the observer (e.g., the observer might
choose to forgive their guilty partner). In the following three chapters, I will examine each of these aspects
of expressions, thereby demonstrating the utility of the Theory of Affective Pragmatics. In this Chapter, I
will first show evidence that ‘context’ shapes the emotional meaning of expressions within a naturalistic
social task ∗
.
∗This chapter describes work originally reviewed and published at IEEE ACII 2023 [38].
12
The importance of context on the interpretation of expressive displays has previously been raised
by several authors within the affective computing field. For example, Barrett et al. [6], show that the
interpretation of a facial express can change from anger to joy, if they learn that the subject has just
won a tennis match. This contradicts some of the foundations of basic emotion theory [20], which posits
that expressions are universal and involuntary signals of underlying feelings and largely unaffected by
contextual factors. In the case of a social dilemma, the following work will show evidence that facial
expressions are in fact interpreted differently in different situations based on their context. The work in
this chapter has previously been published in two conference papers [37, 38], as well as a journal article
(currently awaiting acceptance).
In order to investigate this, I analyzed the data of an experiment where pairs of people participated in
a social dilemma. During this experiment the participants were asked to rate their own and their partner’s
expressions at specific moments in the dilemma in a video-cued recall procedure [29, 71]. Using this
procedure it was possible to get both first-person and second party ratings on videos of facial expressions.
By using ratings of those involved in the interaction and know about the context, as opposed to a third
party, I was able to look at the proposed association between emotional expressions and context. The
following two competing hypotheses state the possible ways that contextual and facial cues might combine:
Hypothesis 1: Expression and context are independent sources of information that predict emotion
judgments; their additive effects determine the interpretation and effect of the expressions.
Hypothesis 2: Expression and context interact to determine emotion judgments; the relationship between a facial cue and perceived emotion changes depending on the context.
Each of these hypotheses has implications for the future design of automated emotion recognition,
but if Hypothesis 2 holds, the consequences are more severe. Hypothesis 1 suggests that expressions
have context-independent meaning (e.g., a smile always suggests joy) but that other factors separately
13
influence the final interpretation. This would suggest one could develop a context-independent recognition
algorithm then separately add in the impact of context.
However, if hypothesis 2 holds, this situation becomes far more complex. For example, a smile might
indicate joy on one context but anger in another. Though the severity depends on the exact nature of the
interaction. For example, some interactions can be handled via Bayesian cue integration [68]. For example,
smiles might be associated with joy in some contexts but have no association with joy in others if joy is
extremely unlikely in those latter contexts.
I contrasted these hypotheses in an experiment where participants played a social dilemma, an iterated version of the prisoner’s dilemma, in pairs with each other. Following the experiment, participants
annotated perceived emotion via a video-cued recall procedure. Section 3.1 of this chapter will give details
on the prisoner’s dilemma, section 3.2 will describe the video-cued recall procedure and section 3.3 will
give the remaining details on the method.
3.1 Iterated Prisoner’s Dilemma
The prisoner’s dilemma is a standard game often used to examine social behaviors and decisions, including
the role of emotion [68]. I chose it because of its easy for participants to understand the task. Further, the
number of possible actions is small (facilitating analysis) but these actions can still result in a set of vastly
different outcomes, that often elicit various expressed emotions. By running a multi-round iterated version
of the prisoner’s dilemma, these outcomes become an important aspect of the context surrounding the
participants’ decision making process. Research on the impact of facial expressions has shown that people
are highly expressive when learning about their partner’s most recent decision in a prisoner’s dilemma, and
that these expressions contain information about what just occurred [51]. Related research furthermore
highlights that the perceived meaning of facial expressions can change dramatically depending on the
14
Table 3.1: A traditional payoff matrix of an iterated prisoner’s dilemma
Player B
cooperate (C) defect (D)
Player A cooperate (C) A = 1, B = 1 A = -1, B = 2
defect (D) A = 2, B = -1 A = 0, B = 0
Figure 3.1: A representation of the decision making process in a prisoner’s dilemma.
action that co-occurs with it [59]. However, such findings have typically been explored with stylized
computer-generated facial expressions.
In this study, I examine how people interpret the expressions of their human partner in the context
of the most recent outcome of the task. Therefore, the way I will operationalize the concept of context, is
by equating it to the most recent joint decision experienced by participants in the game (i.e., examine the
interpretation of expressions given the most recent outcome in the game).
In a prisoner’s dilemma, both participants make a decision whether to ‘cooperate’ with each other or
to ‘defect’. Therefore, there are four possible outcomes in total. In an iterated prisoner’s dilemma there
will be several rounds of these decisions. Based on both the participant’s decisions, participants will score
15
points, an example of this is shown in Table 3.1. In this case if both participants choose to cooperate they
both receive 1 point, however if one participant defects they receive 2 points, while their opponent loses a
point. As such if you want to maximize your score you want to try to defect, however if both participants
defect upon each other they will not receive any points.
To further illustrate the above point, the decision making process in an iterated prisoner’s dilemma can
be modeled as show in Figure 3.1. The main nodes represent the combined decision of both participants.
‘Joint cooperation’ represents the state where both participants chose to cooperate (e.g. C/C), ‘joint defect’
represents both participants defecting (D/D). Finally there are two states representing one participant defecting upon the other, ‘exploit other’ for when a participant defects while their partner cooperates and
‘exploited by other’ for the reverse of this. As such, during an iterated prisoner’s dilemma participants
move between these states, as represented by the arrows in the figure. With the arrows with solid lines
representing cooperative choices by the participants and arrows with dashed lines defecting choices. However, since the state depends on the decision’s of both participants, one participant cannot simply decide
how they move through the game. E.g. after a joint cooperation a participant can choose to cooperate
again, however it depends on their partner whether they end up in the joint cooperation state again, or
instead end up in the exploited by other node.
The main four outcomes of a round as shown as nodes in Figure 3.1, will represent the concept of
‘context’ in the study. So there are four distinct situations that represent a specific context.
3.1.1 The Split or Steal Framework
For the actual experiments, and to further help participants intuit the nature of the social task, I framed
the iterated prisoner’s dilemma the ‘Split or Steal’ game and developed a software framework to facilitate
online deployment, capture and sychronization of participants’ decisions and expressions. The framing
was inspired by the British game show “Golden Balls,” where participants play for prize money. Each
16
Figure 3.2: The game interface. Participants make choices by selecting either the “split” or “steal” option
each round. They can also see the number of tickets won by themselves and by the other player as well as
the other player’s webcam video (top right) and their own webcam video (bottom right).
round, participants are allocated a pot of money. If both participants choose to “split” they would split the
prize money evenly, corresponding to the join cooperate choice. However they had the option to “steal”
and if their opponent chose to split, they would take all the money. However, if both participants chose to
steal they would both forfeit the prize money. Rather than playing for real money, in my study participants
played for sets of lottery tickets, these lottery tickets allows them to participate in a lottery drawing for an
additional prize after the conclusion of the full study. The more tickets a participant collected, the higher
their chance of winning this lottery.
Figure 3.2 shows a screenshot of the Split or Steal game interface, as it was displayed to participants.
Participants would play the game in the panel on the left side of the screen. This side would display the
split or steal choice they had to make. After both participants would make a decisions a simple animation
would reveal the round outcome to both participants. The left side furthermore displayed the current score
in the game and the number of rounds remaining, i.e. the game was a finite horizon game. The right half
17
of the screen was mostly reserved for displaying a real time video feed of the other player, as well as a
smaller video of the participants themselves, similar to a video call layout. While participants were able to
see each other, they were not able to speak as audio was not supported by the system. As such expressing
yourself in the video was the only way participants were able to communicate while playing the game.
3.2 Video-cued Recall Procedure
The video-cued recall procedure [29, 71], is a procedure where participants give feedback on previously
recorded video footage of themselves. In this study this takes the form of participants annotating the video
footage of themselves as well as their partner in the Split or Steal game. The participants will be specifically be annotating the videos for the overall measure of expressed valence. Using the recall procedure, it
is possible to obtain both first-person ratings of the videos by having participants annotate videos featuring themselves, as well as second-person ratings of the videos of their opponent. Both of these types of
annotations are relatively uncommon within the affective computing domain, which has generally relied
on annotations by professionally trained annotators, who are presented videos or images stripped of any
context. In this case I also have the annotations of the subjects themselves, as well as those of a second
party that was involved in the interaction and as such more aware of the context in which the expressions
were displayed.
3.3 Method
Within this section I’ll give details on the experimental setup and measures that were collected during this
study.
18
Table 3.2: Payoffs for each combination of choices in the iterated prisoner’s dilemma game
Participant B
cooperate defect
Participant A cooperate A = 5, B = 5 A = 0, B = 10
defect A = 10, B = 0 A = 1, B = 1
3.3.1 Experimental Setup
The study was performed at the University of Oxford in the United Kingdom. One-hundred people total
(67 female, mean age=26.42 SD=7.44) who came from a community panel (69 participants) and responded
to local calls for participation (31 participants) participated in the study. Each participant was paid £10 for
participation. The age, gender and ethnicity of community panel participants did not differ from those who
responded to local advertisements. All participants provided written informed consent and the University’s
Ethics Committee approved the study. One person withdrew consent.
Participants were randomly paired and played 10 rounds of the computer-mediated iterated prisoner’s
dilemma within the split-or-steal framework. In each round, participants played for a set of lottery tickets
and selected whether to cooperate or defect (associated payoffs are shown in table 3.2). Participants were
informed that each ticket earned would be entered into a lottery draw with one £100 prize and two prizes
of £50, and as such, by obtaining more tickets they could increase their chances of winning a prize.
Each participant played in a separate room, seated in front of a computer with a webcam, they could see
each other’s webcam video stream while playing. In each round, participants selected whether to “Split”
(cooperate) or “Steal” (defect) by clicking on the screen. Once both participants had made their choices, the
outcome of the round was displayed to both participants. A ticket counter allowed participants to track
their current scores. Figure 3.2 shows the game interface. The webcam video stream did not include audio
and participants were instructed not to talk to each other or to use hand gestures to communicate, but that
they were free to use facial expressions.
19
After the game, participants completed a video-cued recall procedure, in which they reviewed video
clips and evaluated the valence and regulation of both their own and their partner’s expressions [29, 71].
By using the specific timing of the participants’ decisions as logged in the database, I could automatically
generate video clips of specific events. For every participant I created sets of two five-second video clips for
all 10 rounds played. The first clip (‘decision’ clip) in this set captured the decision of the participant while
opting whether to “Split” or “Steal”. The second clip (‘outcome’ clip) captured both participants for the five
seconds following the reveal of the round outcome. For ‘decision’ clips, participants were instructed to rate
expressions that occurred immediately before the “Split” or “Steal” choice was made. For ‘outcome’ clips,
participants were instructed to rate expressions that occurred immediately after both players’ decisions
and the outcome of the round were revealed.
The video clips were presented in chronological order; for each ‘decision’ and ‘outcome’ event participants first rated the video of themselves, and then the video of their partner. For the ’outcome’ clip, the
outcome of the round was displayed to participants when they made their ratings to provide information
about the context in which the expression was displayed. In this chapter, I examine ratings of the ’outcome’
clips because I am interested in how observers decode displayer’ facial responses to specific game events.
3.3.2 Measures
Behavioral data about participants’ decisions was collected during the games, as well as two sets of facial
expression annotations; one created by the participants themselves during the video-cued recall, and one
automatically extracted from the video using facial expression detection software.
3.3.2.1 Questionnaires
Before the game, participants answered some pre-game questionnaires including demographics, the Berkeley Expressivity Questionnaire (BEQ) [35], the Emotion Regulation Questionnaire (ERQ) [34], the Social
20
Value Orientation (SVO) [66] and first impressions of the other participant. After the game (and before
the video-cued recall procedure), participants answered global questions about expressions and regulation in the game (both self and other participant). They also completed a post-game questionnaire (after
the video-cued recall procedure) about their global impressions of the other player and their expressions.
Questionnaire data are not analysed in this chapter.
3.3.2.2 Behavioral Data and Context
The actions of participants throughout the game were logged in a database as timestamped data. The
database included information about the decision to cooperate or defect, and other important game events,
such as the moment when the results of the joint decision was revealed.
From the joint decision made by each pair of participants in each round, I used the set of ‘game states’
as was shown by the main nodes in Figure 3.1. When both participants chose to cooperate, this results in a
“joint cooperation” or CC-state, while both participants defecting results in a “joint defection” or DD-state.
When one participant defects while the other participant cooperates this results in an unequal outcome;
the participant who defected and as such exploits their partner is in the DC-state, while the participant
who was exploited is in the CD-state. By defining these specific states, I am able to investigate whether
contextual information such as these game states affect the displays and perceivers’ ratings of them.
I will be employing the joint outcomes as context in my analysis. In a social economic game such as the
social dilemma where the majority of player actions are the decisions they make, these joint outcomes are
a large part of the context within which the facial displays are presented and interpreted. This is especially
the case for the events where the outcome is revealed.
21
3.3.2.3 Video-Cued Annotations
Using a video-cued recall procedure, I obtained ratings for each of the video clips of the participant, with
ratings by both the participant themselves (displayer) and their partner (observer). For each video clip
participants rated the valence of the displayed expression on a scale from -50 very negative to +50 very
positive. Following this, participants rated the videos on both positive and negative expression regulation
using a -50 (suppressed) to +50 (exaggerated) scale.
3.3.2.4 Automatic Facial Expression Annotations
Using the timestamps that were stored with each joint decision, 5-second clips of joint outcome events
were used during the video-cued recall procedure and the video was stored on a server afterward. After
the study was completed, I extracted facial expression ratings automatically from the webcam videos using
a commercial software system based on CERT [53]. Using this system, I extracted Action Units (AUs) as
defined by the Facial Action Coding System (FACS) from each participant’s video on a frame-by-frame
basis. For each frame the system assigns each AU an ‘evidence’ value. This value represents the odds of
the AU being activated, with positive values indicating that a particular AU is activated and negative values
indicate that the AU is not active. The system uses a logarithmic scale base 10 scale for scoring AUs (i.e. a
score of 1 indicates an AU is 10 times more likely to be rated as active by a human coder than inactive, a
score of 2 indicates the AU is 100 times more likely to be rated as active, etc.). I then computed overall AU
evidence scores for each of the video clips that were rated by the participants during the video-cued recall.
This was done by finding data for the set of frames that were shown during the clip, and then calculating
the mean value for each AU using this data.
Action Units correspond to specific facial muscles being active. However, within the iterated prisoner’s
dilemma video corpus, I observed several sets of AUs that are often active at the same time. Therefore rather
than looking at the individual AUs I will be using the facial expression factors [85]. Prior work has found
22
these factors useful for explaining behavior in the prisoner’s dilemma [51]. Similar dimension-reduction
approaches have been applied to analyze emotional styles [41] or expressions used to deliver good or bad
news [90].
I computed a set of factors that combine Action Units that commonly co-occur, based on prior work [85].
There are 6 factors total, Factor 1 (F1) through Factor 6 (F6). F1 corresponds to a ‘smile’ consisting of AU
6, 7 and 12, F2 corresponds to ‘eyebrows-up’ consisting of AU1 and 2, F3 corresponds to ‘open-mouth’
consisting of AU 20, 25 and 26, F4 corresponds to ‘mouth-tightening’ consisting of AU 14, 17 and 23, F5
corresponds to ‘eye-tightening’ consisting of AU 4, 7 and 9 and F6 corresponds to ‘mouth-frown’ and consists of AU 10, 15 and 17. During the analysis I will be using this set of 6 factors, rather than individual
action units.
3.3.3 Analysis
In order to assess H1 (i.e., expression and context are independent sources of information), I looked at the
correlation between factors and valence scores. Since I have both valence ratings for displays and interpretations I furthermore looked at whether specific factors are used differently based on the perspective
(first-person or second-person) of the annotator. Secondly, I have looked at the overall valence scores per
joint outcome, in order to determine the independent impact of context on the ratings.
I used a moderated multiple linear regression approach to investigate the second hypothesis H2 (i.e.,
expression and context interact). I created regression models to predict the expressed and perceived ratings
of valence, in order to find the relation between expression and context. Expressions are represented by
the average value for a reveal event of the six previously defined factors that were automatically extracted
from the videos following the reveal event.
In order to model context in the linear regression I used dummy coding to encode this state using
the variables ‘decision self’ and ‘decision partner.’ These decisions will be coded differently based on
23
the joint outcome: For joint cooperation (CC) ‘decision self’ and ‘decision partner’ will be coded 0 for
cooperating and 1 for defecting. For joint defect (DD) they will be coded reversely, so 0 for defecting and
1 for cooperating. For exploited by other (CD) ‘decision self’ will be coded as 0 for cooperating and 1
for defecting, while ’decision partner’ will be coded as 0 for defecting and 1 for cooperating. Finally for
exploiting other (DC) ‘decision self’ will be coded as 0 for defecting and 1 for cooperating and ‘decision
partner’ as 0 for cooperating and 1 for defecting.
In order to model the relation, the decision variables (i.e., ‘decision self’ and ‘decision partner’) will be
used as two moderating variables in the regression on the expression, in order to predict the dependent
variable of expressed or perceived valence. As such the linear regression will take the form of:
Y = b0 + b1x1 + b2x2 + b3x3 + b4(x1 × x2)+
b5(x1 × x3) + b6(x2 × x3) + b7(x1 × x2 × x3) + ε (3.1)
Where x are the independent values, x1 is the factor, x2 is ‘decision self’ and x3 is ‘decision other’. Y
is the dependent value, self-reported or perceived valence and ε is the error term.
3.4 Results
First, looked at the influence of expression and context on self-reported and perceived valence separately
(investigating H1). For expressions, I looked at the correlation between the factors and the valence rating.
I present the results using the lens model in Figure 3.3. The lens model was developed by Egon
Brunswik to assess the accuracy of perceptual judgments [13]. The left-hand side of the diagram (referred to as cue validity) shows the relationship between some latent variable (in this case, emotion) and
observable perceptual cues. The right-hand side of the diagram (referred to as cue utilization illustrates
how a perceiver uses cues to reconstruct the latent variable (in this case, the observer’s perception of their
24
Figure 3.3: Lens model of the correlations between the facial factors and participant ratings on valence.
Green lines show significant correlations. Dashed lines show factors where I found significant interactions
between expression and context.
partner’s emotion). Each link in the diagram shows the correlation with the latent construct (i.e., selfreported emotion on the left and perceived emotion on the right). Symmetry between the two sides of the
“lens” implies that the perceiver correctly utilizes the valid cues. Asymmetry indicates perceptual errors
and helps identify misconceptions.
In this case, I am particularly interested in how context (i.e., joint outcomes of the game) interacts
with facial cues: (1) how does context shape the valid cues of self-reported emotion and (2) how does
context shape the way perceivers utilize cues to predict how their partner feels? In Figure 3.3, solid lines
indicate that context is an independent predictor. Dashed lines indicate that the cue and context interact.
For example, the dashed line between smile and perceived emotion indicates that the way the perceiver
uses the smile-cue changes based on the joint-outcome of the game. The top connecting line shows the
correlation between self-reported and perceived judgments (r(907) = 0.397, p < .001).
25
3.4.1 Impact of the Face Alone
First, I examined cue validity while ignoring context. Looking at the correlation between self-reported
valence and each facial factors, I found significant correlations for the smile (r(907) = .147, p < .001) and for
the open mouth (r(907) = .148, p < .001) factors. This indicates that smiles and open-mouth are valid cues
of self-reported emotion and each cue shows a positive relationship (i.e., more smiles mean more positive
self-reported valence).
Second, I examined cue utilization while ignoring context. I examined the correlation between the
partner’s facial expressions and second-person perceptions that the partner felt positive or negative (i.e.,
perceived valence). I found significant correlations for the smile (r(907) = 0.208, p < .001), the open-mouth
(r(907) = .145, p < .001), the mouth-tightening (r(907) = .072, p = .30) and the frown (r(907) = .069, p =
.038) factors. This shows that observers utilize the same two valid cues (smiles and open-mouth), but also
attend to two invalid cues (mouth-tightening and frown). Thus, ignoring the role of context, observers
pay closest attention to the valid cues of smiling and open mouth, but also attend to irrelevant cues and
thus often misinterpret their partner’s self-reported feelings.
3.4.2 Impact of the Context Alone
The correlations in Figure 3.3 highlight how facial cues relate to self-reported and perceived valence but
ignore how context might shape these judgments. I can also look at how context shapes judgments while
ignoring facial cues. In order to find the impact of context individually we can look at the average valence
scores for each context as shown in Figure 3.4.
What you see here is that context has a similar overall effect on both self-reported and perceived emotion. Regardless of whether participants are reporting their own feelings or estimating their partner’s
feelings, valence is reported as highest following mutual cooperation (CC). As expected from the structure of the payoff, valence is reported as negative when a player has been exploited (DC) and positive
26
Figure 3.4: Difference in valence per condition, showing display on the top and perception below. Valence
ratings were collected using a continuous valence scale from -50 (negative valence) to +50 (positive valence)
and averaged per round outcome.
when a player successfully exploits their opponent. Interestingly, valence is reported as somewhat positive following mutual defection, but this might be explained by the fact that some money is better than
none.
Just looking at the payoff alone, one might expect the greatest positive emotions when exploiting one’s
partner, which was not the case. Some research suggests that even in such games, people feel some guilt
when they take advantage of their partner which could explain this attenuation [22].
3.4.3 Face and Context Combined
Based on the results so far, it is clear that both facial cues and game outcome are sources of information. Participants appear to be attentive to specific facial factors based on their correlation with valence.
Participants also judge valence differently based on the context.
27
In order to contrast H1 (the face and context are independent predictors) with H2 (the face and context
interact), I used moderated multiple linear regressions as described in Section 3.3.3. I used moderated
regressions for each of the factors based on both display and interpreted valence, looking at the interaction
between expression and context, i.e., a total of 6 models for both display and perception.
Although I did not find significant interactions in all regressions, there were some in specific cases.
Figure 3.3 displays the moderated linear regressions that were significant with a dashed line. What first
stands out is that there appear to only be interactions on the side of the second-person perceivers utilizing facial cues, supporting H2 for second-person judgments. I found that context and the face were
independent predictors of self-reported valence, supporting H1 for first-person reports.
Next I want to highlight the multiple linear regression that showed significant interactions for perceiver cues. The first significant interaction that I found was in the regression for smile. This moderated
regression showed how own decision (C or D) and partner’s decision (C or D; together representing CC
CD DC DD) moderated the association between smile (F1) and perceived valence. The interaction term
between own decision, partner’s decision and factor 1 reached statistical significance, B = 0.308, t = 2.103,
p = .036. To break down the interaction I conducted simple slope analysis. This showed that there was
a significant effect of factor 1 on the perceived expression in the CC condition (BB = 0.395, t = 9.550, p <
.001), as well as in the DC condition (BB = 0.229, t = 2.932, p = .003). However, there were no significant
effects of smile on perceived expression in the DD condition (BB = 0.119, t = 1.430, p = .153) or the CD
condition (BB = -0.023, t = -0.277, p = .781).
Second, there was a significant interaction in the moderated regression that examined how own decision (C or D) and partners decision (C or D; together representing CC CD DC DD) moderated the association between frown (F6) and perceived valence. The interaction term between own decision, partners
decision and frown reached statistical significance, B = 0.344, t = 2.281, p = .023. To break down the interaction I conducted simple slope analysis. This showed that there was a significant effect of frown on the
28
perceived expression in the CD condition (BB = 0.250, t = 2.872, p = .004), but that there were no significant effect of factor 6 on perceived expression in the CC condition (BB = 0.026, t = 0.628, p = .530), the DD
condition (BB = -0.026, t = -0.317, p = .751) or the DC condition (BB = 0.095, t = 1.157, p = .248).
Figure 3.5 gives an overview of the regression coefficients (B) for the moderated regressions on smile
and frown.
Figure 3.5: The regression coefficients for smile and frown based on gamestate for perceived valence.
From this graph you can see that participants appear to use smiles and frowns differently across conditions. The significant effects for smile in the CC and DC-state in the simple slope analysis show that
participants rate the associated expressions more positively if their partner displays a smile. The significant effect for frown in the CD-state means that, perhaps surprisingly, participants rate a partner showing
a frown after being exploited as feeling more positive than one who does not.
29
3.5 Conclusion
In this chapter I looked at the influence of context on interpretation of expressions, by studying two hypotheses. The first hypothesis stated that expression and context are independent sources of information
and I found some evidence for this. Facial cues related to both self-reported and perceived emotion, as
shown by the significant correlations of the smile and the open mouth factors with self-reported valence,
and the significant correlation of the smile, open-mouth, mouth-tightening and frown factors with perceived valence (see Figure 3.3). Although smile and open-mouth appears to be used for both display and
interpretation, there are some difference as well between these two perspectives. When people rated their
own displays they would rate displays with only smiles and open mouth factor higher, whereas when interpreting other’s displays there was a significant correlation with the the mouth-tightening and frown
factors as well. These results are interesting as generally mouth-tightening and frown displays are not
necessarily thought of as positive.
Whereas there were some differences in use of facial factors by self-reporters and perceivers, there
appear to be no differences in their use of context when rating valence (ignoring facial expression), as
shown in Figure 3.4. Both self-reported and perceived valence judgments were shaped by context in almost
identical ways. Participants rated emotion as most positive in the joint cooperation outcome, and as more
negative in the exploited by other outcome. Feelings were also reported and perceived to be more positive
following successful exploitation and, interestingly, somewhat positive following mutual defection.
Lastly, I examined the relationship between facial and contextual sources of information about valence.
I found that facial cues and context were independent predictors of self-reported emotion (supporting H1
for first-person judgments) but that facial cues and context interacted to determine perceived emotion
(supporting H2 for second-person judgments). In particular, perceivers used smiles as a cue to predict
positive feelings following mutual cooperation but completely ignored smiles when inferring how someone
felt after being exploited. Surprisingly, people used frowns to predict their partner was happy following
30
exploitation (after being exploited. if they frowned they were seen as feeling more positive). In other
contexts, the frown was not utilized.
The results of this study reinforce prior findings that context contains important cues about the emotions contained within facial displays. Facial expressions can mean different things in different situations,
similarly to how words require context as well when analyzing speech acts.
More specifically this study shows that context and facial cues interact differently between self-reported
and perceived emotion, as such facial display acts can be thought of as a more noisy communication channel than speech acts. As most people tend to agree about the meaning of words in certain context, however
in the case of facial displays there appears to be a difference in interpretation. This may prove challenging for approaches that seek to improve the accuracy of emotion recognition by simply providing context
to emotion annotators. In the following chapters I will continue using Scarantino’s Theory of Affective
Pragmatics [76] as a guideline and will look into the illocutionary and perlocutionary functions of facial
displays.
31
Chapter 4
The influence of Context on the Illocutionary Function of Expressions
The previous chapter demonstrated that emotion perception is strongly influenced by the context in which
expressions are produced, rebuking the notion that automatic emotion recognition can safely infer emotion
independent of context. In this chapter, I will turn to exploring the extent that expressions reveal the
displayer’s intentions. Specifically, I will examine how emotional expressions and contextual information
can help a machine predict a person’s future actions in a social task ∗
. From the results of the previous
chapter, I hypothesize that both expression and context will be crucial for these predictions. This will be
tested in the domain of a bilateral negotiation.
Scarantino’s Theory of Affective Pragmatics [76] argues that expressions can be examined from multiple perspectives and this chapter, thus turns to the perspective of Expressive Intent (which is analogous to
the illocutionary function of speech in linguistic theory). Expressive intent corresponds to the displayer’s
social motivation for producing this display. For example, regardless of whether they actually feel guilty, a
child caught trying to steal candy from a jar might show guilt to signal to the parent that they understand
that stealing is wrong. Setting aside issues of deception, this expression further signals future intentions
(“I won’t steal again”).
∗This chapter describes work originally reviewed and published in volume 14 of the IEEE Transactions on Affective Computing [36].
32
However, most corpora in affective computing are ill-suited to uncovering the intentions associated
with expressions. Most common corpora focus on the problem of emotion recognition alone and don’t attempt to associate expressions with the situational antecedents and consequences of these displays. Rather,
most emotion recognition corpora de-contextualize displays by providing only small snippets of expressive
behavior labeled with perceived emotion [49]. As such they fail to connect these expressions to concrete
actions or decisions by the producer or consumer of these signals. When expressions are embedded in a
richer interaction where the illocutionary function is more present, data is often produced by actors [14],
or collected in loosely structured tasks where it is difficult to connect displays to objective social beliefs
and actions [82]. Corpora that do connect expressions to individual or social actions often involve artificial
tasks that strip away many of the features of natural human interaction, such as speech [75]. To address
this gap, I collect and analyze a new dataset of expressions in negotiation. This task allows for more
naturalistic and context-heavy interactions where emotional expressions and their impact on measurable
outcomes can be further studied.
Negotiations involve a mixture of cooperation and conflict, and thus evoke a wide array of feelings
and expressions. Prior research suggests that facial expressions during a negotiation reveal whether a
negotiator is trustworthy [55], if they have high or low aspirations [89], how much they care about specific
issues [32], how satisfied they are with a given offer [27], and if they will accept an offer or walk away [69].
Negotiation exercises are commonly used to study emotion processes because they evoke rich multimodal
behavior, but also because this behavior can be connected to objective information such as if a negotiator
is trustworthy, if they have high or low aspirations, what they actually want, and if they accept or reject
a specific offer.
Negotiation is also an important domain for AI in general, given the growing interest in creating agents
that negotiate with people [30]. Already, commercial companies develop products that negotiate on the
behalf of users over the phone [91] or text chat [21], and research is expanding these algorithms’ “emotional
33
intelligence.” This includes both endowing algorithms with the ability to “read” affective signals, but also
how to use synthesized affective signals to teach skills [11], extract concessions [60] or convey trust [19].
Such applications will benefit from high-quality corpora of human negotiation behavior.
However, negotiations tend to be less structured than many other social dilemmas that are commonly
studied within affective computing. As such, in addition to studying the illocutionary function of expressions, this study also makes a scientific contribution in the form of extracting meaningful information from
facial expressions in conversational tasks like a negotiation. I do this by focusing on analyzing facial reactions to a negotiation partner’s utterance. There are several reasons for focusing on such reactions. From
the perspective of emotion theory, emotions are short-term reactions to specific actions in the world. In
a conversation, speakers perform actions through their words. Specifically, speech acts are verbal actions
that accomplish something in a conversation (we greet, insult, offer, reject). From a communications perspective, verbal and nonverbal reactions to a partner’s speech serve important interactional functions such
as providing evaluative feedback (sometimes called backchanneling) or turn management. This research
can be beneficial when designing artificial agents as well. As attending to expressive reactions could help
them predict a user’s subsequent behavior, and thus inform their own actions towards the user. Towards
this end, and following the lead of Scarantino[76], I will examine statistical regularities between partner
speech acts and listener facial reactions.
This chapter is organized as follows. First I will give details on the negotiation corpus used for my
analysis in section 4.1. In section 4.2 I describe the method of processing the data for the analysis. This
general analysis on expressions within the context of a negotiation is discussed in detail in section 4.3.
Following the analysis I will address the illocutionary function of expressions in more detail in section 4.4.
Finally section 4.5 contains the conclusion of this chapter.
34
4.1 The DyNego Negotiation Corpus
As mentioned previously, in order to study the illocutionary function of expressions, I switched to a different dataset. Instead of studying this within the constructs of a simple structured social task, such as the
iterated prisoner’s dilemma, I looked at a more free-form naturalistic social task. The prisoner’s dilemma
allowed me to gain understanding in how context may affect expression perception, however a rigid task
like that might not allow participants to express themselves as freely as they may wish to. Therefore I
explored the illocutionary function of expression in the DyNego Corpus, a dataset of multi-issue bargaining task in the form of a negotiation. This dataset consisted of data obtained from three prior multi-issue
negotiation studies. All studies received ethical approval from the University of Southern California’s Institutional Review Board. This dataset was only available internally initially, however as part of my work
on the corpus, I made the dataset partially available as the DyNego-WOZ Corpus alongside the journal
article that also describes the results of the study in this chapter [38].
4.1.1 Negotiation Studies Overview
The three studies had similar designs involving a multi-issue dyadic negotiation between a person and a
VH. Participants were told to assume the role of an antique dealer and negotiated (for up to 10 minutes)
with another dealer over the contents of an abandoned storage locker. All three negotiations involved the
division of six items within the locker with systematic variations in the value of different items. Participants
were randomly assigned either a male or female VH partner, both named Alex, and saw the locker contents
arrayed on a virtual table (see Figure 4.1). To incentivize participants to perform well, they received a base
compensation for their time ($30 USD), but could earn additional money based on the items they obtained
in the negotiation (i.e. how well they performed in the negotiation). Participants in each study received
a payoff table denoting the value of each item and any points obtained in the final deal were converted
into lottery tickets that were entered into a $100 USD lottery. The second and third studies followed
35
Figure 4.1: The female and male versions of the VH used in the studies.
Table 4.1: Distributive condition payoff
Records/ Lamps/ Painting/
Chairs (3) Plates (2) Clock (1)
Participant 30 15 0
VH 30 15 0
Table 4.2: Integrative condition payoff
Records/ Lamps/ Painting/
Chairs (3) Plates (2) Clock (1)
Participant 10 30 0
VH 20 10 0
the first studies design, but participants received light negotiation training. In the second study, some
participants read a case file describing a previous similar negotiation. In the third study, some participants
were given personalized feedback about their performance after finishing a negotiation. Across all studies,
the behavior of the VH was controlled by wizards following the same detailed script outlined in S1.
Research in negotiation emphasizes that negotiation processes and outcomes are strongly determined
by the structure of the payoff participants receive from the items under discussion. In distributive (or
win-lose) negotiations, parties’ interests are in conflict (i.e., they both want the same items). In integrative negotiations, parties have complementary interests allowing win-win solutions, although they may
36
Figure 4.2: A sample of expressive participants as captured by the webcam during the study. From left to
right: Cluster 2-smile, Cluster 3-brow raise, Cluster 4-mouth open, Cluster 5-lip tighten.
not realize it. Across the three studies, participants were randomly assigned one of two payoff matrices
defining either a distributive (see Table 4.1) or integrative (see Table 4.2) structure. Participants were not
told the VH’s value of items in either condition, however they were encouraged to learn about the agent’s
preferences by interacting with the agent through the negotiation. Participants received the points if an
agreement was reached. If no agreement was reached within the 10-minute time limit, participants only
received points equal to one of their highest value items.
The actual contents of the storage locker also varied across conditions, although this was only a superficial change as the overt value of items was solely determined by the payoff matrix: participants either
negotiated over one clock, two plates and three chairs, or negotiated over one painting, two lamps and
three crates of records.
225 participants (100 male, 125 female) were recruited across the three studies through Craigslist.com:
90 in the first study, 76 in the second and 59 in the third. Participants typically completed two negotiations
and filled a number of personality and post-questions. Including questionnaires, the study took a maximum
of one hour.
4.1.2 General Study Procedure
Before the start of an experiment participants would give their informed consent and filled out a set of
initial surveys. Participants were then instructed about the task by the experimenter and they also received
an instruction sheet which showed the relative value of each item for them. Depending on the condition
37
(Integrative or Distributive) the VH’s item value would be the same or different, however participants were
not informed of this. After this, participants were told to sit down behind a table facing a large monitor.
The VH would be displayed on this monitor throughout the study. Participants used a mouse to interact
with the items on screen, a digital timer on the table and displayed the remaining time in the negotiation.
The participants were recorded by two cameras, the first was a camera on a raised tripod placed behind
the monitor. It filmed over the monitor and was angled slightly down to record the full upper body of
the participant. The second camera was a webcam situated on the monitor, it faced the participant more
directly and provided a closer view of their face. An example of the camera footage can be found in
Figure 4.2. Participants wore a small wired microphone in order to collect high quality audio recording of
their voice. Throughout the study the monitor would display the VH sitting behind a similar table as the
participant. On this virtual table, cubes were displayed, each of these cubes had an image displayed on it
representing the items that were part of the negotiation (see Tables 4.1 and 4.2).
After finishing the negotiation, participants would fill out a set of surveys relating to the negotiation
and their impression of the VH. After this, participants would negotiate again with the VH over a different
set of items, and in some cases in a different condition (Integrative or Distributive). Finally, after this the
participants filled out the final set of surveys relating to the second negotiation and were debriefed about
the study.
4.1.3 Negotiation structure and data
In each negotiation, participants read the introductory framing of the scenario, were assigned to a distributive or integrative negotiation, and received their portion of the the payoff matrix. Participants were
not informed of the value of items for the VH, but could figure this out during the negotiation. During
the entire negotiation session, the participants could talk to the agent without any restrictions. In general,
most participants would start off the negotiation by greeting the agent and introduce themselves, and then
38
move onto the negotiation. The negotiation generally would involve either the participant or the agent
proposing some distribution of the items, the other party could then respond to the proposed offer by
either accepting or rejecting. Rejections could further be split up in a complete rejection of the offer, or
as a counter offer where the offer would be adjusted to some extent and presented to the party who gave
the initial offer. The other main interaction open to participants was using their mouse to move the cubes
that represented the items. Both the participant and the VH could assign items by moving these cubes to
either the VH’s or the participant’s side of the table, in order to represent their offer. Two experimenters
controlled the VH during each session in a Wizard-of-Oz style study. One experimenter controlled the
agent’s speech and other verbal behavior, while the other controlled the agent’s nonverbal expressive behavior, such as facial expressions and gestures. These experimenters were seated in a separate room from
the participant, but were able to see and hear the participant through a video feed. The experimenters
were trained to make the avatar behave as realistically as possible, they also had received instructions on
which items to prioritize in the negotiation. The full wizard instructions can be found in S1. In general the
experimenters were instructed to try and win the negotiation by maximizing their own score.
Video footage of the participant was recorded during the study by using the webcam. This footage
was further analyzed with facial expression software; more details about this can be found in Section 4.2.2.
In order to record their speech, the participants wore a microphone during the study. The VH’s speech
was also recorded separately. Both the participants’ and the VH’s speech were automatically analyzed
and annotated, giving me detailed insight into speech patterns such as the type of phrases (e.g. rapport
or accepting/rejecting and offer). I also looked into identifying offer values through speech, although the
accuracy of this was somewhat low. Therefore, I focused the analysis on the response to offers presented by
the VH, which was more accurate (more details are found in section 4.2.1). Finally, the speech annotations
also indicate when participants were speaking. This information can be useful for the facial expression
analysis as well, because speech can affect how facial expressions are displayed and interpreted.
39
4.2 Method
In this section I will describe how I processed the data within the DyNego Corpus. As the data initially
existed of just raw audio and video, both these channeles needed to be processed in order to perform
analysis on the illocutionary function of the expression of the participants.
4.2.1 Audio Processing
Audio processing was divided into two steps. The first involved automatically annotating the audio. In
order to do this I used a framework specifically designed for annotating negotiation dialogue. Once I
obtained the automatic annotations, I further process these in order to extract meaningful acts, such as
offers, responses to offers and the specific values of offer these offers.
4.2.1.1 Automatic Annotation
The automatic annotation system takes as its input the captured audio of the VH and the captured audio
of the participant during the negotiation. It then proceeds through a series of natural language processing
steps that ultimately output recognized dialogue acts for both the VH and for the participant. These processing steps include: prosody analysis (pitch, power, and spectral stability [81]); voice activity detection;
inter-pausal unit (IPU) detection [44]; automatic speech recognition with Baidu’s DeepSpeech2 [2]; forced
alignment (which synchronizes words to the speech signal and provides accurate word-level timing information); dependency parsing; and dialogue act segmentation and recognition within a BiLSTM-CRF architecture [56]. The semantics of dialogue acts are represented in a detailed collection of key-values, including
key-values that represent the type of dialogue act (following the general-purpose and dimension-specific
function taxonomy of [42]), the propositional content of the dialogue act, the specific division of items
being discussed, and others. For the analysis, I clustered the detailed dialogue acts from the automatic
annotations into 16 categories.
40
Figure 4.3: Total occurrences of all agent acts.
4.2.1.2 Event Parsing
Through the automatic annotations, I gained some information on what people and the VH said specifically
at a word level, as well as the timing of each specific word. Additionally, I also have information on the full
dialogue act at an utterance-level. This information can be used to separate acts based on their content,
for example using the “dimension” information of an act as the “timeManagement” or “turnManagement”
tag, this implies that the act was most likely used for turn keeping. In general, the accuracy on correctly
identifying both the words and the dialogue act is higher for the VH than the participants. By looking
specifically at the VH’s speech, I can use this information to further classify the speech acts into several
ontological categories, thus allowing me to determine whether there are differences between these. In total
I defined 16 of these categories. Figures 4.3 and 4.4 show the occurrences and the distribution for acts of
these categories across the negotiation.
41
Figure 4.4: Heat map of the distribution of agent acts throughout negotiations. The x-axis represents time
throughout negotiation, with the start of the negotiation on the left. With the color representing the ratio
of acts occurring at a specific time (Dark red representing close to all speech acts that have occurred,
Darker blue representing a low rate).
• Assert preference (self): Captures preference statements of the participant. This only captures
statements that specifically relate to the participant. E.g. “I want the chairs.” For the analysis I
combine
• Assert preference (general): Captures general statements about preferences, as opposed to the
previous category. E.g. “We both want the lamps.” For the analysis I combine this category with the
‘assert preference (self)’ category.
• Request preference: This captures statements and questions that request the preference of the
other. E.g. “Which item do you want most?”
• Persuade preference: These are statements that try to persuade the other. E.g. “This is a good
deal.”
42
• Make offer: These are statements where an offer is being made. These can both be a full offer
including all items in the negotiation, or a subset of these.
• Accept offer: These are statements where an offer is accepted. In the case of a full offer, this
generally means the negotiation has finished.
• Reject offer: Statements where the offer is rejected.
• General approach: This captures statements about the general approach in the negotiations. This
includes summarizing the scenario, which the VH specifically may do if the participant is confused
and also statements such as “Let’s focus on the value of the items, not the number of items”.
• General evaluation - Agree: These are statements where the speaker agrees with a statement,
excluding statements where the speaker accepts an offer.
• General evaluation - Disagree: These are statements where the speaker disagrees with a statement, excluding where an offer is rejected.
• General evaluation - Uncertain: These statements are made when the speaker is not sure (e.g.
“I’m not sure”). This also occurs when the agent is debating whether to accept an offer (e.g. “I don’t
know if I would really have enough space for the clock”).
• General evaluation - Surprise: These statements are made when the speaker is surprised, they
tend to be exclamations such as “Oh really?”
• General evaluation - Fair: These statements relate to wanting to be fair or implying that an offer
is fair, a fairly common strategy taken in the negotiations (e.g. “I think that’s pretty fair”)
• Rapport: These are rapport building statements, some examples are introductions done at the start
of the negotiation (e.g. “I’m Sam. What’s your name?”), thanking the other and saying farewell at
the end of the negotiation.
43
Table 4.3: Participant verbal response time (seconds) to agent speech
Speech Act Resp. time M Resp. time σ
Assert preference 3.369 5.17
request preference 1.900 2.43
make offer 5.982 2.43
reject offer 4.413 6.86
accept offer 2.062 2.77
rapport 1.529 2.57
backchannel 4.946 5.71
turn-keeping 7.286 6.75
• Backchannel: These are short verbal bursts produced by listeners during the partner’s speech (e.g.
“oh,” “ok,” etc.)
• Turn keeping: This captures filler words used by the agent with the goal of keeping it’s speaking
turn (e.g., “Well...” and “But...”). The agent will often use these phrases while receiving an offer, as
well as when it is deciding to present an offer itself.
Some events occur more commonly than others, e.g. “turn keeping” events are much more common
than “make offer” events. As such, I first looked at the overall distributions of the ontological categories
used by the agent, in order to gain a better understanding about the distribution of these events. As
shown by Figure 4.4, most of the acts are fairly spread out throughout the negotiation, with two notable
exceptions: “Accept offer” occurs by large at the end of the negotiation, while “Rapport” speech acts occur
mostly at the very beginning. This makes sense in the context of a negotiation, which generally starts off
with introductions where speech acts will mostly fall within the “Rapport” category. A negotiation will end
once the offer has been accepted, as such the “Accept offer” acts will fall at the very end of the negotiation.
There are also some more subtle dynamics within the events. For example “Request preference” acts occur
more often in the beginning, as the agent is first trying to gain some understanding on the items that the
participant wishes to have. “Backchannel” events also occur more at the beginning, perhaps implying that
44
the agent is allowing to participant to speak more, in order to explain their preferences. Near the end of
the negotiation “make offer” events become more common, showing that after gaining an understanding
of a participant’s item preferences, the agent starts making offers. “Reject offer” also occurs more often
near the end, indicating that the participant as well, will be presenting more offers near the end of the
negotiation.
4.2.2 Visual Processing
I processed the corpus’ videos (captured at 30 frames per second) to extract facial expression data. By using
a commercial software system based on CERT [53], I extracted a continuous measure of the participant’s
Action Units (capturing the measure for each individual frame, i.e. 30 measures per second). I then calculated facial expression factors, using the approach described by Stratou et al. [85]. They performed factor
analysis on action units shown by people in lab studies, allowing them to find groups of action units that
often co-occur. Through the analysis they found 6 of these groups, by order of importance these groups
are: ‘enjoyment smile’ (Factor 1/F1), ‘eyebrows up’ (F2), ‘open mouth’ (F3), ‘mouth tightening’ (F4), ‘eye
tightening’ (F5) and ‘mouth frown’ (F6), see S2 for further details. Following this I used the model by
Lei et al. [52] in order to obtain a continuous measure of the participant’s expressivity. Because CERT,
and by extension the expressivity measure, is affected if the target is speaking, I only use data for when
the participant is not speaking, as indicated by the automatic annotations. Although I have expressivity
measures for the duration of the entire interaction, for the analyses in this chapter I will be focusing on
the expressive reactions of participants to speech acts of the VH. I do this by looking at a 6 second time
frame in the data that is anchored around the time that the VH finishes speaking. This time frame includes
the first 1 second before the end of speech, as people may show some expressive reactions before the VH
finishes speaking. The time frame then continues for 5 seconds after the end of the VH’s speech, for a total
of 6 seconds. I chose this time frame, because the expressivity signal appeared to start growing stronger
45
Figure 4.5: Heat map of participant expressivity measure in reaction to each act over time. The x-axis
represents time throughout negotiation, with the start of the negotiation on the left. With the color representing the expressivity (with darker hues of blue representing low expressivity, lighter blue and red as
high expressivity). The colored area represents the standard error of the mean.
around the end of the agent’s speech act (see figure 4.7), as such my data captures the moment when a
participant would generally start showing an expressive reaction to the speech. I chose the the 5 second
time frame, as most participants took on average around 6 seconds to react (see table 4.3). Because I did
not include the expressive signal after participants started speaking, this allowed me to keep the majority
of the data.
4.3 Analysis and Results
In this section, I will give details on how the automatic annotations and the facial expression data were
combined, in order to analyze the use of facial expressions in a negotiation, as well as give details on the
analysis itself.
46
Figure 4.6: Clustered expressivity by act. The Y-axis represents the ratio of each expression for the speech
act.
4.3.1 Expressive events
In order to analyze a participants expressive events, I first had to find situations were participants were
in fact expressive. As during large parts of the data the participant might be talking, listening or thinking
about their next move, participants are expressive in varying degrees throughout the interaction.
In order to find expressive events I define expressivity based on the work of Lei et al. [52], who created
an expressivity measure by training a random forest model on video clips of people playing an iterated
prisoner’s dilemma, which were rated on expressivity by various annotators (see S3 for further details).
Furthermore, since I have the timing information of all the speech acts that occurred in the negotiations, I
can analyze the participants’ expressivity in relation to various speech acts falling within specific ontological categories. Thus, allowing me to gain further insight into whether there are any differences between
these ‘expressive events’.
47
Since VH speech was identified more accurately by the automatic annotations, I will anchor the expressive events around the VH speech events for this analysis. For example, a “Make offer” event is based on
the VH presenting an offer to the participant, and as such the participant’s response to this offer. I can categorize each expressive event by the ontological category of the agent’s utterance prior to the expression.
This gives an impression on what types of utterances give rise to expressive events. Figure 4.5 shows a
heat map of the average expressivity of reactions to acts by intervals of 0.1 throughout the interaction. You
can see that overall the reaction to each event do not change much throughout the interaction (e.g. people
show similar expressive reactions to “request preference” acts throughout the negotiation). However, there
are two notable exceptions: “accept offer” and “rapport.” Based on figure 4.4, you can see that these two
acts mostly occur either at the very beginning (for “rapport) or at the very end (for “accept offer”). Due to
low sample rate of these two acts throughout most of the negotiation, it’s hard to show significant differences in expressivity. However, we do see that the “rapport” act is very expressive at the beginning, while
“accept offer” also shows reasonably high expressivity at the end (i.e. the moments in the negotiation when
these speech acts often occur), indicating that these events tend to elicit more expressive reactions than
the other acts. We can also see some differences between expressive reactions between other acts, “make
offer” and “turn-keeping” appear to have less expressive reaction, whereas “request preference” appears
to be more expressive overall.
Although the heat map in Figure 4.5 shows that people are expressive in response to many types of
different events, I wanted to get some further insight into this. In order to do so, I performed k-means
clustering with k=5 using the maximum velocity factors described in Section 4.2.2, in order to find the
most common occurring expressions. I found four clusters related to specific patterns of expressions (see
Figure 4.2): ‘smile’, ‘brow raise’, ‘press lips’ and ‘open mouth,’ the final category contained the remainder
of expressions, note that more clusters might give different patterns. I used K=5 as it gave a low sum
of squared errors while also keeping the number of clusters low. As shown in Figure 4.6, the frequency
48
Figure 4.7: Expressive reactions to various speech acts by the VH.
of these clusters differ depending on the partner’s speech act. Smiles are common across most acts, but
especially following rapport and accept offer acts. Lip pressing occurs after the VH asserts/requests preferences, accepts an offer and backchannels. Brow raises occur after assert preference and reject offer acts.
This shows that while people are expressive after partner speech acts, the nature of these expressions can
yield additional information.
Whereas looking at the average expressive reaction to each act (as in Figure 4.5) can give some information about the the act is perceived, it does not capture everything. Expressive reactions can differ is
their trajectory over time, i.e. do expressive reactions unfold differently over time? To get some further
insight into this, Figure 4.7 shows a subset of the acts and the expressivity in reactions over time. This
subset represents the eight most common acts in both the overall distribution and the distribution when
filtering events by requiring a minimum level of expressivity. Each of these acts are represented by a plot
that shows the average expressivity over time based on the agent’s speech act. The dotted line represents
the point when the VH stops speaking (i.e. the plots start 1 second before the end of the speech act and
49
Figure 4.8: Participant mean expressivity towards different agent offers. Error bars represent standard
error. The X-axis is the value of the offer for the agent, i.e. far right agent receives 66% of the items,
value in parentheses are the total number of offers (offers values with less than 20 total occurrences were
not considered). The Y-axis shows the participant’s expressivity. This plot shows the curvilinear nature of
expressivity towards offers based on values. Regression analysis showed there was a significant curvilinear
effect (p=0.002).
end 5 seconds after the VH stopped speaking.) By plotting these values over time, you can see differences
in the types of expressive reactions to various acts.
Looking at expressivity shows that most acts evoke facial reactions, as shown by the sudden increase
in expressivity as the VH finishes speaking, however both backchannel and turn-keeping acts do not evoke
reactions, which is in line with these acts being underrepresented in the acts filtered by expressivity. There
are further differences between the acts that do evoke a reactions. For example, “make offer”, has a relatively low increase when looking at the average expressivity over time, while “accept offer” and the
“rapport” speech acts seem to create the largest expressive reaction on average.
50
Figure 4.9: Participant response to offers by agent value with linear regression. Similar to Figure 4.8 the Xaxis represents the offer value to the agent with the total number of occurrences of an offer in parentheses
(total number of offers are different compared to Figure 4.8 as offers with no direct participant response
are not considered).The Y-axis shows the percent of responses.
4.3.2 Offer Analysis
The main component of negotiation is making and responding to offers. Therefore, I further analyzed
expressive reactions during offers. I focus the analysis on the VH’s “make offer” acts and analyze the
expressive reactions to these to determine if differences in reactions might reveal how the offer is evaluated.
Presumably the quality of the VH’s offer is important for this evaluation. To examine this, split offers by
the division of items. Since there are 6 items total, a fair split (ignoring item value) would be that both
sides receive 3 items each, so each side gets 50% of the items. But the agent may often offer the participant
a better or worse split.
Figure 4.8 shows that the strength of participant’s reactions differ as a function of the value of the VH’s
offer, suggesting they contain information about how the offer is evaluated. The x-axis displays value for
the participant: i.e., if the participant receives 66% of the items, the VH receive 33%. The graph shows
51
Figure 4.10: Participant expressivity over time, averaging over data depending on whether the participant
accepts, rejects or presents a counter offer. Data is anchored around end of VH utterance, showing the
average expressivity 1 second before end of speech until 5 seconds after.
a curvilinear pattern: expressivity is higher for good and poor offers, whereas fair (50/50 split) elicit less
expressive reactions. I performed a regression analysis on the data, showing that this curvilinear effect
was indeed significant (p=0.002). Looking further into other components of expressivity, we see a similar
curvilinear pattern for smiles(maximum velocity of factor 1). This implies that participants don’t just smile
when they are given good offers (i.e. when they receive 66% of the items), but also when they receive offers
where they do not receive items.
Another way to see if expressive reactions contain information is to see if specific expressions reveal
the participant’s intentions toward the offer. Obviously, these intentions will be shaped by the quality of
the offer, but perhaps expressions will reveal additional information. Figure 4.9 shows, indeed, that people
are more likely to reject poor offers (note that participants tend to see more poor offers as the wizard of the
VH was instructed to make competitive deals and only move towards better offers if necessary). Figure 4.10
shows the temporal dynamics of participants’ reactions also differ as a function of their decision (accept,
52
reject, counter) and time. The plot shows the 6-second reaction starting one second before the end of
the VH’s speech act (at -1) and ending five seconds after the VH has stopped speaking. Again, the plot
shows that participants begin to react before the end of VH speech (but not when they subsequently make
a counter-offer). Participants show the strongest and most persistent reactions when they subsequently
reject the VH’s offer.
4.4 Modeling the illocutionary function of expressions
Based on the analysis, expressions seem to reveal information about how participants are evaluating the
VH’s speech. For example, Figure 4.8 shows that participants show varying degrees of expressivity and
that the velocity of Factor 1 changes based on the perceived value of the offer for the agent. Furthermore, as
shown in Figure 4.10, after the agent presented the participants with an offer, participants show expressive
reactions. However this reaction is more intense when participants are going to reject than when they
accept or present a counter offer. These results are in line with the work of Scarantino [76], giving further
evidence to the existance of an illocutionary function of facial expressions.
As such, it may be possible to use the expressivity and its components in machine learning models to
predict how participants intend to respond to the VH’s speech. In this section I will give an overview on
some machine learning experiments that were performed to investigate this.
4.4.1 Predicting Expressive Intent
The goal for the machine learning experiments, was to predict the participant’s decision to accept or reject
an offer. this decision was predict the outcome as a three-class problem. The reason for having three
possible outcome class is that in addition to accepting and rejecting,the participant can also modify the
offer in a ‘counteroffer.’ As a baseline I use the majority vote, which in this case is to choose to reject the
offer. This baseline gives us an accuracy of 0.53, showing the imbalanced nature of the dataset used for the
53
classification, with the large majority of decisions being to reject (followed by presenting a counteroffer,
accepting offers is the least used decision). Due to the naive approach of the baseline, the F1-score is low,
reaching only 0.23.
I used several sets of features for the machine learning approach. First off I defined a set that represented the offer value. This was done by using the percentage of items allocated to both the participant
and to the VH, similarly to how I represented the offers in Figure 4.8, as two separate features (‘participant
item %’ and ‘agent item %’)
For the facial expressivity features, I use the values captured over the same time frame as shown in
Figure 4.10, starting 1 second before the end of the agent speech until 5 seconds after, for a total of 6
seconds. As shown in Table 4.3, the average time for a participant to respond to an offer is 5.98 seconds
(sd=2.43). As such many of the participant’s spoken responses lie outside this time frame.
For the facial expression feature set, I used a ‘dynamic’ measure to capture facial expressions. Rather
than simple describing the expressions by giving the static average (as is commonly done), I use different
measures here to define the specific dynamics of the facial expression. Dynamic features are an important
component of the Expressivity. In order to represent the dynamics I use the velocity of the Action Unit
Factors within the 6-second windows. More specifically I use the maximum velocity. Secondly, I also use
the expressivity itself as a feature here. Finally, I combine the offer features and dynamic face features for
a set that uses both the information embedded in the facial expressions and in the offer values.
Summarized, the following measures are used: The participant item % and agent item % as the ‘offer
value’ features, the average AU factors as the ‘static face’ features and the max velocity AU factors and
expressivity as the ‘dynamic face’ features.
Due to the imbalanced nature of the dataset, I used Synthetic Minority Oversampling Technique
(SMOTE) [15] for training the models. Lastly, for hyperparameter tuning I used a grid search cross validation using F1-score as its metric. The results of the experiment are shown in Table 4.4. Due to the
54
Table 4.4: Classification results of majority vote and Random Forest Classifier (RFC) on 3-class (accept,
reject, counter) problem
Original Oversampled
Feature set F1-score Accuracy F1-score Accuracy
Majority vote 0.23 0.53 0.17 0.33
Offer-only RFC 0.36 0.39 0.43 0.46
Dynamic face RFC 0.38 0.43 0.78 0.79
Offer & face RFC 0.43 0.5 0.81 0.81
imbalanced nature of the dataset, I also ran the experiments on a rebalanced dataset by using random oversampling on the two minority classes (accept and counter-offer) prior to running the experiments. Looking
at the original data, all experiments beat the majority vote baseline F1-score of 0.23, with the combined
offer and face model performing best with a f1-score of 0.43. Although the majority vote performs best in
accuracy, the combined offer and face model accuracy performance is very close to it in accuracy as well.
Due to the imbalanced nature of the dataset I argue that the F1-score is a better metric of performance in
this case. Furthermore when looking at the oversampled data, we can see that the combined offer and face
model, as well as the face-only model, perform far better than using a naive approach like majority vote or
using only offer-value information. This indicates that facial expressions contain important information.
4.4.2 Feature importance
Because the best performing RFC-model was a hybrid model using both offer information and facial expressions, I was interested which features were most important in the model. Was the model mostly using
the offer items to determine the outcome, or was it actually leveraging the predictive power of the expressions? In order to do this I looked at the importance of features embedded within this model on the original
data. As shown in Figure 4.11 the model appears to largely rely on the expressivity value, with a weight of
over 0.25. Following this are the max velocity of factors 2 (eyebrows up) and 4 (mouth tightening). After
55
Figure 4.11: Feature importance of the random forest classifier.
this is the number of items assigned to the agent with a weight of around 0.10. As such, it appears that the
model is mostly attentive to the facial expressions when determining a participant’s response to an offer.
4.5 Conclusion
In this chapter I looked at the illocutionary function of facial expressions within the domain of a negotiation. During this study, the domain of negotiation has proven useful to study this communicative function
of facial expressions in a naturalistic setting. I show that participants show strong expressive reactions to
their partner’s speech (Fig. 4.7), that the nature of these expressions differ as a function of the content of
that speech (Fig. 4.6). Many speech acts such as accept offer, rapport and turn-taking elicited strong smile
responses, however smiles were less prevalent in negotiation acts such as assert preference and request
preference. Finally I show the illocutionary function of these reactions, as they can be used to predict the
expresser’s subsequent actions (Table 4.4).
56
Although predicting behavior is a difficult task, there appeared to be valuable illocutionary information
contained within the expressivity signal of participants that can be used to help with this task. In order to
demonstrate the use expressivity I used expressivity features in a random forest classifier. I used grid search
cross validation across the number of estimators, maximum depth of the trees, the minimum sample split
and the minimum samples for a leaf to obtain the best model and trained the models using 10 fold cross
validation across all data. Using these facial features, the RFC outperformed a majority vote approach, as
well as an RFC that used offer-value information in F1-score. A fusion model using both facial features and
offer-value performed best in F1-score. By looking at the importance of features of this model, I learned
that expressivity was the most important feature for the model to determine outcome.
Following these results on the illocutionary function of facial expressions within specific contexts, I
will look at the perlocutionary function in the following chapter.
57
Chapter 5
The influence of Context on the Perlocutionary Function of Expressions
The preceding chapters illustrated how incorporating context improves the accuracy of automatic emotion
recognition and predicting the social intentions behind these expressions. Finally, I turn to the impact of
these expressions on the observer. Again, context will prove crucial. Following Scarantino’s Theory of
Affective Pragmatics [76], expressions can be analyzed from multiple perspectives and this chapter adopts
a focus on expressive consequences. This is analogous to examining the perlocutionary function of speech
in linguistics, which refers to the effect a speech act has on the interlocutor. For example, after some sort of
transgression that may have upset the interlocutor, an expression of guilt from their partner might make
them feel better, similarly a rousing speech by a coach might inspire the team to perform better.
To investigate the automatic prediction of expressive consequences, I will return to the split or steal
framework from Chapter 3. Specifically, I will highlight how facial expressions of one player shape the
decisions of their partner. From an algorithmic perspective, I will show how this can be used to predict
the partners responses. Perhaps more interestingly, I will highlight how algorithms might uses this effect
to shape the responses of a human partner ∗
. For example, a virtual human might chooses expressions
strategically to elicit certain actions from their human partner.
As we will see, predicting people’s decisions in social dilemmas is a complex and multi-faceted problem. Even in simple two-person dilemmas, people don’t usually make rational decisions [57, 54] and are
∗This chapter describes work originally reviewed and published at ACM AAMAS 2017 [39].
58
influenced among other factors by affect, communication with their partners, and elements of reciprocity
[45, 83, 25]. Prior research has shown 1) that emotional expressions can telegraph a person’s cooperative
tendencies and 2) that observers use these expressions to inform their own decision-making [25, 47, 62,
88, 84, 78, 12].
I will discuss a computational prediction model that predicts a prisoner’s dilemma player’s actions,
based not only on their opponent’s prior actions, but also on their opponent’s emotional expressions and
the context of preceding events. This model can be used to drive the actions of virtual and robotic agents
that can perceive or even participate in the exchange of strategic emotional displays, simulating the way
humans interact in these scenarios [18, 61]. The model could also be used to provide real-time decision
support to a decision-maker by analyzing the behavior of their opponent or to help teach students how to
attend to emotions in social interactions (e.g [10, 43]).
Specifically, I will incorporate immediate reactions or emotional displays of a player as input into the
decision model of the opponent in an iterated prisoner’s dilemma task. The model will be trained on data
from natural, unscripted interactions between pairs of human players and consider both the emotional
signaling by each player as well as the overall context of the game.
By building this predictive model of the opponent’s decision in the iterative social dilemma, I will be
able to investigate the intricate dynamics in iterative social dilemmas. For example, using the opponent
model in simulations, one can explore different strategies in different contexts. In this case, I will incorporate emotional signaling into the model, and am especially interested in perlocutionary effect of these
emotional signals: how do the player’s facial expressions affect the outcome of the game, and can one form
strategies based on displayed emotion? I hypothesize that:
• Emotional signaling in combination with game behavior can influence opponent game acts
59
A further benefit to this model is that it can run in real-time so that these findings can implemented in
a real-time agent system. Specifically I use computer vision to automatically detect and classify facial expressions in videos. This way, I can capture natural expressions from real interactions and can incorporate
realistic emotion displays in the model.
5.1 Method
While the DyNego study allows for more naturalistic interactions, the less structured design of the study
made it more complicated to deconstruct the behavior of participants. Since the perlocutionary function
depends on the behavioral response of a participant, instead of just their expressive response, understanding their behavior becomes a pivotal aspect of the study. The Wizard of Oz DyNego study was well suited
to studying the illocutionary function of expressions by using the accurately logged information of the
agent’s behavior during the study. This meant that the interaction context could also be defined using just
this information, while the expressions of the participants were detected automatically from the video.
However, when looking at the perlocutionary function of expression I am interested in the behavioral response of a person and need to be able to retrieve this behavior as accurately as possible. As opposed to the
negotiation, the iterated prisoner’s dilemma of the split-or-steal framework allows for easy classification
of both behaviors and context of the situation, making it much more suited for studying this concept. Furthermore the more structured setup of the split-or-steal game allowed me to directly look at face-to-face
interactions (albeit computer-mediated), rather than using a Wizard of Oz design as was necessary for the
negotiation study. Therefore I returned to the Split-or-Steal framework corpus in order to study this effect.
5.1.1 Split or Steal Framework
As I return to the Split or Steal framework, I will give a brief overview on how the current data differs from
the study described in chapter 3. For a detailed overview of the framework and data see section 3.1.
60
For this study, data was collected in a lab setting somewhat differently than the previously study.
Rather than in a room by themselves, participants played the golden balls game in a large room with rows
of desks and several computers, similar to a testing center for a standardized test. Each participant was
seated in front of their own computer which was surrounded by a barrier. So while groups of participants
were in the same room, they were seated such that they could not see their opponents, furthermore they
were told not to speak and could only communicate through the skype interface, similar to the previously
described split or steal study.
Before playing the game, players filled out a demographic questionnaire and personality scales. They
then read instructions about playing the game and had the opportunity to ask questions to the experimenter before the game started. After finishing the game they filled out additional questions about their
experience playing the game.
Each player was paid a $30 for their participation; additionally they were motivated to maximize their
own score by having a chance to receive an additional $100 through a group lottery. In each of the 10
rounds, both players were offered 10 lottery tickets to divide amongst themselves. Table 5.1 shows the
payoff matrix used for the study. By gathering more tickets, players could increase their chances at winning the lottery. Based on the decisions whether or not to cooperate, player would divide the tickets (see
Table 5.1)
In total, 370 participants did the study. In addition to the synchronized videos of each player, the data
consisted of decision on each round (split or steal), including the time of this decision and the time of the
start of a new round. Using this time information it was possible to match the time of decisions to the
relevant webcam video footage. The video footage of the study was stored on a secure server for further
analysis.
61
Table 5.1: Round payoffs in the iterated prisoner’s dilemma
Other cooperate Other defect
Player cooperate 5,5 0,10
Player defect 10,0 1,1
5.1.2 Automatic Behavior Extraction
In order to categorize the interactions that occurred during the studies, I used a similar set of behaviors
as to the ones defined in section 3.3.2.2. Similar to this study the data can be split up into game behavior
and expressed displays. The game behavior features include the behavior of a participant and as such can
be used to describe the current context of the prisoner’s dilemma. The expressed behavior features on the
other hand, are descriptors of the facial expressions displayed by participants.
Table 5.2 shows an overview of all features that were used for modeling, showing the overall categories
game behavior and facial displays.
Game behavior contains information on the current state of the prisoner’s dilemma, as well as the
decision making process of both the participant (self) and their partner (other). Since a participant’s future
decisions may be influenced by actions taken on previous round, one feature is the individual choice made
by each player on the preceding round: cooperate (C) or defect (D). Secondly I keep track of the game state
of the current game. This game state is decided by the joint decision made by both participants on the
preceding round, which directly determines the context: joint cooperation (CC), the participant is exploited
by the opponent (CD), the other exploits the opponent (DC) or joint defect (DD).
Additional features that will be used modeling the perlocutionary function include the current round
number, the ratio of the participant’s score of the combined participants’ score, the cooperation rate of both
participants over preceding rounds, and rates for of how often the four game states occurred throughout
the game (e.g, how often did joint cooperation occur in the preceding rounds).
62
Table 5.2: Overview of extracted features used for modeling
Category Feature Description\encoding Source Game Behavior
Self last pick C or D
IPD event database
Other last pick C or D
Prev. state CC,CD,DC,DD
Curr. round 2,...,10
Relative score playerscore
jointscore
Coop. rate % cooperation so far
CC rate % of joint cooperation so far
CD rate % of being exploited so far
DC rate % of exploiting opponent so far
DD rate % of mutual defection so far
Facial Displays
AU6 Cheek raiser
FACET[53]
AU10 Upper lip raiser
AU12 Lip corner puller
AU14 Dimpler
AU17 Chin raiser
AU24 Lip pressor
63
Facial displays describe the participants expressed behavior, i.e. their facial expressions. Since facial
expressions are the only means of communication during the game, decisions are shaped by how their
opponent emotionally reacted to the previous outcome. To examine this, facial expressions were extracted
using a tool based on CERT [53], similarly to the previous study. However, rather than relying on the
expressivity measure as was done previously, I used the Action Unit evidence measure that is reported by
the tool directly. Action Units these are components of a facial expression e.g., action unit 12, or AU12 for
short, denotes upturned corners of the mouth. The evidence values describe the likelihood on a logarithmic
scale of a specific action unit being active, with positive values indicating that the software classifies the
action unit as active and negative values indicating inactivity. These values where then averaged for each
round using the mean evidence values between specific event timings, for example, while participants are
deciding what to do and when the joint decision is revealed. Similar to the first study, the strongest facial
expressions occurred following the reveal event of the game, so I used the average evidence in the first six
seconds after this event.
Since I relied on the action units rather than expressivity values for this study, I narrowed the number
of action units considered for analysis down. This was done by using only facial action units that have
a high occurrence rate in the data as measured by their activation ratio. As such, I kept only the action
units with an overall activation of more than 25% of the data frames. This process left me with six AU
features in total: AU6 (cheek raiser), AU10 (upper lip raiser), AU12 (lip corner puller), AU14 (dimpler),
AU17 (chin raiser) and AU24 (lip pressor). Unlike to the study in Chapter 3, here I focus on the emotional
reactions preceding the reveal. This allows the model to run in real time and predict the decisions before
it is revealed.
Participant data was also screened using a threshold value, this value was based on the number of
frames with a high confidence rating (i.e. this measures the confidence of CERT that a specific AU is
present). Dyads with detection rates of less than 75% of the video were discarded from this work, leaving me
64
with 296 participants total. Finally, the first round of the IPD was excluded from the modeling process due
to its special circumstances, since participants at that point do not have prior game actions or expression
displays to base their decision on.
All of the features presented in Table 5.2 were extracted automatically and are available online in realtime for evaluation of the model. In order to investigate the perlocutionary function of expressions, I will
computationally model these features. As such this model will be using the expressions of participants
and try to determine the behavior of their partner following specific displays. A model such as this could
be used to predict what your partner might plan to do in a prisoner’s dilemma. When discussing these
behaviors I use the term self to refer the participant whose facial displays are being described and the term
other to refer to the player’s partner whose next move the model is trying to predict.
5.2 Results
In this section, I will give the result of my attempt to model the perlocutionary function of expressions on
opponent decisions in the split-or-steal corpus.
5.2.1 Perlocutionary Act
In order to model participant behavior I use the available information in each independent round, such as
the previous events that occurred in the game and the player’s facial displays, with the goal to predict the
game decision of their opponent in the next round. Specifically, the model uses the joint game decisions
and their results (e.g. current score and game state) and the player’s facial displays during the reveal of
the results of the previous round as inputs.
65
5.2.2 Method
Although a variety of machine learning techniques are possible to investigate my research question, I
used a Naive Bayes classifier with a kernel density estimate function in order to construct the model.
This classifier was chosen since the learned model is easily interpretable and because it readily facilitates
the social planning that I will discuss further in Section 5.3. To examine the contribution of different
modalities, I constructed several models using different grouping of features. By creating a model using all
features (both context and opponent expressions), but then contrast this with models trained on contextor with expressions-alone. These models are further compared with the two baselines, a naive baseline
of the majority vote (in this case the model would always cooperate) and secondly a model that employs
the tit-for-tat strategy by cooperating following the opponent’s cooperation and defecting following the
opponent’s defection.
The models were trained with leave-one-participant-out training and testing method. Within each fold,
I first performed feature selection. Using the Lasso regression analysis method, I tested several different
feature sets based on specific degrees of freedom; 3, 5, 7, 10, 15 and 20. After selecting these different
feature sets I used a Naive Bayes algorithm to construct the model.
5.2.3 Model Performance
Table 5.3 summarizes the performance of each model in terms of F1 score and overall accuracy, as well as
the degrees of freedom that gave the best results.
As shown in the table, a model trained on actions alone can outperform the naive baseline and the
tit-for-tat model, confirming the hypothesis that actions factor into the decision in the next round. While
a model based on expressions alone does not manage to outperform the tit-for-tat strategy, it still has an
improved performance as shown by F1 score than the naive baseline. The best performing model used both
66
Table 5.3: Performance of baselines and different models
Model F1 score Accuracy DoF
Actions and expressions 0.742 0.744 5
Actions alone 0.739 0.742 5
Expressions alone 0.521 0.581 7
Tit-for-tat 0.691 0.708 -
Baseline 0.375 0.601 -
context and facial expressions, confirming the hypothesis that opponent expressions enhance predictive
accuracy. This model yielded correct decisions almost 75 percent of the time.
The difference in performance between the models using both actions and expressions and the model
using actions alone is small, however using logistic regression you can see that the action and expression
features both add an independent contribution to the prediction. Using this approach both the actionsonly model (p<0.001, coeff=-2.269) and the expressions-only model (p=0.008, coeff=-0.280) add a significant
unique contribution to the prediction.
From the actions, the features most commonly selected were: oppLastPick, selfCoopRate,CC%,DD% (referencing Table 5.2). From the set of expressions, all AUs were selected uniformly in the different models.
This can be shown using the prediction of a model using either game or expression features as input for a
logistic regression to predict the ground truth.
5.2.4 Model Decision Process
I performed additional analysis to give insight into how the partner’s facial expression’s perlocutionary
function can influence their partner’s decisions. In particular, I wanted to test if specific expressions were
more important at different points in the game. Recall that the majority of participants cooperated in the
game, as a result most of the predictions in the corpus are made when the previous round involved joint
cooperation (CC). I had considered that emotion may be most valuable when players deviate from this
most-common state. Therefore I partitioned the corpus into four subsets based on the four basic contexts
67
that occur in the prisoners’ dilemma game: joint cooperation (CC), joint defection (DD), the participant
exploits their partner (DC) and the participant gets exploited by their partner (CD). In other words, the
participant being exploited (CD) subset only contained decisions that followed the situation where the
partner has just exploited the participant. I then trained individual models from these four subsets and
examined how these models performed. As before, for each context I trained three types of models, a
combined model (using both action- and expression-features), action-only models, expression-only models,
and contrasted these with the two baselines.
As shown in Table 5.4, the action-only model performs best when in the context of a joint cooperation
(CC) state, however the combined model performs best in the being exploited (CD), participant exploits
opponent (DC) and joint defection (DD) contexts. Additionally, I show that the action unit-only model
does better when a participant exploits or gets exploited (the CD and DC states) than it does in the joint
cooperation state, indicating that these states in particular might benefit from using facial displays when
modeling them, rather than using them regardless of context. These results indicate that expressivity might
be more important in these states compared to the much more common joint cooperation state. One final
thing to note is that the performance of the state models appear lower than that of the overall models in
table 5.3. However this is only the case because of the weighted nature of the F1-score, when combining
the specific state models into a “hybrid” model its F1-score is comparable to the full game models.
I next tried to visualize what the models learned. Although the model is nonlinear, it can be displayed
the way the Naive Bayes model use these descriptors as kernels by plotting them using their probability
distribution. As an illustration of the effects of facial expressions, AU12 has a different effect based on
context when shown while the player is being exploited (CD) than when the player exploits (DC). Figure
5.1 shows the probability distribution used by two different models for this action unit. What this graph
indicates is that when the player exploits the opponent (DC) it is better not to smile, a finding in line
with previous research as smiling might indicate that you are more willing concede [62]. Additionally, by
68
Being Exploited (CD) Exploit (DC)
active active
AU12 evidence AU12 evidence
Figure 5.1: Probability distribution of AU12 used for the “player being exploited” (CD) and “player exploits”
(DC) models. A circle notes the AU12 evidence value where max{P(C) − P(D)} is observed.
smiling after choosing to defect a participant can signal that they enjoy taking advantage of their opponent
and are therefore more likely to be punished for this. Whereas, if it was the player being exploited (CD), it
is beneficial for the player to display AU12 (a smile) in order to make their opponent return to cooperating,
perhaps as a means of showing you forgive their transgression. Smiling appears to only be beneficial in
the specific state when your opponent just betrayed you.
When looking at the specific action units used as descriptors by each model, differences can be found
as well. The main action units used when the player is exploited (CD) are AU12 (lip corner puller) and
AU24 (lip pressor), AU12 is often related to joy while AU24 can often be interpreted as negative. If AU12
is active it is more likely that the opponent will cooperate, whereas AU24 should be inactive in order to
maximize cooperation likelihood. AU24, the lip pressor can be seen as a type of mouth control. Therefore
if it is active at the same time as AU12 this could indicate that the smile is not as sincere as when AU24 is
absent. Therefore this seems to indicate that after an opponent defects it would be best to smile sincerely
69
Table 5.4: F1-score performance of state based models
Model
State Combined Action AUs Baseline Tit-for-tat
CC 0.662 0.670 0.459 0.460 0.460
CD 0.617 0.594 0.527 0.308 0.308
DC 0.615 0.607 0.514 0.301 0.363
DD 0.627 0.598 0.504 0.227 0.414
in order to have them cooperate again. When the player exploits their opponent (DC) AU6 (cheek raiser)
and AU17 (chin raiser) are used, again this is an action unit related to positivity (AU6) and one to negativity
(AU17). Although AU6 being active only seems to slightly improve cooperation likelihood, however when
it is inactive it will greatly increase the likelihood of the opponent defecting (and therefore punishing the
participant).
70
5.3 Social Planning
As my analysis on the performance of the model indicated that the facial expressions of one participant
influences their opponent’s decisions. This raises the question of whether it is possible to strategically
influence the opponent’s decisions by choosing different facial expressions (I will refer to this concept as
social planning). Second, even if you can determine which expression will influence the opponent, is this
expression practically significant (i.e., would it make a meaningful difference in outcomes)? In this section,
I will provide preliminary evidence that the answer to both of these questions is yes.
In its broadest sense, social planning would allow a player to generate a sequence of actions and expressions that influence the opponent to achieve a goal, such as maximizing the player’s individual reward.
In describing how this is possible, consider the simple case of influencing the opponent’s decision on the
very next round.
More specifically, consider that player A has just been exploited by their opponent (player B) and would
like to influence their opponent to start cooperating again.
This situation is illustrated by the probability distributions on the left-hand side of Figure 5.1. This
graph indicates what player B will do after they just exploited player A and they observe player A showing
AU12. If player A fails to show AU12 (i.e, evidence is zero or lower), then player B is most likely to defect,
as because the probability of defection (the red line) is higher than the probability of cooperation (the blue
line). In contrast, if player A clearly shows AU12 (i.e, evidence is greater than 1), then player B is most
likely to cooperate a second time. Therefore, if player A wants to exploit player B again, they should smile.
This concept can be extended to multi-step plans by using forward chaining.
Although the best test of this idea would be to run this social planning model against actual human
opponents, here I provide a feasibility test of this idea by using the idea of a simulated opponent. In particular, I use the model that was constructed in Section 5.2.4 as a proxy for an actual human opponent. I
71
consider if a player could increase their individual score by showing specific facial expressions and contrast this to the score that could be obtained by showing only a neutral expression. If judicious choice of
expressions can increase a players score, then this provides evidence that such expressions are practically
significant (in that they meaningfully shape outcomes).
Specifically, I attempt to derive an action and expression policy for player A that maximizes player
A’s score. Because of the nature of the payoff matrix for prisoner’s dilemma, player A can maximize their
score by coercing player B to cooperate as much as possible while, further, exploiting player B as much
as player A can get away with. (By inducing player B to cooperate, player A will have the opportunity to
score five points when cooperating and 10 points when exploiting, whereas a defecting player B will afford
only zero or one points per round). I derive this policy through brute-force forward-simulation† with the
simulated opponent. At each round in the game the model branches on player A’s action (cooperate vs.
defect) and identifies the expression that maximizes B’s probability of cooperation. You can then compare
this against an agent that shows only neutral expressions. For the no expression simulation I will assume
none of the action units are active (using a FACET value of -1).
By plotting the probability distributions I was able to identify the optimal strategy for using our expression. This strategy involves displaying the action units that are often related to happiness, as well as
the Duchenne smile; AU6, the cheek raiser, should be active (with a value of 1.76) and AU12, the lip corner
puller, had an optimal value of 1.45. AU10, the upper lip raiser, can be detrimental to the result when active
and has an optimal value is -1.89. AU10 can often be related to negative emotions such as anger. AU24
should also be inactive as well (-0.87), while the last two action units used in the model can generally have
some low form of activation: AU14 has a value of 0.49 and AU17’s optimal value is 0.65.
Although this model shows that it is possible to achieve a better result when being able to predict
your opponent’s behavior, it is likely that actual human opponents will change their strategy when they
†
For this simulation I used a well performing model that uses the six action units with an activation rate over 25% as descriptors. Since the model does not simulate the first round, it assumes that participants start in the CC state for this, as it is the most
common game state in the first round in the dataset.
72
Table 5.5: Examples of optimal strategies when using a neutral expression and manipulating certain decisions
Maximum Score Example decision
Neutral 70
C-C-D-C-D-C-D-C-C-D
C-C-C-D-D-C-D-C-C-D
C-C-C-C-C-C-D-D-D-D
Manip. C 90
C-D-D-D-D-D-D-C-D-D
C-C-D-D-D-D-D-D-D-D
are confronted with an approach such as this. Additionally it is possible to deceive a model such as this,
for instance, by presenting behavior that is completely different than the behavior that was used to learn
the model. Therefore the results of the simulation should not be considered as an optimal strategy when
playing against human opponents. Nonetheless the simulation illustrates that a better understanding of
action and emotion can inform social planning, although the effectiveness of this approach will need to be
demonstrated in subsequent research.
Table 5.5 summarizes the score obtained and the resulting policy that results from these social planning
simulations. When using the neutral expression strategy, the maximum possible score you can achieve is
70 points (out of a maximum of 100). This score can be obtained by defecting four times throughout the
game, with one of these defections occurring in the final round as this leaves the opponent unable to
respond. This pattern is sufficient to induce the simulated opponent to always cooperate and shows one
possible explanation why the model performs better than a tit-for-tat approach, as the tit-for-tat approach
would defect at least three times when faced by four defections. In contrast, by using appropriate facial
expressions, it is possible to defect up to eight times, resulting in a score of 90 points. Again, these results
are suggestive and need to be verified against human opponents.
73
5.4 Conclusion
In this chapter I have shown that both context and the facial displays of a player have predictive power
in a classification model for decision making behavior in a iterated prisoner’s dilemma. Secondly I have
shown how an individual or agent can use this model for social planning by employing not only strategic
game decisions but also emotions.
I have shown that just the context of the game can have a strong effect on the decision that a participant
may make, rather than the participants expressive bahviors. A simple interpretation is that because of the
iterative nature of this task previous actions speak “louder than words,” however combining both game
acts and emotional displays will still yield the best performance. This finding validates previous research
output that emotional signals influence game decisions in joint tasks and moreover, I provided a model, a
mechanism, on how emotion contributes to the opponent’s decision in a certain round.
To dissect the impact of emotion in more detail, I also modeled the opponent decision separately in
the four contexts that are possible within the game (joint cooperation, joint defection, player exploits and
player being exploited), as discussed in Section 5.2.4. By adding this context information I am able to
highlight the impact of emotion in the decision of the opponent by coupling it with specific game events
(e.g., being exploited, or self exploiting on an opponent).
This analysis revealed that emotion input has more impact in the opponent’s decision following a
state that involved defection (joint defection, player exploits or opponent exploits) as opposed to joint
cooperation (Table 5.4). This observation perhaps makes sense considering that the majority of people
picked cooperation (joint cooperation, makes up for ∼46% of the data) and with continuous cooperation
being the default scenario there is perhaps less need to signal or communicate intentions for the next round.
Defecting however, breaks the loop of default expectations (continuous cooperation) therefore triggering
more emotional reactions and also creating more uncertainty, and thus need to plan the next move and
that includes estimating the opponent’s intentions by all available input (actions and emotions). This also
74
suggests that the impact of emotion may in fact be underrepresented in the overall model, since most of
the population in this scenario opted for continuous cooperation.
75
Chapter 6
Conclusion
Within this dissertation, I have given detail on the work I did during my doctoral studies on the subject
of facial expression modelling. Specifically I looked at how facial expression perception was influenced by
the context that these expressions were displayed in. This situates my work within the growing body of of
research that questions the utility of purely context-free methods for the automatic recognition of facial
expressions.
I described described three different studies in this dissertation, where I investigated context and expressions through the lens of Scarantino’s Theory of Affective Pragmatics [76]. These studies highlight
how context shapes emotion perception and the illocutionary and perlocutionary functions of expressions
in natural social interactions.
The first study used automated methods to examine how context and expressions interact to shape first
and second person interpretations of facial expressions, and the implication of this interaction for contextfree recognition approaches. Specifically, I contrasted two possible ways that expression and context might
combine. The first hypothesis stated that expressions and context are independent sources of information,
while the second stated that there may be an interaction between the two. While the first hypothesis
held to the participant’s self-reported valence ratings, but I found an interaction between facial cues and
context when looking at the ratings of participants that observed the expressions. In particular, observers
76
used smiles as a cue to predict positive feelings following a mutual cooperation, but smiles were not used
when they defected on their partner. Rather than smiles, it appeared that they used frowns to predict
positive feelings (i.e. if a participant frowned after being exploited, they were seen as feeling more positive
than if they did not frown). At the same time frowns were not relied on in different contexts. As such the
second hypothesis held for the observer ratings, as context appeared to influence how the expressions were
interpreted, with smiles not being used to determine valence when a participant was exploited, and frowns
being used as an indicator for valence instead. Both findings reinforce the need to incorporate context into
automatic recognition and that this might be especially difficult when predicting second-person inferences.
The second study investigated the effect of context on expressive intent (i.e., the illocutionary function
of an expression). Results reinforce that automatic intention recognition must incorporate information on
context. Specifically, I found that expressions differed both in form but also in their dynamics in different
contexts. With participants being more expressive towards speech acts relating to the negotiation at hand,
in acts such as “request preference,” “agree offer,” “reject offer” and most notably following a “make offer”
act. Participants were furthermore more expressive in acts aimed at building rapport, while being far less
expressive to “turn keeping” and “backchannel” acts. Looking closer at the “make offer” act, the dynamics
of expressivity furthermore differed based on the perceived value of the offer, which had a curvilinear
nature with offers that were either on the lower or higher side evoking more expressive responses. For
example, participants would smile more in both cases where they receive more but also fewer items than
usual. One big example of participants using their expressions as an illoctutionary act was when they
would reject offers, as they showed far more expressivity in these types of situations, as participants often
seemed upset. Finally, I looked at using both expressivity and offer-value (context) as features in a random
forest classifier. When looking at F1-score the RFC outperformed our baseline majority vote approach.
Furthermore the best performing RFC was the one that used both facial features and offer value in order
77
to predict the participant behavior, showing that context is a factor on predicting expressive intent that
should be taken into consideration.
The third study investigated the influence of context on predicting expressive consequences (i.e., the
perlocutionary function of expressions). Consistent with the other studies in this thesis, automatic recognition performs best when it incorporates information about the context. In the study, I used machine
learning to predict how the partner’s facial expressions shaped the decisions of players in the iterated
prisoner’s dilemma. Models explored the relevance of various features including facial expressions and
game prior player actions (my operationalization of “context”). Results found that the best performing
model was one that used both the partner’s expressions and the game state information. This emphasizes
that context plays a significant role for the perlocutionary function of facial expressions. I further showed
the potential utility of this result for informing human-computer interaction. Specifically, I showed how
an artificial player might generate specific express to shape team outcomes.
Throughout this dissertation, I have looked at the effects of context on the interpretation of expressions.
I have shown that there is an overall influence of context on expressions in general. By following the
Theory of Affective Pragmatics as a guideline, I furthermore looked at the influence of context on the
illoctutionary and perlocutionary acts of expressions and have shown in both cases that context plays an
important role. By demonstrating the importance of context, I hope this work will contribute to further
development of context-aware emotion recognition methods and potentially lead to more emotionally
robust affective computing systems by taking situational perspective into account.
78
Bibliography
[1] Affectiva and Emotion AI. url:
https://web.archive.org/web/20240616004854/https://www.affectiva.com/emotion-ai/ (visited on
06/21/2024).
[2] Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro,
Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel,
Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin,
Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh,
David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao,
Dani Yogatama, Jun Zhan, and Zhenyao Zhu. Deep Speech 2: End-to-End Speech Recognition in
English and Mandarin. 2015. arXiv: 1512.02595 [cs.CL].
[3] J. L. Austin. How to do things with words. Oxford University Press, 1962.
[4] Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. “Openface: an open source facial
behavior analysis toolkit”. In: 2016 IEEE winter conference on applications of computer vision
(WACV). IEEE. 2016, pp. 1–10.
[5] Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella, Aleix M. Martinez, and Seth D. Pollak.
“Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial
Movements”. In: Psychological Science in the Public Interest 20.1 (2019), pp. 1–68.
[6] Lisa Feldman Barrett, Batja Mesquita, and Maria Gendron. “Context in Emotion Perception”. In:
Current Directions in Psychological Science 20.5 (2011), pp. 286–290.
[7] Janet B. Bavelas, Linda Coates, and Trudy Johnson. “Listeners as Co-narrators”. In: Journal of
Personality and Social Psychology 79.6 (2000), pp. 941–952.
[8] J. Blascovich, J. Loomis, A. Beall, K. Swinth, C. Hoyt, and J. N. Bailenson. “Immersive virtual
environment technology as a methodological tool for social psychology”. In: Psychological Inquiry
13 (2002), pp. 103–124.
[9] Cynthia Breazeal. “Toward sociable robots”. In: Robotics and autonomous systems 42.3-4 (2003),
pp. 167–175.
79
[10] Joost Broekens, Maaike Harbers, Willem-Paul Brinkman, Catholijn M Jonker, Karel Van den Bosch,
and John-Jules Meyer. “Virtual reality negotiation training increases negotiation knowledge and
skill”. In: Intelligent Virtual Agents. Springer. 2012, pp. 218–230.
[11] Joost Broekens, Catholijn M Jonker, and John-Jules Ch Meyer. “Affective negotiation support
systems”. In: Journal of Ambient Intelligence and Smart Environments 2.2 (2010), pp. 121–144.
[12] Jeannette Brosig. “Identifying cooperative behavior: some experimental results in a prisoner’s
dilemma game”. In: Journal of Economic Behavior & Organization 47.3 (2002), pp. 275–290.
[13] Egon Brunswik. Perception and the representative design of psychological experiments. Univ of
California Press, 1956.
[14] Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim,
Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. “IEMOCAP: Interactive emotional
dyadic motion capture database”. In: Language resources and evaluation 42.4 (2008), pp. 335–359.
[15] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. “SMOTE:
synthetic minority over-sampling technique”. In: Journal of artificial intelligence research 16 (2002),
pp. 321–357.
[16] Alan S Cowen and Dacher Keltner. “Self-report captures 27 distinct categories of emotion bridged
by continuous gradients”. In: Proceedings of the national academy of sciences 114.38 (2017),
E7900–E7909.
[17] Carlos Crivelli and Alan J. Fridlund. “Facial Displays Are Tools for Social Influence”. In: Trends in
Cognitive Sciences 22.5 (2018), pp. 388–399.
[18] Celso M De Melo, Peter Carnevale, and Jonathan Gratch. “The influence of emotions in embodied
agents on human decision-making”. In: IVA. Springer. 2010, pp. 357–370.
[19] David DeSteno, Cynthia Breazeal, Robert H Frank, David Pizarro, Jolie Baumann, Leah Dickens,
and Jin Joo Lee. “Detecting the trustworthiness of novel partners in economic exchange”. In:
Psychological science 23.12 (2012), pp. 1549–1556.
[20] Paul Ekman. “An argument for basic emotions”. In: Cognition and Emotion 6.3-4 (1992),
pp. 169–200.
[21] Megan Farokhmanesh. “Stop arguing with Comcast and let this bot negotiate for you”. In: The
Verge (2016). url: https://www.theverge.com/2016/11/17/13656264/comcast-bill-negotiator-botargue-money-customer-service.
[22] Ernst Fehr and Klaus M Schmidt. “A theory of fairness, competition, and cooperation”. In: The
quarterly journal of economics 114.3 (1999), pp. 817–868.
[23] José-Miguel Fernández-Dols, Pilar Carrera, and Carlos Crivelli. “Facial Behavior While
Experiencing Sexual Excitement”. In: Journal of Nonverbal Behavior 35.1 (2011), pp. 63–71.
80
[24] R Frank. “Introducing moral emotions into models of rational choice”. In: Feelings and emotions:
The Amsterdam symposium. Cambridge University Press New York. 2004, pp. 422–440.
[25] Robert H Frank, Thomas Gilovich, and Dennis T Regan. “The evolution of one-shot cooperation:
An experiment”. In: Ethology and sociobiology 14.4 (1993), pp. 247–256.
[26] Nico H Frijda and Batja Mesquita. “The social roles and functions of emotions.” In: (1994).
[27] Adam D Galinsky and Thomas Mussweiler. “First offers as anchors: the role of perspective-taking
and negotiator focus.” In: Journal of personality and social psychology 81.4 (2001), p. 657.
[28] Erving Goffman. Interaction ritual: Essays in face to face behavior. AldineTransaction, 2005.
[29] John M Gottman and Robert W Levenson. “A valid procedure for obtaining self-report of affect in
marital interaction.” In: Journal of consulting and clinical psychology 53.2 (1985), p. 151.
[30] Jonathan Gratch, David DeVault, and Gale Lucas. “The benefits of virtual humans for teaching
negotiation”. In: International Conference on Intelligent Virtual Agents. Springer. 2016, pp. 283–294.
[31] Jonathan Gratch and Celso M de Melo. “Inferring intentions from emotion expressions in social
decision making”. In: The social nature of emotion expression: what emotions can tell us about the
world (2019), pp. 141–160.
[32] Jonathan Gratch, Zahra Nazari, and Emmanuel Johnson. “The Misrepresentation Game: How to
win at negotiation while seeming like a nice guy”. In: Proceedings of the 2016 International
Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous
Agents and Multiagent Systems. 2016, pp. 728–737.
[33] Mitchell Green. “Speech acts”. In: Oxford Research Encyclopedia of Linguistics. 2017.
[34] James J Gross and Oliver P John. “Individual differences in two emotion regulation processes:
implications for affect, relationships, and well-being.” In: Journal of personality and social
psychology 85.2 (2003), p. 348.
[35] James J Gross and Oliver P John. “Revealing feelings: facets of emotional expressivity in
self-reports, peer ratings, and behavior.” In: Journal of personality and social psychology 72.2 (1997),
p. 435.
[36] Jessie Hoegen, David DeVault, and Jonathan Gratch. “Exploring the Function of Expressions in
Negotiation: The DyNego-WOZ Corpus”. In: IEEE Transactions on Affective Computing (2022).
[37] Jessie Hoegen, Jonathan Gratch, Brian Parkinson, and Danielle Shore. “Signals of emotion
regulation in a social dilemma: Detection from face and context”. In: 2019 8th International
Conference on Affective Computing and Intelligent Interaction (ACII). IEEE. 2019, pp. 1–7.
[38] Jessie Hoegen, Gale Lucas, Danielle Shore, Brian Parkinson, and Jonathan Gratch. “How
Expression and Context Determine Second-person Judgments of Emotion”. In: 2023 11th
International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE. 2023,
pp. 1–7.
81
[39] Jessie Hoegen, Giota Stratou, and Jonathan Gratch. “Incorporating emotion perception into
opponent modeling for social dilemmas”. In: Proceedings of the 16th Conference on Autonomous
Agents and MultiAgent Systems. 2017, pp. 801–809.
[40] Sean Dae Houlihan, Max Kleiman-Weiner, Luke B Hewitt, Joshua B Tenenbaum, and Rebecca Saxe.
“Emotion prediction as computation over a generative theory of mind”. In: Philosophical
Transactions of the Royal Society A 381.2251 (2023), p. 20220047.
[41] Herman Ilgen, Jacob Israelashvili, and Agneta Fischer. “Personal Nonverbal Repertoires in facial
displays and their relation to individual differences in social and emotional styles”. In: Cognition
and Emotion 35.5 (2021), pp. 999–1008.
[42] ISO/DIS 24617-2: Language resource management – Semantic annotation framework (SemAF) – Part
2: Dialogue acts. 2010. url: http://semantic-annotation.uvt.nl/DIS24617-2.pdf.
[43] Julia Kim, Randall W. Hill, Paula Durlach, H. Chad Lane, Eric Forbell, Mark Core, Stacy C. Marsella,
David V. Pynadath, and John Hart. “BiLAT: A Game-Based Environment for Practicing Negotiation
in a Cultural Context”. In: International Journal of Artificial Intelligence in Education 19.Issue on
Ill-Defined Domains (2009), pp. 289–308. url: http://ict.usc.edu/pubs/BiLAT-%20A%20GameBased%20Environment%20for%20Practicing%20Negotiation%20in%20a%20Cultural%20Context.pdf.
[44] Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. “An Analysis of
Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task
Dialogs”. In: Language and Speech 41.3-4 (1998), pp. 295–321.
[45] Peter Kollock. “Social dilemmas: The anatomy of cooperation”. In: Annual review of sociology
(1998), pp. 183–214.
[46] R. Kosti, J. M. Alvarez, A. Recasens, and A. Lapedriza. “Context Based Emotion Recognition Using
EMOTIC Dataset”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 42.11 (2020),
pp. 2755–2766.
[47] Eva Krumhuber, Anthony Manstead, D Cosker, D Marshall, P. L. Rosin, and A. Kappas. “Facial
dynamics as indicators of trustworthiness and cooperative behavior”. In: Emotion 7.4 (2007),
pp. 730–735.
[48] Jens Lange, Marc W. Heerdink, and Gerben A. van Kleef. “Reading emotions, reading people:
Emotion perception and inferences drawn from perceived emotions”. In: Current Opinion in
Psychology 43 (2022), pp. 85–90.
[49] Petri Laukka, Hillary Anger Elfenbein, Wanda Chui, Nutankumar S Thingujam, Frederick K Iraki,
Thomas Rockstuhl, and Jean Althoff. “Presenting the VENEC corpus: Development of a
cross-cultural corpus of vocal emotion expressions and a novel method of annotating emotion
appraisals”. In: Proceedings of the LREC 2010 Workshop on Corpora for Research on Emotion and
Affect. European Language Resources Association Paris, France. 2010, pp. 53–57.
[50] Geoffrey N Leech. Principles of pragmatics. Routledge, 2016.
82
[51] S. Lei and J. Gratch. “Emotional Expressivity is a Reliable Signal of Surprise”. In: IEEE Transactions
on Affective Computing (2023), pp. 1–12.
[52] Su Lei, Kalin Stefanov, and Jonathan Gratch. “Emotion or Expressivity? An Automated Analysis of
Nonverbal Perception in a Social Dilemma”. In: 2020 15th IEEE International Conference on
Automatic Face and Gesture Recognition (FG 2020)(FG). 2020, pp. 770–777.
[53] Gwen Littlewort, Jacob Whitehill, Tingfan Wu, Ian Fasel, Mark Frank, Javier Movellan, and
Marian Bartlett. “The computer expression recognition toolbox (CERT)”. In: Automatic Face &
Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE. 2011,
pp. 298–305.
[54] George Loewenstein and Jennifer S Lerner. “The role of affect in decision making”. In: Handbook of
affective science 619.642 (2003), p. 3.
[55] Gale Lucas, Giota Stratou, Shari Lieblich, and Jonathan Gratch. “Trust me: multimodal signals of
trustworthiness”. In: Proceedings of the 18th ACM international conference on multimodal
interaction. 2016, pp. 5–12.
[56] Xuezhe Ma and Eduard Hovy. “End-to-end Sequence Labeling via Bi-directional
LSTM-CNNs-CRF”. In: Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics,
Aug. 2016, pp. 1064–1074. doi: 10.18653/v1/P16-1101.
[57] Michael W Macy and Andreas Flache. “Learning dynamics in social dilemmas”. In: Proceedings of
the National Academy of Sciences 99.suppl 3 (2002), pp. 7229–7236.
[58] Stacy Marsella, Jonathan Gratch, and Paolo Petta. “A blueprint for an affectively competent agent:
Cross-fertilization between Emotion Psychology, Affective Neuroscience, and Affective
Computing”. In: by KR Scherer, T. Bänziger, and E. Roesch. Oxford: Oxford University Press. Chap.
Computational Models of Emotion (2010).
[59] Celso de Melo, P J Carnevale, S. J. Read, and Jonathan Gratch. “Reading people’s minds from
emotion expressions in interdependent decision making”. In: Journal of Personality and Social
Psychology 106.1 (2014), pp. 73–88.
[60] Celso M de Melo, Peter Carnevale, and Jonathan Gratch. “The effect of expression of anger and
happiness in computer agents on negotiations with humans”. In: The 10th International Conference
on Autonomous Agents and Multiagent Systems-Volume 3. 2011, pp. 937–944.
[61] Celso M de Melo, Jonathan Gratch, and Peter J Carnevale. “The Importance of Cognition and
Affect for Artificially Intelligent Decision Makers”. In: Twenty-Eighth AAAI Conference on Artificial
Intelligence. 2014.
[62] Celso M. de Melo, Peter Carnevale, and Jonathan Gratch. “The Effect of Expression of Anger and
Happiness in Computer Agents on Negotiations with Humans”. In: The 10th International
Conference on Autonomous Agents and Multiagent Systems - Volume 3. AAMAS ’11. Taipei, Taiwan:
International Foundation for Autonomous Agents and Multiagent Systems, 2011, pp. 937–944.
isbn: 978-0-9826571-7-1. url: http://dl.acm.org/citation.cfm?id=2034396.2034402.
83
[63] Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha.
“Emotions don’t lie: An audio-visual deepfake detection method using affective cues”. In:
Proceedings of the 28th ACM international conference on multimedia. 2020, pp. 2823–2832.
[64] David A Morand and Rosalie J Ocker. “Politeness theory and computer-mediated communication:
A sociolinguistic approach to analyzing relational messages”. In: 36th Annual Hawaii International
Conference on System Sciences, 2003. Proceedings of the. IEEE. 2003, 10–pp.
[65] Michael W Morris and Dacher Keltner. “How emotions work: The social functions of emotional
expression in negotiations”. In: Research in organizational behavior 22 (2000), pp. 1–50.
[66] Ryan O Murphy, Kurt A Ackermann, and Michel Handgraaf. “Measuring social value orientation”.
In: Judgment and Decision making 6.8 (2011), pp. 771–781.
[67] RadosŁaw Niewiadomski and Catherine Pelachaud. “Affect expression in ECAs: Application to
politeness displays”. In: International journal of human-computer studies 68.11 (2010), pp. 851–871.
[68] Desmond C. Ong, Jamil Zaki, and Noah D. Goodman. “Affective cognition: Exploring lay theories
of emotion”. In: Cognition 143 (2015), pp. 141–162.
[69] Sunghyun Park, Jonathan Gratch, and Louis-Philippe Morency. “I already know your answer:
Using nonverbal behaviors to predict immediate outcomes in a dyadic negotiation”. In: Proceedings
of the 14th ACM international conference on Multimodal interaction. 2012, pp. 19–22.
[70] Brian Parkinson and Gwenda Simons. “Affecting others: Social appraisal and emotion contagion in
everyday decision making”. In: Personality and social psychology bulletin 35.8 (2009), pp. 1071–1084.
[71] Brian Parkinson, Gwenda Simons, and Karen Niven. “Sharing concerns: Interpersonal worry
regulation in romantic couples.” In: Emotion 16.4 (2016), p. 449.
[72] Rosalind W Picard. Affective computing. MIT press, 2000.
[73] Isabella Poggi and Catherine Pelachaud. “Performative facial expressions in animated faces”. In:
Embodied conversational agents (2000), pp. 155–189.
[74] Byron Reeves and Clifford Nass. “The media equation: How people treat computers, television, and
new media like real people”. In: Cambridge, UK 10.10 (1996).
[75] Rainer Reisenzein, Sandra Bördgen, Thomas Holtbernd, and Denise Matz. “Evidence for strong
dissociation between emotion and facial displays: The case of surprise”. In: Journal of Personality
and Social Psychology 91.2 (2006), pp. 295–315.
[76] Andrea Scarantino. “How to do things with emotional expressions: The theory of affective
pragmatics”. In: Psychological Inquiry 28.2-3 (2017), pp. 165–185.
[77] Leonhard Schilbach, Bert Timmermans, Vasudevi Reddy, Alan Costall, Gary Bente, Tobias Schlicht,
and Kai Vogeley. “Toward a second-person neuroscience”. In: Behavioral and Brain Sciences 36.4
(2013), pp. 393–414.
84
[78] Joanna Schug, David Matsumoto, Yutaka Horitaa, Toshio Yamagishi, and Kemberlee Bonnet.
“Emotional expressivity as a signal of cooperation”. In: Evolution and Human Behavior 31 (2010),
pp. 87–94.
[79] John R Searle. Expression and meaning: Studies in the theory of speech acts. Cambridge University
Press, 1979.
[80] Danielle Shore, Olly Robertson, Ginette Lafit, and Brian Parkinson. “Facial regulation during dyadic
interaction: Interpersonal effects on cooperation”. In: Affective Science 4.3 (2023), pp. 506–516.
[81] Gabriel Skantze. “Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using
LSTM Recurrent Neural Networks”. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse
and Dialogue. Saarbrücken, Germany: Association for Computational Linguistics, Aug. 2017,
pp. 220–230. doi: 10.18653/v1/W17-5527.
[82] Ian Sneddon, Margaret McRorie, Gary McKeown, and Jennifer Hanratty. “The belfast induced
natural emotion database”. In: IEEE Transactions on Affective Computing 3.1 (2011), pp. 32–41.
[83] Adam Sparks, Tyler Burleigh, and Pat Barclay. “We can see inside: Accurate prediction of
Prisoner’s Dilemma decisions in announced games following a face-to-face interaction”. In:
Evolution and Human Behavior 37.3 (2016), pp. 210–216. issn: 1090-5138. doi:
http://dx.doi.org/10.1016/j.evolhumbehav.2015.11.003.
[84] Giota Stratou, Jessie Hoegen, Gale Lucas, and Jonathan Gratch. “Emotional Signaling in a Social
Dilemma: an Automatic Analysis”. In: Proceedings of ACII 2015. Xi’an, China: IEEE, Sept. 2015. url:
http://ict.usc.edu/pubs/Emotional%20Signaling%20in%20a%20Social%20Dilemmaan%20Automatic%20Analysis.pdf.
[85] Giota Stratou, Job Van Der Schalk, Jessie Hoegen, and Jonathan Gratch. “Refactoring facial
expressions: An automatic analysis of natural occurring facial expressions in iterative social
dilemma”. In: 2017 Seventh international conference on affective computing and intelligent interaction
(ACII). IEEE. 2017, pp. 427–433.
[86] Kazunori Terada and Chikara Takeuchi. “Emotional expression in simple line drawings of a robot’s
face leads to higher offers in the ultimatum game”. In: Frontiers in psychology 8 (2017), p. 252551.
[87] Gerben A Van Kleef. “How emotions regulate social life: The emotions as social information (EASI)
model”. In: Current directions in psychological science 18.3 (2009), pp. 184–188.
[88] Gerben A Van Kleef, Carsten KW De Dreu, and Antony SR Manstead. “An interpersonal approach
to emotion in social decision making: The emotions as social information model”. In: Advances in
experimental social psychology 42 (2010), pp. 45–96.
[89] Gerben A Van Kleef, Carsten KW De Dreu, and Antony SR Manstead. “The interpersonal effects of
anger and happiness in negotiations.” In: Journal of personality and social psychology 86.1 (2004),
p. 57.
85
[90] David M Watson, Ben B Brown, and Alan Johnston. “A data-driven characterisation of natural
facial expressions when giving good and bad news”. In: PLOS Computational Biology 16.10 (2020),
e1008335.
[91] Chris Welch. “Google just gave a stunning demo of Assistant making an actual phone call”. In: The
Verge (2018). url: https://www.theverge.com/2018/5/8/17332070/google-assistant-makes-phonecall-demo-duplex-io-2018.
[92] Matthias J Wieser and Tobias Brosch. “Faces in context: A review and systematization of
contextual influences on affective face processing”. In: Frontiers in psychology 3 (2012), p. 471.
[93] Jamil Zaki and W Craig Williams. “Interpersonal emotion regulation.” In: Emotion 13.5 (2013),
p. 803.
86
Abstract (if available)
Abstract
The work that I performed during my doctoral studies relates to facial expressions within the field of Affective Computing. In this field, facial expressions are a popular subject of study. State-of-the-art facial expression models and emotion perception models have gone through leaps of improvements and nowadays match the accuracy of professionally trained expression coders. Because of these improvements, one might expect that automatic emotion recognition has become an indispensable tool for analyzing and predicting human social behavior. Indeed, psychological theories argue that emotional expressions serve crucial social functions such as revealing intentions and shaping partner behavior. Yet these theoretical benefits have largely failed to materialize within the field of affective computing.
There is now growing understanding that one of the obstacles to the advancement of affective computing is how the concept of emotion is typically represented within affective computing. Influenced by early theories from psychology, expressions are often treated as universal and context-independent signifiers of an underlying emotional state. This latent state is then assumed to shape subsequent human behavior. Yet more recent psychological theories argue that expressions should be seen more like words and function to coordinate social behavior. My dissertation embraces this latter view and explores its consequences for affective computing.
Following recent "pragmatic" theories of emotional expressions, I adopt the perspective that expressions in social settings should be best treated like words. Like words, the meaning of expressions must be seen as context dependent. Just as "bank" might refer to the side of a river or a financial institution, a smile might refer to pleasure or anger depending on the surrounding context. And like words, expressions can be examined from multiple perspectives. We can consider the "author's" perspective (why did this person produce this expression? what was their intent? what does it signal they will do next?) but also the "reader's" perspective (how does this expression shape the observer's emotions, intentions and actions?).
I illustrate the utility of this perspective for analyzing human social behavior. Focusing on a series of social tasks such as social dilemmas and negotiations, I show how the interpretation of facial expressions is shaped by context, and that expressions, when combined with context, can usefully predict the author's intentions and consequences for the reader. Together, this body of research makes several important contributions. First, I add to the growing body of research that questions the utility of context-free methods for automatically recognizing emotional expressions. Second, from the perspective of the author, I show how both expressions and context are necessary for predicting an author's subsequent actions in a face-to-face negotiation from their expressions. Thirdly, from the perspective of the reader, I show how emotional expressions shape the readers actions in a social dilemma. Finally, I show how these models could inform the behavior of interactive synthetic agents, for example allowing them to strategically select emotional expressions that will benefit a team task. More broadly, my dissertation illustrates the potential benefits of incorporating a pragmatic perspective on the meaning of emotional expressions into the field of affective computing.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Building and validating computational models of emotional expressivity in a natural social task
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
Towards generalizable expression and emotion recognition
PDF
Context dependent utility: an appraisal-based approach to modeling context, framing, and decisions
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Computational modeling of human behavior in negotiation and persuasion: the challenges of micro-level behavior annotations and multimodal modeling
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
Emotional appraisal in deep reinforcement learning
PDF
Emphasizing the importance of data and evaluation in the era of large language models
PDF
Aging and emotion regulation in the judgment of facial emotion
PDF
Emotions in engineering: methods for the interpretation of ambiguous emotional content
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Multimodality, context and continuous dynamics for recognition and analysis of emotional states, and applications in healthcare
PDF
Measuing and mitigating exposure bias in online social networks
PDF
Towards generalized event understanding in text via generative models
PDF
Computational foundations for mixed-motive human-machine dialogue
PDF
Modeling social causality and social judgment in multi-agent interactions
PDF
Towards social virtual listeners: computational models of human nonverbal behaviors
PDF
Mining and modeling temporal structures of human behavior in digital platforms
PDF
Parasocial consensus sampling: modeling human nonverbal behaviors from multiple perspectives
Asset Metadata
Creator
Hoegen, Jessie
(author)
Core Title
Decoding situational perspective: incorporating contextual influences into facial expression perception modeling
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-08
Publication Date
07/26/2024
Defense Date
07/01/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
affective computing,cognitive modeling,emotion recognition,facial expressions
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gratch, Jonathan (
committee chair
), Coricelli, Giorgio (
committee member
), Ferrara, Emilio (
committee member
)
Creator Email
jessiehoegen@icloud.com,jhoegen@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113998G0Y
Unique identifier
UC113998G0Y
Identifier
etd-HoegenJess-13281.pdf (filename)
Legacy Identifier
etd-HoegenJess-13281
Document Type
Dissertation
Format
theses (aat)
Rights
Hoegen, Jessie
Internet Media Type
application/pdf
Type
texts
Source
20240730-usctheses-batch-1187
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
affective computing
cognitive modeling
emotion recognition
facial expressions