Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Natural language description of emotion
(USC Thesis Other)
Natural language description of emotion
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NATURALLANGUAGEDESCRIPTION OFEMOTION by Abe Kazemzadeh A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2013 Copyright 2013 Abe Kazemzadeh Acknowledgements From my parents and grandparents while growing up, to my professors in grad school, I have had many excellent teachers. It is not a stretch to say that without any of one of them I might not have stayed in school so long or accomplished the task of writing this dissertation. First of all I would like to thank my current teachers: my advisor, Shri Narayanan, and co- advisors, Sungbok Lee and Panos Georgiou. I have worked with Shri a long time, since my junior year as an undergrad. In this time, I have accomplished much and I have also learned from several failures. Both have provided valuable lessons that surpass the specific topics of research. I have worked with Sungbok on many projects. Beyond the great deal of information I have learned from he has taught me his patient and honest approach to research and collaboration. Though I have worked with Panos directly for a shorter time that Shri and Sungbok, I have benefitted greatly from his discussions and advice; his input and support has been critical in the later stages of my PhD research when I was formulating computational models for EMO20Q. Next I would like to thank my parents and family. My mother, Mary Weible, has always been supportive of everything I have ever undertaken. She has instilled a love of books and learning by reading to my brother and me when we were little (many parents read to their kids, but most do not read whole novels out loud, and fewer still all seven Chronicles of Narnia among many others). My grandmother, Bernele Schultz, always reminds me that I can lose money but not an education, a mantrathathelpedkeepmegoingalltheseyears. Shealsohelpedfundmyundergraduateeducation ii and her unexpected gift enabled me to buy the sailboat that improved my morale in the last year of my PhD work. My father, Farhad Kazemzadeh, has always pushed me to achieve more, has gave me an early appreciation of math, science and computers, introduced me to the GopherNet, and helped fund my undergraduate education. My stepfather, Ralph Weible, has always been a good role model of hard work and patience, and introduced me to the World Wide Web. My brothers and sisters, Yusef Kazemzadeh, Vincent Weible, Mark Weible, Kurt Weible, Claire Weible, Christina Kazemzadeh, and Erika Kazemzadeh, have helped me stay grounded throughout my school work. They inspired me with their endeavors and families, visited me on several occasions, and have helped me stay in touch with my hometown. My girlfriend, Elly Setiadi, has been a constant source of cheerfulness and love. The Setiadi family has been incredibly hospitable to me. Thanks to the Setiadi family and also to the generosity of the Martinez family, whom I roomed and boarded with earlier in my studies, I have never been a starving student. Iwouldliketothankmydissertationandproposalcommittees: DennisMcLeod,JerryMendel, Andrew Gordon, and Jonathan Gratch. They have given honest and constructive feedback. Andrew Gordon was a research mentor early in my graduate studies and gave me a taste for novel experi- mental design for studying natural language description of human psychology. Jerry Mendel taught me fuzzy logic both in and out of the classroom and his formulations of Computing with Words inspired a large part of this dissertation. Dennis McLeod’s comments in the quals helped me clarify my arguments and his suggestions about RDF helped in the serialization of EMO20Q dialog data (even though JSON was used instead of XML). Jonathan Gratch’s class and comments in the quals gave me a broad background knowledge of affective computing. I recieved input, feedback, support on my dissertation and related research from the following people: Nassos Katsamanis, Thomas Forster, Eduard Hovy, Sridhar Ramakrishnan and Milibo sem- inar participants, Jimmy Gibson, Michael Proctor, Emily Mower, Matteo Gerosa, Michael Arbib, Eduard Hovy, Zornitsa Kosareva, Kevin Knight, the NLG seminar participants, Carbon 5 and their Hack Night participants, Dongrui Wu, Mohammad Reza Rajati, Daylen Riggs, Rob and Naanta iii Salgado, Erin Reilly, Juanchen Li, Michael Johnston, and anonymous reviewers, in particular from Interspeech and ACII conferences, as well as journal reviewers from IEEE Computational Intelli- gence Magazine, IEEE Transactions on Affective Computing, and Elsevier Computer, Speech, and Language. Iwouldalsoliketothankmyfriends. Ihavelearnedalotfromthemandtheyhavealsobeena great source of support. Many of my friends have gone before me as grad students, so I would like to thank them: Rebeka Campos Astorkiza, Jongho Shin, Markus Iseli, Joseph Tepperman, Selina Chu, Matt Black, Jeremy Chih-Chun Lee Ozlem Kalinli, Dagen Wang, Emily Mower, Carlos Busso, Viktor Rozgic, Shiva Sundaram, Laurie Gerber, Serdar Yildirim, Guanjun “Bella” Feng and Yan Li (who were both my classmates and my professors at different times), Erik Bresch, Prasantha Ghosh, Chartchai Meesookho, Kyu Han, Murtaza Bulut, Jorge Silva, Emil Etiliae, Tom Murray, Abhinov Sethy, Naveen Srinivasamurthy, Erdem Unal, Nick Mote, Reid Swanson, Miguel-Angel Oliva Aijon, Ana Sanchez Muñoz, Roberto Mayoral Hernandez, Sheila Lulastra, Vivek Rangarajan, Michael Rushforth, Karla Lagunas, Yijing “Amy” Zheng, Asier Alcazar, Leila Kazemi, Shankar Ananthakrishnan, MiZhang, MonicaCabrera, andSasaTomosasa, JerryLiu, AsliCiger, andAlvaro Cerron Palomino. I would also like to offer thanks to the current students of the SAIL lab and other current students who have encouraged me: James “Jimmy” Gibson, Derya Ozkan, Ozan Canbaz, Daylen Riggs, Dogan Can, Mohammad Reza Rajati, Kartik Audhkhasi, Jangwon Kim, Qun Tan, Angeliki Metallinou, Adam Lammert, Ming Li, Bo Xiao, Danny Bone, Vikram Ramanarayanan, Naveen Kumar, Andreas Tsiartas, Andreas Zavou, Rahul Gupta, Qing Dou, and many others that I cannot list. Best of luck in your studies! I would also like to thank old friends who have managed to stay in touch even though I haven’t always had much free time: Garrett Smith, Bryant Davis, ScottHagen,MichaelMorrison,MonicaMueller,JeffHalbleib,Neftali“Lefty” Acosta,BillHamilton, Todd Adams, Yin “Angela” Huang, Daniel Olds, Michael Miller, Ken Weatherwax, and Scott Wu. Finally, thanks to Anna “Cupcake” Choi for her help in proofreading and editing this dissertation. I would also like to thank the following teachers from grade school, highschool, and undergradu- iv ate: Mrs. Mack, Mr. Menard, Mrs. Hosley, Sr. Diane, Mrs. Bissell, Mr. McNamarra, Mr. Snyder, Mr. Campbell, Señora Hicks, Prof. McCann, Prof. Jean-Roger Vergnaud, Prof. Haas, Prof. Dani Byrd, Prof. Bonnie Glover Stalls, and Prof. Roumyana Pancheva. Finally, I would like to thank the people who helped support me in my studies through guidance, mentoring, technical support, and personal support: Lizsl De Leon, Gloria Halfacre, Mary Francis, Talyia Veal, Allan Weber, Torre Brannon Reese, Regina Morton, Joyce Perez, and Maria, the RTH janitor, and Cecilia, the EEB janitor. v Contents Acknowledgements ii List of Tables ix List of Figures xi Abstract xiii 1 Introduction 1 1.1 Problem Statement 4 1.2 Thesis Statement 4 1.3 Theoretical Approach 6 1.4 Experimental Approach 10 1.4.1 Interval Approach Surveys 10 1.4.2 Emotion Twenty Questions 11 1.5 Computational Approach 13 1.6 Additions To and Differences From the Dissertation Proposal 15 1.7 Related Work, Open Challenges, and Novelty of this Dissertation 17 1.8 Previous Contributions 20 2 The Meaning of Emotion Words 22 2.1 The Meaning of Meaning 27 2.1.1 Meaning in General 27 2.1.2 The Meaning of Emotion Words and the Dimensional Model of Emotions 29 2.2 Interval Type-2 Fuzzy Set Model for the Meaning of Emotion Words 33 2.2.1 Emotion Space and Emotional Variables 33 2.2.2 Fuzzy Logic and emotion concepts 35 2.2.2.1 Fuzzy logic basics 35 2.2.2.2 Fuzzy logic for emotional variables 35 2.2.2.3 Interval Type-2 Fuzzy Logic 37 2.2.3 Similarity and Subsethood 39 2.2.4 Interval Surveys using the Interval Approach 42 2.2.4.1 Data part 43 2.2.4.2 Construction of the Interval Type-2 Fuzzy sets 45 2.3 Methodology 47 vi 2.3.1 Emotion Vocabularies 47 2.3.2 Valence, Activation, and Dominance Model (Model 1) 48 2.3.3 Propositional Model (Model 2) 50 2.4 Experimental Results 51 2.4.1 Valence, Activation, and Dominance Model (Model 1) 51 2.4.2 Propositional Model (Model 2) 57 2.5 Discussion 59 2.6 Conclusions 63 3 Science vs. Natural Lang. Descr. of Emotion 65 3.1 EMO20Q, Crowd-Sourcing, and Experimental Design 68 3.2 EMO20Q 71 3.2.1 Rules 71 3.2.2 Motivation 72 3.3 Constructing a Theory from EMO20Q 74 3.4 From Theory to Epistemology 75 3.5 Conclusion 78 4 Natural Lang. Descr. of Emo. in Humans 80 4.1 Natural Language Description of Emotion From EMO20Q 82 4.2 Human-Human Data from EMO20Q 83 4.2.1 Successful Game Outcome Rate 84 4.2.2 Emotions 85 4.2.3 Questions 85 4.3 Discussion 88 4.4 Emotion Description in the Brain 90 4.4.1 Spatial Metaphors 91 4.4.2 From Spacial Metaphor to Rewards and Reinforcement 93 4.5 Emotion as Behavior Motivation and Its Role in Language Learning 93 4.6 Four Stages of Language Learning from Quine 95 4.7 Cognitive Subcomponents 98 5 Natural Lang. Descr. of Emo. in Computers 99 5.1 Simulating Cognitive Components with a Generalized Pushdown Transducer 100 5.2 From Truth to Epistemology 102 5.2.1 Computational Socratic Epistemology 102 5.2.2 Human-Computer EMO20Q Experimental data 107 5.2.3 Results 108 5.2.4 Discussion 111 5.2.5 Conclusion 113 5.3 Pursuing Knowledge Goals 114 5.3.1 Constructing a Graph from Question-Object Pairs 117 5.3.2 Identifying Unconnected Knowledge 119 5.3.3 Determining Which Questions to Ask 119 vii 5.3.4 Data 121 5.3.5 Results 122 5.3.6 Discussion 123 5.3.7 Conclusion 124 5.4 Reward-based Shortest Path Description Generation Model 125 5.4.1 Results 127 6 Conclusion and Future Work 131 Bibliography 134 Index 143 viii List of Tables Table 1.1 Differences in communication of emotion in animals, humans, and computers. 3 Table 1.2 Experimental approaches and corresponding computational models for simula- tions. 10 Table 1.3 Subcomponents of human cognition and their simulated models. 14 Table 2.1 Examples of intension and extension. 28 Table 2.2 Similarity Between words of the Blog Moods vocabulary and the Emotion Cat- egory Word vocabulary. 53 Table 2.3 Similarity between Spanish and English emotion words. 56 Table 4.1 Emotion words from human-human EMO20Q matches. 86 Table 4.2 Examples of some of the questions that occurred multiple times (disregarding case and punctuation). 87 Table 4.3 Examples of question standardization. 87 Table 4.4 Examples of question categories. 88 Table 5.1 Chomsky hierarchy. 101 Table 5.2 Example dialog from EMO20Q questioner agent where the human interlocutor has chosen “relaxation”. 109 Table 5.3 Experimental results for 15 subject pilot study (45 EMO20Q games). 110 Table 5.4 Experimental results for 100 subject Mechanical Turk study (300 EMO20Q games). 110 Table 5.5 Observed emotion words by difficulty. Words that were attested but which did not fit the authors’ broad intuitions are marked with ’?’. 111 Table 5.6 Data processing 121 Table 5.7 Generated description for “happiness”. 128 ix Table 5.8 Generated description for “anger”. 129 Table 5.9 Generated description for “enthusiasm”. 129 Table 5.10 Generated description for “devastation”. 129 Table 5.11 Generated description for “soberness”. 130 x List of Figures Figure 1.1 Interpretation from emotion descriptions to behavior representation. 7 Figure 1.2 Interpretation diagram for resolving emotion descriptions. A natural language description of emotion ”d” is resolved to a virtual object "in the conceptual level via interpretation function c I d .Takingthisfunctiontooutputafuzzy set allows us to consider it a proper function. However, the interpretation from a concept to observations in the world p I c (or a direct mapping from descriptions to observations in the world p I d ) will be a one-to-many mapping as it does not uniquely designate world states. 8 Figure 1.3 Interpretation: model implementation vs simulation. 9 Figure 1.4 Interval Approach surveys for describing emotions in terms of valence, activa- tion, and dominance scales. 12 Figure 1.5 Human-human EMO20Q. 13 Figure 1.6 Human-computer EMO20Q. 14 Figure 2.1 Translation as a perceptual computer. 33 Figure 2.2 Example of a trapezoidal interval type-2 membership function (IT2 MF). A normalizedtrapezoidalIT2MFcanbespecifiedwithnineparameters, (a, b, c, d, a’, b’, c’, d’, e’). The trapezoidal height of the upper membership function (e), can be omitted in normalized IT2 FSs because it is always equal to 1. 38 Figure 2.3 The data part of the interval approach. Figure adapted from [73]. This figure shows the data part of the interval approach highlighting the valence layer. This framework is repeated for each word in the vocabulary. 43 Figure 2.4 The fuzzy set part of the interval approach. Figure adapted from [73]. Layers denote individual fuzzy sets for valence, activation, and dominance. This framework is repeated for each word in the vocabulary. 45 xi Figure 2.5 Fuzzy answers to yes/no questions obtained by presenting the answer phrase (x-axis labels) to users of Amazon Mechanical Turk, who responded by using a slider interface to indicate the truth-degree (y-axis). This plot was based on asinglehandleslider,incontrasttotheintervalapproachsurveys,inorderto show an overview of the data. The results presented below are for the double handle slider and interval approach analysis. 52 Figure 2.6 Example membership functions (MF’s) calculated with the interval approach for happy, neutral, angry, and sad emotions. All the membership functions shown here, except the valence for neutral, are shoulder MF’s that model the edges of the domain of µ. The region between the upper and lower MF’s, the footprint of uncertainty, is shaded. The variables of Val., Act., and Dom. stand for valence, activation, and dominance. 53 Figure 2.7 Multidimensional Scaling (2-D) representation of the emotion words’ similar- ity. This visualizes when the similarity of the individual valence, activation, anddominancedimensionswerecombinedbytakingtheirproduct. Thewords in the categorical emotion vocabulary are marked in bold. 54 Figure 2.8 Performance of translating from the Spanish emotion vocabulary to the cat- egorical emotion vocabulary, which was the set of emotion labels used for annotating the IEMOCAP corpus [8]. 57 Figure 2.9 Performance of translating Spanish emotion words to liveJournal mood labels (colloquial emotion words). 58 Figure 2.10 Example IT2 FSs calculated with the Enhanced interval approach for answers to yes/no questions. 59 Figure 5.1 Generalized pushdown transducer (GPDT). 102 Figure 5.2 GPDT simulation of a EMO20Q questioner. 103 Figure 5.3 Dialog graph for the EMO20Q questioner agent. The loop labelled “asking” represents the functionality described by the sequential Bayesian model of Equation 5.2 and Algorithm 1. 108 xii Abstract This dissertation studies how people describe emotions with language and how computers can simu- latethisdescriptivebehavior. Althoughmanynon-humananimalscanexpresstheircurrentemotions as social signals, only humans can communicate about emotions symbolically. This symbolic com- munication of emotion allows us to talk about emotions that we may not currently be feeling, for example describing emotions that occurred in the past, gossiping about the emotions of others, and reasoningaboutemotionshypothetically. Anotherfeatureofthisdescriptivebehavioristhatwetalk about emotions as if they were discrete entities, even though we may not always have necessary and sufficient observational cues to distinguish one emotion from another, or even to say what is and is notanemotion. Thismotivatesustofocusonaspectsofmeaningthatarelearnedprimarilythrough language interaction rather than by observations through the senses. To capture these intuitions about how people describe emotions, we propose the following thesis: natural language descriptions of emotion are definite descriptions that refer to intersubjective theoretical entities. We support our thesis using theoretical, experimental, computational results. The theoretical arguments use Russell’s notion of definite descriptions, Carnap’s notion of theoretical entities, the question-asking period in child language acquisition, and Socratic epistemology. The experimental data we collected include dialogs between humans and computers and web-based surveys, both using crowd-sourcing on Amazon Mechanical Turk. The computational models include a dialog agent based on sequential Bayesian belief update within a generalized pushdown transducer, as well xiii as a fuzzy logic model of similarity and subsethood between emotion terms. For future work, we propose a research agenda that includes a continuation of work on the emotion domain as well as new work on other domains where subjective descriptions are established through natural language communication. xiv Chapter 1 Introduction Thus who would undertake to translate ’Neutrinos lack mass’ into the jungle language? Willard Van Orman Quine, Word and Object Imaginethatyoufindyourselfonaremoteislandamongislanderswhoselanguageyouknownothing of [77], and that you wish to communicate about emotions with them. You might imagine that it is easy to recognize and convey some basic emotions without language by using facial expressions, gestures, and pantomime. For example, from their behavior you could tell if they are scared, angry, curious, or friendly, and you could reciprocally communicate some emotions to them. However, there are some aspects of communicating about emotions that might be difficult. Imagine that for some reason you want to convey more complex emotions like jealousy, pride, vindication, or hope. These emotions seem to be more defined through language and context than through immediate, observable behavior. For these emotions, you would expect that it would be harder to communicate about them without first learning the natives’ language. In this dissertation, we look at this scenario through the eyes of a computer: imagine instead that you are a computer that finds itself among a population of humans and that you wish to communicate with them about their emotions. Like the introductory quote, one might think that 1 it would be as difficult to translate emotion descriptions for a computer as it would be to translate “neutrinos lack mass” to a jungle tribe. However, we show that the second form of communication about emotions, when emotions are defined in language, is not actually that difficult for computers. Computersareinherentlysymboliccreaturesandwedemonstratethattheycaninfactcommunicate about emotions symbolically through natural language dialog behavior. Emotional understanding, or empathy,andnaturallanguagecommunicationaretwocharacter- istic capabilities of human beings. Using these two capabilities, we can understand the emotions of othersandtransmitthisinformationtootherpeoplethatarenotfirsthandexperiencersofemotional data. While other animals may be able to display and recognize immediate emotions as social sig- nals, humans are arguably the only animals that can communicate about emotions symbolically, i.e. using words that are combined systematically into phrases, sentences, and dialogs. This symbolic communication about emotions, where a word refers to an emotion rather than expressing emotion, is what we call natural language description of emotion.Naturallanguagedescriptionofemotion allows us to talk about emotions we felt in the past, imagine about emotions that could happen in the future, ask people about their emotions, and gossip about their emotions to other people. In this dissertation, we 1 study this fundamental capability of humans to describe their emotions in language. While some animals can arguably feel, express, and recognize emotions but not describe them symbolically, computers cannot feel emotions, but they are quite adept at symbolic computation and communication. Table 1.1 shows an overview of communication and emotions in animals, humans, and computers. It is an open question how much of our human ability to communicate about emotions is due to our ability to feel these emotions, and how much is due to our ability to communicate linguistically. By designing computer systems that can communicate about emotions, whichcannotactuallyfeelemotions, wecangaininsightintothisquestion. Thisdesignofcomputers 1 I will refer to myself in third person in this dissertation because, although I am the sole author of this work, I have done much of the work in collaboration with my colleagues at SAIL lab. Nevertheless, all inaccuracies, omissions, or mistakes are due to no one other than myself. 2 Table 1.1: Differences in communication of emotion in animals, humans, and computers. Communication Emotions Animals social signals expression Humans social signals + discrete symbols expression + description Computers discrete symbols (approximate simulation) that can imitate the human ability to describe emotions is what we call simulation, as defined in Section 1.3. There has been much research simulating emotional expression in virtual embodied agents. Oneofnovelaspectsofthisdissertationisthatitlooksatsimulatingdescriptionsofemotion, rather than expression of emotion. In this dissertation, we will study the capabilities of humans to describe their emotions in several ways. First, we will theoretically formulate what exactly we mean by “natural language description of emotion”. Second, we will study how people describe their emotions by conducting various exper- iments. Third, we will design and test computational models that simulate the ability to describe emotions. Creating models that simulate natural language description of emotion has both scientific and practical applications. Past research in emotional computing has shown that theoretical scientific advances and computational models go hand-in-hand. The scientific value of such models lies in the insight they provide about how people describe emotions using natural language. By programming this ability in computers, we not only deal with important computational details, but we also create an interactive example of the model’s strengths and short comings. Practical applications, such as programs that can serve as artificial, human-like agents, come from computational implementations of such a model. 3 1.1 Problem Statement This work proposes to tackle the general problem of computationally modeling how people describe emotions using natural language. To focus the research, the following sub-problems are identified: 1. How can we represent the meaning of natural language emotion descriptions in a way that is understandable by humans and that is computable by computers? 2. Howcanwerepresentbothinter-andintra-subjectivevariabilityaboutthemeaningofemotion descriptions? 3. How do we use the computational model in various tasks and test it against human perfor- mance? 4. What is the information content of emotion descriptions and how can it be measured? 5. What does this view of emotion descriptions entail about processing such information by the human organism? 1.2 Thesis Statement In making a model of how humans understand natural language emotion descriptions, we make strong claims about human understanding and how to simulate it in an engineering applications. Thisthesisproposesthatwhilehumanunderstandingofnaturallanguagedescriptionsofemotion can be grounded with reference to specific human behaviors and situations in the world, there is also anintermediate,conceptuallevelthatcanbethereferentialgroundingofemotiondescriptionsinlieu of reference to physically observable stimuli. This conceptual representation can be modeled by the logical notion of definite descriptions and theoretical entities. Descriptions of emotions that are not grounded in behavior or witnessed first hand are understood using this intermediate representation. Due to not being directly referential to physical data or shared directly with other agents, an 4 intermediate representation of this type is inherently vague, subjective, and potentially ambiguous. However, this uncertaintly does not hinder the human reasoning ability to use this representation. Indeed, this intermediate representation is robustly communicated in a natural language dialog setting. This representation of emotions contains information that help people talk and reason about emotion abstractly and includes logical relations between emotions, such as similarity and subsethood, and information about possible behavior and situations associated with the emotions. The concise statement of the thesis is the following: Thesis: Natural language descriptions of emotion are definite descriptions that refer to intersub- jective theoretical entities. We call upon Russell’s logical notion of definite descriptions [100] to capture how people talk about specific unique objects, which may or may not physically exist. By claiming that natural language descriptions of emotion are definite descriptions, we are trying to formally account for the intuition that people talk about emotions as distinct entities that are distinguishable from other emotions. By claiming that these descriptions refer to theoretical entities, we are using Carnap’s forumulation [11] of how science can deal with hypothetical objects like atoms or infinite cardi- nals. Therefore, we remain agnostic to whether or not certain emotions are natural kinds [5], that is, whether there exists some discrete, unique neurobiological criteria that can test for a certain emotion. By claiming that natural language descriptions of emotion refer to intersubjective theo- retical entities, we define emotion descriptions in terms of learned linguistic conventions rather than in terms physically observable sensory data. According to this claim, one need not experience or witness an emotion first hand in order to be able to understand a description of that emotion. The proposed thesis regarding an engineering solution to the modeling problem is that interval type-2 fuzzy logic is appropriate to model the smooth or continuous subjectivity along a scale and that a dialog agent is appropriate to model the epistemological representation and social aspects involved in communicating emotion descriptions. The uncertainty in our model comes from several 5 sources. First, our model aims to approximate the human faculty of understanding natural language descriptions of emotions, so there will be some uncertainty due to the approximate nature of a modeling task. Second, there is inter- and intra-subjective variability about the meaning of natural language descriptions of emotions due to the vagueness of the meaning of emotion descriptions. Finally, there is ambiguity when grounding emotion descriptions in the real world. There may be perceptual limitations of the observer, such as limited modality or context, and there may be conflicting evidence suggesting multiple possible emotion descriptions [88]. Many of these sources of uncertainty are well suited to fuzzy logic. The interpretation of natural language descriptions of emotions aslogicalpropositions isa startingpoint, whichwerefine thoughfuzzy logic, epistemology, and abduction. Incompleteness in our model comes from the fact that we cannot hope to observe every possible natural language description of emotions, nor even to observe every emotion.A fortiori,weareunlikelytoobserveeverycombinationofemotionandassociateddescriptionsforeach communicative agent. Therefore, we need a way to handle sparse samples of emotion descriptions. Graph connectivity tests, such as finding the eigenvalues of the graph Laplacian and power series analysis of adjacency graphs provide a way to identify gaps in knowledge and model behavior like curiosity that aims to find these gaps and fill them in. 1.3 Theoretical Approach We approach the modeling of natural language description of emotion as a symbolic system becase we have identified the phenomenon as related to the symbolic nature of human language. We use formal logic as a starting point and refine the model to account for aspects of human language that behave differently than formal logic. These refinements of a purely logical system to deal with aspects of human language about emotions are the interesting parts of the model and can be seen as the main theoretical points of this dissertation. One refinement is fuzzy logic, which extends traditional logic to deal with partial set membership 6 w I d Physical Representation Natural Language Description Figure 1.1: Interpretation from emotion descriptions to behavior representation. and truth values. Fuzzy logic allows us to refine a purely logical system to account for certain types of subjectivity. Another refinement to traditional logic to account for natural language description of emotion is the notion of definite descriptions which was proposed in “On Denoting” by Bertrand Russell [100]. Russell’s notion of a definite description allows a logic system to refer to objects uniquely by description. The unique reference that a definite description specifies can be seen as a function, which we call interpretation,thatmapsfromadescriptioninlanguagetoaspecificobjectinthe world. This unique object is the denotational meaning of the description, also called the extension or reference [28]. This type of interpretation of a definite description is shown in Figure 1.1. Definite descriptions have a logical formulation, but they are also used to represent phenomenon in natural language such as proper names and noun phrases with the determiner “the”. This allows us to speak of emotions terms, like “happy”, “sad”, and “angry”, as refering to something definite and unique, even though they may apply to many various non-unique instances of human behavior. In the case of definite descriptions of emotions, while there is an intuitive uniqueness where one instance of, say happiness, is the same as another instance of happpiness, they are clearly not the same physically. This leads us to use another refinement to logical theory proposed by Rudolf Carnap to posit uniqueness at a conceptual level even though there may not be physically unique objects. Interpretationinthiscasecanbeseenasreferencetovirtualobjectsthatrepresenttheoretical entities [11, 26]. This conceptual level has been studied at length in linguistics, analytic philosophy, mathematicallogic, andsoftwareengineering(intermsofvirtualobjectsinobject-orientedmodeling 7 p I c c I d Physical Representation Conceptual Representation Natural Language Description Figure 1.2: Interpretation diagram for resolving emotion descriptions. A natural language descrip- tionofemotion ”d”isresolvedtoavirtualobject"intheconceptuallevelviainterpretationfunction c I d . Taking this function to output a fuzzy set allows us to consider it a proper function. However, the interpretation from a concept to observations in the world p I c (or a direct mapping from de- scriptions to observations in the world p I d ) will be a one-to-many mapping as it does not uniquely designate world states. [26]) so it provides a good theoretical basis for a theory of natural language descriptions of emotions. The process of resolving the reference of emotion terms can be seen as an interpretation function that resolves a description to some conceptual representation, as in Figure 1.2. The approach we take views the interpretation of natural language descriptions of emotions as a way of describing behavior that is mediated by a conceptual representation. This conceptual representation can serve as the referential meaning of a natural language description of emotion if the physical, denotational meaning is unspecified, as in the case when someone is communicating emotional information to someone who is not a first hand witness of the emotional situation. The process of interpreting a natural language description of emotion is done by human beings when they understand language about emotions. In this dissertation, we assume that interpretation of natural language descriptions of emotion can be done by computers as well, at least in some approximate form. We use the terminology of [26] in which the first type of interpretation fuction, by humans, is called the implementation of theinterpretation, and thesecond type of interpretation, 8 I simulation verification I implementation Physical Representation, Human agreement Natural Language Description Brain Computer Figure 1.3: Interpretation: model implementation vs simulation. by computer, is called the simulation of the interpretation, as illustrated in Figure 1.3. It should be noted that this terminology is slightly different from colloquial usage in critical ways. Colloquially, wetalkaboutimplementationsasprogramsthatareimplementedonacomputer. Thesensethatwe use is most similar to when a programming language is implemented, but in our case the language is anaturallanguage,andtheimplementationisbiological. Similarly,incolloquialtermswetalkabout simulation as expermiments using generated data as input, rather than real data but in our defined usage of “simulation” we mean an interpretation function that approximately emulates a canonical implementation. This terminology borrows from mathematical logic, where statements in a formal language are interpreted to determine their truth in a particular structure or model. The formal definition of a theory, in mathematical logic, is a set of statements that are true of some model [22]. We extend this definition of a theory to observed natural language behavior from experiments used to elicit truth judgments about emotions. However, looking at the interpretation as being true or false is only part of the story. As humans, wemaybelievesomethingtobetrueorfalse,butwemaybemistaken. Also,weknowwhywebelieve something as true or false (at least after about three years of age [33]). The ability to justify, test, and revise our beliefs is another difference between purely deductive logical inference and human abilities. To simulate this aspect of human difference from a purely logic system we use the notions 9 Table 1.2: Experimental approaches and corresponding computational models for simulations. Experimental Methodology Computational Model (Simulation) Emotion Twenty Questions (EMO20Q) graphs, sequential Bayesian belief update, GPDT Interval Approach surveys interval type-2 fuzzy logic, fuzzy similarity metrics of epistemology and abduction. 1.4 Experimental Approach In this dissertation, we use two experiment approaches: Interval Approach surveys [85, 86] and the Emotion Twenty Questions (EMO20Q) game [49, 53, 48, 50, 52, 51]. These experiments parallel the computational models used to simulate natural human interpretation of natural language de- scriptions of emotion: fuzzy logic and graphical Bayesian models within a generalized pushdown transducer (GPDT), respectively, as summarized in Table 1.2. Interval Approach surveys are ways to elicit truth judgements in the form of interval selections on abstract, Likert-type scales. EMO20Q is a game to induce language behavior in the form of truth judgments of assent or dissent about descriptions that are negotiated in a dialog context. 1.4.1 Interval Approach Surveys Interval Approach surveys [72, 73] can be thought of as extensions of Likert surveys that allow researchers to capture intra-subject variability. Whereas in Likert scales each subject rates the prompt as a single point on a scale, Interval Approach surveys allow subjects to indicate a range of the scale, which allows uncertainty within users to be quantified. Applying the Interval Approach to emotions [56, 47, 57] is one of the novel aspects of the research that led up to this dissertation topic, and has progressed since the proposal [9]. Applying the interval approach to emotion description relies on the scale-based model of emotions [89, 39, 24, 102, 101], which represents emotions in terms of semantic factors. The three most widely used scale-based factors are valence, activation, and dominance. Valence refers to how positive or negative an emotion is, activation refers to the 10 strength of an emotion, and dominance refers to how aggressive or submissive an emotion is. The Interval Approach survey methodology for emotions prompts subjects with an emotion word and asks them to select ranges on these scales that correspond to the meaning of the emotion word prompt, as in the example for prompt “fearful” shown in Figure 1.4. The Interval Approach surveys use an iconic representation of emotions called a self assessment manekin [7]. This iconic representation has the benefit of providing a language-independent basis for the meaning of emotion words. This experiment has the drawback that the list of emotion word stimuli is dependent on the experimenters and the natural language descriptions of emotion that are modeled are at the word or phrase level. The other experimental approach we used, the EMO20Q game, allowed subjects to contribute emotion words. We tested several vocabularies, as described in Section 2. We found that as the vocabulary size increased, the performance of the computational model derived from the Interval Approach experiments degraded. As the space induced by these three scales became crowded, distinguishing between neighboring emotions became difficult and led to decreased performance when using this model to translate words. ThecomputationalmodelassociatedwiththisexperimentisafuzzylogicalmodelcalledComput- ing With Words (CWW). This model builds an interval type-2 fuzzy set representation of emotion words in terms of the three scales. The implementation of CWW we use is known as Perceptual Computing. We argue that this method is best for capturing the meaning of emotion words that is language-independent and based on perception, whereas the next experimental method and its asso- ciated computational models are best for capturing the meaning of words that is based on linguistic descriptions in a dialog setting. 1.4.2 Emotion Twenty Questions ThesecondexperimentalapproachisagamecalledEmotionTwentyQuestions(EMO20Q).EMO20Q is played like the traditional game of twenty questions, except that one player, the answerer,must choose an emotion term. The other player, the questioner,musttrytoguesstheemotionthatthe 11 Figure 1.4: Interval Approach surveys for describing emotions in terms of valence, activation, and dominance scales. answerer picked by posing a series of twenty or fewer questions. Importantly, we do not restrict the set of emotion terms that the players can choose nor the questions they can ask: the set of emotion terms and questions is not limited except by the players’ judgment and cooperation. The EMO20Q experiment was inspired by the fact that in the twenty questions game, players can guess a large number of objects using twenty questions. The previous experimental approach and associated computational model had difficulty representing large emotion vocabularies, but the EMO20Q experiment and its related computational models allowed for larger vocabularies and the vocabularies were generated by the subject, i.e. the players, rather than the experimenters. There are two forms of the EMO20Q experiment, first when humans play both roles and second when the computer plays one of the roles. The data from the human-human EMO20Q games (Figure 1.5) allows us to observe what terms people consider to be emotions, what questions people ask about emotions, what is the number of questions necessary to successfully guess emotion words, and what is the probability that the emotion words will be correctly guessed. These results of the human-human experiments are used to create the computer agent that 12 HTTP/ XMPP XMPP clients XMPP Server (Ejabberd) Questioner Answerer Would I feel this emotion if I stubbed my toe? No. Figure 1.5: Human-human EMO20Q. simulates human behavior. Then subsequent human-computer matches (Figure 1.6) can be used to test and improve the agent’s performance with respect to human performance and further improve the agent. In this paper we consider the case where the agent plays the role of the questioner. 1.5 Computational Approach As described in Section 1.3, we aim to computationally simulate the verbal behavior of humans de- scribing emotions. To do this, we will simulate several discrete subcomponents of human cognition: dialog-structured language input and output, belief, long term memory, short term memory (a.k.a. episodic buffer or working memory), inference, learning, and perceptual subjectivity. The compu- tational simulations for these components are, respectively: a generalized pushdown transducer’s (GPDT) input and output symbols, a Bayesian prior, a emotion vs. question/answer matrix, the GPDT’s stack, sequential Bayesian update, positive reinforcement, and fuzzy logic. These subcom- ponents of human cognition related to emotion description and their simulated models are listed in Table 1.3. 13 HTTP GET/ raw_input Browser/terminal Web Server or Standalone script Questioner Agent Human Answerer Would I feel this emotion if I stubbed my toe? No. HTTP/ BASH Figure 1.6: Human-computer EMO20Q. Table 1.3: Subcomponents of human cognition and their simulated models. Cognitive Component Computational Simulation Language input/output GPDT input/output symbols Belief Bayesian prior Long term memory Emotion vs. Q-A matrix short term memory GPDT stack Inference Sequential Bayesian update Learning Positive reinforcement Perceptual subjectivity Fuzzy logic 14 1.6 Additions To and Differences From the Dissertation Pro- posal Here we describe the additions to and differences between the thesis proposal that make up the final dissertation. The main additions, following the proposed work, are the EMO20Q experiments and computa- tional models derived from these. In the proposal we had a small amount of human-human data, but now we have both more human-human data as well as large amounts of human-computer data where the computer plays the questioner role. From these, we establish human performance metrics as well as metrics for the computational agent, which was designed based on a sequential Bayesian belief update algorithm within a generalized pushdown transducer. We also present a graphical model to simulate curiosity in the agent as well as a natural language generation algorithm based on shortest path decoding of a game graphic using a reward function for certain transitions. Anotheradditionismoreworkontheintervaltype-2fuzzylogicmodelpresentedintheproposal. In the proposal, the languages that we focused on were English and Spanish. In the dissertation, this work is now extended to Turkish. We have also examined using the interval type-2 fuzzy logic model for the truth value quantification for the EMO20Q data, which was not part of the proposal. Intheproposal,weidentifiedsomespecificbrainregionstostudytoseeifthedescriptivebehavior or mental represenations could be localized functionally. After taking a course with Prof. Arbib, we shifted the focus from specific details in the brain to a more general exploration of linguistic behavior. In particular, we examined the question-asking period in child language acquisition and the characteristics of human language that distinguish it from animal communication. This was also motivated by an admonition of Quine [95], “to project non-sensory attributes purely on the analogy of sense qualities, hence as recurrent characters some- how of a subjective show within the mind, betrays surely a cavalier attitude towards psychological processes and a lack of curiosity about the mechanisms of behavior”. In particular, we had been 15 examining the use of spatial reasoning in the hippocampus with respect to abstract emotional scales like valence, activation, and dominance. While this may be a fruitful exploration in the future, we found that the reward and punishment responses in other limbic areas fit better with emotions and the game playing model of EMO20Q. We also have studied in more depth the philosophical literature on subjectivity. After these studies we chose to reword the thesis statement from “natural language descriptions of emotion are definite descriptions that refer to subjective theoretical entities” to “natural language descriptions of emotion are definite descriptions of emotion that refer to intersubjective theoretical entities”. The notion of intersubjectivity, in particular as it is presented in the work of Quine [95], best captures the aspect of subjectivity expressed in Prof. Mendel’s maxim that “words mean different things to differentpeople, yetwordsalsomeansimilarthingstodifferentpeople” [78]. Theword“subjectivity” carries with it an informal meaning that expresses judgements that are based on whims. However, interpreting the meaning of a word is not subjective in this case, i.e., one cannot subjectively prefer that “happy” is interpreted as negative emotion, as they would subjectively prefer coffee over tea. “Intersubjectivity” is a more precise technical term that captures a more constrained form of subjectivity that could be realized as gradient perception or as mutually compatible but different ways of describing the same object. Another improvement has been to move from a propositional truth-based model to an episte- mological model. Whereas in the propositional model the focus was on collecting true and false statements about emotions, the epistemological approach puts more focus on justification of belief. This translates to analyzing the game playing, conversational behavior in EMO20Q, rather than simply collecting true and false statements about emotions. This form of truth justification through aconversationalgamecloselyresemblestheSocraticmethodofquestion-asking[40],aswellasthe question asking behavior of children during language acquisition. Also, the formulation of definite descriptions have been improved by including discussion about stimulus meaning vs. linguisitic meaning. 16 One shortcoming, where the proposed work was not realized, was in making an answerer agent for EMO20Q. Although this was proposed, it fell out of the scope of the dissertation, but remains a topic for future work Another addition to the dissertation that was not proposed is the use of the Emotion Markup Language (EmotionML), a W3C standard for machine readable specification of emotional informa- tion for annotation, recognition, and generation. Contributing a reference implementation for this standard using the EMO20Q agent shows a real-world application of the work in this dissertation. Anotheradditiontothedissertationthatwasnotproposedistheuseofthegeneralizedpushdown transducer for generation of emotion descriptions using the EMO20Q dialog data. 1.7 Related Work, Open Challenges, and Novelty of this Dis- sertation Although the field of affective computing is maturing and great progress has been made, many open challenges remain. In the field of sentiment analysis and affect recognition, there has been progress in recognizing the general expression of emotions in natural language, i.e., the emotions that are felt and then conveyed by the speaker/writer’s language production and recieved by the listener or reader. In this case, the emotional data is directly available for experience by an observer or by analysis. However, the natural language description of emotions is a more specific, open area waiting to be tapped. In natural language description of emotion, some agent, not necessarily the experiencer nor even a first hand witness of the emotional data, conveys emotional information to another agent. The reason why natural language description of emotion is interesting and important is that we can talk about sadness, despair, resentment, jealousy, rage, vindication, pride, ecstasy, relief, and cheerfulness without being or feeling any of these. The previous sentence would have confused any naïve algorithm that tries to determine my emotional state of the writer, or the emotion that is 17 inspiredinthereader,becauseitisnotexpressingemotionbutratherreferringtoemotionabstractly. Thisfactisparticularlysalientbecausemanynon-humananimalsareabletoexpresstheiremotions, but only humans are known to be able to describe and name emotions regardless of their current emotionalstate. Thisisanimportantabilityofhumansbecauseitallowsagreatdealofintrospection and allows us to think about emotions hypothetically (on the downside, we humans do not only feel emotions “in the moment” like animals do, but we also imagine about, gossip about, and project emotions). Of course, this ability is derived from the human language ability and its differences from animal communication: human language can create a large repertoire of meaningful signs (i.e., words) out of a small number of meaningless units (i.e., phonemes or letters); human language can be in response to a temporally or spatially removed stimulus; and human language is culturally transmitted rather than instinctual [42]. The study of natural language description of emotion promises significant results. This specific type of emotional language highlights the social context of learning language about emotions and communicating information about emotion through social networks. Anthropologically, the social network structure of information flows have been studied, as have culture specific models of emotion [103, 60]. However, technological developments offer new phenomena to be studied as well as novel experimental methodologies to observe natural language descriptions of emotions as social behavior. Previously,researcherswhodefinedemotiontermsmayhaveglossoverthesubtletiesofhowmeaning isestablishedinformallyinnaturalhumanpopulations. TheWe Feel Fine projectlookedatcollecting phrasesthatfollowtheword“feel” inblogsandothersocialmedia. Ourworksimilarlyaimstogather informal descriptions of emotions, but we focus on dialog interaction and precisely capturing what people intuitively consider to be an emotion. Althoughmanyemotionalmodelshavebeenproposed,noneofthemlooksatthelogicalstructure of description of emotion in terms of definite descriptions and theoretical entities [100, 11, 26]. Furthermore, none of the previous models except [56, 47] have examined higher levels of uncertainty, such as type-2 fuzzy logic, for representing the meaning emotion words. Reconciling the logical 18 structure of natural language description in general with language about emotions remains an area where progress has only recently started [41]. The implications of these findings have not yet been translated to experimental procedures and design of intelligent dialog agents. To be able to translate vague, theoretical concepts such as descriptions of emotions using natural language into a logical language that can be interpreted by a computer is the technical challenge that we tackle in this dissertation proposal, and such an endeavor has implications for our understanding of human capabilities. Some previous research has used fuzzy logic to model the vague, imprecise nature of emotions. However, these approaches do not consider the intra- and inter-subject uncertainty inherent in emotions. In this proposal we directly treat inter- and intra-subject variation and consider it as a fundamental component of emotions. The formulation of natural language descriptions as referring to intersubjective theoretical objects is one of the key theoretical contributions of this dissertation. Furthermore, our prior work in fuzzy logic models of emotions [56, 47], and other descriptive models of other phenomena, such as signal descriptions [108, 59, 13], are word-level models. To be able to express emotions in natural language requires sentence-level and dialog-level models, so this presentsmanynewchallenges. Theresearchintodescriptionsofsignalsapproachesnaturallanguage capabilities, but one serious omission of this research is that it has not considered natural language descriptionsinsocialcontextslikedialog. Thesesocialcontextsarenecessaryformodelingthesocial conventions that are necessary for description of social behavior and abstract concepts. Creatingmodelswithsuchcapabilitiesbegsthequestionabouthowtheywillbeusedinpractical applications. Currently there are toolkits that allow for emotion recognition and synthesis [18], but these do not have robust emotional understanding and the responses are limited to backchannels, mimicry, and canned responses. The design of the EMO20Q questioner agent is an advance that furthers the state of the art in emotionally intelligent agents. Furthermore, with applications that have advanced abilities to understand and describe emotions, new ways of evaluating system perfor- manceneedstobedevised. Inthisdissertation, welookatnewwaystoassessthesenewcapabilities, 19 namely by evaluating the EMO20Q agent’s game playing performance and the fuzzy logic models performance in translating emotion words. 1.8 Previous Contributions Beginning with [104, 54, 55] early attempts at user modeling of human-computer dialogs was under- taken. In particular, [55] developed a scale of user activation level that was obtained by a machine learningtechniqueknownasmodeltrees. Activationisonecomponentinthemodeldescribedin2.2, which also includes valence and dominance. Furthermore, evaluation of a user’s satisfaction when using a dialog system often uses a Likert scale survey. User satisfaction can be seen as a measure of valence. In [35] we explored the natural language expressions of common sense psychology concepts. This work aimed to recognize when concepts like "planning", "execution", "causation", "belief", etc., were expressed in text. This used a finite state transducer based approach using the lexical resource Intex. Another previous contribution presented a scale-based measure of behavior for psychometric performance measures. In reading assessment of non-native speakers, we used a pair of scales, namely reading ability and pronunciation ability. These were first annotated manually using expert knowledge (in this case teachers and linguists) and then automated by training speech recognition to detect correct pronunciations, common mispronunciations, and disfluencies [58, 2]. In [56, 47] we created the first interval type-2 fuzzy logic model of emotion words. The benefits of this type of fuzzy logic model is that it is possible to represent both inter- and intra-subject uncertainty. Thismodelrepresentsthemeaningofwordsinanemotionalvocabularyasintervaltype- 2 fuzzy sets on scales of valence, activation, and dominance. This work enabled two applications. One application was converting between different emotion vocabularies. This case arises when different researchers use different categorizations to label emotional data. The other application was 20 translating emotion words from Spanish to English. This application explored multilingual variation in the meaning of emotion words. The performance of the fuzzy logic model for emotion words degraded when used to represent larger vocabularies. While presenting this work at the World Conference on Computational Intelli- gence(WCCI2010)inBarcelona,theideaoccurred,whilepassingthetimeonatrainridebyplaying twenty questions, to use the twenty questions game as a way to deal with larger vocabularies. 21 Chapter 2 The Meaning of Emotion Words 1 What makes an occasion sentence low on observationality is, by definition, wide in- tersubjective variability of stimulus meaning. Language as a socially inculcated set of dispositions is substantially uniform over the community, but it is uniform in different ways for different sentences. If a sentence is one that (like ’Red’ and ’Rabbit’ ) is in- culcated mostly by something like direct ostension, the uniformity will lie at the surface and there will be little variation in stimulus meaning; the sentence will be highly obser- vational. If it is one that (like ’Bachelor’) is inculcated through connections with other sentences, linking up thus indirectly with past stimulations of other sorts than those that serve directly to prompt assent to the sentence, then its simulus meaning will vary with the speakers’ pasts, and the sentence will count as very unobservational. The stimulus meaning of a very unobservational occasion sentence for a speaker is the product of two factors, a fairly standard set of sentence-to-sentence connections and a random personal history; hence the largely random character of the stimulus meaning from speaker to speaker. -Willard Van Orman Quine, Word and Object 1 Parts of this chapter are derived from [57] 22 In the quote above, Quine distinguishes sentences that have a highly observational meaning and sentences that are unobservational. Words like “red” or “rabbit” are learned through direct contact with red things or (real or pictorially-represented) rabbits. Words like “bachelor” are learned from a mixture of linguistic descriptions as well as introductions to particular bachelors. If we were do design a bachelor detector based on sensory data, there would be little to test for besides being a male of approximately the right age. The test of a ring on the finger is another physical clue 2 , but a non-bachelor without a ring is still a non-bachelor. The point of the example is to note that define what a bachelor is, it is easier to work from language, i.e. to have the conjuction of natural language statements that ’x is a male’ and ’x is not married’ rather than clues from perceptual data. In this chapter, we look at the meaning of emotion words. While some emotion words can be learned through direct ostentation, such as basic emotions of happiness, sadness, anger, and fear for example, we claim that in general emotions that are named by words in human languages behave more like ’bachelor’ in that they can be learned without direct experience. This view of emotion words motivates looking at interpreting the meaning of emotion words in a computational model for subjectivity, namely interval type-2 fuzzy logic, which provides a way to model both inter- and intra- subject uncertainty about the words’ meanings. It also motivates the approach of EMO20Q, where emotion words are defined in terms of the questions and answers that they are associated with, rather than by sensory information. While it is true that words, whether auditory or written, are interpreted through the senses, we argue that they should be taken as discrete symbols rather than signals. Words and natural language play a central role in how we describe and understand emotions. One can learn about emotions first-hand by observing physiological or behavioral data, but to communicateemotionalinformationtootherswhoarenotfirst-handobservers, onemustusenatural language descriptions to communicate the emotional information. The field of affective computing dealswithcreatingcomputersystemsthatcanrecognizeandunderstandhumanemotions. Torealize 2 Thanks to Emily Mower for this counter-example. 23 the goals of affective computing, it is necessary not only to recognize and model emotional behavior, but also to understand the language that is used to describe such emotional behavior. For example, acomputersystemthatrecognizesauser’semotionfromspeechshouldnotonlyrecognizetheuser’s emotion from expressive speech acoustics, but also understand when the user says “I am beginning to feel X”, where “X” is a variable representing some emotion word or description. The ability to understand descriptions of emotions is important not only for human-computer interaction, but also in deliberative decision making activities where deriving behavioral analytics is based on natural language (for example, in mental health assessments). Such analytics often rely on abstract scales that are defined in terms of natural language. This dissertation looks at the problem of creating computational models for the conceptual meaning of words used to name and describe emotions. To do this, we represent the meaning of emotion words as interval type-2 fuzzy sets (IT2 FSs) that constrain an abstract emotion space. We present two models that represent different views of what this emotion space might be like. The first model consists of the Cartesian product of the abstract scales of valence, activation, and dominance. These scales have been postulated to represent the conceptual meaning of emotion words [102]. The second model is based on scales derived from answers to yes/no questions, where each scale can be seen as the truth value of a proposition. In each model, the meaning of an emotion word is represented as a fuzzy set in an emotion space, but the two models represent different theoretical organizations of emotion concepts. In the first, a spatial metaphor is used to organize emotion concepts on valence, activation, and dominance scales. In the second model, emotion concepts are represented as lists of propositions and associated truth values. In both models, the algebraic properties of fuzzy sets can be used as a computational model for the meaning of an emotion word. We outline the properties of these models and describe the methodology that estimates the fuzzy set shape parameters from data collected in interval approach surveys[72,73]. Inanintervalapproachsurvey,subjectsratewordsonabstractscales,butinsteadof picking a single value on the scales (as in a Likert scale survey), users select interval ranges on these 24 scales. In the two models we present, the survey results are aggregated into fuzzy sets for words in an emotion vocabulary. The fuzzy set representation allows one to compute logical relations among these emotion words. By using the relations of similarity and subsethood as measures of mappings between items of two vocabularies, one can translate between these vocabularies. This allows us to use our first model for several applications that involve mapping between vocabularies of emotion words. We can map from labels of one vocabulary of emotion words to another when the labels are in the same language, for example, using different emotion annotation schemes in annotated corpora. We can also convert emotion labels from one vocabulary to another when they are in different languages, such as when translating emotion words from one language to another (here, Spanish and English). These applications show one way our proposed model may be used and provideexperimentalevidencebywhichwecanevaluatethemodel. Forevaluationofthefirstmodel, we compare the translation applications with human performance in these tasks as a benchmark. Our results show that performance of the first model decreases when the vocabulary size gets larger, which indicates that a three-scale representation for emotions is ideal only for small vocab- ularies. To address this limitation, our second model uses inspiration from the game of twenty questions, where players can identify a large set of objects using question-asking. Because people’s beliefsaboutemotionscanbesubjective,manyoftheanswerstoquestionsaboutemotionsarevague and can be represented as fuzzy sets. For evaluation of this model, we test the estimated IT2 FS on data from different subjects who took a single-value survey by finding the membership of these points in the estimated IT2 FS. Other research has presented related methodologies–using fuzzy logic for affective computing, emotion lexical resource development, and representing emotions using valence, activation, and dominance dimensions. We will commence by describing some of these works and the novelties of our dissertation. There are many examples where fuzzy logic has been applied to the task of recognizing and representing observed emotional behavior. [76] gives an example where fuzzy logic is applied to 25 multimodal emotion recognition. Other examples of fuzzy logic in emotion recognition are [38, 70, 122], which use fuzzy logic rules to map acoustic features to a dimensional representation in valence, activation, and dominance. [65] uses an IT2 FS model for emotion recognition from facial expressions. The model of [21] uses fuzzy logic for emotional behavior generation. Another related trend of research is the development of lexical resources. Our work can be seen as a lexical resource framework like the Dictionary of Affective Language (DAL) [117]. In this work, 8745 common English words were evaluated for valence and activation (as well as a third dimension, imagery). The methodology for collecting the data in this paper was similar to our survey in presenting subjects with words as stimuli, but in the DAL the values of each word’s dimensions are the mean across all subjects, so there is no estimate of the intra-subject variation. Also, compared with the DAL, we focus on words that are names of emotions, rather than words that might have emotional connotation. As such, our approach is more geared toward analyzing the meaning of short utterances explicitly referring to emotions, i.e. natural language descriptions of emotion [48]. The dictionary of affect would be more appropriate for characterizing the emotional tone at the document-level. Another related research trend outside the domain of affective computing is the study of linguistic description of signals [59, 108], which aims to associate words with the signals they describe. One of the contrastive traits of this research is that we try to use the dimensional approach and fuzzy logic to model emotion concepts used in natural language descriptions of emotions [48], rather than characterizing data from emotional human behavior [37, 38, 76, 70, 122]. Focusing on the conceptual meaning of emotion words allows us to consider cases where emotion is communi- cated through linguistic meaning, as opposed to paralinguistics or body language. The dimensional approach has been used to both describe emotional data and emotion concepts but more often than not this distinction is not made clear. By describing our model of the meaning of emotion words in terminology established by the philosophy of language, we hope to clarify this issue. Furthermore, by rigorously defining the idea of an emotional variable and operations on such variables in terms of 26 fuzzy logic, we can establish general relations such as similarity and subsethood that can be applied even if the underlying representation of valence, activation, and dominance is changed. Another contrast between this work and other research using fuzzy logic to represent emotional dimensions is that we use IT2 FSs [71] and the interval approach [73]. This allows our model to account for both inter- and intra-subject variability. Compared with the earlier developments of [56, 47, 9], this paper offers a more detailed description of the theoretical framework and analysis of experimental results by incorporating subsethood and applying newer developments to the interval approach [14] (Section 2.2.4). This paper also extends these results by proposing a second model to deal with larger emotion vocabularies. Byconstrainingourfocustoaconceptuallevel, wefocusoninput/outputrelationswhoseobjects are words, rather than observations of stimuli and behavior. As such, this work can be seen as an instance of Computing with Words (CWW) [82, 80, 124]. CWW is a paradigm that considers words astheinputandoutputobjectsofcomputation. Perceptualcomputing[83,84]isanimplementation of the CWW paradigm that we draw upon in this work. 2.1 The Meaning of Meaning What does it mean to say that our model represents the meaning of emotion words? We believe this is an important question and therefore we will briefly discuss meaning in general in Section 2.1.1 and then explain how it relates to the meaning of emotion words in Section 2.1.2. 2.1.1 Meaning in General Inaninfluentialpaperaroundtheendofthe19thcentury, thephilosopheroflanguageGottlobFrege described two components of meaning: extension and intension [28]. The extensional component of meaning is a mapping from words to things in the world, whereas the intensional meaning is a mapping from words to concepts. The stereotypical example of this is illustrated by the terms 27 Table 2.1: Examples of intension and extension. Domain Meaning Astronomy Mathematics extension <2nd planet> < 1,1>,< 2,4>,< 3,9>,... intension “Venus”, “Morning Star” “f(x)=x 2 ”, “f(x)=x⇤ x” “morning star”, “evening star”, and “Venus”. The extensional meaning of these three terms is the same, namelythesecondplanetinthesolarsystem. However, theintensionalmeaningofthesethree terms is different, which explains why the three terms cannot be freely substituted in an arbitrary sentence without changing the meaning of the sentence. In this paper, we focus on the meaning of individual words, but we touch upon the topic of the meaning of phrases in the second model. Although the notion of extension and intension are most frequently associated with the field of philosophy of language, the idea can also be described in mathematical terms [25]. One can think of theextensionof afunctionasasetoforderedpairs, wherethefirstitemofthepairisaninputtothe function and the second item in the pair is the corresponding output. The intensions of a function are described by their symbolic or algorithmic representations. Therefore we can have “f(x)= x 2 ” or “f(x)= x⇤ x”asintensionsoftheextensionalsetofpairs“h1,1i,h2,4i,h3,9i,...”. A summary of the distinction of intension and extension can be seen in Table 2.1. Extension and intension have been formally described in the study of formal concept analysis [30]. We believe that by defining meaning in this way, we can describe our model more precisely. Without explicitly describing “meaning,” whether in terms of extension and intension or otherwise, this important concept tends to get blurred. Although, this topic is complex, the intuition behind it israthersimple: similar,intuitivedistinctionsalongthelinesofintensionandextensionarecommon. Extension-related terms include: referent, percept, object, empirical data, Aristotelian worldview, or stimulus meaning. Intension-related terms include: signified, concept, subject, knowledge struc- ture, schema, Platonic worldview, or linguistic meaning. The process of understanding a word is a mapping, or interpretation, from the word itself to the word’s meaning, whether it be intensional or extensional. We argue that, when understanding natural language in the absence of first-hand, 28 perceptual evidence, people refer to intensional meaning rather than extensional meaning. It is intensional meaning that we associate with theoretical entities described in the thesis statement. 2.1.2 The Meaning of Emotion Words and the Dimensional Model of Emotions According to the definition of meaning described above, the extensional meaning of an emotion word isthesetofhumanbehaviorsandstatesoftheworldthatthewordrefersto. Theintensionalmeaning of an emotion word is the concept that people have when using it to communicate. Although most other examples of emotion research do not make an explicit distinction between intensional and extensional meaning, it seems that many tend towards extensional meaning, especially when dealing with the analysis of emotional data that has been annotated with emotional labels. In this view, the extensional meaning of an emotion word used as an annotation label refers to the set of all data to which it has been applied. The focus on intensional meaning in this work therefore can be seen as one of its distinguishing features, though it could be said that machine learning that generalizes from training data is in fact a way to infer intentional meaning. The question then arises about the form of this intensional meaning, in particular, how we can simulate this subjective form of meaning, with respect to emotion words, in a computer. The two computational models we describe mirror two different theoretical views of intensional meaning. One view seeks to represent the intensional meaning of emotion words as points or regions of an abstract, low-dimensional semantic space of valence, activation, and dominance. The other view seeks to represent the intensional meaning of emotion words in relation to other propositions. This latter perspective is exemplified in the Emotion Twenty Question (EMO20Q) game. EMO20Q is played like the normal twenty questions guessing game except that the objects to be guessed are emotions. One player, the answerer, picks an emotion word and the other player, the questioner, tries to guess the emotion word by asking twenty or fewer yes-no questions. Each question can be seen as a proposition about the emotion word, which prompts an answer that ranges on a scale from 29 assent to dissent. Scale-basedmodelsofemotionhaveaninterestinghistorythatgoesbacktoSpearman’sattempts to measure general intelligence using factor analysis. At first Spearman hypothesized that there was oneunderlyingscalethatcouldrepresentaperson’sintelligence, butlateritcametoberealizedthat intelligence was a complex concept that required multiple scales. Factor analysis was the method used to isolate these scales. In turn, factor analysis was used in the pioneering work [89] that first identified valence, activation, and dominance as factors in the connotative meanings of words. In [89],psychologists,aidedbyoneoftheearlycomputers,conductedsemantic differential surveysthat tried to measure the meaning of words on Likert scales whose endpoints were defined by thesaurus antonyms. Valence, activation, and dominance were identified as interpretations of the factors that were encountered. Some of the early applications of this emotional model to language are [39, 102, 101, 117]. The pictorial representation of these dimensions, which we use in the interval surveys, was developed by [7]. It should be noted that the valence, dominance, and activation representation is merely a model for emotional meaning and these scales most likely do not exhaustively describe all emotional concepts. In [24], it is argued that four dimensions are required; “unpredictability” in addition to the three scales we use. The approach we advocate here is based on an algebraic model that is generalizable to any scales. Our choice of the three scales for this model was motivated by their wide usage and to balance theoretical and practical concerns. The idea of using abstract scales for measuring complex concepts is attested in many areas. Some physical examples include the Richter scale of earthquakes and the Beaufort wind scale. Some examples in psychology and sociology include: intelligence quotient (IQ), Likert scale (agreement toward or against issues rated on a scale of 1-5), Thurstone scale (a scale of attitudes induced from average counts of binary questions of comparative judgment), Rasch model (scales that include read- ing ability, personality traits, and attitudes), Bogardus social distance scale (measures the closeness of social relations), and Guttman scale (an abstract scale derived from a questionnaire of binary or- dered questions). Even probability, under the Bayesian interpretation, can be seen as a intensional 30 subjective belief scale [111] (the frequentist interpretation would be the extensional meaning of such ascale). Ataxonomyofscalesthatareinfrequentuseinexperimentalpsychologyisdescribedin [107] and scale construction is covered in [16]. In the examples mentioned above, one can see that we, as humans, use scales to conceptualize various phenomena. The amount of weight we give to the role of abstract scales has changed from the proposal to the dissertation. In the proposal, the scale-based representation was the main proposed model for emotion concepts and it was argued that such scales have a basis in human mental representations. There is a basis in cognitive science for such a hypothesis. In [68], it is argued that metaphors for physical objects such as distance, weight, and containment are applied to abstract concepts. Applying this view to natural language description of emotion leads to the hypothesis that people applying the physical notion of distance to emotion concepts in terms of abstract scales of valence, activation, and dominance. Similarly, the notion of physical containment could be applied to concepts. For example, the concept of happiness would contain the concept of joy. In [31], such a hypothesis of using geometrical space to represent concepts is explored. For example, in [31, Chapter 5], properties are distinguished from concepts in that properties are represented along one scale while concepts have several domains. However, despite the evidence for a spatial metaphor for concepts, it is difficult to observe concepts. In [95], Quine gives a strong admonition against projecting non-sensory concepts onto sensory metaphors while ignoring observable behavior: To project non-sensory attributes purely on the analogy of sense qualities, hence as recur- rent characters somehow of a subjective show within the mind, betrays surely a cavalier attitude towards psychological processes and a lack of curiosity about the mechanisms of behavior. The second model we propose takes this different perspective. Rather than having theoretically motivated scales for various characteristics of emotions, the second model aims to represent the 31 intentionalmeaningofemotionwordsintermsofnaturallanguagepropositionsthatcanbeassented to or dissented from in verbal behavior. This of assent or dissent could also be construed as an abstract scale of truth with respect to various propositions (which has been considered in the study of veristic fuzzy sets [110, 125, 96]), but we see this view as qualitatively different from the first model. The propositional model can be seen differently from the scale-based model for several reasons. First, the number of propositions about emotions will generally be far larger than the number of emotion words. In the case of the scale-based representation the number of scalar dimensions will be smaller than the emotion vocabulary size. Second, the propositional model can be considered qualitatively different than the scale-based model because propositions can be verbally (or ortho- graphically) expressed as linguistic stimuli. On the other hand abstract scales carry more cognitive implicationsandarelanguageindependent. SomequestionsfromEMO20Qcloselycorrespondtothe scales in the first model, e.g. “is it positive?” is similar to valence, “is it a strong emotion?” is similar to activation, and “is it related to another person?” hints at dominance. However, model 2 contains many questions that are very specific, such as “would you feel this emotion on your birthday?”. The models we propose can be seen as an algebraic representation where theoretical entities like emotionconceptsareconsideredvirtualobjects[26]withabstractscales. Inthisview, acollectionof scalesthatdescribeanobjectcanbeseenasasuite of congruence relations.Recallthatacongruence relation⌘ (modP) is an equivalence relation that holds given some property or function P.Asuite ofcongruencerelationsisabundleofequivalencerelations{⇠ i :i2I},again,givensomepropertyP. In both of the models we present,P are fuzzy sets in an emotion space. In the case of the first model we present, I is a set which can contain valence, activation, and/or dominance. In the case of the secondmodel,I isasetofpropositionsderivedfromtheEMO20Qgame[49,50,53,52]. Forexample, for the statement that “"makes you smile”, we can say that happy and amused are congruent given this statement about smiling behavior. In terms of the scales, the equivalence relations on each scale divide the scale space into equivalence classes. In the next section, we describe this space of 32 Figure 2.1: Translation as a perceptual computer. emotions in more detail. 2.2 IntervalType-2FuzzySetModelfortheMeaningofEmo- tion Words 2.2.1 Emotion Space and Emotional Variables Let E be an emotion space, an abstract space of possible emotions (this will be explained later in termsofvalence,activation,anddominance,butforthetimebeingwewillremainagnosticaboutthe underlying representation). An emotion variable " 3 represents an arbitrary region in this emotion space, i.e. "⇢ E, with the subset symbol⇢ used instead of set membership (2) because we wish to represent regions in this emotion space in addition to single points. The intensional meaning of an emotion word can be represented by a region of the emotion space that is associated with that word. An emotion codebook C =(W C ,eval C ) is a set of words W C and a function eval C that maps words of W C to their corresponding region in the emotion space, eval C : W C ! E. Thus, an emotion codebook can be seen as a dictionary for looking up the meaning of words in a vocabulary. Words in an emotion codebook can also be seen as constant emotion variables. The region of the emotion space that eval C maps words to is determined by interval surveys, as described in Section 2.2.4. 3 The symbol “"” is used instead of a simple “e” because in future work we plan to look at the relation of emotions with event semantics, which has already reserved the letter “e” [41]. Also, we were motivated by an analogy with the Euler-Mascheroni constant. It is not known whether this constant is rational, irrational, algebraic (the root of a polynomial), or transcendental (like ⇡ ). Despite its uncertain identity, it can be represented by a variable, which brought to mind the uncertainty regarding the underlying identity of our emotion variable. 33 We consider two basic relations on emotion variables: similarity and subsethood. Similarity, sm :E⇥ E,isabinaryequivalencerelationbetweentwoemotionvariables(wewillseethatthefuzzy logic interpretation of similarity will actually be a function, sm :E⇥ E! [0,1], which measures the amount of similarity between the variables rather than being true or false). Subsethood, ss :E⇥ E, is a binary relation between two emotion variables that is true if the first variable of the relation is contained in the second. Like similarity, the fuzzy logic interpretation of subsethood is a value between zero and one. Further details are provided in Section 2.2.3, where we will define the fuzzy logic interpretation of these relations. Finally, a translation is a mapping from the words of one vocabulary to another, as determined by the corresponding codebooks: translate :W 1 ⇥ C 1 ⇥ C 2 ! W 2 (2.1) which is displayed schematically in Figure 2.1. This can be decomposed by thinking of C 1 ⇥ C 2 as a similarity or subsethood matrix, which is denoted as the CWW engine in the figure. Translation can be seen as selecting the word from the output language w output 2W 2 such that the similarity or subsethood is maximized for a given w input 2W 1 . In the case of similarity, the translation output is w output = arg max w22 W2 sm(eval C2 (w 2 ),eval C1 (w input )) (2.2) where the argmax functions as the decoder in Figure 2.1. The formulation of similarity and subset- hood in terms of IT2 FSs will be described in Section 2.2.3 and we will empirically evaluate the use of similarity and subsethood for use in translation in Section 2.4. 34 2.2.2 Fuzzy Logic and emotion concepts 2.2.2.1 Fuzzy logic basics In traditional set theory (a.k.a, “crisp logic” or Type-0), a set A in a universe of discourse D A can be defined by a bivalent membership function µ : D A !{ 0,1} that maps objects of the universe of discourse to items of the set consisting of 0 or 1 (false or true) depending on whether an object x 2 D A is in A or not. Membership functions of crisp sets are also referred to as characteristic functions, indicator functions, or discriminant functions [83]. The insight of fuzzy logic is to generalize the notion of membership to a real number between 0 and 1, which is termed the degree of membership or membership grade to indicate its gradual nature. A type-1 fuzzy set A is defined by its universe of discourse,D A , together with a membership functionµ A :D A ! [0,1] that maps objects to a point between 0 and 1, inclusive. Whenµ A (x)=1 or µ A (x)=0 for8 x2D A ,theset A is actually a type-0 set, so fuzzy sets can be said to generalize traditional sets. In terms of the earlier discussion about intension and extension, we can say that the membership function µ A (x) is a fuzzy set’s intensional meaning while the set of ordered pairs A = {hx,µ A (x)i|x2D A } is its extensional meaning. In fuzzy logic, the traditional logical relations of union and intersection are given new interpre- tations, such as maximum and minimum. A comprehensive list of the different interpretations of these relations can be found in [64]. The general class of operations corresponding to conjunction and disjunction are called t-conorms and t-norms, respectively. In this work, we use max for union and min for intersection: µ A[ B (x) = max(µ A (x),µ B (x)) and µ A\ B (x)=min(µ A (x),µ B (x)),for 8 x2X.Furthermore,thefuzzycomplementis µ ¯ A (x)=1 µ A (x) for8 x2X. 2.2.2.2 Fuzzy logic for emotional variables InSection2.2.1, thedefinitionofanemotionspaceE followedatraditionalsettheoreticformulation. Traditional, non-fuzzy sets have crisp boundaries, which means that we can precisely determine 35 whether a region in the emotion space is a member of any given set representing an emotion word. However, this seems to contradict the intuition and evidence that emotion concepts are somewhat vague and not precisely defined sets [5]. There are several sources of uncertainty that theoretically precludeprecisesetboundariesineitherofthetwomodelswepresent. Thereismodelinguncertainty because a computational model is necessarily an approximation of human thought processes. There is measurement uncertainty because the precision on these scales may be limited by perceptual processes of mapping sensory data to concepts and in distinguishing between concepts. Finally, there is uncertainty due to inter- and intra- subject variation. Postulating a blurred boundary between emotion concepts leads us to use fuzzy logic, in particular IT2 FSs. When we use fuzzy logic to represent emotions, the formulations in Section 2.2.1 remain the same but receive a different interpretation as fuzzy sets. For example, in a crisp representation of “happy”, the membership function would be a step function, i.e.: happy(")=1 if "✓ ”happy” or happy(")=0 if "6✓ ”happy” (note that we use the convention happy(") rather than µ happy ("),the notation in Section 2.2.2.1, for convenience and ease of reading). With a fuzzy logic interpretation, the truth value is a quantity between 0 and 1 rather than a step function. This aspect of the model is based on the intuition that there can be a particular emotion that is, say, partially happy. In this view an emotion "may simultaneously be partially happy and partially not happy when "is not a propersubsetofeitherhappyornothappy. Duetothesepartialtruthvaluesandnewinterpretations of the logical operators, the axioms of the excluded middle and law of non-contradiction no longer hold. 4 For example, if happy(x)=0.2 for some x,then ¬happy(x)=0.8. In this case, the law of excluded middle holds only partially: happy(x)_¬ happy(x)=max(happy(x),¬happy(x)) = 0.8. When such a structure behaves like the Boolean lattice of traditional set theory except for the law of excluded middle and the law of non-contradiction, it is called a pseudo-complemented distributive lattice [63]. There are other interpretations of the laws of logic for fuzzy sets that do respect these two laws, but at the expense of other logical laws (the laws of idempotence (A^ A = A_ A = A) 4 This is not necessarily a drawback. In the field of logic proper (i.e., apart from fuzzy logic), constructive logic eschews the use of the excluded middle in proofs [25, p. 84]. 36 and distributivity (A^ (B_ C)=(A^ B)_ (A^ C) and A_ (B^ C)=(A_ B)^ (A_ C) are the trade offs). If we deem that emotion concepts can be represented as fuzzy sets in either of these two models, then how do we determine the shapes of sets in this space? As we describe later in Section 2.2.4, we use the interval approach survey methodology. One can think of a Likert type of survey where the scales represents valence, activation, and dominance and then query users with emotion words as stimuli on these scales. However, subjects may be unsure about picking a specific point on the scale due to vagueness in the meaning of emotion words, especially broadly defined emotion words like those typically used as primary emotions. To deal with this intra-subject uncertainty, we turn to interval surveys and IT2 FSs. 2.2.2.3 Interval Type-2 Fuzzy Logic Just as type-1 fuzzy sets extend classical sets by postulating set membership grade to be a point in [0,1], type-2 fuzzy sets further extend this generalization by defining a membership function’s membership grade at a given point in the domain to be a distribution in [0,1] rather than a single point, which allows for uncertainty in the membership grade [123]. The rationale for type-2 fuzzy logicisthatevenifamembershipfunctiontakesavaluebetween0and1, thereisstillnouncertainty being represented because the membership value is a fixed point. What is represented by type-1 fuzzy sets is partial membership, not uncertainty. Whenever there is uncertainty, type-2 fuzzy logic is motivated on theoretical grounds [80]. The region of uncertainty in the membership grade with respect to the domain is known as the footprint of uncertainty. Whilegeneraltype-2fuzzylogicsystemsaccountforuncertainty, theyaremoreconceptuallyand computationally complex and methods to estimate them directly from human input are still ongoing areas of research [46]. IT2 FSs use intervals to capture uncertainty of the membership grade [71]. Instead of an arbitrary distribution in [0,1] as is the case for general type-2 fuzzy sets, IT2 FSs use an interval [l,u] in [0,1] to represent an area of uniform uncertainty in the membership function’s 37 value, where 0 l u 1 are the lower and upper bounds of the uncertainty interval, respectively. IT2 FSs can be regarded as a first-order representation of uncertainty because they are the simplest type of fuzzy set that will account for uncertainty in the membership function. Also, as will be discussed in Section 2.2.4, there is a method for constructing IT2 FSs from human input, which makes the use of IT2 FSs practical for human-computer interaction. IT2 FSs have been widely used because they approximate the capability to represent the un- certainty of general type-2 fuzzy set models while still using many of the same techniques used for type-1 fuzzy sets. IT2 FSs can be represented as two type-1 membership functions: an upper mem- bership function, which defines the upper bound of membership, and a lower membership function, which represents the lower bound on membership. When these coincide, the IT2 FS reduces to a type-1 fuzzy set [79, 81]. If the difference between the upper and lower membership function is wide, this means that we have much uncertainty about the membership grade. Figure 2.2: Example of a trapezoidal interval type-2 membership function (IT2 MF). A normalized trapezoidal IT2 MF can be specified with nine parameters, (a, b, c, d, a’, b’, c’, d’, e’). The trapezoidal height of the upper membership function (e), can be omitted in normalized IT2 FSs because it is always equal to 1. An example of an interval type-2 membership function can be seen in Fig. 2.2. The area between the upper and lower membership functions is the footprint of uncertainty. In this paper, as an engineering decision we have restricted ourselves to trapezoidal membership functions, which can bespecifiedinaconcisewayusinga5-tuple (a,b,c,d,e). The first number of the tuple,a,represents 38 the x-value of the left side point of the base of the trapezoid, b represents the x-value of the left side point of the top of the trapezoid, c represents the x-value of the right side point of the top of the trapezoid, d represents the x-value of the right side point of the base of the trapezoid, and e represents the height of the trapezoid (i.e., the y-value of the top of the trapezoid). Since IT2 FSs consist of an upper and lower membership function, they can be represented as a 10-tuple. However, in the case of normalized interval type-2 membership functions, those whose upper membership function reaches 1, we can leave out the height of the upper membership function and specify the fuzzy set as a 9-tuple consisting of a 4-tuple for the upper membership function with the fifth value assumed to equal be 1, and a 5-tuple for the lower membership function (we must include the fifth value, e 0 as described above, because in general the height of the lower membership function can be anywhere between 0 and 1). 2.2.3 Similarity and Subsethood Similarity and subsethood form an important part of our model of emotions. The notion of similarity allows us to express that some pairs of emotional concepts are more or less similar than others. For example, we would say that angry is more similar to frustration than it is to happiness. When we make this judgment, we do not explicitly consider specific experiential examples of angry, frustrated, and happy data. Rather, we argue that one can make similarity judgments based on a mental representations of emotions. Two people could have disjoint sets of formative emotional stimuli, but still largely agree on the emotional concepts which form the intensional meaning of emotion words. In the fuzzy logic interpretation, similarity ranges from 0 to 1, where 1 is equality of two membership functions, and 0 indicates that the membership functions have no overlap. The notion of subsethood allows us to capture that some general emotions might encompass other emotions. For example, “elated” might be a subset of “happy”. The notion of subsethood is defined for traditional sets as being an either-or value, but for fuzzy sets it takes a value between 0 39 and 1. Similarity and subsethood are closely related. For clarity, we present the definitions of similarity and subsethood in terms of crisp sets, then type-1 and type-2 fuzzy sets. The definitions of the fuzzy set similarity and subsethood follow naturally from crisp sets. The general form of similarity is based on the Jaccard Index, which states that the similarity of two sets is the cardinality of the intersection divided by the cardinality, i.e., sm J (A,B)= |A\ B| |A[ B| For fuzzy sets, the set operations of intersection and union (\ and[ )arerealizedbythe minand max functions, as described in Section 2.2.2.1, and the cardinality operator (||) is realized by summing along the domain of the variable. Thus for type-1 fuzzy sets, sm J (A,B)= P N i=1 min(µ A (x i ),µ B (x i )) P N i=1 max(µ A (x i ),µ B (x i )) For interval type-2 fuzzy sets, the right hand side of this equation becomes P N i=1 min(µ A (x i ),µ B (x i ))+ P N i=1 min(µ A (x i ),µ B (x i )) P N i=1 max(µ A (x i ),µ B (x i ))+ P N i=1 max(µ A (x i ),µ B (x i )) where µ(x) and µ(x) are the upper and lower membership functions, respectively. The formulas for similarity are symmetric (sm J (A,B)=sm J (B,A))andreflexive(sm J (A,A)=1)[83]. We also examined a different, earlier similarity method called the Vector Similarity Method (VSM) [120]. This method was used in earlier experiments [56], so we tested it in addition to the newer Jaccard-based method. The VSM uses intuition that similarity of a fuzzy set is based on two notions: similarity of shape and similarity of proximity. Thus, the similarity of two fuzzy sets can be seen as a two element vector: ss V (A,B)=(ss shape (A,B),ss proximity (A,B)) T . The similarity measure of proximity is based on the Euclidean distance between the fuzzy set centroids. 40 The similarity measure of shape is based on the Jaccard similarity between the two fuzzy sets once their centroids have been aligned. To convert the vector similarity to a single scalar, the product of ss shape and ss proximity is taken. The subsethood measure has is closely related to similarity and is based on Kosko’s subsethood measure [66]. The measure of subsethood of a set A in another set B is defined as: ss K (A,B)= |A\ B| |A| As with the similarity metric, when the set and cardinality operators are replaced by their fuzzy logic realizations, one obtains ss K (A,B)= P N i=1 min(µ A (x i ),µ B (x i )) P N i=1 µ A (x i ) forthecaseoftype-1fuzzysetsandfortype-2fuzzysetstherighthandsideoftheequationbecomes P N i=1 min(µ A (x i ),µ B (x i ))+ P N i=1 min(µ A (x i ),µ B (x i )) P N i=1 µ A (x i )+ P N i=1 µ A (x i ) As opposed to similarity, subsethood is asymmetrical, i.e. ss K (A,B)6=ss K (B,A)). Theseequationsgivethesimilarityandsubsethoodmeasuresforfuzzyvariablesofonedimension. To aggregate the similarity of the three dimensions of the emotion words, we tried several methods: averaging the similarity of the individual dimensions sm avg (A,B)=1/3 X i2{ Val.,Act.,Dom.} sm i (A i ,B i ) taking the product of the similarity of the individual dimensions sm prod (A,B)= Y i2{ Val.,Act.,Dom.} sm i (A i ,B i ) 41 and taking the linguistic weighted average [121] sm avg (A,B)= X i2{ Val.,Act.,Dom.} sm i (A i ,B i )w i / X i2{ Val.,Act.,Dom} w i The results of these different choices are described in 2.4. As mentioned in Section 1.3, the extension oftraditional, “crisp” logictofuzzylogic, hereforsimilarityandsubsethood, canbeseenasageneral theme of adapting a logical model to better represent human language. 2.2.4 Interval Surveys using the Interval Approach To estimate the interval type 2 fuzzy sets over the valence, activation, and dominance scales, we used the interval approach [72, 73]. This survey methodology uses a Likert-like scale but the subjects select interval ranges instead of single numbers on the scale, which results in IT2 FSs. One of the novelties of our work that adds to [72, 73] is that we look at modeling a phenomenon where the underlying variable is composed of multiple scales: three separate scales (valence, activation, and dominance) in the case of our first model, and an open-ended number of scales in our second model. The interval approach assumes that most people will be able to describe words on a scale, similar to a Likert scale. However, while the Likert scale approach allows the subject to choose only a single point on the scale, the interval approach allows the subject to select an interval that encloses the range on the scale that the word applies to. Thus, while a Likert scale can capture direction and intensity on a scale, the interval approach also captures uncertainty. This uncertainty that an individualuserhasaboutawordcanbethoughtofasintra-useruncertainty. Theuserdoesnotneed to know about the details of interval type-2 fuzzy logic; they can indicate their uncertainty as an interval which is then aggregated into IT2 FSs by the interval approach, which represent inter-user uncertainty. Aftercollectingasetofintervalsfromanintervalapproachsurvey,theintervalapproachestimates an IT2 FS that takes into account the collective uncertainty of a group of subjects. This type of 42 Figure 2.3: The data part of the interval approach. Figure adapted from [73]. This figure shows the data part of the interval approach highlighting the valence layer. This framework is repeated for each word in the vocabulary. uncertainty can be thought of as inter-user uncertainty. The interval approach consists of a series of steps to learn the fuzzy sets from the survey data which can broadly be grouped into the data part and the fuzzy set part. The data part takes the survey data, preprocesses it, and computes statistics for it. The fuzzy set part creates type-1 fuzzy sets for each subject and then aggregates them with the union operation to form IT2 FSs. A new version of the interval approach, the enhanced interval approach, was proposed in [14]. This enhancement aims to produce tighter membership functions by placing new constraints on the overlapping of subject-specific membership functions in the reasonable interval processing stage. 2.2.4.1 Data part The flow chart for the data part is given in Fig. 2.3. The data part preprocesses the intervals from the survey data and the intervals that survive the preprocessing are used to compute statistics, as 43 described in [73]. The input is a series of intervals [a i ,b i ] for each subject i=1...n.Preprocess- ing consists of four stages: bad data processing, outlier processing, tolerance-limit processing, and reasonable-interval processing. The bad data processing stage removes nonsensical results, such as data from subjects who did not take the survey seriously. The intervals must be in the domain of the scale and the upper bound must be higher than the lower bound. In our implementation, the web-based survey system is operationally constrained by the javascript user interface to not allow such bad data, so this stage is more important if the survey is carried out with pen and paper. The outlier processing stage uses a box and whisker test to remove intervals that are deemed outliers [115]; it rejects intervals whose endpoints and lengths exceed the first and third quartiles by more than 1.5 times the interquartile range. The tolerance-limit processing stage calculates the mean and standard deviation of the interval endpoints and lengths for the intervals that remain after the outlier processing. This stage rejects any intervals whose endpoints or lengths are out of the 95% confidence intervals, which corresponds to plus or minus 2.549 times the standard deviation. Reasonable interval processing is the final stage of the data part preprocessing. This step aims to discard any intervals that do not overlap with each other, based on the intuition that even though subjects may differ about the meaning of a word, the meaning should overlap across subjects. To do this, one takes the mean and standard deviation of the endpoints and see where the Gaussian distributions of each endpoint meet. All intervals whose lower bound is greater than this point are discarded, as well as all intervals whose upper bound is lower than this point. After this preprocessing is the stage named “probability distribution assignment”, where proba- bility distributionsS i =(m i , i ) are assigned to the remaining intervalsi from 1 tom (m n) using the formulas for a random variable with a uniform distribution:m=(a+b)/2 and =(b a)/ p 12. This can be seen as a different representation of an interval as a midpoint and range of the interval instead of its endpoints. 44 Figure 2.4: The fuzzy set part of the interval approach. Figure adapted from [73]. Layers denote individual fuzzy sets for valence, activation, and dominance. This framework is repeated for each word in the vocabulary. These S i are then input to the next stage, the fuzzy set part. 2.2.4.2 Construction of the Interval Type-2 Fuzzy sets The fuzzy set part, illustrated in Fig. 2.4, takes the resulting S i from the data part and eventually creates an interval type-2 fuzzy set. This process, which closely follows the algorithm described in [73] consists of nine steps, which correspond to the boxes in Fig. 2.4. The first two steps, “choose a type-1 fuzzy set model” and “establish fuzzy set uncertainty mea- sures” refer to the facts (1) that there are many possible different types of type-1 fuzzy set shapes and one must be chosen, and (2) that for a chosen type-1 fuzzy set model type, there are several different uncertainty measures that can be chosen from [62]. In our case, the types of fuzzy set model is a symmetrical interior triangle or a right-angled trapezoidal left or right shoulder. The uncertainty measures on these shapes are mean and standard deviation which can be derived ge- ometrically using formulae for the mean and deviation of triangular and uniform distributions (a trapezoidal membership function can be cut into a rectangular part with a uniform distribution and a triangular part with a triangular distribution. For more information, see Table II of [73]). 45 In the step labeled “compute general formulas for parameters of type-1 fuzzy set models”, we take the S i from the data part and equate the means and standard deviations from these interval models with the corresponding parameters from the previous step. In the step of Fig. 2.4 labeled “establish the nature of FOU (footprint of uncertainty)”, we use the parameters of the models associated with each interval to classify whether an interval should be mapped to an interior membership function, or a left or right shoulder membership function. In this step, if the parameters show that the distribution is out of the range of the scale, then the input interval will be mapped to a shoulder membership function. Next, with the intervals and the decision of whether they are interior or shoulder membership functions, we can compute the realized fuzzy sets for each interval (“compute embedded T1 FSs” in Fig. 2.4). These correspond to a person-specific membership functions. Because these are based on statistics and not the raw intervals themselves, there needs to be another preprocessing stage (labeled “delete inadmissible T1 FSs”) to remove inadmissible type-1 fuzzy sets, namely, the ones whose range is outside of the limit of the scale of the variable of interest. Finally, we take aggregate the subject-specific type-1 fuzzy sets to compute the resulting interval type-2 fuzzy set that contains all the subject-specific type-1 fuzzy sets in it’s FOU (the last two steps in Fig. 2.4). This aggregation can be seen as a type-2 union of type-1 fuzzy sets, where embedded type-1 fuzzy sets trace the FOU of an interval type-2 fuzzy set. Anewversionoftheintervalapproach,theenhancedintervalapproach,wasproposedrecently [14]. This enhancement aims to produce tighter membership functions by placing new constraints on the overlapping of subject-specific membership functions. We tested this method as well as the original interval approach and found that the original produced better results (c.f. Section 2.4). Since the original method produced better results and the enhanced interval approach method is derived from the original, we present the original here, described with respect to our task. 46 2.3 Methodology This section describes the experimental methodologies that were used to create the two models for emotion codebooks. In the first, we use an interval approach survey for emotion words and we adapt the CWW paradigm to account for 3-dimensional fuzzy scales, specifically, by implementing similarity and subsethood measures for fuzzy sets that have 3 dimensions. In the case of the second model, the interval survey is separate from the elicitation of emotional information. The emotional information is collected from the EMO20Q game and thereafter the fuzzy sets are calculated from the answers to the questions in the game. 2.3.1 Emotion Vocabularies In our experiments, we examined four different emotion vocabularies. The first vocabulary consisted of seven emotion category words: angry, disgusted, fearful, happy, neutral, sad,and surprised. These are commonly used emotion categories used for labeling emotional data. We refer to this vocabulary as Emotion Category Words. These emotions are posited to be basic in that they are reliably distinguishable from facial expressions [20]. The second vocabulary consisted of 40 words taken from the top 40 emotion mood labels used by the bloggers of LiveJournal (this blogging site lets users label each post with a mood label, which has been used as an annotated corpus for studying emotional text [87]). The words in this vocabulary are: accomplished, aggravated, amused, angry, annoyed, anxious, awake, blah, blank, bored, bouncy, calm, cheerful, chipper, cold, confused, contemplative, content, cranky, crazy, creative, curious, depressed, disgusted, drained, ecstatic, excited, exhausted, fearful, frustrated, good, happy, hopeful, hungry, neutral, okay, pissed off, sad, sick, sleepy, stressed, thoughtful,and tired.Werefer to this vocabulary as Blog Moods. The third vocabulary was a list of 30 Spanish emotion words that was taken from the men- tal health initiative of a Southern California medical service provider. The words in the Spanish 47 emotion vocabulary are: aburrido, agobiado, agotado, ansioso, apenado, asqueado, asustado, aver- gonzado, cauteloso, celoso, cómodo, confiado, confundido, culpable, deprimido, enamorado, enojado, esperanzado, extático, feliz, frustrado, histérico, malicioso, pasmado, rabioso, solitario, sorpredido, sospechoso, timido,and triste (see table I in [47] for glosses of these words from a Spanish-English dictionary). We refer to this vocabulary as Spanish Emotion Words. The fourth vocabulary was elicited from subjects playing EMO20Q, both between two humans and also between a human and computer with the computer in the questioner role. [49, 50, 52]. These data sources resulted in a set of 105 emotion words. 2.3.2 Valence, Activation, and Dominance Model (Model 1) The data collected from the interval surveys for the first model consists of four experiments: three surveys of 32 subjects for English and one survey of eight subjects for Spanish. All surveys had a similar structure. First, the surveys gave the subject instructions. Then the surveys sequentially presented the subject with emotion words, which we will refer to as the stimuli, one word per page. For each stimulus there were sliders for each of the three emotion dimensions. The sliders had two handles, which allowed the subjects to select the lower and upper points of ranges. The range of the sliders was 0-10. The maximum range allowed was 10 and the minimum range was 1 because the steps were integer values and the implementation imposed a constraint that the upper and lower endpoints could not be the same. Above each scale was a pictorial representations known as a self-assessment manikin [7] that aimed to illustrate the scale non-verbally. The overall structure of the Spanish survey was the same as the English one, but special care was required for the translation of the instructions and user interface elements. The first version of the translation was done by a proficient second-language Spanish speaker and later versions were corrected by native Spanish speakers. The subjects of the surveys were native speakers of Spanish with Mexican and Spanish backgrounds. Inthesurveys,eachsubjectwaspresentedwithaseriesofrandomizedstimulifromoneoftheemo- 48 tion vocabularies. The description of the stimuli regimen and other implementation details for the experimentscanbefoundin[56]forEnglishand[47]forSpanish. Linkstothesurveyscanbefoundat http://sail.usc.edu/~kazemzad/emotion_in_text_cgi/ . One final issue was deciding whether similarity or subsethood was best for our task and how to aggregate these metrics for three dimensions. Both similarity and subsethood can be used as an objective function to be maximized by translation. [83, Chapter 4] recommends using subsethood when the output is a classification and similarity if the input and output vocabularies are the same, but it was not immediately clear what would be preferable for our tasks, so we tested the different methods empirically. Since this is one of the first studies that uses fuzzy sets that range over more than one dimension, we tested several ways of combining the similarities and subsethoods of the individual scales using the average, product, and linguistic weighted average as described in Section 2.2.3. We also tried leaving dominance out as it is a distinguishing feature in only a few cases. The mapping from one vocabulary to another is done by choosing the word from the output vocabulary that has the highest similarity or subsethood with the input word. Here, similarity and subsethood are the aggregated scale-wise similarities and subsethoods for valence, activation, and dominance. We examined several different mappings. In [56], we examined mapping from the blog mood vocabulary to the more controlled categorical emotion vocabulary, which simulates the task of map- ping from a large, noisy vocabulary to a more controlled one. In this paper, we use mapping tasks that involved translation from Spanish to English to evaluate the estimated IT2 FSs. To empirically evaluate the performance of the mapping, we used a human translator to complete asimilarmappingtask. Weinstructedthetranslatortochoosethebestwordor,ifnecessary,two words from the output vocabulary that matched the input word. A predicted result was considered correct if it matched one of the output words chosen by the evaluator. We also use multidimensional scaling to visualize the derived emotion space. Multidimensional scaling is similar to principal component analysis except that it operates on a similarity matrix 49 instead of a data or covariance matrix. Since it operates directly on a similarity matrix, it is ideal for visualizing the results of aggregating the scale-wise similarities into a single similarity matrix. 2.3.3 Propositional Model (Model 2) We devised the second model to address the results obtained from model 1, described in Section 2.4, where we found that larger vocabulary sizes resulted in lower performance in the translation tasks. Our inspiration for the second model was that people can guess an object from a large, open-ended set by adaptively asking a sequence of questions, as in the game of twenty questions. The sequential questioning behavior thus motivated our representation and experimental design of the EMO20Q. The premiss of EMO20Q is that the twenty questions game is a way to elicit human knowledge about emotions and that the game can also be used to test the ability of computer agents to simulate knowledge about emotions. The experimental design of the EMO20Q game was proposed in [49] and since then we have collected data from over 100 human-human and over 300 human- computer EMO20Q games. In this paper we focus on follow-up experiments that aim to understand theanswersinthegameintermsoffuzzylogic. MoreinformationaboutEMO20Q,includingdemos, code, and data, can be found at http://sail.usc.edu/emo20q. Although the questions asked in the EMO20Q game are required to be yes-no questions, the answers are not just “yes” or “no”. Often the answer contains some expression of uncertainty. Here we focus on the fuzzy logical representation of answers to questions in the game. Just as the first model uses valence, activation, and dominance scales to represent emotions, the second model uses the questions from EMO20Q as scales that can be interpreted on axes that range from “yes” to “no”. In this case, the interval surveys we performed were not overtly about emotions, but rather to evaluate the answers on the scale from “no” to “yes”, which we defined as a domain for fuzzy sets that range from 0 to 100. Using data from EMO20Q games, we collected a set of questions and answers about emotions. We sampled a set of answers based on frequency of occurrence and how well the set covered the 50 space from affirmative to negative answers. We also included some control stimuli not observed in the data but included to provide insight on how people would interpret negation. For example, we included phrase groups like “certainly”, “not certainly” and “certainly not” that would allow us to calibrate how the subjects would interpret phrases that might have a logical interpretation. The final set of stimuli consisted of 99 answers. These were presented to subjects along with either a single or double handle slider. Below in Figure 2.5, we plot the responses for single sliders, which are easier to visualize than double sliders. In what follows, we present the double handle slider results, which form the input to the interval approach methodology described above. We conducted the interval approach survey on Amazon Mechanical Turk (AMT), an internet marketplace for crowd-sourcing tasks that can be completed online. The survey was conducted in sets of 30 stimuli to each of 137 subjects on AMT who were ostensibly English speakers from the U.S. The average amount of ratings per stimulus was 38.5. 2.4 Experimental Results In this section, we present the results of experiments that used the two models and the survey methodology described in Sections 2.2.4, 2.3.2, and 2.3.3 to estimate fuzzy set membership func- tions for the emotion vocabularies presented in Section 2.3.1, to calculate similarity and subsethood between emotion words as described in Section 2.2.3, and to map between different emotion vocab- ularies. 2.4.1 Valence, Activation, and Dominance Model (Model 1) Examplesofthemembershipfunctionsthatwerecalculatedfortheemotioncategoryvocabularycan be seen in Fig. 2.6. The distances between these membership functions and those of the blog moods vocabulary can be seen in Table 2.2, as calculated using the product of the individual scale-wise similarities as the aggregation method. In Fig. 2.7 we display the results of calculating a similarity 51 Figure 2.5: Fuzzy answers to yes/no questions obtained by presenting the answer phrase (x-axis labels) to users of Amazon Mechanical Turk, who responded by using a slider interface to indicate the truth-degree (y-axis). This plot was based on a single handle slider, in contrast to the interval approach surveys, in order to show an overview of the data. The results presented below are for the double handle slider and interval approach analysis. 52 0 5 10 0 0.5 1 Happy Val. 0 5 10 0 0.5 1 Happy Act. 0 5 10 0 0.5 1 Happy Dom. 0 5 10 0 0.5 1 Neutral Val. 0 5 10 0 0.5 1 Neutral Act. 0 5 10 0 0.5 1 Neutral Dom. 0 5 10 0 0.5 1 Angry Val. 0 5 10 0 0.5 1 Angry Act. 0 5 10 0 0.5 1 Angry Dom. 0 5 10 0 0.5 1 Sad Val. 0 5 10 0 0.5 1 Sad Act. 0 5 10 0 0.5 1 Sad Dom. Figure 2.6: Example membership functions (MF’s) calculated with the interval approach for happy, neutral, angry, and sad emotions. All the membership functions shown here, except the valence for neutral, are shoulder MF’s that model the edges of the domain of µ. The region between the upper and lower MF’s, the footprint of uncertainty, is shaded. The variables of Val., Act., and Dom. stand for valence, activation, and dominance. Table2.2: SimilarityBetweenwordsoftheBlogMoodsvocabularyandtheEmotionCategoryWord vocabulary. Angry Disgusted Fearful Happy Neutral Sad Surprised Amused 0.004 0.003 0.005 0.060 0.004 0.005 0.053 Tired 0.006 0.003 0.034 0.001 0.038 0.196 0.001 Cheerful 0.003 0.003 0.003 0.109 0.001 0.002 0.088 Bored 0.015 0.012 0.075 0.004 0.064 0.335 0.004 Accomplished 0.015 0.013 0.008 0.151 0.006 0.008 0.139 Sleepy 0.007 0.005 0.018 0.009 0.172 0.128 0.010 Content 0.005 0.004 0.007 0.044 0.015 0.012 0.040 Excited 0.015 0.017 0.006 0.255 0.002 0.002 0.213 Contemplative 0.006 0.004 0.012 0.006 0.161 0.075 0.007 Blah 0.014 0.010 0.049 0.005 0.166 0.359 0.007 Awake 0.020 0.017 0.016 0.061 0.015 0.014 0.068 Calm 0.003 0.002 0.011 0.007 0.137 0.069 0.008 Bouncy 0.009 0.012 0.002 0.361 0.000 0.001 0.311 Chipper 0.002 0.002 0.001 0.066 0.002 0.003 0.059 Annoyed 0.393 0.380 0.080 0.041 0.002 0.023 0.076 Confused 0.026 0.020 0.064 0.014 0.046 0.170 0.017 Busy 0.068 0.079 0.049 0.111 0.013 0.012 0.116 Sick 0.008 0.004 0.032 0.001 0.023 0.204 0.001 Anxious 0.207 0.181 0.091 0.028 0.003 0.025 0.038 Exhausted 0.015 0.011 0.048 0.003 0.046 0.298 0.004 Depressed 0.008 0.005 0.050 0.001 0.015 0.218 0.001 Curious 0.038 0.042 0.014 0.203 0.011 0.006 0.176 Drained 0.009 0.007 0.039 0.002 0.061 0.280 0.003 Aggravated 0.578 0.618 0.114 0.047 0.002 0.020 0.087 Ecstatic 0.000 0.000 0.000 0.108 0.000 0.000 0.117 Blank 0.006 0.004 0.017 0.005 0.133 0.137 0.006 Okay 0.016 0.013 0.035 0.017 0.076 0.057 0.020 Hungry 0.084 0.082 0.029 0.045 0.013 0.034 0.052 Hopeful 0.009 0.007 0.007 0.047 0.010 0.009 0.050 Cold 0.005 0.003 0.026 0.001 0.047 0.123 0.002 Creative 0.027 0.037 0.007 0.524 0.001 0.002 0.462 pissed_off 0.383 0.363 0.052 0.016 0.000 0.008 0.035 Good 0.004 0.003 0.004 0.067 0.005 0.006 0.060 Thoughtful 0.005 0.003 0.004 0.011 0.079 0.029 0.012 Frustrated 0.186 0.233 0.068 0.022 0.001 0.012 0.030 Cranky 0.325 0.351 0.099 0.045 0.002 0.022 0.060 Stressed 0.288 0.304 0.158 0.044 0.003 0.026 0.053 53 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 amused tired cheerful bored accomplished sleepy content excited contemplative blah awake calm bouncy chipper annoyed confused busy sick anxious exhausted depressed curious drained aggravated ecstatic blank okay hungry hopeful cold creative pissed_off good thoughtful frustrated cranky stressed angry disgusted fearful happy neutral sad surprised Component 1 Component 2 Multidimensional Scaling Plot of the Product of Distances Figure 2.7: Multidimensional Scaling (2-D) representation of the emotion words’ similarity. This visualizes when the similarity of the individual valence, activation, and dominance dimensions were combined by taking their product. The words in the categorical emotion vocabulary are marked in bold. 54 matrixbetweenthewordsofbothvocabulariesusingmultidimensional scaling (MDS)[15]. MDSisa statistical approach in the same family as principal components analysis (PCA) and factor analysis. We use MDS in this case because factor analysis has unwanted assumptions (namely, a multivari- ate normal distribution with linear relationships) and because PCA operates on feature vectors as opposed to similarity matrices (and also assumes linear relationships). We performed MDS on the aggregated similarity measurements to qualitatively visualize the emotion space as derived from the similarity matrix. The result of combining the similarities of the valence, activation, and dominance dimensions was slightly different using sum versus product aggregation. The sum aggregation pro- duced a more spread out distribution of the words in the space induced by MDS, while the product aggregation produced a space where the emotions are more tightly clustered. This was because the product aggregation method was less sensitive to small dissimilarities. The multidimensional scaling plotalsoallowsonetoseewhichemotionsarecloseandpotentiallyconfusable. Forexample,“happy” and “surprised” are very close, as are “angry” and “disgusted”. Since mapping between vocabularies, like MDS, is done using similarities, this implies that these pairs are confusable. Since the compo- nents derived from MDS are calculated algorithmically, they are not directly interpretable as in the case of factor analysis. To check the mapping induced by the similarity matrices, we show in Table 2.2 the similarity ma- trix for the product aggregation of the dimension-wise similarity measures of the valence, activation, and dominance scales. The location of the maximum of each row (bold) shows the final translation from the larger vocabulary (rows) to the smaller vocabulary (columns). The most glaring error is that “fearful” is not in the range of the mapping from large vocabulary to small vocabulary due to relatively low similarity to any word in the blog mood vocabulary. Cases where one would expect to have a mapping to “fearful” (e.g., “anxious”, “stressed”) do show elevated similarity to “fearful” but “angry” or “disgusted” are higher. The observation that most of the values in the “fearful” column are lower than the other columns, we normalized each column by its maximum value. Doing this does in fact produce the intuitive mapping of “anxious” and “stressed” to “fearful”, but also changed 55 Table 2.3: Similarity between Spanish and English emotion words. angry disgusted fearful happy neutral sad surprised aburrido 0.2284 0.2335 0.6370 0.1965 0.3196 0.4610 0.1230 agobiado 0.4762 0.5696 0.4611 0.3122 0.1495 0.2895 0.2175 agotado 0.2250 0.2344 0.4883 0.1425 0.4081 0.5135 0.1012 ansioso 0.4579 0.4748 0.2837 0.3655 0.2703 0.1728 0.3598 apenado 0.2915 0.2928 0.7711 0.3128 0.1219 0.4065 0.1211 asqueado 0.5445 0.5969 0.3885 0.4538 0.2045 0.2784 0.3199 asustado 0.4610 0.5324 0.3209 0.3508 0.2213 0.2141 0.3489 avergonzado 0.2701 0.2663 0.6345 0.2393 0.0660 0.4737 0.0713 cauteloso 0.0918 0.0957 0.5357 0.1848 0.3784 0.3126 0.0958 celoso 0.7396 0.6880 0.3335 0.1832 0.0515 0.2444 0.2390 cómodo 0.0436 0.0510 0.3363 0.3686 0.3963 0.3518 0.2240 confiado 0.2835 0.3307 0.2382 0.4753 0.1393 0.0562 0.2821 confundido 0.2488 0.2531 0.7690 0.2202 0.1286 0.4498 0.0878 culpable 0.3275 0.3445 0.7051 0.2916 0.1375 0.3921 0.1401 deprimido 0.2893 0.2914 0.5585 0.1529 0.3380 0.7058 0.0978 enamorado 0.4371 0.5611 0.0942 0.4572 0.1055 0.0351 0.5774 enojado 0.8732 0.7125 0.3596 0.1940 0.1054 0.2654 0.3494 esperanzado 0.0929 0.0987 0.4023 0.5903 0.1798 0.1625 0.3270 extático 0.3140 0.3108 0.0611 0.4305 0.1337 0.0268 0.7222 feliz 0.1329 0.1655 0.2293 0.6020 0.1796 0.0770 0.5046 frustrado 0.6414 0.7271 0.3003 0.3021 0.1677 0.3026 0.3337 histérico 0.6522 0.6566 0.2804 0.2340 0.1550 0.1874 0.4272 malicioso 0.3347 0.4270 0.3427 0.3540 0.2273 0.1322 0.2325 pasmado 0.3102 0.3480 0.3910 0.2544 0.1931 0.3231 0.2654 rabioso 0.5416 0.4616 0.2190 0.0945 0.0018 0.1402 0.3598 solitario 0.2657 0.2672 0.6091 0.0904 0.2549 0.5565 0.0396 sorprendido 0.3405 0.3803 0.1229 0.3336 0.1706 0.0746 0.3675 sospechoso 0.3026 0.3497 0.5129 0.3883 0.2084 0.2425 0.2900 timido 0.0844 0.0857 0.3925 0.1092 0.3578 0.4436 0.0515 triste 0.3376 0.3396 0.6502 0.1477 0.2389 0.5882 0.0852 other values. To better quantify the intuitive goodness of the mapping from one vocabulary to another, we undertook an evaluation based on human performance on the same mapping task. We found that at least one of the subject’s choices matched the predicted mapping except in the following five cases (i.e., performance of approximately 84%): “confused,” “busy,” “anxious,” “hungry,” and “hopeful.” Filtering out clearly non-emotion words like “hungry” may have improved the results here, but our aim was to use a possibly noisy large vocabulary, since the data came from the web. To see if the fuzzy logic approach agreed with a simpler approach, we converted the survey inter- val end-points to single points by taking the midpoints of the subjects’ intervals and then averaging across all subjects. As points in the 3-D emotion space, the mapping performance of Euclidean dis- tance was essentially the same as those determined by the fuzzy logic similarity measures. However, asimpleEuclideandistancemetriclosessomeofthetheoreticalbenefitswehavearguedfor,asit does not account for the shape of the membership functions and cannot account for subsethood. Based on the membership functions from the Spanish survey and the previous English surveys, we constructed similarity matrices between the Spanish words as input and the English words as output. The similarity matrix of the Spanish words and the Emotion Category Word vocabulary are 56 VSM Similarity Jaccard Similarity Subsethood Spanish to IEMOCAP Translation Performance 0.0 0.2 0.4 0.6 0.8 Sum/Avg Aggregation Product Aggregation Sum w/Valence and Activation Product w/Valence and Activation Linguistic Weighted Average Figure 2.8: Performance of translating from the Spanish emotion vocabulary to the categorical emotion vocabulary, which was the set of emotion labels used for annotating the IEMOCAP corpus [8]. shown in table 2.3. Overall, the best performance of 86.7% came from mapping from the Spanish vocabulary to the Emotion Category Word vocabulary using similarity (rather than subsethood), and aggregating the scale-wise similarities using the multiplicative product of the three scales. The performanceofmappingfromSpanishtotheBlogMoodvocabularywasworsethatwiththeEmotion Category Word vocabulary as output because the much larger size of the Blog Mood vocabulary resulted in more confusability. The best performance for this task was 50% using similarity and linguistic weighted average for aggregating the similarities. A comparison of the different similarity andaggregationmethodscanbeseeninFig. 2.8formappingfromSpanishtotheEmotionCategory Word vocabulary and Fig. 2.9 for mapping from Spanish to the Blog Moods vocabulary. 2.4.2 Propositional Model (Model 2) For the propositional model, we collected a set of 1228 question-answer pairs from 110 human-human EMO20Q matches, in which 71 unique emotion words were chosen. In these matches, the players 57 VSM Similarity Jaccard Similarity Subsethood Spanish to LiveJournal Translation Performance 0.0 0.1 0.2 0.3 0.4 0.5 Sum/Avg Aggregation Product Aggregation Sum w/Valence and Activation Product w/Valence and Activation Linguistic Weighted Average Figure2.9: PerformanceoftranslatingSpanishemotionwordstoliveJournalmoodlabels(colloquial emotion words). successfully guessed the other players’ emotion words in 85% of the matches, requiring on average 12 turns. In the set of question-answer pairs, there were 761 unique answer strings. We selected a set of 99 answers based on frequency of occurrence and how well the set covered the space from affirmative to negative answers. We used the interval approach to obtain fuzzy sets for the answers to yes/no questions. A sample of these are shown in Figure 2.10. To evaluate these, we determined the extent to which the medians from the single handle slider survey were a full or partial members in the fuzzy sets derived from interval approach’s double handle slider survey, which used different subjects but the same stimuli. We found that the IT2 FSs from the interval approach surveys corresponded well with the single-slider data. All of the estimated IT2 FSs except one contained the median of the single-slider values, i.e. 99%. This word, “NO!”, was a singleton IT2 FS at zero, while the median from the single slider was at one (on the scale from 0 to 100). The average value of the IT2 FS membership functions (which is an interval-valued range) at points corresponding to the median 58 Figure2.10: ExampleIT2FSscalculatedwiththeEnhancedintervalapproachforanswerstoyes/no questions. of the single-slider values was (0.41,0.84). To evaluate the enhanced interval approach (EIA), we found that the EIA-derived IT2 FSs performed nearly as well. The IT2 FSs contained all but two of the median single-slider (~98%) and the average membership of the median single-slider values was (0.12,0.89). Beyond these quantitative measurements, the membership functions from model 2 are quali- tatively tighter than those of model 1, especially with the enhanced interval approach. Though some of the membership functions span large portions of the domain, these are answers that signify uncertainty (such as “kind of”, “I think so”, and “perhaps” in Figure 2.10). This was in contrast to model 1, which more frequently resulted in broad membership functions with wide footprints of uncertainty. The data and code for the experiments of model 2 can be accessed at http://code.google.com/p/cwwfl/. 2.5 Discussion Variables that range over sets and functions rather than individual numbers are important devel- opments for modern mathematics. Further, variables that range over proofs, automata languages, 59 and programs further add to the richness of objects that can be represented with variables. This paper looked at expanding the domain of variables to include emotions. To model a seemingly non-mathematical object in such a way, we use fuzzy sets, another relatively new type of variable. This paper proposed two models for emotion variables, one that represented the meaning of emo- tion words on a three dimensional axis of valence, activation, and dominance, and another that represented emotions as a sparse vector of truth values over propositions about emotions. First, we examine the relative benefits and drawbacks of the two models we proposed: the first model based on valence, activation, and dominance scales, and the second model based on questions about emotions whose answers are rated on a scale from true to false. The first model captures intuitive gradations between emotions. For example, the relation of “ecstatic” and “happy” can be seen in their values on the scales: “ecstatic” will a subset of “happy” with valence and activation values more to the extreme periphery. Also, the scales used by the first model are language-independent, iconic representations of emotion, which enables researchers to use the same scales for multiple languages. However, for the first model, each word needs an interval survey on the three scales to calculate the membership function for the word, which is laborious and limits the model to words whose membership functions have been calculated already. Also, as we have seen, performance degrades with the size of the vocabulary. Some of the performance degradation can be expected due to the inherent difficulty of making a decision with more choices. However, limiting the representation to three scales does also limit the resolution and expressiveness of the model. The second model, on the other hand, gives a better resolution when there is a large number of emotions. With more emotions, more expressivity is needed than just valence, activation, and dom- inance. “Pride”, “vindication”, and “confidence” are examples of the emotion words from EMO20Q that are difficult to represent with only valence, activation, and dominance. These emotion words might have similar valence, activation, and dominance values, making it difficult to distinguish them on the basis of only the three scales alone. By representing emotions with propositions based on 60 questions from EMO20Q, we can use a single fuzzy scale for any arbitrary proposition. Once the scales are established the bulk of the data can be collected purely in natural language. Moreover, the propositional truth-value scale can be used for other domains besides emotions. However, with the second model, there is no clear way to compare emotions that were not asked the same set of questions. In the EMO20Q game, the questions are seen as they occur in the game. It will be necessary to collect more data outside of the game to make sure that all the prevalent questions are asked about each emotion. Even though we can use a single fuzzy scale for each proposition’s truth-value the set of all propositions about emotions is a vast open set, so data collection is still an issue. Since the propositions are based on a specific human language, the equivalence of different propositions in different languages is not as apparent as in the first model. There were several modifications that we made to the interval approach to make it more robust for when all intervals are discarded by the preprocessing. It was determined that the final removal of all intervals took place in the reasonable interval processing stage. The modification to the original interval approach involved keeping the intervals in this stage if all would have been removed. This had the effect of creating a very broad membership function with a lower membership function that was zero at all points. The enhanced interval approach improved the rejection of intervals in various stages by separately considering interval endpoint criteria and interval length criteria. For the first model, the enhanced interval approach yielded worse results when using the translation task as a evaluation metric. This was due to the narrower membership functions that the enhanced interval approachwasdesignedtoproduce. Inthecaseofsimilarityandsubsethoodcalculation,thenarrower membership function led to more zero entries in the calculation of similarity and subsethood. In the translationtask, thisresultedinalessrobusttranslationbecausesmallvariationsinthemembership functionwouldyieldadisproportionatechangeinsimilarityandsubsethoodvalues. Inthecaseofthe second model, where the fuzzy sets are used in a more traditional fashion, i.e., as propositional truth quantifiers, the enhanced interval approach did in fact yield membership functions that appeared to more tightly contain the single slider results and performed as well on the evaluation metric we 61 used for this task. The different models both use IT2 FSs, but beyond that, they present different approaches in the representation of emotion descriptions. This difference in approach and the resulting format of the model made it difficult to evaluate in the same way. For the first model, the fuzzy scales of valence, activation, and dominance are directly tied to the emotion representation and because the scales are non-linguistic in nature (they are labeled with a cartoon manikin), the cross-language translationtaskwasapossibleevaluationmetric. However,thefuzzyscalesusedinthesecondmodel are indirectly linked to emotions via linguistic propositions about emotions. Since the propositions about emotions are specific to a given language, the translation task is not directly facilitated by this model. From the comments given by the subjects of the survey, for model 1, we found that subjects reported confusion with the scale of dominance, despite the pictorial representation in the survey. For model 2, we found that the interpretation of linguistic truth values was a source of reflection for the subjects and this provided insight into the variation that may have otherwise been attributed to lack of cooperation on the part of the Amazon Mechanical Turkers. For example, the stimulus “definitely”, from a logical point of view would be assumed to be a strong “yes”. However, several Turkers mentioned that they realized that, when they use the word “definitely”, they do not mean “definitely” in the logical sense, but rather that the colloquial meaning is somewhat more relaxed. From the fuzzy set representation point of view, it may be advantageous to recognize distinct senses for the meaning of words and phrases. In the case mentioned, the word “definitely” could have colloquial sense and a logical sense. Another example of this was in the control phrases we used in the second model. For example “not certainly” was often confused with “certainly not”. This is not to say that all the Turkers were cooperative and took the time to understand the task, but it shows that there are many factors involved with measuring uncertainty. From Figure 2.5, we can see that the default value of the slider (in this case, a single slider at the middle of the scale) was a salient point of outliers. Modeling the effects of uncooperative users who may click through as quickly as 62 possible is one possible improvement that could be made to the interval approach from the data processing point of view. Our conclusion in comparing the two models is that for basic emotions the valence, activation, and dominance scales of model 1 would suffice. Examples of a use-case for the first model would be for converting a large, expressive set of emotion labels to a smaller set for the purpose of training astatisticalclassifier. However,fortheclassofallwordsusedtodescribeemotionsinnatural language, the representational power of first model’s valence, activation, and dominance scales is notsufficient. Tofullyunderstandwhatagivenemotionwordmeanstosomeone, ourworkindicates that the second model is a better model if the modeling goal is to represent a larger vocabulary and finer shades of meaning. 2.6 Conclusions In this chapter, we presented two models to represent the meaning of emotion words. We gave an explicit description of meaning in our models. The first model involved interpreting the emotion words as three-dimensional IT2 FSs on the dimensions of valence, activation, and dominance. This model allowed us to map between emotion vocabularies of different sizes and different languages. Themappingwasinducedbypickingthemostsimilarwordoftheoutputvocabularygiventheinput vocabulary word. The similarity used for this mapping was derived from similarity or subsethood measures of the individual dimensions that were aggregated into a single measure for each pair of input and output vocabulary words. We devised a second model that addresses the challenges that arisewhenthevocabularyofemotionwordsislarge. Insteadofthelowerdimensionalrepresentation in terms of valence, activation, and dominance scales, the second model used a high dimensional representation. The emotion words in the second model were represented in terms of answers to questions about emotions, as determined from data from the EMO20Q game. In the second model, IT2 FSs were used to represent the truth values of answers to questions about emotions. We found 63 that the second model was necessary to capture more highly nuanced meaning when the vocabulary of emotion words was large. This chapter looked at the meaning of words. In the next chapters, we will look at the meaning of arbitrary descriptions that occur in a dialog setting the the EMO20Q game. 64 Chapter 3 Scientific vs. Natural Language Description of Emotions 1 Words, or their inscription, unlike points, miles, classes, and the rest, are tangible ob- jects of the size so popular in the marketplace, where men of unlike conceptual schemes communicate at their best. -Willard Van Orman Quine, Word and Object One may wonder whether the theories of a scientist are fundamentally different than the beliefs and opinions of a non-scientist, or whether they are fundamentally the same. The questions a scientist mightaskaboutanewphenomenonareformedbytheirpriortheories. Similarly, anordinaryperson will ask questions about something unknown based on their prior beliefs. In the quote above, Quine affirms the role of language to facilitate communication despite the fact that the communicants may not have the same conceptual system. In new fields, like the study of emotions, there may be several different theories that coexist in the scientific community. As these theories are mediated by the language used to formulate the theories, it is useful to look at language and to compare the language 1 This chapter is based on work published in [49, 51]. 65 of scientists and lay speakers. This chapter looks at modeling the beliefs of ordinary people as formal theories for the purpose of studying natural language descriptions of emotions. This type of study falls under the umbrella terms of commonsense knowledge or folk ontology [35, 105]. In the proposal of the dissertation topic, we contrasted the scientific approach to describing emotions with a commonsense knowledge of emotions. This commonsense view, we argued, is useful for studying the non-scientific meaning of emotion words as they are used in social contexts. However, upon more thought and examination of the topic, we now focus on the commonalities. The study of emotions is relatively recent and as of yet, there are still lack of concensus on some fundamental terms. Thus, some of the same issues that arise in studying emotion description from a natural language perspective also come up when analyzing a more scientific approach. The main insight that led to this change of our position was the commonalities in the fundamental notions of making a scientific hypothesis and asking a question in natural language. The process of asking a question occurs in a scientists formulation of ahypothesisaswellasinachildinthequestion-askingperiodofrapidlanguagelearning. One function of natural language is to reference objects, whether real or virtual, using patterns of sound or writing. This system of reference provided by natural language is mainly socially- constructed (as opposed to innate). The names of things are established by the conventions of shared tradition and consensus of a community of speakers, e.g., speakers of a given language or workers in a specific profession. In some such communities, such as a scientific field, the process of socially constructing a terminology is deliberate, systematic, and precise; in other communities it is not. For example, the world-wide community of English speakers has no editorial board or royal academy to arbitrate the correct usage of words. Though it may seem serendipitous that such a loosely regulated system can function as well as it does, it is in fact due to this loose organization that natural language achieves it’s flexibility and scalability. On the other hand, general natural language lacks the precision and conciseness that is necessary for specific purposes, such as scientific discussion. 66 Thelinguisticdivisionoflaborbetweeneverydaylanguageandtechnicaljargonisallveryfineand wellwhenthetechnicalterminologyisdisjointfromcommonlanguage,butitcanbecomeproblematic when there is considerable overlap. The field of affective computing is one such example; the scientific terms for emotions overlap with common linguistic usage. Since one of the aims of affective computing is to process natural language, it is necessary to be mindful of the difference between the scientific and common definition of emotional terms. Much of the previous work in affective computinghasfocusedonthescientificdefinitionofemotionalterms. Scientificdefinitionsmusthave a high degree of similarity with the common definitions for them to be meaningful, but the scientific definitions are necessarily limited and may not correspond to everyday usage. This distinction can be seen as a prescriptive versus descriptive focus. However, currently there is a movement towards studyingmorenaturalisticandnon-prototypicalemotionexpression[17,88], whichmaybenefitfrom acorrespondingmovementtowardsusingthemorenatural,socially-definedterminologicalbasisthat weaimtodiscoverintheresearchdescribedbythispaper. InthewordsofOsgoodetal., “wewishto find a kind of measurable activity or behavior of sign-using organisms which is maximally dependent on and sensitive to meaningful states, and minimally dependent on other variables” [89, p. 10]. The semantic differential studies of [89] pioneered the use of gradient scales to represent the meaning of emotions words. In this dissertation, we extend the scale-based representation to include fuzzy scales and we develop the EMO20Q game as an experimental instruments for objectively measuring the (possibly subjective) social understanding of emotion words. To move from the scientific theory of emotions to a theory that explicitly accounts for subjectivity and socialconstruction, wemadeuseof crowd-sourcing [44]and games with a purpose [112]. Crowd- sourcing aims to gather the collective knowledge of a group and is closely tied to the emergent properties of online social communities and web intelligence [126]. Games with a purpose are a particular way of crowd-sourcing. In our case, we devised a game that we call Emotion Twenty Questions (EMO20Q). This game is similar to the ordinary Twenty Questions game except that it is limited to guessing about emotion words. Furthermore, it is implemented as an online chat 67 application using the Extensible Messaging and Presence Protocol (XMPP) so that the games can berecordedandstudied. Astheintroductoryquotealludes,wecarefullyaimtoobservehowpeople who may differ in beliefs can communicate and reach shared understanding. Fundamental to this is the question-asking process. 3.1 EMO20Q, Crowd-Sourcing, and Experimental Design By relying on the wisdom of the masses, we venture a simple answer to the difficult question, “what is emotion?”. The answer, according to crowd-sourcing, is that emotion is what people say it is. Although this answer side-steps many important issues, such as physiological and psychological descriptions of emotions, it does bring other issues into sharper focus. Currently, there is a move- ment toward studying non-prototypical emotional data [88]. Non-prototypical emotional data is exemplified by disagreement among annotators when assigning emotional labels to data. We ar- gue that EMO20Q provides a crowd-sourced description of emotions that can effectively deal with non-prototypical emotions. To avoid falling into the ad populem logical fallacy, we formulate the answer to the question “what is emotion?” not as a question of truth, but a question of knowledge and belief, i.e., an issue of epistemology. This skirts the question of ground truth, but raises other interesting questions: “what do people believe about emotions, how do they express these beliefs in language, and how do they justify their knowledge through question-asking behavior?” When considering people’s beliefs about emotions from the point of view of an annotation task, some of the disagreement between annotators with respect to non-prototypical emotion data can actually be seen as an artifact of being forced to choose from a set of “basic” emotions. Moreover, annotation tasks typically ask annotators to assign natural language descriptors to data without the contextofnaturallanguageinteraction,i.e. withouttheimpliedcommunicativegoalsthatareshared with another interlocutor. When the annotation task is set up as a forced choice between one of severallabels,itisobviousthatanyemotionaldatathatdoesnotfitintothepreassignedlabelswillbe 68 seenasnon-prototypical. However,morefundamentally,evenwhenthereisanopenchoiceofemotion labels, the task of annotation is divorced from the full context of natural language interaction, so agreement is defined as a hit-or-miss event. For this reason, many annotation methodologies use a manual or training (e.g. ADOS for autism diagnosis [74]) to establish the context for the shared meaning of the annotation vocabulary. However, in the case of annotating emotions, the technical vocabularyoverlapswiththenaturallanguageterms, sothereisthepotentialforambiguitybetween the meaning established by the annotation standards and the meaning of everyday language [49]. ThegameofEMO20Qisaneffectiveexperimentalmethodologytoelicitlanguageinwhichpeople express their beliefs about emotions in an interactive context. In terms of the utility of EMO20Q as an experimental design, we distinguish the human-human version of the game, which is played by twohumanplayers, andthehuman-computerversionofthegame, inwhichoneplayerisacomputer. In terms of experimental design, the human-human EMO20Q is a quasi-experiment or natural experiment,asopposedtoa controlled experiment, which means that there is not a manipulation of variables made by the experimenters, but rather that the these variables are observed as they vary naturally within the system. Much past work [117, 89, 90, 102, 56, 47, 103] has focused on controlledexperimentsforstudyingemotionallanguage,usuallyelicitingresponsesfromsubjectswho are presented with words as stimuli. However, the stimuli are predetermined and the responses are constrained, often as a Likert scale or prescribed set of emotion categories. In EMO20Q, potentially any word can be chosen as an emotion word and any question can be asked of it; it is only limited by the game’s rules, the subjects’ good-faith sportsmanship and the extent to which that one player’s judgements about emotions overlaps with the other player’s judgements. We argue that in contrast to controlled experiments, EMO20Q can be presumed to have higher experimental validity and less experimental bias. The dialog context of the game is closer to natural language interaction than elicitation in repeated trials. Also players/subjects of EMO20Q are less constrained by the elicitation methodology and, all other factors being equal, we can assume that their honesty and cooperation is comparable to elicitation experiments, or perhaps improved due to 69 the presence of the other player and the players’ shared communicative goals. There is, however, is the possibility of experimental biases in EMO20Q due to interactions between players, which we did observe in longitudinal experiments where two players played repeatedly. One drawback of the experimental methodology of EMO20Q is that it is hard to quantify re- liability in the unconstrained interactions of EMO20Q. Reliability can be measured in the amount of agreement between subjects, but this can be difficult because we do not force subjects to pick any particular words (in the human-human experiments), so the words that are in common between users are determined by chance and hence sparse. Determining an appropriate sample size as well as methods to deal with sparsity is an thus a critical issue that we explore in this paper but do not solve. Another advantage of the EMO20Q methodology is the potential to provide more experimental sensitivity. Receiving stimuli and giving responses using the same modality, natural language, has the potential to be much more sensitive than Likert scales or restricting user input to fixed choices. This is because we can assume that natural language has the capabilities of expressing most, if not all, of the nuanced distinctions between emotions. Even in cases where one is literally “at a loss for words”, there are natural language descriptions, like the quoted phrase, that people use to approximate such an emotion. One exception where the natural language modality could be less sensitive is in the case of children and non-native speakers. In these cases, we can imagine less fluent subjects who have conceptual distinctions in their beliefs about emotions that they are not able to verbally express without the aid of elicitation. The methodological utility of the natural language modality in EMO20Q can be seen in the productivity,ubiquity, andsocialaspectsoflanguage, aswellastherelationtoengineeringsolutions, such as natural language processing. From the perspective of natural language processing, the EMO20Q game experiment can be seen as a Wizard of Oz experiment that can be used to collect human behavior that can be used to train the behavior an automated agent. Games like EMO20Q can be seen as games with a purpose [112] whose purpose is crowd-sourcing [44] the collective 70 knowledge and beliefs of the players [49]. The phenomenon of crowd-sourcing is closely tied to the emergent properties of online social communities [126]. The human-computer version of EMO20Q offers more possibilities for experimental control. In Section 2.3, we explore the possiblity of controlling the EMO20Q game using a computer questioner agent using a sequential Bayesian update algorithm. 3.2 EMO20Q 3.2.1 Rules In the game of Emotion Twenty Questions (EMO20Q) there are two players who interact using natural language. In Section 4, we look at EMO20Q played between two humans and in Section 5 we look at the case of EMO20Q played between humans and computers. A game consists of a series of matches;ineachmatchoneoftheplayersassumestheroleof answerer and the other, the role of questioner. At the beginning of the match, the answerer picks an emotion word, unbeknownst to the questioner, which we call the emotion in question. The emotion in question need not be the emotion of answerer’s current state and, importantly, the answerer is not forced to choose from a prescribed list of emotions. Once the answerer has picked an emotion, the questioner has twenty turns to guess the emotion in question, where a turn is a question posed by the questioner in natural language followed by an answer by the answerer. If the identity of the emotion in question is guessed in twenty or fewer turns, the questioner wins. On the other hand, if the emotion is still unknown to the questioner after twenty turns, then the answerer wins. Although questioning and answering are the two basic plays of EMO20Q, there are other types of gameactions thatwould bestdescribed asdialogacts. Forexample, aplayermayaskaboutrules, give directions, give hints, make clarifications, give-up/resign, or engage in small-talk. Also, as in other games, one player may cheer or jeer the other player to express sportsmanship or competitive- ness. At the end of a match, there is usually a recapitulation of especially salient turns and at that 71 time the questioner may debate the answers given or whether the emotion in question is actually an emotion. This end-game behavior can be useful for determining fringe cases of words that may or may not be actual emotions and can identify erroneous answers. There were some rules that were proposed in hindsight after our first pilot tournament. One particular rule is that synonyms should be counted as a correct, game-ending guess if the answerer cannot come up with a distinguishing characteristic (e.g., “brave” and “courageous”, and “awe” and “amazement”). As these additional rules may improve the gameplay, the EMO20Q rules committee is considering them for future seasons. 3.2.2 Motivation The game of EMO20Q serves to address several issues in experimental methodology that arise when considering theories of emotion. First, this theory seeks to understand emotional behavior from the level of the individual player/subject. In Sect. 3.3 we formulate a general theory of emotions as an aggregation of the personal theories of each player. The questioner in EMO20Q can be seen as testing a hypothesis given his or her personal theory, and the person-independent theory can be seen as the union of the theories of the individual person-specific theories, with contradictions thrown out. Another motivation of the EMO20Q game is that it allows for spontaneous natural language behavior while constraining the language to the targeted domain we wish to study. Much past work [117,89,90,102,56,47]hasfocusedonelicitingresponsesfromsubjectswhoarepresentedwithwords as stimuli. However, the stimuli are predetermined and the responses are constrained, often as a Likertscaleorprescribedsetofemotioncategories. InEMO20Q,potentiallyanywordcanbepicked as an emotion word and any question can be asked of it; it is only limited by the game’s rules and the subject’s good-faith judgment. Thus, in contrast with purely elicited experiments, EMO20Qcan be presumed to have higher experimental validity and less experimental bias. The player/subjects of EMO20Q are less constrained by the elicitation methodology and we can assume that their honesty 72 and cooperation is comparable to elicitation experiments, or improved due to the supervision of the other player and their shared communicative goals. There is the possibility of experimental effects due to interactions between players. Thus we can assume to have less experimental effects than in experiments with guided/elicited responses due to the bias from using a fixed set of stimuli and responses. One drawback of this approach is that it will hard to quantify reliability in the unconstrained interactions of EMO20Q. Reliability can be measured in the amount of agreement between subjects, but this can be difficult because we do not force subjects to pick any particular words, so the words that are in common between users are determined by chance and hence sparse. An alternative way to measure reliability could be done in offline question-answering of specific question-emotion pairs, which could be repeated across subjects. TheexperimentalmethodologyofEMO20Qhasotheradvantages. Receivingstimulusandgiving response using the same modality, natural language, has the potential to be more sensitive than Likert scales or restricting user input to fixed choices. This is because we can assume that natural language has the capabilities of expressing most, if not all, of thecommunicable distinctions between emotions. Even in cases where one is literally “at a loss for words”, there are descriptions, like the quoted phrase, that people use to approximate such an emotion. One exception where the natural language modality could be less sensitive is in the case of less fluent subject/players, such as children and non-native speakers. In this case, we can imagine subjects who have conceptual distinctions in theirtheoriesofemotionsthattheyarenotabletoverballyexpresswithouttheaidofelicitation. The utility of the natural language modality can be seen in the productive, social aspects, and ubiquity of language, as well as the relation to engineering solutions, such as natural language processing. 73 3.3 Constructing a Theory from EMO20Q As described in the introduction, our first model from the proposal looked at building a person specific theory from the data collected from the EMO20Q game . This model formed the basis of the graph-based model and the sequential Bayesian model. However, the theoretical developments after the quals led us to refine this definition of truth in a model by considering epistemology. OurapproachtomodelthemeaningofemotiondescriptionsfromEMO20Qusesastripped-down definition of the term theory from mathematical logic which states that a theory is simply a set of sentences in some language L that is true of a model M [22]. In the case of EMO20Q, is the set of questions that were answered with “yes” and negations of the questions that were answered with “no” for a given emotion, L is the language of propositional logic, and M is a mental model of emotions, which we assume is only accessible through natural language communication. In this view, each question in EMO20Q can be represented as a proposition p2P that can be judged true or false of a given emotion "2 E by player i 2 I according to that player’s mental model, M ",i , which we assume is unobservable but inferrable through communicative behavior. Assuming now that player i is the answerer who has just been asked question p,wecansaythat |= M",i p if the player answers “yes” or |= M",i ¬p if the player answers “no”. The previous notation is read “p/¬p is satisfied by M ",i ”, or equivalently “M ",i is a model of p/¬p”. Atheoryforaspecificemotion "and a specific playeri,denoted ",i ,isconstructedasaBoolean vector of length |P|, where |P| is the total number of questions. For every question p n asked of player i,the n-th position of the vector ",i will be true or 1 if the player has answered yes to p n when the emotion in question was ". In this case we can say that p n is a theorem of ",i .Similarly, false or 0 is assigned to elementn of ",i ifp n received no as an answer while being questioned while " was the emotion in question. In this case, ¬p n is a theorem of . If a contradiction is reached, i.e., if both p n and ¬p n are members of the set ",i , then both are removed. In practice, this has not happened and we assume that it will not because a player is consistent with him or herself in 74 normal situations, so such a contradiction would be a warning flag that a player may not be playing consistently. One can proceed in a similar manner for a person-independent theory " of a particular emotion ". In this case, the proposition p n associated with a particular question is added to " if any player has answered yes to p n when questioned about emotion ",andconverselyfor ¬p n when any player has answered no. In the case of a person-independent theory, generally one can expect some contradictions and these would signify when there is disagreement about whether such a proposition is true or false of that particular emotion. In this case, as before for the person-specific case, both propositions should be removed from the theory to prevent contradictions. If a theory for a specific emotion can be seen as a long list of propositions that are true of it, the theory of a set of emotions can be seen as an matrix indexed by the emotions in one dimension and the questions in the other dimension. If the theory contains emotions " m for 1 m| E| and propositions p n for 1 n| P|,then will be an |E|⇥| P| matrix. Ordinarily, Boolean algebra would dictate that this matrix would consist of ones and zeros. Such a representation has been explored under the aegis of formal concept analysis [30]. However, we need the matrix to be sparse to represent the fact that not all of the combinations of questions and emotions have been encountered and that there may be some contradiction among subjects. To this end, we propose that the matrix be a (1,0, 1)-matrix, or a signed matrix/graph [67], where 1 indicates that the proposition of column-m is true for the emotion of row-n, 1 indicates that it is false, and 0 indicates that it has not been seen or that a contradiction has been encountered. 3.4 From Theory to Epistemology In [49] we described the construction of a person-specific theories of emotions based on data from EMO20Q, inspired by the definition of a theory from mathematical logic, where a theory is defined as the set of propositions in some language that is true of a model. In the view proposed in 75 [49], a person’s verbal intelligence was represented as a truth-table model that determined whether natural language statements about emotions were true or false with respect to a given emotion and particular person. The person-specific truth-table that was proposed in [49] can be thought of a representation of a person’s knowledge or beliefs. However, when viewed as belief, a person-specific theory of emotions is more appropriately considered an issue of epistemology rather than formal logic. Moving from a logical theory to one based on epistemology is the theoretical contribution of thispaper. Thisshiftinfocusfromlogicaltruthtablestoepistemologywasprecededbydevelopment of a model for belief based on sequential Bayesian update [52]. Epistemology is the branch of philosophy that deals with knowledge. According to basic results inepistemologythatdatebacktoPlato,knowledgeisdefinedastrue,justifiedbelief. Asanexample, one can belief in God, and God can actually exist, but one cannot call this true belief “knowledge” unless it is justified in some way. In this view, we shift our focus from truth of statements in a model [49] and belief [52], to the process of justification of beliefs. The particular way in which we try to observe how people justify their knowledge about emotions is inspired by the Socratic method of asking questions, which is why we call it a Socratic epistemology. The question-asking process can be seen as central to how people understand and describe the world. Question-asking allow people to learn about things without directly experiencing them. Hypothetically, the question-asking period in children is a mechanism that accounts for a child’s rapid learning and language acquisition [27]. The logician Charles S. Peirce identified three types of thought processes by which a person can acquire knowledge: deduction, induction, and hypothesis [93]. The third of these, also known as abduction [19], has been compared with the Socratic method of questioning [40]. The precise point that humans or their forebears began to use language to name and describe emotions most likely predates writing and certainly predates modern science. However, the early writings quoted below show that description of emotions using language goes back to antiquity. Similarly, attempts to define this language via question-asking can also be traced back to ancient 76 times: [Gorgias:] Just as different drugs draw forth different humors from the body – some putting a stop to disease, others to life – so too with words: some cause pain, others joy, some strike fear, some stir the audience to boldness, some benumb and bewitch the soul with evil persuasion” (Gorgias, Encomium of Helen,c.415B.C.). Socrates: You, Gorgias, like myself, have had great experience of disputations, and you musthave observed, Ithink, thattheydonotalways terminate in mutualedification, or in the definition by either party of the subjects which they are discussing;...Now if you are one of my sort, I should like to cross-examine you, but if not I will let you alone. And what is my sort? you will ask. I am one of those who are very willing to be refuted if I say anything which is not true, and very willing to refute any one else who says what is not true, and quite as ready to be refuted as to refute. (Plato, Gorgias,Transl. Benjamin Jowett, 380 B.C.) In the first quote above, Gorgias, a Sophist rhetorician, describes the effects of words on a person’s emotions. Gorgias uses his rhetorical skills to describe emotions by making reference to the theory of physiological humors. Humankind’s conception of emotions has changed since the time of these ancients. These ancient Greeks believed that emotions were generated from bodily “humors”, which in turn were derived from alchemical elements. However, our conception of emotion is still largely expressible through language. In the second quote, Socrates (as quoted by Plato) cross-examines Gorgias to determine Gorgias’ beliefs. Socrates applied his method of inquiry to examine concepts that seem to lack any concrete definition, in particular some of the complex moral and psychological concepts of his time like “justice”, “knowledge”, “piety”, “temperance”, and “love”. We claim that this method of inquiry can shed light on how people define emotional concepts, which also seem to defy concrete definition. By examining the data from the EMO20Q game in light of Socratic epistemology, we not only 77 aim to collect an increasing list of propositions that are true or false of a given emotion for a particular person as in [49], but also we want to observe the ordering of the questions. This shows aconceptualcontinuitythatreflectsaperson’sjustificationofbeliefandtherelationsofsimilarity and subsethood between emotion concepts for that person [57]. Moreover, while some questions are better than others and can be evaluated on a per-question basis [53], questioning can also be evaluated at the level of a policy [45] or strategy. Although Gorgias’ description of emotion is based on the scientific theories of his time, it is a plausible hypothesis that is more different in content than in form from our current scientific hypotheses. 3.5 Conclusion In this chapter, we proposed the EMO20Q game as an experimental methodology to collect natural language descriptions of emotions. These descriptions were extracted from the questions asked in the game and were used to construct player-specific theories of emotions. Although we made the distinction of scientific-versus-natural-language descriptions of emotions, our approach can be used to scientifically study the social behavior of communicative agents, whether they be humans or computers. These agents may have different knowledge, memory, and experience that they draw upon when conveying emotional information to each other. We think that automating a computer player of EMO20Q would pose many interesting challenges andopportunities. Theultimatetestofwhetherthetheoriescollectedwithourmethodologyarevalid isiftheycanenableaprogramtosimulateahumanplayer, asinaTuringtest. Fromourexperience, it seems that automating the questioner role would be easier than automating the answerer role. The main challenge to automating a player of EMO20Q is robust natural language understanding. The questioner would need to understand answers to yes/no questions, but the answerer would need to understand the larger set of questions that can be asked of emotions. Initial work on designing aquestioneragentisdescribedin[53]. Forautomatingtheanswererrole,thecomputerwouldneed 78 a way to understand a wide variety of questions. We have undertaken previous research about how to answer questions about dimensional representations for emotions and similarity and subsethood judgments[56,47,57],which,accordingtoourpilotdata,makeupnearly25%ofthetotalquestions. Questions about the unknown emotion’s identity accounted for a significant portion (42%) of the questions we observed. These emotion identity questions could be understood by template matching questions in the form of “is it X?” or simply “X?”. However, the remaining question types in Tab. 4.4 will be relatively more difficult because they require real world knowledge. Another area of further research opportunities is to study the implications for processing done by the brain when playing EMO20Q. Recent research has found connections between episodic mem- ory and emotions (fear, in particular), realized as interactions between the hippocampus and the amygdala [94]. In contrast, this work focuses on a mixture semantic and episodic memory and a wider variety of emotions, so it is an open question whether similar neural processing is at work. In conclusion, this experiment examined how one can collect a theory of emotions from rela- tively unconstrained natural language interaction that is focused to the domain of emotions using the EMO20Q game. Our pilot data led to descriptive results that can inform practitioners of affec- tive computing about how people understand emotions and communicate about them with natural language. 79 Chapter 4 Natural Language Description of Emotions in Humans In learning words, we have to learn to send and to receive. We have imagined a child learning to send ’Mama’, and also to parrot the word on hearing it, but we have not con- sidered intelligent hearing. What would count as intelligent response to the heard word ’Mama’, and be observable enough for onlookers to appreciate and reinforce? Prompted assent is no game for such small children. Perhaps something rather like this: the child hears ’Mama’ (say from the father) while sensing the mother in the periphery of his visual field, and then turns visibly on the mother. [...] It is the same old pattern of rein- forcement, only this time the child’s initial act is that of turning rather than babbling. Turning toward the mother while happening to hear ’Mama’, the child is applauded and so is confirmed in the routine. But learning thus to face a named object does not have to await the whims of operant behavior; for the child can be directed. W.V.O. Quine, Word and Object [95] The introductory quote provides a Skinnerian view for learning language, a behavioral view that 80 focuses on stimulus response and reward-based learning. It also alludes to the “game of prompted assent”. In this chapter, we consider how playing a game of prompted assent can provide an experi- mental paradigm for a computer to observe the language that humans use to talk about emotions. Being able to understand language about emotions, i.e. natural language description of emotion, as opposed to the general class of language that expresses emotions, is an important human ability to talk about emotions referentially. Thus we ask the question “what words and phrases do people use to describe emotion in natural language?” rather than studying how they express emotions in language. To see the difference between expression and description, imagine an algorithm that looks at lexical choice in classifying a user’s emotion. If the word “happy” is observed, it would most likely be correlated with the user expressing happiness, except when the user is applying the word “happy” to other people. In this case, we say that the usage of “happy” is descriptive, rather than expressive. To be sure, there are cases where this distinction is blurred, especially when the emotion words are adjectives, but in the form of abstract nouns, such as “happiness” and “frustration”, the descriptive usage is clearer. We take the position that the meaning of natural language descriptions of emotion must be defined in a way that can account for a system of heterogeneous agents that can communicate symbolically in a dialog setting similar to the the game of prompted assent mentioned in the quote. We call this game Emotion Twenty Questions, or EMO20Q for short. In this game, one player picks an emotion word and the other player tries to guess by asking twenty or fewer questions. There is no preset list of emotion words because we aim for a descriptive, rather than prescriptive, model. In previous research we have formulated this game, experimented on humans playing this game via text chat, created software agents that can play this game, and observed basic performance characteristics of both humans and software agents. In this paper, describe this work and show how we can use the data that we have collected to generate natural language descriptions of emotions. This chapter is organized as follows. In the rest of this introduction, we discuss related work and thegeneraltopicsthatcontextualizeourwork. Thefollowingtwosectionspresentthehuman-human 81 results of our experiments using EMO20Q. 4.1 NaturalLanguageDescriptionofEmotionFromEMO20Q Focusing on natural language descriptions of emotion highlights the human ability to describe emo- tions as unique individual entities using language. It is useful to differentiate natural language description of emotion from other types of verbal expression of emotion [29] because doing so high- lights the ability to externalize and communicate the idea of emotions as individual entities. It is a open question whether the emotions that we call “angry”, “happy”, “sad”, etc. are unique, biolog- ically speaking [5]. For example, even a basic emotion like anger may not be reducible to a single, unique biological response [106, ch. 10]. However, when talking about emotions informally, or even when annotating emotional data in controlled settings, we talk about individual emotions as unique entities. The uniqueness of reference established by certain natural language expressions is known as definite description [100], but in the case of descriptions of emotions, the reference or denotation of these expressions is not clear. The initial difficulty in defining expressions like “happiness”, “vindication”, “a negative emotion” is in some respects mirrors the difficulty in defining a word like “game” [119]. The difficulty is that it appears that what a person means by the word “game” is more than exhaustively listing a set of games that he or she has seen in the past (that is, the extensional meaning). Similarly, we assume that the meaning of emotion words used as labels of data mean more than the data they are applied to. We use feature representations and machine learning techniques to generalize from the observed training data to make predictions about unseen data. Hypothetically, at some point scientists may discover an underlying neurobiological basis for the emotions that we talk about informally and, in this case, we may say that this neurobiological basis is what the natural language terms for emotions refer to. However, we claim that when people use emotion words informally, these words refer to theoretical objects [11, 26], or more colloquially “concepts”, that are defined in relation to other 82 words and expressions and that this meaning is arbitrated by conversational interaction that can be modeled as a game. We use the EMO20Q game experiment to define this conceptual level of theoretical objects by making reference to observed behavior. This framework for meaning is implicit in the distributional hypothesis which forms the basic assumptions in statistical natural language processing, but the game metaphor focuses on the interactive use of language, rather than a document-based approach. Though the game-theoretic formulation of the statistical usage patterns of language goes back to Shannon’s language game, more recently the ability to gather interactive usage data on the World Wide Web has made statistical learning from from games a possibility. Such games [112, 113], called games with a purpose, try to harness the interactive patterns of human behavior in order to do tasks that are difficult for computers. At a basic level, these games-with-a-purpose can be seen as simply awaytocollecttrainingdataformachinelearning. However,atadeeperlevel,onecansurmisethat making a computer more intelligent should also include making the computer a skilled, interactive game player, rather than simply a way to collect data. 4.2 Human-Human Data from EMO20Q We collected a total of 110 matches from 25 players in the human-human EMO20Q experiments. On average this is about 8 matches per player. However, we had two main types of subject: volunteers and paid participants. Most volunteers played two matches per player, one match per role, so two matches per subject was the median. The paid participants were part of a longitudinal experiment that aimed to see the effect of playing the game with each other over a longer period of time, which resulted in a maximum of 57 matches for the two longitudinal experiments. The EMO20Q experiment was is implemented as an online chat application using the Extensible Messaging and Presence Protocol (XMPP) so that the games can be easily recorded and studied. Early in our pilot studies, we realized that it was difficult to successfully terminate the game 83 when the questioner guessed words that were synonyms of the word the answerer picked. This led us to treat the phenomenon of synonyms with an additional rule that allowed the game to terminate if the answerer could not verbally explain any difference between the two words. In this case, we considered the game to terminate successfully, but we flagged these matches and kept track of both words. Sincethesurfaceformsofthequestionsvarywidely,weusedmanualpreprocessingtostandardize the questions to a logical form that is invariant to wording. This logical form converted the surface forms to a pseudo-code language with a controlled vocabulary by converting the emotion names to nouns if possible, standardizing attributes of emotions and the relations of emotions to situations and events. Examples of the standardized questions are shown in Tab. 4.3. After this semantic standardization, there were a total of 727 question types. To get a better idea of the relative frequencies of general types of questions, we made the fol- lowing high level characterization of the questions using the following broad categories: identity questions (guessing an emotion), attribute questions (asking about dimensional attribute like va- lence or activation), similarity/subsethood questions (asking if the emotion in question is similar to or a type of another emotion), situational questions (questions that ask about specific situations associated with the emotion in question), behavior questions (questions that are asked about the behavior associated with the emotion in question), causal questions (questions about the cause, effect, or dependency of the emotion in question), social questions (questions asking about other parties involved in the emotion–this overlaps somewhat with causal questions and situational ques- tions), miscellaneous questions (questions that defied classification or had categories with too few examples). Some examples of these categories are given in Tab. 4.4. 4.2.1 Successful Game Outcome Rate Ofthe110matchesplayedbetweenhumanplayers,94–approximately85%–terminatedsuccessfully with the questioner correctly identifying the emotion that the answerer picked or a word that the 84 answerer felt was a synonym. The mean and median number of questions asked per game was 12.0 and 10, respectively, when failures to correctly guess the emotion were averaged in as 20 questions. Of the 94 successfully terminated matches, 22 terminated with synonyms. The 16 unsucessfully terminated matches that were considered failures consisted of several distinct cases. The questioner player could give up early if they had no clue (5/16), they could give up at twenty questions (1/16), or they could pass twenty questions due to losing count or as a matter of pride (6/16). The four remaining cases were considered failures because the answerer inadvertently gave away the answer due to a typing error or giving an unnecessarily generous hint. These four cases were all in the longitudinal experiment. 4.2.2 Emotions There were unique 71 words that players chose in the human-human games, 61 of which were correctly identified. These are listed in Table 4.1. 4.2.3 Questions There was a total of 1228 question-asking events. Of the questions, 1102 are unique (1054 after normalizing the questions for punctuation and case). In Table 4.2 we list some of the questions that occurred more than once. In preliminary experiments, we collected a total of 26 matches from 12 players. Since each match hastwoplayers,thisaveragedabout4matchesperplayer. Thenumberofmatchesplayedbyaplayer ranged from 2 to 12. The mean and median number of questions was 12.04 and 15.5, respectively, when failures to correctly guess the emotion were averaged in as 20 questions. In the data, a total of 23 unique emotions were chosen, i.e., only three emotions were observed more than once, and these involved related word forms (e.g., “confused” and “confusion”; “frustrated” and “frustration”). The emotions that were played are: admire, adore + , anger, awe + , boredom, bravery ⇤ , calm, confidence ⇤ , confusion + , contempt, disgust, enthusiasm + , frustration, gratefulness, 85 Table 4.1: Emotion words from human-human EMO20Q matches. emotions (synonyms) count #correct ... emotions (synonyms) count #correct admiration 1 1 guilt 4 4 adoration 1 0 happiness 1 1 affection (love) 2 2 helplessness 1 1 amusement 1 1 hope (feeling lucky) 3 3 anger 2 1 insecurity (shyness) 1 1 annoyance (irritated) 2 2 jealousy (envy) 3 3 anxiety 3 3 joy 1 0 apathy (uninterested) 1 1 loneliness 1 1 awe 1 0 love 2 2 boredom 2 2 madness (anger) 1 1 bravery 1 1 melancholy 1 1 calm 2 2 pity (sympathy) 1 1 cheerfulness 1 1 pride 2 2 confidence 1 1 proud 1 1 confusion 2 1 regret 2 2 contempt 1 1 relief 5 5 contentment (calm) 2 1 sadness 2 2 depression (misery) 2 2 satisfaction 1 0 devastation 1 0 serenity 1 1 disappointment 1 1 shame 1 1 disgust 2 2 shock 1 1 dread (hopelessness) 1 1 shyness 1 1 eagerness (determination) 1 1 silly 1 1 embarrassment 2 2 soberness 1 0 enthusiasm (eagerness) 3 1 sorrow (sadness) 1 1 envy (jealosy) 3 3 stress 1 1 exasperation 1 1 suffering 1 0 excitement 1 1 surprise 3 3 exhilaration (thrill) 1 1 tense (uncomfortable) 1 0 exhaustion 1 1 terror 1 1 fear (distress,scared) 2 2 thankful 1 0 frustration 2 2 thrill (entrancement) 2 1 fury 1 1 tiredness 2 2 glee 1 0 wariness 1 0 gratefulness 1 1 worry (anxiety, scared) 3 3 grumpiness 1 1 total 110 94 jealousy, love, proud, relief, serenity, shame, silly, surprised,and thankful. The starred emotion words were disputed by players after the games and those marked by a plus were emotions that were not successfully guessed. In addition, there were 66 additional emotion words that were referred to in question that attempt to identify the emotion in question. There was a total of 313 questions-asking events that received unambiguous yes/no answers. After normalizing the questions for punctuation and case, there was a total of 297 unique questions types with 13 questions types (29 tokens) seen more than once and no questions were seen more than 3 times. Since the surface forms of the questions vary widely and because at the current stage we have not developed natural language processing techniques to extract the underlying semantics of the questions, we used manual preprocessing to standardize the questions to a logical form that is invarianttowording. Thislogicalformconvertedthesurfaceformstoapseudo-codelanguagewitha controlledvocabularybyconvertingtheemotionnamestonounsifpossible, standardizingattributes 86 Table 4.2: Examples of some of the questions that occurred multiple times (disregarding case and punctuation). question count is it positive? 16 ok is it a positive emotion? 15 is it a positive emotion? 14 is it intense? 13 ok is it positive? 10 is it a strong emotion? 7 is it like sadness? 6 is it sadness? 5 is it pride? 5 is it neutral? 5 is it like anger? 5 is it surprise? 4 is it an emotion that makes you feel good? 4 thrilled? 3 regret? 3 pleased? 3 is it very intense? 3 is it love? 3 is it kinda like anger? 3 is it associated with sadness? 3 ... ... ok is it a negative emotion? 2 ok is it a good emotion? 2 okay is it a strong emotion? 2 is it highly activated? 2 is it directed towards another person? 2 is it directed at another person? 2 is it associated with satisfaction? 2 is it associated with optimism? 2 is it associated with disappointment? 2 is it an emotion that lasts a long time 2 does it vary in intensity? 2 Table 4.3: Examples of question standardization. Standardized Question Examples cause(emptySet,e) can you feel the emotion without any external events that cause it? is it an emotion that just pops up spontaneously (vs being triggered by something)? cause(otherPerson,e) is it caused by the person that it’s directed at? Do you need someone to pull this emotion out of you or evoke it? if so, who is it? e.valence==negative is it considered a negative thing to feel? 2) so is it a negative emotion? situation(e,birthday) would you feel this if it was your birthday? is it a socially acceptable emotion, say, at a birthday party? e==frustration oh, is it frustrated? frustration? of emotions and the relations of emotions to situations and events. Examples of the standardized questions are shown in Tab. 4.3. After the standardization, there were a total of 222 question types. We also manually mapped the surface form of the answers to yes/no/other answers. In the future, we will explore how to automate these steps and represent fuzziness in the answers that are not clearly “yes” or “no”. After the manual normalization, we found that there were 37 questions types that had been asked at least twice, eight questions types that were asked three times, four that were asked at least four times (a total of 90 question tokens were repeated at least twice). Examining the normal- 87 Table 4.4: Examples of question categories. Question Categories Examples identity (42%) is it angry? guilt? attribute (13%) is it something one feels for long periods of time? is it a strong emotion? similarity/ subsethood (10%) is the emotion a type of or related to content or zen contentment (is that a word?_) so it’s similar to excited? situational (14%) is the emotion more likely to occur when you are tired? would i feel this if my dog died? behavior (3%) you can express it in an obvious way by sighing? do adults usually try to discourage children from feeling this? causal (7%) yes. mynext question is can it harm anyone besides the feeler? I think I know, but I’ll ask one more question...does it ever cause children to wake up and cry? social (8%) are you less likely to experience the emotion when around good firiends? 13)would you feel that towards someone who is superior to you? miscelaneous (3%) i dont’ know if this is a valid question, but does it start with the letter D? or an aspirational emotion? does the word function or can be conjugated as anything eles? i.e. can it be a verb too? ized questions revealed interesting patterns of question reuse through the social interactions of the players, though more longitudinal data will be needed to rigorously characterize these interactions. Approximately half of the questions were emotion identity questions. To get a better idea of the relative frequencies of general types of questions, we devised a way of classifying the questions using the following categories: identity questions (guessing an emo- tion), attribute questions (asking about dimensional attribute like valence or activation), similar- ity/subsethood questions (asking if the emotion in question is similar to or a type of another emo- tion), situational questions (questions that ask about specific situations associated with the emotion in question), behavior questions (questions that are asked about the behavior associated with the emotion in question), causal questions (questions about the cause, effect, or dependency of the emo- tion in question), social questions (questions asking about other parties involved in the emotion–this overlaps somewhat with causal questions and situational questions), miscellaneous questions (ques- tions that defied classification or had categories with too few examples). Some examples of these categories are given in Tab. 4.4. 4.3 Discussion One of the claims we made in Sect. 3.2.2 was that the methodology of EMO20Q allowed for less experimental effects than other types of elicitation. Another benefit is that one class of experimental effect is easily measurable: influence of one subject’s guessing strategies on another subject can be 88 quantified in terms of question reuse. Examining the question reuse revealed interesting patterns that, in a different context, could be studied in its own right as a social phenomenon over time, as opposed to being considered an extraneous effect. More longitudinal data will be needed to rigorously characterize these interactions. There were some experimental effects with hinting behavior. In after-action reviews with the players, there was anecdotal evidence of retribution type experimental effects, where one player who gives only yes/no answers without any hints will be given less hints when it is their turn to be questioner. Furthermore, there is also mercy type experimental effects, whereby more hints are given by the answerer as the questioner approaches the 20 question limit. Since our main focus was the yes/no questions we did not attempt to quantify the hinting behavior as this did not effect the collection of yes/no question/answers for emotion words . By analyzing the distribution of question types, we can begin to see some general trends in how people describe emotions. The categories are we used are descriptive way of generalizing the data, not precise classes. Certain questions could have been classified into into multiple categories. In particular, the social, behavioral, causal, and situational questions all referred to various types of real-world knowledge and so these categories tended to run together. For example “are you more likely to experience the emotion when around good friends?” was labeled social, whereas “Would you feel this emotion at a birthday party?” was labeled situational, even though these two examples are very closely related. Therefore, these categories should be interpreted in a general, descriptive sense apart from the theory we aim to crowd-source. The EMO20Q game requires that an emotionally intelligent agent can understand emotions without observing them, thanks to natural language and empathy. Empathy, which we consider to be the ability to understand emotions that are not one’s own emotions, is an important aspect of this game. This game explores a computational type of empathy and that can be thought of as empathy removed of its humane connotations. Just as intelligence tests often seek to isolate verbal intelligence from mathematical or creative intelligence, emotional intelligence may have also a verbal 89 aspect and EMO20Q could provide a way to test for such a component of emotional intelligence. There were a few behaviors and game states that we did not anticipate when formulating the rules, and these will be incorporated in future data collections. One issue was synonymy. Several times, a questioner would get a very close word and oscillate between related synonyms, for example “mad” and “angry”. To deal with this we added an additional rule that allows players to accept asynonyminsteadoftheexactwordifthesynonymthatcannotbedistinguishedbyanytypeof explanation. However,eventhoughitmayseemthattwowordsarecloselyrelated,theremaystillbe some differences. For example, one player was unsure about whether “proud” and “pride” would be considered equal (native speakers of English generally feel that “pride” has a negative connotation, whereas “proud” has a positive connotation). 4.4 Emotion Description in the Brain One of the early theories of emotions is due to William James, who also discussed the phenomenon known as tip-of-the-tongue (TOT) or presque vu.AlthoughJames’workislargelyspeculativeand dated, recent work has focused on experimental and clinical data. The work of Broca and Wernike has demonstrated loci of specific language abilities and these findings have been elaborated in work on lexical access [10]. The game of EMO20Q can be seen as a TOT type phenomenon were the word is in the mind of the other player, rather than “on the tip of the tongue”. Indeed, the anecdotal affective response of players of EMO20Q when they are stuck is similar to that of not being able to think of a word and can be described as “mild mental anguish”. If it is frustrating to not be able to come up with a word, imagine suffering aphasia from a stroke that affects Broca’s area of the brain. According the an account of John Horan, a stroke victim, “It was four weeks after his stroke before he spoke his first word - which, understandably, was f***”[116]. There are different manifestations of aphasia, likely due to the areas of the brain that were 90 affected,thatcaninvolvespecificproblemssuchasnotbeingabletorememberwordsofacertainpart of speech, or words in a certain semantic category. It is these results that allow researchers to make claims about lexical access in different parts of the brain. There are results that show that emotion words may be spared in certain cases of aphasia [69]. It was observed in clinical data that when aphasia had affected the dominant (usually the left hemisphere for right-handed people) cerebral hemisphere that deals with language but had spared the minor (right) side of the brain, patients hadbetterreadingperformancewithemotionalwords, whichsuggeststhatemotionalandpragmatic language are localized to the minor hemisphere. The word reading tests were repeated on healthy patientsbyshowingwordstoselectivehalvesofthevisualfieldanditwasconfirmedthatreadingwith theleftvisualfield(whichiscontrolledbytheright/minorhemisphere)showedasignificanteffecton reading emotional words in males. However, these words were not necessarily emotion names, but rather more often emotionally charged words, like “fear”, “kill”, “pain”, “dead”, “rape”, “slap”, “stab”, etc. In this list, only “fear” and perhaps “pain” would be considered emotion descriptions. We cannot be sure that the same results would apply when dealing with natural language descriptions of emotions, which have a more conceptual, abstract character. In fact, other results attribute this lateralization effect to “imageability”, finding that concrete words were recognized better than abstract words in a patient (note that this result only involved a single patient) with left-brain damage [98]. 4.4.1 Spatial Metaphors One of the claims made in the proposal of this dissertation was that emotion descriptions are often understood in terms of dimensions and make use of spacial metaphors, such as closeness and containment for similarity and subsethood of concepts. This leds us to consider the brain’s spatialization and wayfinding capabilities as a component in understanding abstract descriptions of emotions. There is evidence that concrete words are easier to remember, regardless of brain pathology [114]. One function of spacial metaphors for abstract concepts could be to make these 91 concepts more concrete using existing brain hardware. One mnemonic technique used for improving memory is the method of loci, which dates back to antiquity and has been applied to memorizing abstractandunrelateditemssuchasdigitsof⇡ andtheorderofadeckofplayingcards. Thismethod emphasizes putting the arbitrary and perhaps abstract items to be memorized into an imaginary background: And these backgrounds ought to be of moderate size and medium extent, for when excessively large they render the images vague, and when too small often seem incapable of receiving an arrangement of images. Then the backgrounds ought to be neither too bright nor too dim, so that the shadows may not obscure the images nor the lustre make them glitter. I believe that the interval between the backgrounds should be of moderate extent, approximately thirty feet; for, like the external eye, so the inner eye of thought is less powerful when you have moved the object of sight too near or far away.[3] Word imageability has also been connected with the hippocampus [61], which is known as a brain structure that is involved with spatialization and navigation. After the proposal, we more deeply studied spatial metaphors as a cognitive model for how natural language descriptions of emotion are interpreted and considered these metaphors in light of our experimental results. In this dissertation, we have progressed from this view to consider a more nuanced and realistic view. We still use spatial metaphors in the scale-based fuzzy models. In the results of the fuzzy logic experiments, we showed that these scale-based models work well with smalleremotion vocabularies, such as basic emotions thatmaybe learned through direct experience. Also, spatial metaphors are still useful computational frameworks, allowing methods like principle components analysis (PCA), multidimensional scaling (MDS), factor analysis, cosine distance, and graph embeddings. However, our research into elucidating how interpretation of natural language descriptions is actually implemented in humans and our experimental results in simulating the interpretation in computers lead us to consider a more behavior-based model using rewards and 92 reinforcement. We also look at subcomponents of human cognition, such as language input and output, long and short term memory, belief, learning, and inference as a way to ground the Socratic epistemology in human abilities. 4.4.2 From Spacial Metaphor to Rewards and Reinforcement In the proposal, we identified some specific brain regions to study to see if the descriptive behavior or mental represenations could be localized functionally. After taking a course with Prof. Arbib, we changedthefocusfromspecificdetailsinthebraintomoregeneralexplorationoflinguisticbehavior, in particular, the question-asking period in children in language learning and the characteristics of human language that set it apart from animal communication. This also was motivated by an admonition of Quine [95], “to project non-sensory attributes purely on the analogy of sense qualities, hence as recurrent characters somehow of a subjective show within the mind, betrayes surely a cavalier attitude towards psychological processes and a lack of curiosity about he mechanisms of behavior”. Inparticular,wehadbeenexaminingtheuseofspatialreasoninginthehippocampuswith respecttoabstractemotionalscaleslikevalence,activation,anddominance. Aftermoreexamination ofthis, wefoundthat, whilethismaybeafruitfulexploration, therewardandpunishmentresponses in other limbic areas fit better with the emotions and the game playing model of EMO20Q which could also explain how these descriptions are learned. 4.5 EmotionasBehaviorMotivationandItsRoleinLanguage Learning One of the main claims in this paper is that a game-based approach can help connect two different aspects of human emotions, (1) emotional motivation of behavior and (2) natural language descrip- tion of emotion. The first has seen much focus in past research, from biological and neurological models of emotion to computational models of intelligent agents. The second is an area of research 93 that has been treated more indirectly in annotation studies and some ethnographic and theoretical research. Weclaimthatlookingatlanguagefromabehaviorallearningpointofviewcanbringthese two aspects together using games as a game-based experimental methodology. The first aspect, emotional behavior motivation, is derived from the observation that emotions play a large role in learned behavior, both consciously and unconsciously. At a very basic, biological level,positiveandnegativeemotionscanbeseenasafeedbackmechanismwherebycertainbehaviors are reinforced or punished [99, 23]. This view contrasts operant conditioning [106], where behaviors are rewarded or punished after the behavior, from classical conditioning [92], where behaviors are conditioned with antecedent or concurrent stimulus. In operant conditioning, condition responses tend to be more voluntary and can produce learned behaviors more robustly. It has been posited that emotional motivation of behavior causes learning not of specific behavior patterns but rather more abstract goals [99]. Thus, in lower animals feeding behavior might be specified directly in genes. In higher animals feeding behavior is rather an abstract goal which is genetically encoded and physiologically implemented through pleasure from feeding or pain from hunger. This indirect encoding is a more robust and adaptable way to achieve the goal of nourishment. There is a growing body of research from cognitive science and developmental psychology that shows that emotional behavior motivation plays a role in language learning behavior. One such area of research is in the question-asking period of child language acquisition. The two major hypotheses about this period of child development is that children ask questions for attention and that they are constructing explanations about the world. At first it would seem that the attention- seeking hypothesis would be a more emotional motivator of behavior and that the construction of explanations would be more cognitive. However, results show that it is not attention that satisfies children, in terms of number of questions asked, but rather receiving meaningful explanations [27]. The satisfaction of receiving an explanation can be seen in the same evolutionary sense of emotion as behavior motivation. A succinct and provocative way of stating this hypothesis is the analogy that “explanation is to cognition as orgasm is to reproduction”[32]. However, both cases, attention- 94 seeking and explanation seeking, can be seen as a positive reinforcement that can be modeled by question-asking games. The question of whether emotional behavior motivation can account for intelligent behavior more broadly, e.g. learning and problem solving, has been explored widely in intelligent agent architectures [75] and in intelligent tutoring systems [36]. In our work using EMO20Q, we take two types of consequent feedback, receiving a “yes” answer to a question, and successfully completing the game by correctly guessing the emotion chosen by the otherplayer. TheideaisthatgameslikeEMO20Qcancaptureboththeemotionalutilitythatdrives game playing behavior as well as cognitive utility of understanding natural language descriptions that explain emotions. 4.6 Four Stages of Language Learning from Quine Below, wequotefrom[95]anexcerptthatdescribesfourmajorphasesoflearninglanguage. First, to apply words given stimuli, second, to refer to objects distinctly, third, to refer to new and possibly imaginary objects by combining attributes of different objects, and fourth, to refer to relations between objects: Let us look back over the phases that we have already distinguished in the referential function of language, so as to see the significance of this new on in the proper setting. In the first phase, terms like ’mama’ and ’water’ were learned which may be viewed retrospectively as names of an observed spatiotemporal object. Each such term was learned by a process of reinforcement and extinction, whereby the spatiotemporal range of application of the term was gradually perfected. The object named is assuredly an observedone,inthesensethatthereinforcedstimuliproceededdirectlyfromit. Granted, this talk of name and object belongs to a later phase of language learning, as does the talk of stimulation. The second phase is marked by the advent of general terms and demonstrative sin- 95 gular terms; also, as degenerate cases of the latter, singular descriptions. The general terms are learned still by ostension, but they differ from their precursors in their divided reference. Thedivisionofreferenceissuchastogiveprominencetotemporalcontinuities of substance and of objective position. Already it may be possible to learn a gernal term such as ’unicorn’ by ostension of pictures and be quite prepared to find that it refers to nothing; for one soon appreciates that terms are intended less commonly for pictures than for things they depict. An in any event, failures of reference of a sort become pos- sible on the demonstrative singular terms and singular descriptions, as when ’this apple’ or ’the apple’ is used in connection with somthing that proves to lack its back half or to be a tomato. But despite the scope for failure of reference, the objects amenable to reference are still substantially of the same old ones. They are the objects from which the reinforced stimuli proceeded in our stensive learning of the general terms concerned, or objects enough like them to invite application of the same terms. The third phase brings compound general terms, through the attributive joining of general terms. Here we are more clearly assured than before of cases of failure of reference on the part of general terms; we get compounds like ’square apple’ and ’flying horse’ that are not true of anything, because of non-duplication among the objects of which the component terms are true. The attributive joining of terms can also issue directly in singular terms that name nothing; viz., such compound mass terms as ’dry water’. Moreover,fromattributivelycompoundedgereraltermswecangetdemonstrative singulartermsandsingulardescriptionswhosefailureofreferenceiscategoricallyassured: ’this square apple’, ’the flying horse’. This third phase, for all the possibilites of failure of reference that it opens up, still provides no reference to new sorts of objects on the part of general terms. When there are things at all for the newly compounded general terms to be true of, they are just some among the old things to which the component terms apply. The third phase brings 96 mass production of general terms, far out running the objects of reference; but those objects are the same old ones. One may still ask whether the third phase does not open the way to new objects for singular terms. For the attributive compounding of mass terms yields a signular term thatreferstojustthecommonpartofthetwomassesorscatteredtotalitiesnamedbythe components. May it not happen that that common part is something that no previous singular term refers to and non of the gereral terms at hand that is true of either? Not so. Eachof thecomponents, beinga massterm, hasits predicative useas ageneralterm, and, so used, will already have been true of that common part among other things. It is reserved rather to the fourth phase to give access to new objects. This is the phase ushered in just now by applying relative terms to singular or general terms to form general terms. This phase brings a new mode of understanding, by generating such compoundsas’smallerthanthatspeck’. Suchacompoundisunlike’squareapple’inthat it does not even purport to denote things to which we could point and give individual namesiftheycameourway. Therelativeterm’smallerthan’hasenabledustotranscend the old domain, without a sense of having fallen into gibberish. The mechanism is of course analogy, and more specifically extrapolation. The relative terms of the fourth stage in the quote refer to relations between objects. These type of linguistic construction is particularly useful when referring to emotions because many emotions are types of de-relativized relations. For example, the relation “x loves y” can be de-relativized to create theabstract emotion “love”. Thisprocess is oftentermed “reification”, for re Latin forthing or object. What Quine refers to in the quote as “giving access to new objects” is the idea of theoretical entities that are extrapolated from concrete relations between objects by reifying the relation into something discussed as if it had a definite, unique existence. It is this class of objects that have been studied as theoretical (or virtual) objects (or entities). 97 We argue that these types of descriptions of emotion that are based on reified relations are not well suited for a scale-based representation of emotions but rather a linguistic, propositional model is better. In the next chapter we look at computational models that can simulate how humans understand this type of linguistic meaning, but first we describe the subcomponents of human cognition that make the interpretation of language possible. 4.7 Cognitive Subcomponents In Section 1.5, we identified several subcomponents of human cognition that were at play in inter- preting natural language descriptions of emotion and that should be simulated in computational approaches. These were: dialog-structured language input and output, belief, long term memory, short term memory (a.k.a. episodic buffer or working memory), inference, learning, and perceptual subjectivity. 98 Chapter 5 Natural Language Descriptions of Emotion in Computers Neurath has likened science to a boat which, if we are to rebuild it, we must rebuild plank by plank while staying afloat in it. The philosopher and scientist are in the same boat. If we improve our understanding of ordinary talk of physical things, it will not be by reducing that talk to a more familiar idiom; there is none. It will be by clarifying the connections, causal or otherwise, between ordinary talk of physical things and various further matters which in turn we grasp with the help of ordinary talk of physical things. -Willard Van Orman Quine, Word and Object In the quote above, the metaphor of rebuilding a boat while it floats is used to represent the process of building theories by making slight adjustments and dealing with a provisional understanding that will certainly be revised in the future. In this respect, the engineer is in the same boat as the philosophers and scientists noted in the quote. In this chapter we look at building an agent that can understand natural language descriptions of emotion. To do this, we aimed to design an agent around the cognitive components identified in Section 4.7, that can update it’s belief, learn from 99 mistakes, identify and act upon uncertain knowledge, and generate descriptions from it’s knowledge. 5.1 SimulatingCognitiveComponentswithaGeneralizedPush- down Transducer In the Section 1.3 and Section 4.7 we described the cognitive components that were hypothetically enable human natural language description. In Section 5.2.1 we designed a statistical framework to model long term (semantic) memory, short term memory (episodic buffer), sequential observations, and update of a belief state using a sequential Bayesian model. In this section, we give a compu- tational data structure that implements this model. This implementation is based on a generalized pushdown transducer (GPDT), a type of computational automaton. Computational automatons are used in theories about acceptance of formal languages. Recall the Chomsky hierarchy [12], which specifies nested sets of languages of varying complexity. Briefly, the four levels of language complexity, from simple to complex, are regular languages, context free languages, context sensitive languages, and recursively enumerable languages. The membership of aparticularstringinanyoftheselanguagesrequiresanautomatonofacorrespondingcomplexity. Ignoring some specific details (of determinism), finite state automata can determine the strings of regular languages, pushdown automata can determine the strings of context free languages, linear bounded automata determine the strings of context sensitive languages, and Turing machines deter- minethestringsofrecursivelyenumerablelanguages. Whenwesaythattheseautomata“determine” the corresponding language, we mean that they can both recognize and generate these languages. The topic of generation is taken up in Section 5.4. These results of formal language and automata theory are summarized in Table 5.1. An automaton recognizes a language by accepting input symbols and returning a decision of whether the string of symbols is in the language specified by the automaton. Transducers are automata that also output symbols. Pushdown automata and transducers have been used in pro- 100 Table 5.1: Chomsky hierarchy. Language Type Automaton Regular Finite state automaton Context free Pushdown automaton Context sensitive Linear bounded automaton Recursively enumerable Turing machine gramming language parsers (e.g. [91]) and also in parsing natural language (e.g. [1]). Our approach hasbeenslightlydifferentinthattheunitofanalysisisnotaprogramminglanguagetermornatural language word, but rather whole question and answer utterances. Our formulation of a GPDT can be specified by a 7-tuple (⌃ , ,Q, ,,q o ,F): ⌃ and are the input and output alphabets, respec- tively,Qisasetofstates, isasetofstacksymbols, isatransitionrelationinQ⇥ ⌃ ⇥ ⇤ ⇥ Q⇥ ⇤ , q 0 is the start state, and F is a set of terminal final states. Our model uses a generalized pushdown transducer model, which means that it can look at the whole contents of the stack instead of just the topmost symbol on the stack. This allows the transition relation to consider a string of symbols from the stack aphabet (denoted by the Kleene closure “ ⇤ ”onthefirst ⇤ in the transition relation). This is important because the stack becomes the feature vector f 1 ,...,f t in the sequential Bayesian statistical model. An example GPDT is shown in Figure 5.1. Q together with define the directed graph in the figure. IntheGPDTthatweusetosimulatethebehaviorofahumanplayerofEMO20Q,thecomponents of the GPDT map transparently onto the cognitive components that we identified earlier. ⌃ and are the answer and question utterance types, respectively, Q is a set of states in the dialog graph, is a set of features composed of question-answer pairs, and is a transition relation between different dialog states. In this implementation, start and end states correspond to the user starting and quitting. However, the loop back from the “between matches” state and the “asking” state correspond to accepting the dialog in the sense of an accepted string in a formal language, if we wished to separate multiple EMO20Q matches into each single match about a given emotion. The GPDA simulation of a human questioner in EMO20Q is shown in Figure 5.2. 101 Figure 5.1: Generalized pushdown transducer (GPDT). 5.2 From Truth to Epistemology In this section, we look at the computational implications of the theoretical shift from the earlier model of our dissertation proposal to the current model. In the proposal, the understanding of natural language description of emotion was simulated by the truth of propositions. In the current model, we focus on simulating the justification of belief. 5.2.1 Computational Socratic Epistemology ThemodelweusefortheEMO20QquestioneragentisasequentialBayesianbeliefupdatealgorithm. This model fits the framework of Socratic epistemology, as described in the introduction, because it combines the notion of belief and question-asking. Intuitively, this algorithm instantiates an agent with data from previous EMO20Q matches assumed as semantic knowledge and begins a match of EMO20Q with a uniform belief about the emotion word to be guessed. Based on the previous semantic knowledge, the agent asks questions and updates its belief based on each observation of 102 Figure 5.2: GPDT simulation of a EMO20Q questioner. the user’s answer to a question. While the match is played, the observations are aggregated into an episodic buffer [4],alsoknownasworkingmemory. Afterthematch,theagentupdatesitsknowledge using the results of the match, clears its episodic buffer, and is then ready to play again. The words in italics are programming-level abstractions used to create a human-centered model for the agent, which is underlyingly represented as a sequential Bayesian statistical model. The semantic knowledge described above is the conditional probability of observing a set of question-answer pairs given a hidden variable ranging over emotion words. This conditional proba- bilitydistributionisestimatedfromthecorpusofpasthuman-humanandhuman-computerEMO20Q matches as follows. Let E be the set of emotion words and let "2 E be this categorical, Bayesian (i.e., unobserved) random variable distributed over the setE. The probability of ", P(") is the belief about the emotion word to be guessed. Each question-answer pair from the match of EMO20Q is considered as an observation or feature of the emotion being predicted. Thus if Q is the set of questions and A is the set of answers, then a question q2Q and an answer a2A together compose 103 the feature f =(q,a),i.e. f 2 Q⇥ A. The conditional probability distribution, P(f|"), which represents semantic knowledge, is estimated from the training data using a smoothing factor of 0.5 to deal with sparsity. In this model we stipulate that the set of answers A are four discrete cases: “yes”, “no”, “other”, and “none”. When the answer either contains “yes” or “no”, it is labeled accordingly. Otherwise it is labeled “other”. The forms of ‘yes’ are ‘yes’, ‘yeah’, ‘yea’, ‘yep’, and ‘aye’. 1 The forms of ‘no’ are ‘no’ and ‘nope’. The feature value “none” is assigned to all the questions that were not asked in a given dialog. “None” can be seen as a missing feature when the absence of a feature may be important. For example, the fact that a certain question was not asked about a particular emotion may be due to the fact that that question was not relevant at a given point in a dialog. In [57] we explore a fuzzy logic model for answers, but incorporating it into the sequential Bayesian model is still an open research topic. Similarly,westipulatethatthequestionscanbeclassifiedintosomediscreteclassthatisspecified through a semantic expression as described in Section 4.2. For example, the question “is it a posi- tive emotion?” is represented as the semantic expression “e.valence==positive”. If the answer to this questionwas“maybe”,theresultingfeaturewouldberepresentedas(‘e.valence==positive’,‘other’). Using Bayes rule and the independence assumption of the naïve Bayes model, we can formulate the agent’s belief about the emotion vector "after observing features f 1 ...f t ,inonesinglebatch,as P("|f 1 ,...,f t )= Q t i=1 [P(f i |")]P(") Q t i=1 P(f i ) . (5.1) This is simply the formulation of naïve Bayes, where in this case P(") is the prior probability of a player choosing a specific emotion word, Q t i=1 [P(f i |")] is the likelihood of seeing question-answer 1 These forms of ‘yes’ and ‘no’ were determined from the data. The case of ‘aye’ is an example of how some users have tried to trick the agent, in this case by talking like a pirate. From the agent’s point of view, it will be difficult to distinguish the language of population that actually includes pirate demographics from language containing experimental artifacts like this. 104 pairs given specific emotion words, and Q t i=1 P(f i ) is the probability of observing question-answer pairs in general. In our programming-level abstraction, the set of observations f 1 ...f t is the feature vector which was described as the episodic buffer,P(f|") is the semantic knowledge,f t is the current observation, andp(")andP("|f 1 ,...,f t )aretheinitial/priorandfinal/posterieor beliefs,respectively. In Equation 5.1, the posterior belief of the agent of emotion e k at time t, P("= e k |f 1 ,...,f t ) is computed only after the agent has asked the t questions. This model is known as naïve Bayes. In contrast, the sequential Bayes model that we use is dynamic: the agent updates its belief at each time point based on the posterior probability of the previous step, i.e., at time t P("|f 1 ,...,f t )= P(f t |")P("|f 1 ,...,f t 1 ) P(f 1 ,...,f t ) When the game begins, the agent can start with a uniform prior on its belief of which emotion is likely or it can use information obtained in previously played games. In the experiment of this paper, we use a uniform prior, P("= e k )=1/|E|, 8 k=1...|E|.Wechosetousetheuniformprior to initialize the agent because our training data contains many single count training instances and because we want to examine how the system performs with less constraints. We introduce a new variable t,k =P("=e k |f 1 ,...,f t ) for the agent’s belief about emotion k at time t and postulate that the agent’s current prior belief is the posterior belief of the previous step. Then, the agent’s belief unfolds according to the formula: 0,k = P("=e k )=1/|E| 1,k = P(f 1 |"=e k ) P(f 1 ) 0,k t,k = P(f t |"=e k ) P(f 1 ,...,f t ) t 1,k (5.2) Decomposing the computation of the posterior belief allows the agent to choose the best question to ask the user at each turn, rather than having a fixed battery of questions. In this case, we define 105 “best” as the question at time t as the question that is most likely to have a “yes” answer given the posterior belief at time t 1, P("|f 1 ,...,f t 1 ): argmax q2 Q P((q,‘yes’))|")P("|f 1 ,...,f i 1 ) This next-question criterion is a heuristic motivated by seeing “yes” answers as positive feedback that the agent is on the right track. While this heuristic worked well in practice, other next-question criteria are certainly possible and this is an area for future research. The agent asks the best question and takes the user’s response as input. It then parses the input to classify it into one of {“yes”, “no”, “other”}. This information is then used to update the agent’s belief as to which emotion in E in most likely. The unfolding of variable in Equation 5.2 models the update of belief as it is justified by the agent’s question-asking and the user’s answers. It is this computational model of question-asking and belief update that represents the Socratic epistemology for verbal emotional intelligence in a software agent. Identityquestionsareaspecialtypeofquestionwheretheagentmakesaguessabouttheemotion. An affirmative answer to an identity question (e.g., “is it happy?”) means that the agent successfully identified the user’s chosen emotion. Any other answer to an identity question will set the posterior probability of that emotion to zero because the agent can be sure it is not the emotion of interest. Also, because it is playing a twenty questions game, d is set to 20, but this could be changed for the agent to generalize to different question-asking tasks. The pseudo-code for the main loop of the adaptive Bayesian agent is shown in Algorithm 1. This automated, data-driven component was framed within a manually designed dialog graph, as shown in Figure 5.3. Anagentwiththismodelandalgorithmwasusedforthehuman-computerEMO20Qexperiments. Theexpositionofthemodelandalgorithmisanexpandedversionof[52]. Table5.2showsanexample an interaction between the automated EMO20Q questioner agent and a human user, along with a trace of the agent’s belief state that shows the justification of beliefs by question-asking. 106 Algorithm 1 adaptive Bayesian emo20q agent Input: F =Q⇥ A,E,and P(f|") 0,k 1/|E|,8 k=1...|E| for i=1 to d do q (i) = argmax q2 Q P((q,‘yes’)|")P("|f 1 ,...,f i 1 ) Print q (i) a (i) user’s input answer f i (q (i) ,a (i) ) i,k i 1,k ·P(f i |"=e k )/P(f 1 ,...,f i ),8 k=1...|E| if (q (i) is identity question for e k ^ a (i) = ‘yes’ ) then Return: e ⇤ =e k end if if (q (i) is identity question for e k ^ a (i) = ‘no’) then i,k 0 end if end for k ⇤ argmax k2 1...|E| [ i,k ] e ⇤ e k ⇤ Return: most likely emotion given observations: e ⇤ 5.2.2 Human-Computer EMO20Q Experimental data Using the human-human data described in earlier Section 4 and the computational model and algorithm to be described later in Section 5.2.1, we built a computer agent to play the questioner roleinEMO20Qgames. Totestthismodel,weconductedtwoexperimentstoassesstheperformance of the agent. The first experiment was a pilot study of 15 subjects who played three matches against the agent [52]. In this pilot study, the subjects were recruited locally. Subjects were asked to pick three emotion words, one that they thought was “easy”, one that was “medium”, and a third that was “difficult”. These difficulty ratings were described in terms of a person’s maturity and vocabulary: an “easy” emotion word was one that a child could guess, whereas a “difficult” word was one that would require maturity and a sophisticated vocabulary to guess. The second experiment was a larger experiment. It followed the same methodology as the pilot study, but with 101 subjects recruited from Amazon Mechanical Turk. These subjects were selected to come from the United States, speak English fluently, and have high past acceptance rates as Mechanical Turkers. Of the 101 subjects, only one was rejected, due to misunderstanding the 107 Figure 5.3: Dialog graph for the EMO20Q questioner agent. The loop labelled “asking” represents the functionality described by the sequential Bayesian model of Equation 5.2 and Algorithm 1. task by choosing the words “easy”, “medium”, and “difficult” instead of emotion words. This level of acceptance, approximately 99% is rather high for Mechanical Turk, showing a high degree of cooperation. Several users commented that we could have paid less because the task was fun. In the second experiment, the parameters of the model were updated every ten subjects. Thus, there were ten waves of ten subjects, each playing 3 matches against the automated agent, which yielded 300 matches. 5.2.3 Results As reported in [52], the results of our pilot experiments on fifteen subjects are summarized in Table 5.2.3. Tocomparetheagent’sperformancewithhumanperformance,weusedtwoobjectivemeasures and one subjective measure. The success rate, shown in column two of Table 5.2.3, is an objective measure of how often the EMO20Q matches ended with the agent successfully guessing the user’s emotion. The number of turns it took for the agent to guess the emotion is the other objective 108 Table5.2: ExampledialogfromEMO20Qquestioneragentwherethehumaninterlocutorhaschosen “relaxation”. Speaker/Event Utterance/Content Agent Let me know when you are ready... User ok Belief update (Uniform distribution) Agent Is it very intense? User no Belief update relaxation:0.027, hope:0.027, boredom:0.027, ..., anger:0.0024 Agent is it positive? User yes Belief update hope:0.15, relaxation:0.12, cheerfulness:0.091, ... Agent is it an emotion that just pops up spontaneously (vs being triggered by something)? User depends Belief update hope:0.33, relaxation:0.23, cheerfulness:0.15, ... Agent is it caused by something in particular? User well, yes, you do need the right conditions Belief update relaxation:0.50, hope:0.29, cheerfulness:0.097, ... Agent Is it relaxation? User Yes Agent so did I get it right? User Yes, you got it Agent Awesome! Would you like to play again? User No, thanks. Agent Thank you for playing measure. The last column, naturalness, is a subjective measure where users rated how human-like the agent was, on a 0-10 scale. Overall, the performance of 44% successful outcomes (where the emotion word was correctly guessed) was much less than in the human-human experiments, where successful outcomes occurred in 85% of EMO20Q matches. The emotion words chosen by the subjects as “easy” were recognized by the agent with similar success rate and number of required turns as human-human matches. Some examples of “easy” emotions are anger, happiness, and sadness. However, successful outcomes were fewer in emotions chosen as “medium” and “difficult”. Some examples of “medium” emotions are contentment, curiosity, love, and tiredness. Pride, frustration, vindication, and zealousness are examples of “difficult” emotions. The results in terms of successful outcomes and number of turns required to guess the emotion 109 word are roughly reflected in the percent of words that are in-vocabulary. Despite the low perfor- mance on emotion words rated “medium” and “difficult”, there was not a corresponding decrease in the perceived naturalness of the questioner agent. This led us to believe that the model could reproduce somewhat natural behavior, but that the data we had was insufficient due to the amount of out-of-vocabulary words in the medium and difficult classes, which motivated us to perform the second, larger-scale experiment with 100 players from Mechanical Turk. Table 5.3: Experimental results for 15 subject pilot study (45 EMO20Q games). difficulty % success avg. turns % in vocab. naturalness easy 73% 11.4 100% 6.9 medium 46% 17.3 93% 5.5 difficult 13% 18.2 60% 5.8 total 44% 15.6 84% 6.1 Table 5.4: Experimental results for 100 subject Mechanical Turk study (300 EMO20Q games). difficulty % success avg. turns % in vocab. easy 90% 10.7 100% medium 56% 15.7 91% difficult 25% 18.0 60% total 57% 14.8 83.7% In the larger scale Mechanical Turk experiment, we aimed to improve performance by retraining the model after each 10 subjects. This strategy did in fact increase the successful outcome rate and reduced the length of the EMO20Q dialogs (number of questions), as can be seen from comparing Tables5.2.3and5.2.3. Acrossthethreedifficultyclasses, thesuccessfuloutcomerateimproved. The difficult class had the largest relative improvement in successful outcomes, increasing from 13% to 25%, and the overall successful outcome increased from 44% to 57%. The lengths of the EMO20Q dialogs decreased most for the medium difficulty class, resulting in an average of 1.6 less turns for this class. Overall, the decrease in dialog length decreased from 15.6 to 14.8 turns. One surprising result was that even after collecting data from 300 EMO20Q dialogs (more than doubling the earlier human-human data), the out of vocabulary rate stayed nearly the same. We had expected out of vocabulary words to become fewer as more data had been seen. However, with each round of the 110 Table 5.5: Observed emotion words by difficulty. Words that were attested but which did not fit the authors’ broad intuitions are marked with ’?’. difficulty examples easy happiness, anger, sadness, calm, confusion, love, mad, hate, joy medium anger, confusion, contentment, curiosity, depression, disgust, excitement, fear, hate, irritation, love, melancholy, sorrow, surprise, tiredness, envy, outrage, elation, suffering, jealousy, nervousness, sympathy, thrill, upset, joy, anxiety, frustration, flustered, enjoyment, exhaustion, fury, bordom, delight, cold, apathy, hostility, loved, annoyance, playfulness, downtrodden, stupor, despair, pissed, nostalgia, overjoyed, indifference, courage difficult devastation, disgust, ecstasy, ennui, frustration, guilt, hope, irritation, jealousy, morose, proud, remorse, vindication, zealousness, elation, mischievous, usure, angst, patience, despise, inspired, euphoria, exuberance, worrying, melancholy, ambivalence, love, loneliness, exacerbated(?), avarace, stress, envy, disillusionment, maudlin, depression, confusion, maniacal, ambiguity, concern, plea- sure, shame, indifference, anger, suicidal, pessimism, annoyance, sense of failure, educated(?), manic, overwhelmed, astounded, discontent, energetic, introspective, appalled, serenity, dissatisfac- tion, anxiety, lust, conflicted, perplexed, jubilance, disappointment, satisfaction, remorse, embar- rassment, downcast, guilty, enamored, alienation, exotic(?), hate, caring, resentment, pity, aversion, quixotic, infuriation Mechanical Turk experiment, we received new words. For the Mechanical Turk experiment, we did not ask subjects about the perceived naturalness of the agent because in the pilot study it had remained mostly constant. AcompletelistingofthewordschosenbythesubjectsoftheexperimentisgiveninTable 5.2.3. It can be seen that there are a wide variety of words. A few (those marked by “?”) were questionable in the authors’ intuition, but otherwise the words showed a high level of understanding and cooperation by the Mechanical Turkers. The different classes of words were not disjoint: some words like anger, disgust, love, and confusion spanned several categories. It can be concluded that these three difficulty levels do not form a precise classes of emotion words, but the levels do show a trend toward a smaller basic vocabulary and a wider open vocabulary. The difficulty levels also served as a method to elicit diverse words. 5.2.4 Discussion The reader may ask what use is it to study the variety of ways that emotion can be described in natural language. For example, what use is it to know that for some people, the words “pride” and “proud” may have different connotations? 111 [regarding “proud” vs. “pride”] because my intuition was that they’re different... you know pride sometimes has a negative connotation or that “anger” is might not always described as “negative”? [questioner:] so is it a negative emotion? [answerer:] sort of, but it can be righteous For some purposes it might be sufficient to consider “anger” to be simply a negative emotion, or that the words “pride” and “proud” to refer to the same emotion. However, from nearly every line of EMO20Q data we can see that the ways that humans describe emotions are much more nuanced. One of the main results of this study is that when asking questions about emotions at this level of detail, we can, in Socrates’ words of the quote in Section 3.4, reach mutual edification about desciptions of emotion at this level of detail. Precisely, among humans playing EMO20Q mutual edificationhappensabout85%ofthetimeafterabout12dialogturns. Thisrepresentsahighamount of agreement for an emotional classification task, especially considering that the set of emotional classes was unbounded. Although analyzing this level of detail is beyond the scope of many current systems, we have shown that it is a task that humans can do with success rates that beat agreement ratesonemotionalannotationsatamuchcoarserlevel, suchaslabelingdatawithninebasicemotion labels [8]. Thegroundtruthissueinvolvedinannotatingrecordeddatawithdescriptivelabelsisachallenge that the Socratic epistemology can shed light on. The traditional annotation task seeks to have human annotators assign one of a number of labels to data. In the case of emotion research, usually the labels are a controlled vocabulary of several emotion descriptors, like “angry”, “happy”, “sad”, “disgusted”, “fearful”, “surprised”, and “neutral”. The problem with this approach is that these labels often do not fit realistic emotional data. In affective computing the issue of non-prototypical emotions [88] arises when observed emotional data does not fit into the controlled vocabulary of emotional annotations, resulting in low annotator agreement and difficulty establishing a ground 112 truth annotation. Theoretically, our approach addresses the issue of ground truth in the annotation task with the notion of epistemology, which frames the issue as justification of belief rather than groundtruth. Practically,ourapproachaddressestheissueofnon-prototypicalemotionsbyenabling a more nuanced representation where the description is not a closed set of alternatives but rather an interactive process of communication of an open set of natural language descriptions. Though this more nuanced view brings with it new challenges, we have shown the design of an intelligent dialog agent is a feasible way of dealing with these challenges. 5.2.5 Conclusion The main goals of this paper were to use question-asking to observe natural language descriptions of emotion in a natural context and to computationally model the social processes that support the referential link between language and emotions. We presented the emotion twenty questions (EMO20Q) game and showed that the level of human performance was high despite not limiting the players to any specific emotion vocabulary. We also presented an automated agent that can play the question-asking role of EMO20Q. This agent uses a sequential Bayesian belief update algorithm to model the a player’s belief state over time. This framework was inspired by a method of question- asking that was proposed by the ancient philosopher Socrates and the field of epistemology. We deem that future advances in affective computing can come from studying emotions at a finer grain, as they are described in natural language. We plan to continue studying this phenomena in several ways. First, we hope to see the effect of modality on how people describe emotions in natural language. The current work was limited to text-based chat, so paralinguistic information that may help to convey emotional data was minimized. Including audio and video data may allow greater convergence of the players to agree upon the unknown emotion in EMO20Q. Another area of future research will be to examine the answerer role. The current research focused on the questioner role, buttheanswererrolewillofferadditionalchallengesandinsights. Inparticular, automatingthe answerer rolewill requiremore robustnatural languageunderstanding andit willallowa closedloop 113 simulation where both roles of EMO20Q are played by computer. There are also further areas to explore for the questioner agent, in particular, the criterion for choosing each question. Finally, we thinkthatthisapproachcanimproveemotionannotationandotherannotationtasks, suchascoding behavioraldataforpsychologicalassessment. Inthesetaskshumanannotatorsareaksedtolabeldata using a controlled vocabulary of words and agreement is established statistically between isolated annotators. However, we have shown that humans are able to communicate with high accuracy using a large, subjective vocabulary and we feel that allowing natural language descriptions in an interactive, question-asking setting will allow for more accurate and less constrained annotations. Natural language presents many difficulties for describing emotions in a way that can be under- stood by a computer: synonymy, ambiguity, subjectivity, and data sparsity, among others. We feel that the additional complexity of using natural language as a way of describing emotions in compu- tational frameworks will be challenging, but will offer insight into problems that continue to vex the field of affective computing as well as other tasks where descriptions are subjective and established through natural language interaction. 5.3 Pursuing Knowledge Goals 2 In this section, we use the theoretical formulation of a truth table developed in Section 3.3 to construct the adjacency matrix that specifies a graph, which can be used as a computation model of the agents knowledge. In the next section, we refine this model to a more nuanced model of knowledge based on epistemology. Consider a dialog agent that must ask a series of questions in order to identify some unknown object. Whether it be a real, physical object such as a seat on a flight, or a virtual object, such as an emotion or a disease, this object will have a set of attributes that can be described and asked about using natural language. If the agent knows the complete mapping of objects to attributes, it 2 This section is based on work published in [53] 114 will be able to identify the unknown object by asking a series of questions in the form of a decision tree, where each successive question aims to evenly partition the space, i.e., to “divide and conquer” the space of attributes as quickly as possible. These types of questions satisfy what we call a task goal. However, if the agent is acting in a state of incomplete knowledge, it may ask extraneous questions not to identify a particular object, but rather to satisfy a knowledge goal [97, 34], which canbeseenasaformofcuriosity. Forexample, assumethattheagenthasaskedaseriesofquestions that uniquely identify an object, but has incomplete knowledge about some attributes of this object. This agent would continue to ask questions to satisfy a knowledge goal even though the task goal has been completed. One could imagine a conversational diagnostic agent that knows attributes of the flu and attributes of diabetes, but that the attributes do not completely overlap. For example it might know that fevers are associated with the flu, but the association of fever with diabetes is unspecified. Even if the agent has asked enough questions to correctly diagnose diabetes, it may ask another question about fever in order to increase the coverage of its knowledge. It is this type of question-asking behavior–questions motivated by an agent’s knowledge goals–that we turn our attention to in this chapter. This chapter uses the game of Twenty Questions as a domain to study the knowledge goals of adialogagent. Althoughthisgameisfamiliartomanyandwehaveobservedevidencethatitis played in many cultures, we briefly explain the game to establish our terminology. The one player, the answerer, picks a word unbeknownst to the the other player, the questioner,whoattempts to guess the word with twenty or fewer questions. In human-human versions of this game, the two playersswitchrolesattheendofeachmatch. Intheformulationwepresentinthischapter, however, we consider the case where the questioner is a computer agent instead of a human. We could also think of a computer agent for the answerer role, but we deem this to be the harder role to automate and look forward to tackling such a problem in future research. The data we analyze is from a specific version of the game of twenty questions that we devised as an experiment to study natural language descriptions of emotions [49]. The basic idea of the Twenty Questions game is still the 115 same except that the answerers must pick words that denote emotions. Despite the limitation of the game to emotion words, the players were not otherwise limited to a fixed emotion vocabulary, in distinctiontootheremotionresearch[56,47,57]wherefixedemotionvocabulariesaretypicallyused. This fact, combined with the vague, ambiguous nature of emotion terms led to a difficult task even forthehumanplayers. Wefoundthatplayersrequiredonaverage12questionstocorrectlyguessthe unknownemotions, whenfailurestoguesscorrectlyareaveragedinastwentyquestions. Wealsofeel that the limitation of only twenty questions is immaterial from a theoretical and experimental point of view, but practically necessary to prevent undue frustration and expense of time by the players. From other perspectives, the fact that we chose emotions as a domain for questioning is a relevant topic of research which is treated in [49], however in this chapter we look at the problem more abstractly. If one considers emotions to be subjectively defined physiological states, such questions couldbeappliedtoverbalexaminationsbydoctors,forexample. Furthermore,ifweconsiderobjects tobegeneralthingsthatcanberepresentedbyavectorofattributes, thenthismodelcanbeapplied to a wide range of objects. Emotions have the characteristics of theoretical objects [26], i.e., objects whose existence is predicated using natural language that refers to categories of objects rather than actual physical objects. These categories can be seen as equivalence classes formed by the objects’ attributes. The main claim of this chapter is that knowledge goals for a question-asking agent can be formulated in terms of completing connections in a graph structure that connects questions with objects. Questions can be thought of as propositions that are assigned true or false values by the answers. These propositions satisfy,inamodel-theoreticsense,amodeloftheobjects’attributes. However, naively asking questions to connect every question to every answer is inefficient and we are interested in an agent that asks questions that cannot be inferred from others, and we determine thesequestion/objectpairsusingagraph-basedapproach. Tomakethedistinctionbetweenalackof knowledge that cannot be inferred and that which can, we define two terms for this purpose. We call lack of knowledge that cannot be inferred from other knowledge unconnected knowledge to highlight 116 the fact that it is represented by an unconnected graph. In the case where the lack of knowledge is just due to missing attributes of some objects, we call incomplete knowledge to distinguish the graphical representation of this knowledge from a complete graph, where all vertices are connected to every other vertex. Using this graph-based representation, we answer two particular questions: 1) how can an agent determine if its knowledge is unconnected, and 2) how can the agent ask targeted questions that will “connect” the disconnected sets of propositions in its knowledge. Other work has looked at dealing with uncertainty in dialog systems regarding automatic speech recognition (ASR) output, which results uncertain knowledge. In [118], the problem of determining additional questions to ask is posed in terms of statistical uncertainty about the ASR output and hence the dialog state. In this chapter, we simplify the problem by assuming that the dialog system has only one state, that of asking questions about a single object. We also assume that ASR is reliable since it only needs to process answers to yes/no questions. The range of user input is thus limited, so we can assume that ASR only needs to recognize affirmative answers, negative answers, and uncertain answers. At a meta level, the framework of twenty questions to explore a conceptual space has been applied to dialog act taxonomies [109]. 5.3.1 Constructing a Graph from Question-Object Pairs Inthissection, wedescribethenotionofatheoryfrommathematicallogicwhichstatesthatatheory is simply a set of sentences in some language L that is true of a model M [25, 22]. In the case of, is the set of questions that were answered with “yes” and negations of the questions that were answered with “no” for given objects, L is the language of propositional logic, and M is a model of the objects. Using this formulation, we describe how to construct a graph that can be used to identify important gaps in the agent’s knowledge. In this view, each question can be represented as a proposition p that can be judged true or false of a given object o. The model of a specific object o is denoted M o .Assumingnowthatthe agent has just asked question p,wecansaythat |= Mo p if the user answers “yes” or |= Mo ¬p if the 117 player answers “no”. The previous notation is read “p/¬p is satisfied by M o ”, or equivalently “M o is amodelof p/¬p”. If a propositionp satisfies the model of object o, M o ,thenp2 o , where o is the theory of objecto. If we can enumerate a complete set ofP propositionsp n indexed byn=1...|P|, then we can represent o as a Boolean vector of length |P|.Foreveryquestion p n asked, the n-th position of o will be true or 1 if the user has answered yes to p n when the object was o. In this case we can say that p n is a theorem of o .Similarly, false or 0 is assigned to element n of ",i if p n received no as an answer when being questioned while o was the object in question. In this case, ¬p n is a theorem of . If a theory for a specific object can be seen as a long list of propositions that are true of it, the theory of a set of emotions can be seen as an matrix where the rows are indexed by the objects and columns are indexed by the questions/propositions. If the theory contains objects o m for 1 m M and propositions p n for 1 n N,then will be an N ⇥ M matrix. Ordinarily, Booleanalgebrawoulddictatethatthismatrixwouldconsistofonesandzeros. Sucharepresentation has been explored under the aegis of formal concept analysis [30]. However, we need the matrix to be sparse to represent the fact that not all of the combinations of questions and emotions have been encountered due to incomplete knowledge. To this end, we propose that the matrix be a (1,0, 1)- matrix, or a signed matrix/graph, where 1 indicates that the proposition of column-m is true for the emotion of row-n, 1 indicates that it is false, and 0 indicates that it has not been seen or that a contradiction has been encountered. To make the matrix asquare,symmetricadjacencymatrix, we define the adjacency matrix of , A =A() to be an M +N⇥ M +N matrix as follows: A() = 2 6 6 4 zeros(M) T zeros(N) 3 7 7 5 This can be seen as saying that questions and the objects they are asked of are both nodes in a bipartite graph. This graph connects questions to objects, and vice versa,butdoesnotconnect 118 questions with questions nor objects with objects. The absolute value |A| of A describes whether questions have been asked of objects, regardless of whether the answer was yes or no. It is this graph |A| that gives us information about the connectivity of an agent’s knowledge. 5.3.2 Identifying Unconnected Knowledge Converting the theory to the graph A, as described above, allows us to use methods from col- laborative filtering, social network analysis, and spectral graph theory [67, 43]. In this chapter, we use the number of zero eigenvalues of the Laplacian of the graph A to determine the number of connected components of the graph. This can be seen as a measure of the sparsity of our data and can be used to identify the questions that must be asked of certain emotions in order to connect the graph components. The Laplacian L of a signed graph is calculated by subtracting the absolute adjacency matrix |A| from the diagonal absolute degree matrix ¯ D ii = P j |A ij | : L = ¯ D| A| From the matrixL we can tell the number of connected components of A by counting the number of zero eigenvalues. Thus, if there are three eigenvalues that equal zero, the graph is composed of three separate connected components. A graph Laplacian with one zero eigenvalue is a single connected graph. 5.3.3 Determining Which Questions to Ask If, from Section 5.3.2 an agent has identified that it has unconnected knowledge, how can it then plan questions to address the knowledge goal of connecting the components of A?Toanswerthis, we must define the notion of a walk on a graph. A walk of length l on graph A that joins vertices v i and v j is a sequence of vertices u 0 ...u l of A such that v i = u 0 , v j = u l ,and u t 1 and u t are 119 adjacent for 1 t l. According to [6, Lemma 2.5], the number of walks of length l in A that join v i to v j is the entry in cell (i,j) of the matrixA l . Thus, by taking repeated powers of the absolute adjacency matrix |A|, we can determine if nodes v i and v j are connected by walks of length l.Sincethegraphisbipartite, the walks from question nodes to other question nodes or from object nodes to other object nodes will always be even length, and conversely, walks between question and object nodes will be odd length. This behavior is undesirable because we wish to preserve connectedness properties across repeated powers of A l . To remedy this undesirable behavior we can augment the adjacency matrix |A| by adding the identity matrix I to it. At this point, we can say that vertices v i and v j are connected by a walk of lengthl or less if the entry (i,j) of (A+I) l is non-zero. The proof of this, by contraction, that if we imagine that vertices v i and v j are connected by some walk of lengthk<l, but not of length l,thentheremustnotbeself-loopfrom v j to itself after the walk of length k. However, since we added the identity I matrix to A we know that there are in fact self-loops on all of the vertices. The preceding fact allows us to state an alternative test for connectedness and also allows us to identify the question-object pairs that need to be asked to complete the agent’s knowledge. This test can be stated as follows: the graph A is connected if and only if (|A|+I) M+N 1 has no zero entries. This is because the length of a walk with distinct steps, a path,isatmost one less than the number of vertices in the graph, i.e., M +N 1, which would be the case if the graph were a linked list. The question-object pairs that correspond to zero entries in this matrix are precisely the set of candidate questions that need to be asked to connect the agent’s knowledge. The reader may wonder what is the purpose of using the graph Laplacian method to determine whether the graph is connected when this can be accomplished using the method of taking powers 120 Table 5.6: Data processing Preprocessing step Number of questions 0. Raw text 313 1. Text normalization 297 2. Logical representation 222 of (A+I).Whileitistruethatthelattermethodcanaccomplishthesameobjectiveofdetermining whether the graph is complete, the Laplacian eigenvalue method tell how many connected compo- nents there are. This information is useful because it can tell us how many questions need to be asked. Forexample, ifzeroisaneigenvalueofthegraphLaplacian, asdescribedinSection5.3.2, and this eigenvalue has multiplicitym (i.e., there arem eigenvalues equal to zero), the minimum number of questions that need to be asked is the number of edges to create a spanning tree on m nodes. In this chapter, we do not consider any type of weighting on the added edges, so any minimal set of questions that connect the disconnected components of the graph are satisfactory for the purposes of this chapter. This minimal set of edges is simply any tree that connects the m disconnected components. Thus m 1 questions must be asked since there are m 1 edges in a tree of m nodes. One could imagine additional constraints that further identify a “best” set of question-object pairs, for example, the set of questions whose added edges minimizes the diameter of the resulting graph. 5.3.4 Data We collected training data for a question-asking agent using a Wizard of Oz experiment, where humans played both the questioner and answerer roles for the emotion twenty questions game, as described in Section 3.2 and in more detail in [49]. We collected a total of 26 matches from 13 players. Since each match has two players, this averaged 4 matches per player and ranged from 2 to 12 matches. In the data, a total of 23 unique objects (emotions) were chosen, i.e., only three objects were observed more than once. Table 5.6 describes how we processed the questions from raw text (of which there was a total of 313 unique questions asked) to a logical representation, which resulted in 222 unique question nodes 121 Currentlywehavenotdevelopednaturallanguageprocessingtechniquestoextracttheunderlying semantics of the questions. We used manual preprocessing to normalize the questions to a logical formthatisinvarianttothewordingbecausesurfaceformsvariedwidely. Thislogicalformconverted the surface forms to a pseudo-code language with a controlled vocabulary. This standardization involved converting the emotion names to nouns, if possible, standardizing attributes of emotions and the relations of emotions to situations and events. After the standardization, there were a total of 222 question types. In a basic conversion of our data to a graph, there are a total of 23+222 = 245 objects, our data results in an adjacency matrix A of size 245⇥ 245. We will call this the basic graph.However,the object identity questions (e.g., “Is it embarrassment?” for our domain of emotion guessing) identify additional objects. These objects have not been picked as an object but it have been referred to in a question. Although we have not seen the object “embarrassment” in our data as the emotion picked by the answerer, it was referred to in one of the questioner’s questions. To account for this, we augmented our graph with new vertices for these objects that were heard of but not observed. This simply involved adding more object vertices to the graph and connecting them to the object identity question that referred to them. We call this the derived graph. In this case, there were 99 objects, which resulted in an adjacency matrix of 321⇥ 321. 5.3.5 Results The Laplacian eigenvalue analysis of the basic object-question graph showed us that there were 35 zero valued eigenvalues, and hence 35 separate subgraphs of our 245⇥ 245 adjacency matrix A. Although this shows a high degree of sparsity in the graph, analyzing the repeated powers of A+I showed that all of the disconnected components were all single question vertices that represented infrequently asked questions that were answered without a clear yes or no. Since these were not connected to any of the emotions, asking them of any emotion could serve to connect them to the main part of the graph. However, these were already asked at least once without a clear yes or 122 no answer, this might actually suggest questions not to ask. There were no unconnected object nodes due to four highly used questions that connected all the object nodes. Thus, the power series analysis of A+I showed that there were rows and columns of zeros in elements (i,j) and (j,i) for eachquestionithatwasunconnectedandthesetofverticescorrespondingtoobjectswasconnected. The Laplacian eigenvalue analysis of the 321⇥ 321 derived graph showed use that this expanded graph actually had the same number of disconnected components, although these components were larger because the unconnected fragments included question-object pairs instead of single questions vertices. The fact that the number of unconnected components did not increase from the case of the basic graph to the derived graph is an important result. Even though the number of objects more than tripled (from 23 to 99) and no additional questions were asked, the derived graph has similar connectivity characteristics. 5.3.6 Discussion In studying the spectra of our question-object graphs, we were expecting to find large subgraphs that it would be advantageous to connect with strategically chosen questions. What we actually foundwasthat,inthebasicgraph,thedisconnectedsubgraphsweretriviallysinglequestionvertices, which represent knowledge that is only unconnected in the question vertices. The main body of the graph was connected due to several frequently used questions. Therefore, the answer to the question that we set out to answer, “what questions should an agent ask when dealing with unconnected knowledge?”, is that the agent could ask any of these questions of the any objects since in the basic graph the objects are all connected. One possibility that must be considered though is that these questions may be disconnected for a reason: they could be irrelevant. However, the data we based our graphs on is from human-human interaction data, so we must assume that these disconnected questionswererelevantfor the players who asked them.Wediscussplayer-specifictheoriesatgreater length in [49], the conclusion of which is that it is useful to model individual players who may differ 123 in knowledge and question-asking strategies. Thus, if an agent is designed to play Twenty Questions asahumanwould, thenitmakessensetoaskthesequestions. Fromanomniscientperspective, these questions might be irrelevant, but they make sense considering an agent with limited knowledge. Inthederivedgraph,whichhadadditionalverticesaddedforunseenobjectsthatwerereferredto in questions, the situation was different. The derived graph had disconnected object vertices as well as question vertices. If the agent is free to pick any object-question pair to generate a question, then whether the unconnected vertices are questions or objects is not important. However, in the Twenty Questions game, the other player chooses the object, so the agent will have less opportunities to ask questions about specific objects for which its knowledge is unconnected. However, if the object is connected to other questions that are also disconnected from the main body of knowledge, then the agent can ask these questions about any other object that is connected with the main body of knowledge because it will result in the unconnected object being connected via the question. 5.3.7 Conclusion Thissectionpresentedawayforaquestion-askingagenttodealwithincompletenessinitsknowledge. We found that the eigenvalues of the graph Laplacian and the power series of the adjacency matrix give information that an agent can use to determine which questions to pick in order to complete it’s knowledge. In particular we found that the knowledge that was unconnected in the basic object- questionadjacencymatrixwasknowledgeaboutindividualquestions. Whenweincludeddataabout unobserved objects (those that had been asked about but not seen) we found that the object nodes were also disconnected, but that the number of disconnected components remained the same. In this work, we considered only questions that were observed in our data. However, questions are often of a type that can be generalized. For example, “Is the emotion similar to happy?”, can be generalized to “Is the emotion similar to X?” where X could be any word for an emotion. Expanding questions of these types would lead to even more extensive incomplete knowledge, but could lead to better abilities to make inference between the questions, an issue that we will examine in future 124 work. 5.4 Reward-basedShortestPathDescriptionGenerationModel To model how natural language description of emotion can be learned from previously played games, we use a reward-based learning paradigm. An agent with reward-based learning attempts to max- imize some reward function by interacting with an uncertain environment. In the reinforcement learning literature, actions are typically referred to by the letter “a” in mathematical formulae. However, in our case our actions are the questions, so to preclude any confusion between actions and answers, we will use “q” instead of “a” to refer to the actions. We will assume that answers are a form of reward, and that certain answers, such as affirmative answers, will receive more reward than others. This reward is the sum of the individual rewards at each time step, or turn in the twenty questions game: R = T X t=1 r t Where the individual rewards at each time step is defined as a function of the current state, the action, and the resulting state. In this case, the state is the observed sequence of question-answer pairs at a certain point in a dialog. For the dialog state, we use a feature-based representation using discretized versions of the questions and answers, as described in Section 4.2. Let Q be the set of questions and A the set of answers. Then a question q2Q and an answer a2A together compose the feature f =(q,a), where f 2 Q⇥ A. The dialog game state is represented as a sequence of features f 0 ,...,f t .Basedontheassumptionthattheagent’sstatecanbeuniquelydeterminedbyits current belief state and the sequence of questions and answers in it’s episodic buffer, we will assume that the reward at a given time r t is specified as r t =r({f 0 ,...,f t },q,a) 125 Under the assumption that questions that the reward amount is parametrized by the answers to the questions, r t can vary depending on the answer type: r t = 8 >>>>>>>>< >>>>>>>>: r y ifa=”yes” r n ifa=”no” r o else For the values of r y , r n ,and r o ,weused1,0,and 1,respectively,tocapturetheintuitionthat “yes” answers give the greatest reward, while “no” answers do not receive a reward but also do not incur a cost, and unsure answers incur a cost without any reward. To model the generation of emotion descriptions, we consider a good description to be a series of declarative sentences that would result from a high reward strategy of playing EMO20Q. To do this, we construct a game graph of transitions from one question to another with probabilities derived from the observed data and rewards as established by r t and find the shortest path by Dijkstra’s algorithm. The conversion of questions to declarative sentences was done using manual conversion of ques- tionsfromtheEMO20Qdataintoanswertemplatesandthensubstitutionofstringsforthetemplates using a mixture of automatic and manually created regular expression substitution rules. For exam- ple, the frequently used question “Is the emotion you picked a positive emotion?” was represented in as an answer template as “{e:the emotion you picked} is a positive emotion” Another example of a question/answertemplatepairis“Canyoufeeltheemotionaboutyourself?” and“{experiencer:you} can feel {e:the emotion} about {experiencer-reflexive:yourself}”. The regular expression substitution rules interpolate the strings delimited by the curly brackets (“{}”) with a random value from the values indexed by the string preceding the colon in the curly bracket-delimited strings. For example, the bracketed strings indexed by “e” could be replaced by “the emotion you picked”, “the emotion”, or simply “it”. More complex substitutions, like the reflexive pronoun in the example “Can you feel 126 the emotion about yourself?”, were realized as manually created substitution rules. The conversion of an affirmative declarative sentence into a negative or uncertain one was im- plemented as manually created regular expression substitution rules. The simplest rule involved substituting the copula “is” with “is not”. Slightly more complicated, sentences with the modal helping verbs “can”, “could”, and “would” involved negating these verbs as “can’t”, “couldn’t”, and “wouldn’t”. If these rules did not match, there were three main exceptions. One case involved an arbitrary verb predicated of the emotion, e.g. “the emotion makes you smile.” In this the verb was stripped of its inflection and preceded by “doesn’t,” as in “the emotion doesn’t make you smile.” Another case involved predicating a person or group of people experiencing the emotion, e.g., “people feel it at Disneyland.” In this case, the verb was preceded by “don’t”, as in “people don’t feel it at Disneyland.” The third case of exceptions to the general rule was when there was a quantifier like “only” expressed in the sentence, e.g. “you only feel this emotion when you are betrayed.” In this case, the sentences negation would be generated by preceding the quantifier with “not,” as in “you not only feel this emotion when you are betrayed.” This third exception was applied before all the other rules and the two others were applied after. Also, the application of the rules was exclusive, that is only one rule, the first rule that matches, would be applied. For questions that were answered in a way other than affirmative, we simply preceded the whole sentence with “maybe.” This design choice reflected the fact that the “other” category is a catch-all that could not be generated any more precisely. A finer scale categorization of answers that were not answered with a “yes” or “no” is a current area of our research. 5.4.1 Results In Tables 5.7, 5.8, 5.9, 5.10, and 5.11 we give several examples of the descriptions generated from the reward-based shortest path model described above that illustrate several aspects about the generation algorithm. Because the data come from the EMO20Q game, there is a logical progression 127 Table 5.7: Generated description for “happiness”. it is not necessarily directed at someone it is more on the positive side than the negative side it is not a negative emotion it is an emotion that can just pop up spontaneously as opposed to being caused by something you would feel this if it was your birthday it is not more about how one feels about himself it is an emotion that normally makes you feel good it is associated with happiness from general to specific that can be seen in all the examples. In the example for “happiness” in Table 5.7, one can see the generated description for a word that was used frequently in the EMO20Q data. In general this emotion word was also guessed in asmallnumberofquestions,sothepathlengthsinthegraphtendtobeshort. However,thebias towards affirmative answers can cause frequently chosen emotion words to have longer descriptions. The bias toward questions that were answered affirmatively can be seen in the example for “anger” in Table 5.8. In this case, the description generated is long because the path received more reward for a longer path connecting question-answer nodes with affirmative answers, not because the EMO20Q dialogs were long for this emotion word. Ingeneral,whenfrequentlychosenwordsreceivedmanyuncertain,non-yes/noanswersinEMO20Q, the resulting description was short and left out most of the descriptions derived from questions an- swered by “other”. Tables 5.9 and 5.10 show emotion words “enthusiasm” and “devastation”, which resulted in relatively short descriptions, while the matches of the EMO20Q games for these words tended to be longer. The result for “soberness” in Table 5.11 shows an emotion word that was only seen once in the experimentaldataandwasnotsuccessfullyguessedbytheotherplayer. Thisemotionwordonlyhad onematch,sotherewasonlyasinglepaththroughthegamegraph,i.e. thedescriptionsderivedfrom theexactsequenceofquestion-answersinthegame. Thoughthedialogreachedtwentyquestions,the generated description only has sixteen sentences because some of them were non-yes/no questions, so they were excluded. 128 Table 5.8: Generated description for “anger”. it is an emotion that can be directed at another person it is an emotion that you can feel for long durations maybe it can occur without any particular cause it is not positive emotion if you had to give the emotion a color from red to blue it would be closer to red maybe waiting in line make you feel like this it is not neutral you don’t feel it when someone dear has passed away it is not similar to surprise maybe it is something unexpected it is pretty intense it is considered a negative thing to feel maybe it is more about how one feels about himself maybe it can be spontaneous it is not a synonym to happiness joy etc it is like anger it is not more on the positive side than the negative side you wouldn’t feel this if your best friend died this is an emotion which some people seem to be incapable of feeling it is a negative emotion it can’t be related to loving someone the emotion is usually caused by certain situations (as opposed to just popping up in your mind) it can’t be associated with happiness it is a personality attribute as in you could describe someone with this word it is not an emotion that makes you feel good the emotion is not directed at one’s self maybe this emotion involves a verbal outburst or utterance it is not associated with happiness iwouldn’tfeelthisifitweremybirthday maybe people feel that when they are helpless it is like anger Table 5.9: Generated description for “enthusiasm”. it is a positive emotion maybe you can feel the emotion about yourself it is pretty intense it is similar to excited it is not similar to surprise Table 5.10: Generated description for “devastation”. it is not a positive emotion maybe a person who experiences this emotion is less likely to spring into action to do something about this emotion apersonexperiencingthisemotionislesslikelytogetofftheirbuttandtakemeasurestoimprove their condition it is an emotion that lasts a long time generally the emotion is felt in relation to another person it is not a common everyday emotion to experience 129 Table 5.11: Generated description for “soberness”. maybe you feel this way when you are frustrated maybe do you feel this way when you do things you enjoy you don’t associate this emotion with femininity you feel this way when you are sad maybe this is an emotion which is perceived positively in american society this is not an emotion which some people seem to be incapable of feeling this emotion is not associated with self esteem this emotion is associated with the ability to concentrate this emotion is associated with distraction this emotion is associated with level of interest in an activity this emotion is associated with thoughtfulness maybe this emotion is assocated with anticipation it begins with a consonant this emotion’s name doesn’t ends with -ful the spelling of this emotion doesn’t end with -ive the spelling of this emotion ends with -ness In general, these results on emotion description generation show the possibilities of representing natural language descriptions of emotions using game-like models with reward functions for various actions. Currently, the semantics used are fairly shallow, so generated descriptions have redundancy that could be made more concise using a deeper semantic model. Also, the generation of a single path is only one possible application. For other applications, such as recognizing descriptions of emotion in text, it may be better to keep the information in a lattice data structure that would maintain the representation of multiple paths. 130 Chapter 6 Conclusion and Future Work Out of the trunk, the branches grow; out of them, the twigs. So, in productive subjects, grow the chapters. –H. Melville, Moby Dick. This chapter concludes this dissertation and provides some future directions for research in the area. The future directions can serve as a starting point for others who would like to do research in the area of natural language description of emotion. The thesis of this dissertation is that natural language descriptions of emotion are definite de- scriptions that refer to intersubjective theoretical entities. The intuition behind this thesis is that emotions are talked about as if they were distinct entities, but when studying the physical cues there is no necessary nor sufficient observational cues to establish this distinctness. The theoretical solution is that the meaning of natural language descriptions of emotion are established through language, as linguistic meaning in addition to physical stimulus meaning. Stimulus meaning, the component of meaning that is established through knowledge by acquaintance, is one source of sub- jectivity; linguistic meaning, the component of meaning that is established through knowledge by description, is another source of subjectivity. We call these types of subjectivity “intersubjectivity” 131 because they are tightly constrained forms of subjectivity that capture the observation that “words meandifferentthingstodifferentpeople, yetwords alsomeansimilarthingstodifferentpeople” [78]. We supported this thesis by conducting experiments that show that people are able to describe emotionsinwaysthataredefinitelyidentified85%ofthetimeinEMO20Qmatches. Wealsoshowed that interval type-2 fuzzy sets on valence, activation, and dominance scales from Interval Approach surveys capture some of the definiteness of descriptions, but this model did not extend to larger vocabularies. Thegoaltocreateanautomatedagentaroundhuman-inspiredcognitivecomponentsresultedina sequential Bayesian model inside a generalized pushdown transducer (GPDT). This model extended an earlier truth table formulation. The sequential Bayesian model embedded inside the GPDT allowedustosimulatecognitivecomponents,suchasalongandshorttermmemory,belief,inference, and learning. The learning and inference components were inspired by the question asking behavior inchildrenandSocraticepistemology,whichdealwithjustificationofbeliefthroughabductiveverbal behavior. Creating models that simulate natural language description of emotion has both scientific and practical applications. The scientific value of such a model lies in learning how people understand and communicate emotional information using natural language. Practical applications, such as programs that can serve as artificial, human-like agents, come from computational implementations of such a model. Past research in emotional computing has shown that computational models and theoretical scientific advances go hand-in-hand. Further more, natural language description of emotion has much in common with natural language description of other human behavior. In particular, using a Likert-type scale to measure a virtual variable that is described using natural language is common in other types of behavioral coding, with examples in psychology that include autism research and quantitative couples therapy. Therefore, a model for understanding natural language descriptions of emotions provides a vanguard that could lead to advances in such related areas. 132 To be able to computationally simulate human understanding of description of emotions in natural language enables many practical applications. It provides ways for computers to describe emotionaldatainnaturallanguage, whichwillfacilitateemotionalhuman-computerinterfaces. Cur- rently, there has been much focus on recognizing emotional characteristics of natural language, whether it be speech or text. However, these approaches aim to recognize the emotions directly felt and expressed by the speaker, often unconsciously. In contrast, there has been little research that aims to recognize descriptions of emotions that the speaker wishes to communicate, whether it be felt by the speaker or reported about a third-party, an event, or piece of media data. This type of research that we present will enable a user to query emotional data using natural language, report on emotions felt at past or hypothetical situations, and describe emotions observed in other people. These capabilities present us with a significant technical advance that will enable next- generation technologies. An example of this is the notion of presence in electronic communication where a user’s status is broadcasted to others. Previous approaches of document-level emotion analysis, such as classification of documents into categories, may not be ideal for new media and recent trends in communications. These tend towards short documents, such as text messages and microblogs, which allow users to post status messages containing emoticons and tags that are not experimentally controlled emotion labels. Natural language descriptions of emotion, when seen as use of emotional language that does not relect one’s current emotional state, can be seen in a similar light as sarcasm in that the language does not reflect the speakers emotion. With sarcasm, there is emotion expressed by the speaker, however superficial attempts to measure it will be thrown off, just as if superficial attempts to measure emotion encounter a natural language description of emotion. 133 Bibliography [1] Cyril Allauzen and Michael Riley. A pushdown transducer extension for the openfst library. In Proceedings of the Conference on Implementation and Application ofAutomata,2012. [2] AbeerAlwan,YijianBai,MattBlack,LarryCasey,MatteoGerosa,MargaretHeritage,Markus Iseli, Barbara Jones, Abe Kazemzadeh, Sungbok Lee, Shrikanth Narayanan, Patti Price, Joseph Tepperman, and Shizhen Wang. A system for technology based assessment of lan- guage and literacy in young children: the role of multiple information sources. In Proceedings of IEEE International Workshop on Multimedia Signal Processing, Chania, Greece, October 2007. [3] Anonymous. Rhetorica ad Herennium. Loeb Classical Library, 1954. Transl. Harry Caplan. [4] Allan D. Baddeley. The episodic buffer: A new component of working memory? Trends in Cognitive Science,4(11):417–423,2000. [5] Lisa Feldman Barrett. Are emotions natural kinds? Perspectives on Psychological Science, 1(1):28–58, March 2006. [6] Norman Biggs. Algebraic Graph Theory. Cambridge University Press, 1974. [7] Margaret M. Bradley and Peter J. Lang. Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 25(1):49–59, March 1994. [8] Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. IEMOCAP: Interactive emotional dyadic motion capture database. Journal of Language Resources and Evaluation,42(4):335– 359, 2008. [9] Ozan Cakmak, Abe Kazemzadeh, and Serdar Yildirim and. Using interval type-2 fuzzy logic to analyze turkish emotion words. In Proceedings of APSIPA Annual Summit and Conference, Hollywood, CA, 2012. [10] Alfonso Caramazza. How many levels of processing are there in lexical access? Cognitive Neuropsychology,14(1):177–208,1997. 134 [11] Rudolf Carnap. The methodological character of theoretical concepts. Minnesota Studies in the Philosophy of Science, I:39–76, 1956. [12] Noam Chomsky. On certain formal properties of grammars. Information and Control,2:137– 167, 1959. [13] Selina Chu, Shrikanth Narayanan, and C.-C. Jay Kuo. Unstructured environmental audio: Representation,classificationandmodeling. InWenwuWang,editor,Machine Audition: Prin- ciples, Algorithms and Systems. IGI Global, 2011. [14] SimonCoupland,JerryM.Mendel,andDongruiWu. Enhancedintervalapproachforencoding wordsintointervaltype-2fuzzysetsandconvergenceofthewordfous. Inl,editor,FUZZ-IEEE World Congress on Computational Intelligence (WCCI 2010),2010. [15] Trevor F. Cox and Michael A. A. Cox. Multidimensional Scaling. Chapman & Hall/CRC, 2nd edition, 2000. [16] Robert F. DeVellis. Scale Development: Theory and Applications.SagePublications,2003. [17] Laurence Devillers, Sarkis Abrilian, and Jean-Claude Martin. Representing real-life emotions in audiovisual data with non basic emotional patterns and context features. Affective Com- puting and Intelligent Interaction, Lecture Notes in Computer Science,3784:519–526,2005. [18] Ellen Douglas-Cowie, Roddy Cowie, Cate Cox, Noam Amir, and Dirk Heylen. The sensitive artificial listener: an induction technique for generating emotionally coloured conversation. In Proceedings of Language Resources and Evaluation Conference (LREC),2008. [19] Umberto Eco and Thomas A. Sebeok, editors. The Sign of Three: Dupin, Holmes, Peirce. Advances in Semiotics. Indiana University Press, 1988. [20] Paul Ekman. Facial expression and emotion. American Psychologist,48:384–392,1993. [21] M.S. El-Nasr, J. Yen, and T. R. Ioerger. Flame: Fuzzy logic adaptive model of emotions. Autonomous Agents and Multi-Agent Systems,3:219–257,2009. [22] Herber B. Enderton. A Mathematical Introduction to Logic.AcademicPress,2ndedition, 2001. [23] Jean-Marc Fellous and Joseph E. Ledoux. Toward Basic Principles for Emotional Processing: What the Fearful Brain Tells the Robot, volume Who Needs Emotions?: The Brain Meets the Robot, chapter 4, pages 79–115. Oxford University Press, 2005. [24] Johnny R.J. Fontaine, Klaus R. Scherer, Etienne B. Roesch, and Phoebe C. Ellsworth. The world of emotions is not two-dimensional. Psychological Science,18(12):1050–1057,December 2007. [25] Thomas Forster. Logic, Induction, and Sets. Cambridge University Press, 2003. 135 [26] Thomas Forster. Reasoning About Theoretical Entities.WorldScientific,2003. [27] Brandy N. Frazier, Susan A. Gelman, and Henry M. Wellman. Preschoolers’ search for ex- planatory information within adult-child conversation. Child Development,80(6):1592–1611, November/December 2009. [28] GottlobFrege. übersinnundbedeutung. Zeitschrift für Philosophie und philosophische Kritik, pages 25–50, 1892. Eng.: "On Sense and Reference". [29] Susan R. Fussell, editor. The Verbal Communication of Emotions. Lawrence Earlbaum Asso- ciates, 2002. [30] BernardGanterandGerdStummeRudolfWille,editors. FormalConceptAnalysis: foundation and applications.Springer,Berlin,2005. [31] Peter Gardenfors. Conceptual Spaces: The Geometry of Thought. MIT Press, 2000. [32] Alison Gopnik. Explanation as orgasm. Minds and Machines,8(1):101–118,1998. [33] Alison Gopnik. The Philosophical Baby.Picador,2009. [34] Andrew Gordon. Strategy Representation: An Analysis of Planning Knowledge. Lawrence Earlbaum, 2004. [35] Andrew Gordon, Abe Kazemzadeh, Anish Nair, and Milena Petrova. Recognizing expressions of commonsense psychology in english text. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL03),2003. [36] Arthur Graesser, Sidney D’Mello, Patric Chipman, Brandon King, and Bethany McDaniel. Exploring relationships between affect and learning with autotutor. In Affect and Learn- ing Workshop at 13th International Conference of Artificial Intelligence in Education (AIED 2007), Marina Del Rey, CA, USA, July 2007. [37] Michael Grimm and Kristian Kroschel. Rule-based emotion classification using acoustic fea- tures. In Proceedings of the International Conference on Telemedicine and Multimedia Com- munication,2005. [38] MichaelGrimm, KristianKroschel, EmilyMower, andShrikanthNarayanan. Primitives-based evaluation and estimation of emotions in speech. Speech Communication,49:787–800,2006. [39] David E. Heise. Semantic differential profiles for 1000 most frequent english words. Psycho- logical Monographs,79(8),1965. [40] Jaakko Hintikka. Socratic Epistemology: Explorations of Knowledge-Seeking by Questioning. Cambridge University Press, 2007. 136 [41] Jerry R. Hobbs and Andrew Gordon. The deep lexical semantics of emotions. In The Deep Lexical Semantics of Emotions. Workshop on Sentiment Analysis: Emotion, Metaphor, On- tology and Terminology (EMOT-08), 6th International conference on Language Resources and Evaluation (LREC-08),Marrakech,Morocco,May2008. [42] Charles F. Hockett and Stuart Altmann. A note on design features, pages 61–72. Indiana University Press, 1968. [43] Yao Ping Hou. Bounds for the least laplacian eigenvalue of a signed graph. Acta Mathematica Sinica,21(4):955–960,2005. [44] Jeff Howe. The rise of crowdsourcing. Wired Magazine,14.06,June2006. [45] Bruno Jedynak, Peter I. Frasier, and Raphael Sznitman. Twenty questions with noise: Bayes optimal policies for entropy loss. Journal of Applied Probability,49(1):114–136,March2012. [46] Robert John and Simon Coupland. Type-2 fuzzy logic: Challenges and misconceptions. IEEE Computational Intelligence Magazine,7(3):47–52,August2012. [47] Abe Kazemzadeh. Using interval type-2 fuzzy logic to translate emotion words from Spanish to English. In IEEE World Conference on Computational Intelligence (WCCI) FUZZ-IEEE Workshop,2010. [48] Abe Kazemzadeh. Precis of dissertation proposal: Natural language descriptions of emotions. In Proceedings of ACII (Doctoral Consortium),2011. [49] Abe Kazemzadeh, Panayiotis G. Georgiou, Sungbok Lee, and Shrikanth Narayanan. Emotion twenty questions: Toward a crowd-sourced theory of emotions. In Proceedings of ACII’11, 2011. [50] Abe Kazemzadeh, James Gibson, Panayiotis Georgiou, Sungbok Lee, and Shrikanth Narayanan. Emo20q questioner agent. In Proceedings of ACII (Interactive Event),2011. The interactive demo is available at http://sail.usc.edu/emo20q/questioner/questioner.cgi. [51] Abe Kazemzadeh, James Gibson, Panayiotis G. Georgiou, Sungbok Lee, and Shrikanth Narayanan. A socratic epistemology for verbal emotional intelligence. In Preparation. [52] Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, and Shrikanth Narayanan. A sequential Bayesian agent for computational ethnography. In Pro- ceedings of Interspeech,Portland,OR,October2012. [53] AbeKazemzadeh,SungbokLee,PanayiotisG.Georgiou,andShrikanthNarayanan. Determin- ingwhatquestionstoask,withthehelpofspectralgraphtheory. InProceedings of Interspeech, 2011. 137 [54] AbeKazemzadeh,SungbokLee,andShrikanthNarayanan. Acousticcorrelatesofuserresponse to errors in human-computer dialogues. In ASRU, St. Thomas, U.S. Virgin Islands, 2003. [55] Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan. Using model trees for evaluating dialog error conditions based on acoustic information. In Proceedings of the 1st ACM interna- tional workshop on Human-centered multimedia, pages 109 – 114, Santa Barbara, California, USA, 2006. [56] Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan. An interval type-2 fuzzy logic sys- temtotranslatebetweenemotion-relatedvocabularies. InProceedingsofInterspeech,Brisbane, Australia, September 2008. [57] AbeKazemzadeh,SungbokLee,andShrikanthNarayanan. Fuzzylogicmodelsforthemeaning of emotion words. IEEE Computational Intelligence Magazine,2013. [58] Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Andersen, Shrikanth Narayanan, and Abeer Alwan. Tball data collection: themakingofayoungchildren’sspeechcorpus. InProceedingsofEurospeech,Lisbon,Portugal, October 2005. [59] Samuel Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan, and Shiva Sundaram. Super- vised acoustic topic model for unstructured audio information retrieval. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2010. [60] Brian King. The Conceptual Structure of Emotional Experience in Chinese.PhDthesis,Ohio State University, 1989. [61] Peter Klaver, Jürgen Fell, Thomas Dietl, Simone Schür, Carlo Schaller, Christian E. Elger, and Guillén Fernández. Word imageability affects the hippocampus in recognition memory. Hippocampus,15(6):704–712,June2005. [62] GeorgeJ.Klir. Uncertainty and Information: Foundations of Generalized Information Theory. Wiley-IEEE Press, 2005. [63] George J. Klir and Tina A. Folger. Fuzzy Sets, Uncertainty and Information.PrenticeHall, 1988. [64] George J. Klir and Bo Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications.Prentice Hall, 1995. [65] Amit Konar, Aruna Chakraborty, Anisha Halder, Rajshree Mandal, and Ramadoss Ja- narthanan. Interval type-2 fuzzy model for emotion recognition from facial expression. In Proceedings of Perception and Machine Intelligence (PerMIn),2012. [66] Bart Kosko. Fuzzyness vs. probability. International Journal of General Systems,17:211–240, 1990. 138 [67] Jérôme Kunegis, Andreas Lommatzsch, and Christian Bauckhage. The slashdot zoo: Mining a social network with negative costs. In World Wide Web Conference (WWW 2009),pages 741–750, Madrid, April 2009. [68] George Lakoff and Mark Johnson. Metaphors We Live By. University of Chicago Press, 1980. [69] TheodorLandis. Emotionalwords: What’ssodifferentfromjustwords. Cortex,42(6):823–830, August 2006. [70] Chul Min Lee and Shrikanth Narayanan. Emotion recognition using a data-driven inference system. In Proceedings of Eurospeech (Geneva),2003. [71] Q. Liang and J. Mendel. Interval type-2 fuzzy logic systems: theory and design. IEEE Transaction on Fuzzy Systems,2000. [72] Feilong Liu and Jerry M. Mendel. An interval approach to fuzzistics for interval type-2 fuzzy sets. In Proceedings of Fuzzy Systems Conference (FUZZ-IEEE),2007. [73] Feilong Liu and Jerry M. Mendel. Encoding words into interval type-2 fuzzy sets using an interval approach. IEEE Transaction on Fuzzy Systems,2008. [74] C Lord, S. Risi, L. Lambrecht, E.H. Cook, B.L. Leventha, P.C. DiLavore, A. Pickles, and M. Rutter. The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. J Autism Dev Disord., 30:205–223, 2000. [75] Stacy Marsella, Jonathan Gratch, and Paolo Petta. Computational models of emotion. In K.R. Scherer, T. Bänziger, and E. Roesch, editors, A blueprint for an affectively competent agent: Cross-fertilization between Emotion Psychology, Affective Neuroscience, and Affective Computing., chapter 1. Oxford University Press, 2010. In press. [76] Dominic W. Massaro and Michael M. Cohen. Fuzzy logical model of bimodal emotion percep- tion: Commment of the perseption of emotions by ear and by eye by de gelder and vroomen. Cognition and Emoition,14:313–320,2002. [77] Herman Melville. Typee: A Peep at Polynesian Life During a Four Month’s Residence in a Valley of the Marquesas.WileyandPutnam,1846. [78] Jerry M. Mendel. Computing with words, when words can mean different things to different people. In Proceedings of ICSC Symposium on Fuzzy Logic and Appications,1999. [79] Jerry M. Mendel. Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Direc- tions. Prentice Hall PTR, 2001. [80] Jerry M. Mendel. Computing with words: Zadeh, turing, popper and occam. IEEE Compu- tational Intelligence Magazine,November:10–17,2007. 139 [81] Jerry M. Mendel, Robert I. John, and Feilong Liu. Interval type-2 fuzzy logic systems made simple. IEEE Transactions of Fuzzy Systems,14(6):808–821,2006. [82] Jerry M. Mendel, Robert I. John, and Feilong Liu. Computing with words and its relations with fuzzistics. Information Sciences,177:988–1006,2007. [83] JerryM.MendelandDongruiWu. Perceptual Computing: Aiding People in Making Subjective Judgements. IEEE Press and John Wiley and Sons, Inc., 2010. [84] Jerry M. Mendel and Dongrui Wu. Challenges for perceptual computer applications and how they were overcome. IEEE Computational Intelligence Magazine,7(3):36–47,August2012. [85] Jerry M. Mendel and Hongwei Wu. Type-2 fuzzistics for symmetric interval type-2 fuzzy sets: Part 1, forward problems. IEEE Trans,14(6):781–792,December2006. [86] Jerry M. Mendel and Hongwei Wu. Type-2 fuzzistics for symmetric interval type-2 fuzzy sets: Part 2, inverse problems. IEEE Transaction on Fuzzy Systems,14:301–308,April2007. [87] Gilad Mishne. Applied Text Analytics for Blogs.PhDthesis,UniversityofAmsterdam,2007. [88] Emily Mower, Angeliki Metallinou, Chi-Chun Lee, Abe Kazemzadeh, Carlos Busso, Sung- bok Lee, and Shrikanth Narayanan. Interpreting ambiguous emotional expressions. In ACII Special Session: Recognition of Non-Prototypical Emotion from Speech- The Final Frontier?, Amsterdam, Netherlands, 2009. [89] CharlesE.Osgood,GeorgieJ.Suci,andPercyH.Tannenbaum. The Measurement of Meaning. University of Illinois Press, 1957. [90] P.Y.Oudeyer. Theproductionandrecognitionofemotionsinspeech: featuresandalgorithms. J. Hum. Comput. Stud.,59:157–183,2003. [91] Terence Parr. Language Implementation Patterns. The Pragmatic Bookshelf, 2010. [92] Ivan Petrovich Pavlov. Conditioned Reflexes.DoverPublications,2003. [93] Charles Sanders Peirce. Some consequences of four incapacities. Journal of Speculative Phi- losophy,2:140–157,1868. [94] Elizabeth A. Phelps. Human emotion and memory: Interactions of the amygdala and hip- pocampalcomplex. Cognitive neuroscience: Special Issue on Current Opinion in Neurobiology, 14:198–202, 2004. [95] Willard Van Orman Quine. Word and Object (Studies in Communication). MIT Press, kindle edition edition, 1964. [96] Mohammad Reza Rajati, Hamid Khaloozadeh, and Witold Pedrycz. Fuzzy logic and self- referential reasoning: A comparative study with some new concepts. Artificial Intelligence Review,March2012. 140 [97] Ashwin Ram and David B. Leake. Goal-driven learning. MIT Press, 1995. [98] JohnT.E.Richardson. Theeffectofwordimageabilityinacquireddyslexia. Neuropsychologia, 13(3):281–288, September 1975. [99] Edmond T. Rolls. What Are Emotions, Why Do We Have Emotions, and What Is Their Computational Basis in the Brain,chapter5,pages117–146. OxfordUniversityPress,2005. [100] Bertrand Russell. On denoting. Mind,14:479–493,1905. [101] James A. Russell. A circomplex model of affect. Journal of Personality and Social Psychology, 39:1161–1178, 1980. [102] James A. Russell and Albert Mehrabian. Evidence for a three-factor theory of emotions. Journal of Research in Personality,11(3):273–294,September1977. [103] PhillipR.Shaver,UpekkhaMurdaya,andR.ChrisFraley. StructureoftheIndonesianemotion lexicon. Asian Journal of Social Psychology,4:201–224,2001. [104] J. Shin, S. Narayanan, L. Gerber, A. Kazemzadeh, and D. Byrd. Analysis of user behavior under error conditions in spoken dialogues. In ICSLP,Denver,2002. [105] Push Singh, Thomas Lin, Erik T. Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. Lecture Notes in Computer Science: On the Move to Meaningful Internet Systems,2519:1223–1237,2002. [106] Burrhus Frederic "B. F." Skinner. Science and Human Behavior.FreePress,1965. [107] Stanley Smith Stevens. On the theory of scales of measurement. Science,103(2684):677–680, Jun 1946. [108] Shiva Sundaram and Shrikanth S. Narayanan. Classification of sound clips by two schemes: using onomatopoeia and semantic labels. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME),pages1341–1344,2008. [109] David Traum. 20 questions on dialogue act taxonomies. Journal of Semantics,17(1):7–30, 2000. [110] I. B. Türkşen. Computing wiht descriptive and verisitc words. In International Conference of the North American Fuzzy Information Processing Society (NAFIPS),1999. [111] Amos Tversky and Derek J. Koehler. Support theory: A nonextentional representation of subjective probability. Psychological Review,101:547–567,1994. [112] Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems,2004. 141 [113] LuisvonAhn,MihirKedia,andManuelBlum. Verbosity: Agameforcollectingcommon-sense knowledge. In Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI 2006),2006. [114] I. Walker and C. Hulme. Concrete words are easier to recall than abstract words: Evidence for a semantic contribution to short-term serial recall. Journal of Experimental Psychology: Learning, Memory, and Cognition,25(5):1256–1271,1999. [115] Ronald E. Walpole and Raymond H. Myers. Probability and Statistics for Engineers and Scientists.PrenticeHall,8edition,2007. [116] Penny Wark. I wanted to say "i’ve had a stroke" because of the aphasia i couldn’t, April 2008. [117] Cynthia M. Whissell. The Dictionary of Affect in Language,pages113–131.AcademicPress, 1989. [118] Jason D. Williams, Pascal Poupart, and Steve Young. Partially observable markov decision processes with continuous observations for dialogue management. In Proceedings of SIGDIAL Workshop on Discourse and Dialogue,Lisbon,Portugal,October2005. [119] Ludwig Wittgenstein. Philosophical Investigations.BasilBlackwell,1958. [120] Dongrui Wu and Jerry Mendel. A vector similarity measure for linguistic approximation: Interval type-2 and type-1 fuzzy sets. Information Sciences,178:381–402,2008. [121] Dongrui Wu and Jerry M. Mendel. The linguistic weighted average. In FUZZ-IEEE,pages 566–573, Vancouver, BC, 2006. [122] Dongrui Wu, Thomas D. Parsons, Emily Mower, and Shrikanth Narayanan. Speech parameter estimation in 3d space. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME),2010. [123] Lotfi A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning-1. Information Sciences,8:199–249,1975. [124] Lotfi A. Zadeh. Fuzzy logic = computing with words. IEEE Transaction on Fuzzy Systems, 4(2):103–111, May 1996. [125] Lotfi A. Zadeh. From search engines to question answering systems – the problems of world knowledge, relevance, deduction and precisiation. In Elie Sanchez, editor, Fuzzy Logic and the Semantic Web,chapter9,pages163–211.Elsevier,2006. [126] Ning Zhong, Jiming Liu, Y.Y. Yao, and S. Ohsuga. Web intelligence. In Computer Software and Applications Conference,2000. 142 Index A abduction, 76 Abstract scales, 30 ad populem logical fallacy, 68 Agent, EMO20Q, 99 atoms, 5 B Bayesian prior, 13 Behavioral coding, 132 belief, 13, 98 brain, lateralization, 91 C Carnap, Rudoph, 5 Chomsky hierarchy, 100 cognition, subcomponents, 13 common sense psychology, 20 commonsense knowledge, 66 connectivity, graphs, 6 context free languages, 100 context sensitive languages, 100 crowd-sourcing, 67, 68 culture specific models of emotion, 18 D Data, EMO20Q, 85 definite descriptions, 5, 7 descriptions of emotions, 17 dialog, 13 dialog agent, 5 division of labor, linguistic, 67 E EMO20Q, 71 EMO20Q Rules, 71 Emotion Twenty Questions (EMO20Q), 10, 67 emotion vs. question/answer matrix, 13 emotions, naturalistic, 67 emotions, non-prototypical, 67 empathy, 2 episodic buffer, 13, 98 epistemology, 76 experimental methodology, 18 143 Experimental methodology (EMO20Q), 72 ExtensibleMessagingandPresenceProtocol(XMPP), 68 extension, 27 F finite state automata, 100 folk ontology, 66 formal concept analysis, 75 Fuzzy logic, 34 fuzzy logic, 5, 6, 13 G Games, 71 games with a purpose, 67 generalized pushdown transducer (GPDT), 13, 100 Gorgias, 77 Graph, question-object, 117 H hypothesis, 66 I implementation, 8 inference, 13, 98 infinite cardinals, 5 intension, 27 Interpretation, 8 interpretation, 7, 28 Intersubjectivity, 16 Interval Approach surveys, 10 K knowledge, 66 Knowledge goal, 115 L language acquisition, 9, 16, 66 language input and output, 13, 98 Language learning, 95 Laplacian, graph, 6 learning, 13, 98 Likert scale, 132 Likert scales, 10 linear bounded automata, 100 linguistic community, 66 long term memory, 13, 98 M mathematical logic, 9, 74 Meaning, 22 Meaning, emotion words, 29 Meaning, general, 27 meaning, socially-constructed, 66 Membership function, 34 144 Mendel’s maxim, 16 N natural experiment, 69 natural kinds, 5 natural language description of emotion, 2 New media, 133 O ontology, 66 P Peirce, Charles Sanders, 76 perceptual subjectivity, 13, 98 positive reinforcement, 13 power series, of graph adjacency matrix, 6 Presence (instant messaging), 133 pushdown automata, 100 R recursively enumerable languages, 100 regular languages, 100 Russell, Bertrand, 5 S science, 65 sequential Bayesian update, 13 short term memory, 98 signal descriptions, 19 signed matrix/graph, 75 simulation, 3, 9 social signals, 2 Socrates, 77 Socratic epistemology, 16, 76 spectral graph theory, 6 stack, generalized pushdown transducer, 13 subjectivity, vs. intersubjectivity, 16 symbolic communication, 2 T Task goal, 115 theoretical entities, 5 theory, 74 Theory, person independent, 75 Theory, person specific, 74 thesis statement, 5 Turing machines, 100 Typee, Herman Melville, 1 U uncertainty, 5 Unconnected knowledge, 119 W We Feel Fine,18 working memory, 13, 98 145 X XMPP, 68 146
Abstract (if available)
Abstract
This dissertation studies how people describe emotions with language and how computers can simulate this descriptive behavior. Although many non-human animals can express their current emotions as social signals, only humans can communicate about emotions symbolically. This symbolic communication of emotion allows us to talk about emotions that we may not currently be feeling, for example describing emotions that occurred in the past, gossiping about the emotions of others, and reasoning about emotions hypothetically. Another feature of this descriptive behavior is that we talk about emotions as if they were discrete entities, even though we may not always have necessary and sufficient observational cues to distinguish one emotion from another, or even to say what is and is not an emotion. This motivates us to focus on aspects of meaning that are learned primarily through language interaction rather than by observations through the senses. To capture these intuitions about how people describe emotions, we propose the following thesis: natural language descriptions of emotion are definite descriptions that refer to intersubjective theoretical entities. ❧ We support our thesis using theoretical, experimental, computational results. The theoretical arguments use Russell’s notion of definite descriptions, Carnap’s notion of theoretical entities, the question-asking period in child language acquisition, and Socratic epistemology. The experimental data we collected include dialogs between humans and computers and web-based surveys, both using crowd-sourcing on Amazon Mechanical Turk. The computational models include a dialog agent based on sequential Bayesian belief update within a generalized pushdown transducer, as well as a fuzzy logic model of similarity and subsethood between emotion terms. ❧ For future work, we propose a research agenda that includes a continuation of work on the emotion domain as well as new work on other domains where subjective descriptions are established through natural language communication.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Multimodality, context and continuous dynamics for recognition and analysis of emotional states, and applications in healthcare
PDF
User modeling for human-machine spoken interaction and mediation systems
PDF
Emotional speech production: from data to computational models and applications
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Deciphering natural language
PDF
Computational modeling of behavioral attributes in conversational dyadic interactions
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Weighted factor automata: A finite-state framework for spoken content retrieval
PDF
Fairness in natural language generation
PDF
Speech recognition error modeling for robust speech processing and natural language understanding applications
PDF
Emotions in engineering: methods for the interpretation of ambiguous emotional content
PDF
Intelligent knowledge acquisition systems: from descriptive to predictive models
PDF
Enriching spoken language processing: representation and modeling of suprasegmental events
PDF
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
PDF
Improving language understanding and summarization by leveraging auxiliary information through self-supervised or unsupervised learning
PDF
Building and validating computational models of emotional expressivity in a natural social task
PDF
Efficient estimation and discriminative training for the Total Variability Model
PDF
Automatic decipherment of historical manuscripts
PDF
From matching to querying: A unified framework for ontology integration
PDF
Enhancing speech to speech translation through exploitation of bilingual resources and paralinguistic information
Asset Metadata
Creator
Kazemzadeh, Abe
(author)
Core Title
Natural language description of emotion
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/23/2013
Defense Date
02/28/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
crowd-sourcing,definite descriptions,dialog agent,dialog systems,EMO20Q,Emotions,fuzzy logic,human-computer interaction,intersubjectivity,interval type-2 fuzzy sets,natural language processing,OAI-PMH Harvest,question-asking,sequential Bayesian update,Socratic epistemology
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Narayanan, Shrikanth S. (
committee chair
), Georgiou, Panayiotis G. (
committee member
), Lee, Sungbok (
committee member
), McLeod, Dennis (
committee member
), Mendel, Jerry M. (
committee member
)
Creator Email
abe.kazemzadeh@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-298137
Unique identifier
UC11293215
Identifier
etd-Kazemzadeh-1830.pdf (filename),usctheses-c3-298137 (legacy record id)
Legacy Identifier
etd-Kazemzadeh-1830.pdf
Dmrecord
298137
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Kazemzadeh, Abe
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
crowd-sourcing
definite descriptions
dialog agent
dialog systems
EMO20Q
fuzzy logic
human-computer interaction
intersubjectivity
interval type-2 fuzzy sets
natural language processing
question-asking
sequential Bayesian update
Socratic epistemology