Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Managing multi-party social dynamics for socially assistive robotics
(USC Thesis Other)
Managing multi-party social dynamics for socially assistive robotics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Managing Multi-Party Social Dynamics for Socially Assistive Robotics by Elaine Schaertl Short A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2017 Copyright 2017 Elaine Schaertl Short Dedication For my grandparents, Jean and Howard Farley and Arden and Gretchen Short who taught me to value my education. ii Acknowledgements First and above all, I would like to thank my advisor, Maja Matari c. For nearly seven years, she has met with me every other week, read every paper, application, and polit- ically tricky email I have written, taken both me and my ideas seriously, and been my biggest champion both within the university and in the academic community at large. The decision to come to USC to work with Maja is one of the best I have ever made. I would also like to thank the members of my dissertation committee for their advice and mentorship. Gisele Ragusa, David Traum, and Gaurav Sukhatme, all went far beyond the call of duty in meeting with me, listening to my ideas, and asking the types of questions that turn a half-baked notion into a proper research topic. The best of what I have done is thanks to their insight, while any errors are in spite of their eorts. My colleagues in the Interaction Lab and Department of Computer Science have improved the day-to-day experience of graduate school in every way. David Feil-Seifer has provided invaluable advice from the day I rst arrived in the lab as an undergrad- uate research assistant, and was the one to teach me how to say \no". I confess that while \bacon weave" makes a better story, the real reason I joined the lab was because Dave was a model of the kind of academic that I wanted to become: committed to excellence in research without sacricing humor or the willingness to serve as a mentor. iii I would also particularly like to thank Katelyn Swift-Spong for her patience and insights throughout our many collaborations, and Erica Greene for her friendship and commit- ment to improving the standing of women in Computer Science. Aaron St. Clair, Ross Mead, Amin Atrash, Juan Fasola, Caitlyn Clabaugh, Elizabeth Cha, Jillian Greczek, David Becerra, Eric Deng, and Samantha Chen have all made my time in the lab richer, through our collaborations, discussions of research, and social outings. I would also like to acknowledge the contributions of the many research assistants who have worked with me, especially Rhianna Lee, who has been my right hand when it comes to studies and data analysis for many years. The path to an academic career was made smoother by many teachers, supervisors, and mentors. Greg Somers nurtured my love of math and helped launch a cohort of students from the State College Area High School into the academic life. Dana Angluin's classes convinced me to switch my major to computer science, while Brian Scassellati introduced me to the world of human-robot interaction and gave me my start in academic research. My thanks also go to the other members of the Yale Social Robotics Lab, who introduced me to the academic community and provided advice on research and life even after I moved to USC. All of my friends are smarter than I am, which made the transition to graduate school much easier than it might otherwise have been. Ylaine Gerardin, Therese Jones, Malika Krishna, Michelle Vu, and Mengchao Wu all provided tireless emotional support and advice on the day-to-day challenges of academic life. Rahul Krishna, Galen Lynch, Cristiana Baloescu, Goran Micevic, Alex Turin, Susan Johns, Jennifer Urgilez, Christian Csar, Lauren Cannon, Peter and Laura Kemper, and Sophie Powell have all contributed iv to my thriving in various ways, from commiseration and advice to shared dinners and moral support. I love you all, and I look forward to many years of friendship to come. My parents, Pam and Toby Short, were my rst teachers: from the days of \Mr. Floater's Elevator" and playing \What's wrong with the research methodology?" with popular science articles in the newspaper, to providing moral support and advice from the academic advisor's point of view, they have been with me through every step of my academic career so far. Dale Short, in addition to helping design the robot used in much of this work, has admirably fullled his duties as a younger brother by helping me learn how to defend my ideas and never letting me take myself too seriously. My husband, George Short Schaertl, is one of the few people in the world I could have romanced by sending copies of my academic papers. I am fortunate beyond words to have found him, and his un agging support has made this dissertation possible. i carry your heart with me(i carry it in my heart)i am never without it(anywhere i go you go,my dear;and whatever is done by only me is your doing,my darling) | e. e. cummings v Table of Contents Dedication ii Acknowledgements iii List of Figures x List of Tables xiv List of Algorithms xvi Abstract xvii Chapter 1: Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Modeling Moderation . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Understanding Users in Assistive Domains . . . . . . . . . . . . . 5 1.1.3 Robot as Moderator . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2: Background and Related Work 10 2.1 Computer-Supported Collaboration and Learning . . . . . . . . . . . . . 11 2.1.1 Computer Supported Collaborative Work . . . . . . . . . . . . . 11 2.1.2 Understanding Multi-Party Interaction . . . . . . . . . . . . . . . 12 2.1.3 Agents in Multi-Party Interactions . . . . . . . . . . . . . . . . . 13 2.2 Multi-Party Human-Robot Interaction . . . . . . . . . . . . . . . . . . . 14 2.2.1 Human-Robot Teams . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Robots in Social Group Interactions . . . . . . . . . . . . . . . . 16 2.3 Application Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Nutrition Education . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Older Adults and Aging-in-Place . . . . . . . . . . . . . . . . . . 19 2.3.3 Autism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vi 2.4 Socially Assistive Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Socially Assistive Robotics for Education . . . . . . . . . . . . . 21 2.4.2 Socially Assistive Robotics for Older Adults . . . . . . . . . . . . 23 2.4.3 Socially Assistive Robotics for Children with Autism . . . . . . . 25 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 3: Robot and Behavior Controller 27 3.1 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1 Social Considerations . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.2 Engineering Considerations . . . . . . . . . . . . . . . . . . . . . 31 3.2 Robot Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Expressive Movement and Aective Communication . . . . . . . 38 3.3.2 Size and Appearance . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.3 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.4 Cost and Performance . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4: Understanding Users 41 4.1 Socially Assistive Robot for Teaching Nutrition . . . . . . . . . . . . . 42 4.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Socially Assistive Robot for Children with Autism . . . . . . . . . . . . 56 4.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3 Understanding Inter-Generational Interactions . . . . . . . . . . . . . . 81 4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4 Understanding Group Social Interactions . . . . . . . . . . . . . . . . . . 96 4.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 vii Chapter 5: Moderation Model and Core Algorithm 100 5.1 Modeling Moderation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Moderation by Timed Question-Asking . . . . . . . . . . . . . . . . . . 103 5.2.1 General Moderation Algorithm . . . . . . . . . . . . . . . . . . . 103 5.2.2 Discussion Task Moderation Algorithm . . . . . . . . . . . . . . 103 5.3 Validation of the Core Moderation Algorithm . . . . . . . . . . . . . . . 104 5.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 6: Task Goal Moderation 117 6.1 Moderation by Task Assistance . . . . . . . . . . . . . . . . . . . . . . . 118 6.1.1 Task Goal Moderation Algorithm . . . . . . . . . . . . . . . . . 118 6.1.2 Collaborative Task Moderation Algorithm . . . . . . . . . . . . 118 6.2 Validation of Task Goal Moderation . . . . . . . . . . . . . . . . . . . . 120 6.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Chapter 7: Social Graph Moderation 136 7.1 Moderation by Social Intervention . . . . . . . . . . . . . . . . . . . . . 137 7.1.1 Social Feature Moderation Algorithm . . . . . . . . . . . . . . . 137 7.1.2 Collaborative Interaction Moderation Algorithm . . . . . . . . . 137 7.2 Validation of Social Feature Moderation . . . . . . . . . . . . . . . . . . 139 7.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Chapter 8: Combined Task and Social Moderation 149 8.1 Moderation for Social and Task Performance . . . . . . . . . . . . . . . 150 8.1.1 Combined Task Goal and Social Feature Moderation Algorithm . 150 8.1.2 Cooperative Learning Task Moderation Algorithm . . . . . . . . 151 8.2 Validation of Task Goal and Social Feature Moderation . . . . . . . . . 155 8.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.2.2 Algorithm Validation on Synthetic Data . . . . . . . . . . . . . . 159 8.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.2.4 Pilot Validation with Families . . . . . . . . . . . . . . . . . . . . 163 8.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 viii Chapter 9: Summary 168 Bibliography 172 ix List of Figures 1.1 Overall approach used in the dissertation. . . . . . . . . . . . . . . . . . 3 3.1 Custom SPRITE skins used in research. . . . . . . . . . . . . . . . . . 29 3.2 Internal hardware of the SPRITE, with added \neck". . . . . . . . . . . 32 3.3 Schematic of the internal SPRITE design. . . . . . . . . . . . . . . . . . 32 3.4 SPRITE robot movement. . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 SPRITE in toddler clothing for customized appearance. . . . . . . . . . 35 4.1 Diagram of the intervention setup for the nutrition education study. . . 45 4.2 Intervention area setup for the nutrition education study. . . . . . . . . 46 4.3 Child evaluation of the robot in ve categories after the nutrition educa- tion study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Child response times to robot conversational queries and food selection prompts in the nutrition education study. . . . . . . . . . . . . . . . . . 52 4.5 Response categories over time in child-robot interaction in the nutrition education study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.6 The three robot embodiments used in the study of children with autism interacting with a socially assistive robot. Left: mobile humanoid robot. Center: mobile box robot. Right: non-mobile toy (control). . . . . . . . 61 4.7 Outcomes by individual and condition in the child-robot interaction study with children with autism. . . . . . . . . . . . . . . . . . . . . . . . . . . 65 x 4.8 Correlations between robot behavior and child speech in interactions with children with autism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.9 Child vocalization, button-pressing behaviors, and head orientation, sep- arated into object-like and agent-like interaction groups in the child-robot interaction study with children with autism. . . . . . . . . . . . . . . . . 74 4.10 Parent vocalization and child interaction with bubbles, separated into object-like and agent-like interaction groups in the child-robot interaction study with children with autism. . . . . . . . . . . . . . . . . . . . . . . 75 4.11 Robot appearance in the inter-generational interaction study. . . . . . . 82 4.12 The study setup for the intergenerational interaction study. . . . . . . . 82 4.13 Tablet interface for interactive games used in the inter-generational in- teraction study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.14 Two inter-generational groups interacting with the robot. . . . . . . . . 87 4.15 Proportion of utterances referring to the robot by personal pronouns (\she", \he") vs neutral pronouns (\it") in the inter-generational inter- action study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.16 Proportion of utterances directed to the robot versus other in the inter- generational interaction study. . . . . . . . . . . . . . . . . . . . . . . . 90 4.17 Words per minute in the inter-generational interaction study. . . . . . . 91 4.18 Proportion of utterances in which the robot is treated as having aect by participants in the inter-generational interaction study. . . . . . . . . . . 92 4.19 Proportion of utterances in which the robot is treated as having agency by participants in the inter-generational interaction study. . . . . . . . 93 4.20 Time spent speaking while holding the toy and while not holding the toy in the UTEP-ICT dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.21 Proportion of time with [n] speakers in 100ms intervals in the UTEP-ICT dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1 The model of the moderation process. . . . . . . . . . . . . . . . . . . . 102 5.2 Robot hardware used in validation of basic moderation algorithm. . . . 106 5.3 The multi-party interaction experiment setup. . . . . . . . . . . . . . . . 109 xi 5.4 Participant ratings of group cohesion (4-point scale); dashed lines indicate participants who were not asked questions by the moderator robot. . . . 113 5.5 Participant speech (seconds); dashed lines indicate participants who were not asked questions by the moderator robot. . . . . . . . . . . . . . . . . 114 6.1 Screenshot of one user's game screen in the collaborative game. . . . . 124 6.2 The robot and experimental area setup for the validation study for task goal moderation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.3 Score and group cohesion in the task goal moderation validation study. (`*' p< 0:05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.4 Average standard deviation of head pitch and yaw for participants in the task goal moderation validation study. (`*' p< 0:05 `.' p = 0:06) . . . . 130 6.5 Changes in participant behavior as a result of changes in the number of times participants were addressed by the robot in the task goal modera- tion validation study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.6 Number of the robot's suggestions taken by the group in the task goal moderation study, out of 18. (`.' p = 0:067) . . . . . . . . . . . . . . . . 134 7.1 Experimental room setup for social feature moderation validation study. 140 7.2 Screenshot of one user's game screen in the social feature moderation validation study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.3 Standard deviation of users' helpful behavior across the collaboration study conditions. Dotted lines represent family groups. . . . . . . . . . . 145 7.4 Users' helpful behavior across the collaboration study conditions. Dotted lines represent family groups. . . . . . . . . . . . . . . . . . . . . . . . . 146 7.5 Mean of groups' self-reported group cohesion across the collaboration study conditions. Dotted lines represent family groups. . . . . . . . . . . 147 8.1 Screenshot of the game screen for number concepts games used in the combined moderation validation study. . . . . . . . . . . . . . . . . . . 156 8.2 Experimental room setup for combined moderation validation study. . 158 8.3 Modeled skill and output diculty for random-skill user. . . . . . . . . . 159 xii 8.4 Modeled skill and output diculty for low-skill user. . . . . . . . . . . . 161 8.5 Modeled skill and output diculty for high-skill user. . . . . . . . . . . 162 8.6 Modeled skill and output diculty for high-skill user. . . . . . . . . . . 163 8.7 Modeled skill and output diculty for mixed-skill user. . . . . . . . . . . 164 xiii List of Tables 4.1 Agreement values from data coding of features in the child-robot inter- action study with children with autism. . . . . . . . . . . . . . . . . . . 64 4.2 Means and standard deviations of number of button-presses per minute by children with autism interacting with the robot. . . . . . . . . . . . 71 4.3 Ages of the children with autism who interacted the the robot and a description of the sessions of the six participants included in this study (A,B,F,G,I,J). Of the ten overall participants, there were four sets of sib- lings (A-G, D-C, F-J, I-H). *: Robot malfunction that ended the session prematurely. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 The sequence of activities in the inter-generational study session. . . . 83 4.5 The coding of the experiment data in the inter-generational interaction study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6 Number of utterances per group in the intergenerational interaction study. ( y Group only played game for 5 minutes) . . . . . . . . . . . . . . . . . 87 4.7 Age and gender composition of groups in the inter-generational interac- tion study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.1 Kappa Scores for voice activity detection algorithm (*No audio due to microphone malfunction). . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1 Robot-participant requests in the collaborative game. . . . . . . . . . . 124 6.2 The two orderings of conditions in the validation study for task goal moderation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 xiv 7.1 Robot suggestions for helpful behavior in the social feature moderation validation study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.2 The two orderings of conditions in the social feature moderation valida- tion study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.1 Graded-cueing statements to encourage collaboration in the combined moderation validation study. . . . . . . . . . . . . . . . . . . . . . . . . 157 8.2 Robot speech for choosing participant to complete next exercise in the combined moderation validation study. . . . . . . . . . . . . . . . . . . 157 xv List of Algorithms 5.1 Basic Social Moderation Algorithm . . . . . . . . . . . . . . . . . . . . 103 5.2 Instantiated Moderation Algorithm for Storytelling Task . . . . . . . . 105 6.1 Goal-Oriented Moderation Algorithm . . . . . . . . . . . . . . . . . . . 119 6.2 Instantiated Goal-Based Moderation Algorithm for Collaborative Tasks 120 6.3 Collaborative Game Task Request Generation Algorithm . . . . . . . . 125 7.1 Pairwise Social Feature Moderation Algorithm . . . . . . . . . . . . . . 138 7.2 Instantiated Social Feature Moderation Algorithm for Collaborative Tasks138 7.3 Collaborative Game Assistance Request Generation Algorithm . . . . . 142 8.1 Combined Task and Social Moderation Algorithm . . . . . . . . . . . . 151 8.2 Instantiated Social and Task Moderation Algorithm for Turn-Based Learn- ing Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.3 Turn-Taking Collaborative Task Behavior Generation Algorithm . . . . 167 xvi Abstract This dissertation presents a domain-independent computational model of moderation of multi-party human-machine interactions that enables a robot or virtual agent to act as a moderator in a group interaction. A moderator is dened in this work as an agent that regulates social and task outcomes in a goal-oriented social interaction. This model has multiple applications in human-machine interaction: groups of people often require some management or facilitation to ensure smooth and productive interaction, especially when the context is emotionally fraught or the participants do not know each other well. A particularly relevant application domain for moderation is in Socially Assistive Robotics (SAR), where group interactions can benet from a moderator's participation. The evaluation of the model focuses on intergenerational interactions, but the model is applicable to various other SAR domains as well, including group therapy, informal teaching between peers, and social skills therapy. Moderation is formalized as a decision-making problem, where measures of task per- formance and positive social interaction in a group are maximized through the behavior of a social moderator. This framework provides a basis for the development of a se- ries of control algorithms for robot moderators to assist groups of people in improving task performance and managing the social dynamics of interactions in diverse domains. Based on reliably-sensed features of the interaction such as task state and voice activity, the moderator takes social actions that can predictably alter task performance and the xvii social dynamics of the interaction. Thus the moderator is able to support human-human interaction in unpredictable, open-ended, real-world contexts. The model is evaluated in inter-generational applications, where the moderator sup- ports interactions including members of multiple generations within the same family. In interactions with older adults, the moderator can support positive family interactions that lead to strong social support networks and ultimately, better outcomes for health and quality of life. In interactions with families and siblings of children with autism, the moderator can support socially appropriate interactions, and the social integration and learning that may be as important as more traditional cognitive milestones. Simpler al- gorithms are validated in-lab with a convenience population, while algorithms that more fully integrate task and social goals are evaluated in inter-generational interactions with older adults and in interactions between children with autism and their families. The model of moderation provides a framework for developing algorithms that enable robots to moderate group interactions without the need for speech recognition; it com- plements dialogue systems and human-computer interaction, providing conversational agents with additional strategies for managing dynamics of group interaction. The work is intended for short- and long-term deployments of socially assistive robots and virtual agents, and can be applied across assistive domains to facilitate social interactions and improve task performance. xviii Chapter 1 Introduction This chapter provides an overview of the approach to modeling moderation and motivates the use of this approach to moderation with embodied agents. The chapter concludes with an outline of the rest of the dissertation and a list of primary and secondary contributions of this work. 1.1 Overview In this work, moderation is dened as the process of controlling or directing a group interaction. Formally, it is the process by which a goal-directed multi-party interaction is regulated via the social behavior of the moderator, an agent whose primary role is to engage in behaviors that moderate the interaction. This dissertation presents a domain-independent computational model of moderation of multi-party human-machine interactions that enables a robot or virtual agent to act as a moderator in a group interaction. Although this work is broadly applicable to embodied social agents, the evaluation is conduced with a robot as the moderator, leveraging results in human- robot interaction (HRI) suggesting that physically embodied agents are more eective for manipulating human behavior, as well as domain-specic research indicating that 1 physically embodied interactions have benets for task performance (see Chapter 2 for relevant literature). Because moderation is an inherently goal-directed process, the model is evaluated in the domain of socially assistive robotics (SAR). SAR is an area of research that enables social (or sociable) robots (Breazeal, 2003) to assist users in achieving goals through hands-o social interaction (Feil-Seifer and Matari c, 2005), and is often studied in applications in health, wellness, and education. This area of research provides rich opportunities for goal-directed multi-party interactions, with a variety of ways in which the interaction might be moderated. The evaluation of the model is conducted in intergenerational groups, an area where a socially assistive robot can benet human- human interaction, leveraging the richness of human relationships and communication while helping family groups to have more productive and positive interactions. This work presents an iterative approach to developing the model of moderation, that includes studies and analyses to develop an understanding of the application do- mains for robot moderators that inform the development of algorithms based on our model of moderation. These algorithms are then evaluated in multi-party human-robot interactions. This approach is summarized in Figure 1.1. Moderation is formalized as a decision-making problem in which an agent in a multi-party interaction chooses social behaviors that change the interaction state, in order to maximize performance relative to both task goals and social goals. From this formalization, four moderation algorithms are developed and evaluated in group interactions, including peer-group in- teractions, inter-generational family interactions, and family interactions with children with autism. 2 Figure 1.1: Overall approach used in the dissertation. 1.1.1 Modeling Moderation This work models moderation as a decision-making problem, where the moderator is attempting to maximize performance relative to two types of goals in the interaction: social goals and task goals. The social goals are task-independent but not domain- independent; they represent some property of the social interaction that the moderator is trying to maintain or increase. The task goals, on the other hand, are task- and domain-dependent, and represent the purpose of the interaction from the perspective of the participants, whether scoring points in a game or achieving high performance on an academic task. While these goals are not necessarily opposed to each other, it is often the case that moderator attempts to support social goals will inhibit the achievement of some task goals, for example if the robot interrupts participants' own coordinating communication. 3 The decision-making for moderation takes place at a high level in the interaction: the model does not directly address the dynamics of appropriate participation in a multi-party interaction such as turn-taking or gaze behavior, but rather integrates with other models of these behaviors by providing insight into what moderation behaviors should be used. In most of the algorithms in this dissertation work, the timing of the robot's moderation behavior is set to xed intervals, but the moderation algorithms could integrate with a lower-level model of turn-taking in order to wait for an appropriate opportunity to take the turn. Because of this exibility of the moderation model to be used with low-level social behavior controllers, the formalization is independent of the application domain and type of group. This work, however, studies on the core case where the robot serves only as the moderator, in groups with at least three human participants, so that the group interaction is maintained even in the absence of a robot. The model of moderation is used to develop a series of algorithms for controlling a robot moderator. The algorithms are as follows: 1. The core moderation algorithm, in which the robot monitors some feature of an interaction and takes actions at regular intervals. 2. A task goal based controller, in which the robot evaluates task goals and takes actions to support the optimal goal. 3. A social graph based controller, in which the robot models pairwise social features and takes actions to support the optimal edge in the social graph. 4. A task goal and social graph based controller, in which the robot chooses an optimal task goal to support, and chooses actions that support that goal. Because both the actions available to the moderator and the functions that evaluate those actions are domain-specic, for each algorithm this dissertation also presents a 4 domain-specic instantiation of that algorithm. For example, the task-goal controller is instantiated for collaborative tasks with individual goals, with objective functions that enable the moderator to either equalize or reinforce individual performance at the task. Task-specic behaviors for the algorithms are presented with the validation study that evaluates that algorithm. 1.1.2 Understanding Users in Assistive Domains Socially Assistive Robotics has particular promise for supporting group interactions in assistive contexts. Specically, this work focuses on intergenerational family interac- tions, especially those including children. Both the open-ended and multi-party na- ture of the interactions studied in this work and the inclusion of family groups present unique features that must be understood in order to enable a robot to moderate these interactions. Thus this dissertation includes a series of studies and analyses of human- human and human-robot interactions that inform our work on robot moderators. The moderation algorithms developed in this work are evaluated in peer-to-peer and inter- generational interactions, including participants from young children to older adults. This work towards understanding users informs the domain- and task-specic behaviors for the robot moderators, and includes two novel analyses of existing datasets and two novel studies, which address the following high-level research questions: How do participants react to a robot in the role of moderator? What unique features exist in multi-party social interactions? How do children and families interact with a socially assistive robot? The existing datasets include the UTEP-ICT Multi-Party, Multi-Cultural, Multi- Modal dataset (Herrera et al., 2010), and data from a study of child-robot interaction 5 between children with autism and a socially assistive robot (Feil-Seifer and Matari c, 2005). The UTEP-ICT dataset contains storytelling interactions that inspire the task in which the rst moderation algorithm is applied. The study of child-robot interaction between children with autism and a socially assistive robot provides important insights into the interactions of children with autism and a socially assistive robot. This infor- mation is used in the development of the nal moderation algorithm, which is evaluated in family interactions with children with autism, and their parents and siblings. The novel studies presented in this work are a long-term study of child-robot in- teraction for nutrition education, and a study of inter-generational groups interacting in game-playing scenarios with a socially assistive robot. The nutrition study provides key insight into how child-robot interactions develop over time, as well as providing insight for how a robot can use games to support cognitive skill development. The inter-generational study is used to understand how older-adult{adult{child triads func- tion in an interaction with a socially assistive robot. This approach is summarized in Figure 1.1. 1.1.3 Robot as Moderator As discussed above, modeling moderation as a goal-directed process and applying this model on a robot falls into the domain of socially assistive robotics (SAR). Prior work in SAR has primarily focused on one-on-one interactions between robots and users: in many application domains, results from the social sciences suggest that individual attention from an assistive agent will better accomplish a user's goals. There are excep- tions to this, however, when fostering social interactions between people is a goal of the system. One example of this is in SAR to support family interactions, where children can achieve important social developmental milestones, and strong family bonds have been shown to improve health outcomes in older adults. It is in this domain that the 6 evaluation of the model of moderation is focused, although additional validations with adult users demonstrate the generality of the model. Four validations are presented in this work: two studies with convenience-population peer groups and two studies with family groups. The convenience population groups are used to evaluate the rst two algorithms. The third moderation algorithm is evaluated with three types of groups: peer groups, intergenerational family groups with one older adult, one younger adult, and one child, and intergenerational family groups with two younger adults and one child. The nal algorithm is evaluated with synthetic data, and an ongoing validation is presented with family groups with two siblings and one parent, including pilot sessions with family groups in which one child participant has a diagnosis of autism. 1.2 Contributions The main contribution of this work is to enable an embodied social agent to act as a moderator in group interactions, especially intergenerational interactions, al- though this work has wide applications in assistive domains to facilitate learning and enable social interactions. This work contributes to research in computer supported collaborative work and multi-party human-robot interaction with the explicit formal- ization of the moderator role, as well as providing a novel approach to extending socially assistive robotics into multi-party interactions. Finally, the studies related to this work have resulted in a new understanding of the task of managing group dynamics in socially assistive interactions. The following are the primary contributions of the dissertation: 1. A computational formalization of moderation wherein an agent optimizes for the achievement of multiple social and task goals in an interaction. 7 2. Four moderation algorithms based on the computational formalization of modera- tion that enable a robot to support positive social human-human interaction while taking into account task features. 3. Four validations of robot-moderated multi-party interactions, including a vali- dation study of moderation with adults, and two studies with intergenerational interactions, one with tri-generational groups, and a computational validation and pilot study of parent-child-sibling interactions in groups including one child with autism. The following are secondary contributions: 1. An open source open hardware child-friendly tabletop robot used in the studies. 2. An open source dialogue system for rapidly developing high-level robot dialogue actions, including a browser-based expressive animated face and optional teleop- eration interface. 3. Two annotated datasets and two studies for understanding target users. The datasets are from previously-collected video of child-robot and multi-party adult interactions. The new studies are of children in a cognitive nutrition learning task, and intergenerational groups in a game-playing task. 1.3 Outline The remainder of this document is organized as follows: Chapter 2 reviews related work in human-robot interaction, robot manipulation, and multi-party interaction modeling, and describes early work with children that motivates the utility of moderated group interactions with robots. 8 Chapter 3 describes the development of a novel socially assistive robot platform and robot-independent robot control stack for socially assistive applications. Chapter 4 describes work done to understand target users for the socially assistive robot moderator. Chapter 5 describes the model of moderation and the four algorithms developed based on the model. Chapter 6 describes the validation studies of the moderation algorithms. Chapter 7 summarizes the contributions of the dissertation. 9 Chapter 2 Background and Related Work This chapter reviews relevant literature in socially assistive robotics, computer supported collaborative work, and human-robot interaction. Additionally, this chapter summarizes literature relative to the application domains in this work, including nutrition education for children, social skills therapy for children with autism, and social support for older adults. This work is situated in the eld of socially assistive robotics (SAR), an area of research that enables robots to help human users achieve health-, wellness-, and education-related goals, focusing on non-physical interaction and leveraging domain-specic knowledge to create autonomous systems capable of rich, goal-directed social interaction. In order to enable multi-party socially assistive robotics, this work brings a SAR perspective to the intersection of research in computer supported collaborative work (CSCW) studying how technologies function in and support group interactions with research in human-robot interaction (HRI) on how robots can interact with groups of people. This work also leverages domain-specic information to inform the development of SAR interaction scenarios with ecological validity for outcomes in interactions with diverse groups of people, including peer interactions with young adults, family interactions between older adults, adults and children, and interactions including children with autism and their 10 parents and siblings. The following sections review domain-specic background for the target users of the robot moderator, and the relevant literature in CSCW, HRI, and SAR. 2.1 Computer-Supported Collaboration and Learning Researchers in Computer-Supported Collaborative Work (CSCW) and Human- Computer Interaction (HCI) have long been concerned with the problem of facilitating group interactions, especially technology-mediated, workplace-focused interactions, such as remote collaborative work. This dissertation leverages results in CSCW about group interactions, and contributes to this area by increasing our understanding of agents in multi-party interactions. 2.1.1 Computer Supported Collaborative Work In workplace contexts, researchers aim to develop technologies that help to ease the chal- lenges of facilitating remote work, such as the restriction of the facilitator's capabilities and the diculty of coordinating across multiple sites (Niederman et al., 1993). Adkins et al. (2002) show that in a strategic planning task with members of the United States Air Force, the use of such facilitators improved performance in these remote collabora- tion tasks. Other researchers have identied the tasks that a group facilitator takes on in these group interactions, including managing social aspects of the interaction such as participation and relationships, and directing task-related elements of the interaction such as managing the agenda and compiling information (Lopez et al., 2002). In work by Lopez et al. (2002), this takes the form of developing tools to support a human facilitator by autonomously tracking social and task features of the interaction, compil- ing that information, and providing it to the facilitator. However, there are challenges to providing automated meeting support, even when the technology is not expected 11 to engage in real-time social interaction. McGregor and Tang (2017) found that even human annotators were not able to produce useful action items from meeting transcrip- tions. They observed that the task outcomes of meetings were often non-deterministic relative to the spoken interactions, dependent on information not explicitly stated in the interactions, and may have been overall less important than the social functions of the meetings. Child-robot interaction presents additional challenges: in groups with multiple children, Kim et al. (2015) found that children's speech frequently overlaps, especially during times of con ict. 2.1.2 Understanding Multi-Party Interaction To enable this work in computer-supported collaboration, researchers have turned their attention to building computational models of group interactions. Researchers such as Oertel et al. (2013), Furukawa et al. (2011), Herrera et al. (2010), and Yamasaki et al. (2012) have collected corpora of adult multi-party interactions, varying in group size, task formality, and the specic features that have been extracted. With children, Lehman (2014) collected a corpus of multi-child interactions with a virtual agent as the facilitator of the interaction, and Kim et al. (2015) collected data on turn-taking between children in a 3-dimensional puzzle task. This increasing body of data has allowed researchers to begin to model the factors aecting multi-party interactions. A signicant body of work has focused on turn-taking (Bohus and Horvitz, 2011, de Kok and Heylen, 2009, Edlund et al., 2014, Furukawa et al., 2011, Th orisson et al., 2010) and gaze (Al Moubayed et al., 2013, Novick, 2005) in these interactions , while other research has examined the structure of multi-party interactions (Aoki et al., 2006, Gatica-Perez, 2006, Otsuka et al., 2005). Some work has examined unique features for understanding conversation structure, including respiration (Edlund et al., 2014), laughter (Gilmartin et al., 2013) and third-party gaze (Edlund et al., 2012). 12 Jayagopi and Odobez (2013), Katzenmaier and Stiefelhagen (2004), van Turnhout et al. (2005) focus on identifying the addressee of an utterance, while Morency (2009) provides insight into recognizing head gestures during multi-party interactions. 2.1.3 Agents in Multi-Party Interactions A number of researchers have developed embodied social agents that take part in multi- party interactions. For example, Bohus and Horvitz (2009) examine the problem of engagement in open-world scenarios, that is, detecting when someone is trying to inter- act with a system within an open environment where there are many things the person might be trying to do. They propose a model of multi-party engagement, with four actions (engage, disengage, maintain engagement, and no action). They present a real- world deployment of the system with a screen-displayed virtual human, and a multiple choice trivia game task. In this deployment, they nd that the system can initiate and maintain engagement, consistent with our results evaluating robot moderators for multi-party interaction. Other work in human-computer interaction has developed autonomous facilitators for human-human interactions, but primarily focuses on non-copresent scenarios, includ- ing phone calls and video conferences (Kim et al., 2008, Rajan et al., 2012a, Schiavo et al., 2014, Takasaki and Mori, 2009), asynchronous message-board activity (Toriumi et al., 2013), and online programming tutorials (Silver, 2007). Zancanaro et al. (2011) uses a co-located interface with children with autism and an adult facilitator. All of these studies, however, focus on intelligent interfaces that are not necessarily social. Rajan et al. (2012b) does provide an audio-only social interface, however, the behavior of the system is limited to playing a few simple phrases in order to make users aware of their own behavior. Our work contributes to this eld with the development of an 13 explicitly social moderator for multi-party interactions, as well as through validation studies that provide insight into user behavior with autonomous agent moderators. 2.2 Multi-Party Human-Robot Interaction Building on this work in CSCW, researchers in Human-Robot Interaction (HRI) have begun to study how a robot can have a role in group interactions, building on work in HRI that suggests that the physical embodiment of a robot has important eects on learning and interaction. Leyzberg et al. (2012) showed that the physical presence of a robot can increase cognitive learning gains, while with children, Movellan et al. (2009) demonstrated that a social robot could be used to teach young children new words, despite other results suggesting that young children do not learn language from pre-recorded human speech (Kuhl et al., 2003). Similarly, Bainbridge et al. (2008) found that the physical presence of a robot could improve adherence to the robot's instructions, consistent with our results that a robot moderator can eectively change participants' behavior. This use of embodied agents in group interactions causes the HRI research to diverge from the CSCW research in key respects, particularly in the emphasis on co-location and collaboration between humans and robotic agents. Prior work in group human- robot interaction has been conducted primarily from two perspectives: rst, from the perspective of explicit collaborative human-robot teams (e.g., in manufacturing and the military), and second, from the perspective of implicit loosely interacting human-robot social groups. In the rst context, the primary purpose of the robot(s) is to provide unique capabilities to the team, such as a drone providing aerial footage of a scene (Beard et al., 2002). In such work, researchers consider features such as workload, task performance, situational awareness, and preferences about robot autonomy (Gombolay et al., 2017). In the second context, the robot serves specic social features of the 14 interaction, such as shaping participant roles (Bohus and Horvitz, 2010), choosing an addressee (Yamazaki et al., 2012), or serving multiple users (Foster et al., 2012). In this second area, researchers consider features such as participant speech, gaze, social relationships, and preferences about robot agency. Because SAR is inherently both social and goal-oriented, our work aims to contribute to both of these approaches, by developing algorithms that enable a robot to act as a moderator in multi-party human-robot interactions. In our work, the robot has unique capabilities that it shares with users, as in human-robot teaming, but uses them to manage social aspects of the interaction, as in social human-robot interactions. 2.2.1 Human-Robot Teams In the context of human-robot teams, researchers are primarily focused on enabling structured teams to achieve explicit goals. These teams may consist of one human and many robots (Adams, 2009), many humans and one robot (Murphy, 2004), or multiple human users and multiple robots (Lewis et al., 2010). The focus of much of this work is on robots-as-tools, in applications too dangerous for a human or where a robot can provide unique capabilities, such as military (Barnes et al., 2011), search-and-rescue (Goodrich et al., 2009), and manufacturing (Gombolay et al., 2015) contexts. This work is highly task-oriented, with key outcomes relating to task performance, task eciency, or the trade-os between the two. Based on results suggesting that robot autonomy can improve team performance (Gombolay et al., 2015, Lewis et al., 2010), researchers have worked towards developing autonomous capabilities for robots, especially relating to motion planning and task scheduling. For example, Herlant et al. (2016) develop an approach to improving the eciency of a user teleoperating a robot arm by predicting the joystick mode that the user needs and autonomously switching to it. Gombolay et al. (2015) developed a scheduling approach that increases the task eciency of a 15 group by allowing the robot to assist in scheduling sub-tasks. Other work has enabled human users to more naturally and eciently command robot team members, such as in work by Walter et al. (2015) where a robotic forklift can be commanded with pen-based gestures and natural speech. Clare et al. (2012) enable the human-robot team to be more exible to changing mission requirements by allowing human operators to modify robot plans by changing objective functions. 2.2.2 Robots in Social Group Interactions Studies of social multi-party HRI frequently consist of interactions in which the robot presents information (or in one case, drinks) to, or plays a game with, participants, but does not directly attempt to foster interaction between the participants. Presenting information to users, Mutlu et al. (2009) use autonomous gaze cues with teleoperated dialogue progression with a social robot in order to manipulate the roles of the partic- ipants. Similarly, Foster et al. (2012) present an autonomous bartender robot whose control system recognizes human behavior across modalities. High level planning for the system is done using rule-based symbolic reasoning, and incorporates multi-party interactions primarily in order to correctly serve more than one client at the bar. Ya- mazaki et al. (2012) study a museum-guide robot presenting information to multiple visitors through a question-and-answer format with coordinated verbal and nonverbal behavior. In the game-playing domain, Matsuyama et al. (2010) present a framework for choosing autonomous robot behaviors in a multiperson spoken word game where the robot acts as a participant. In another game-playing interaction, Klotz et al. (2011) present an autonomous system in which a Nao robot plays a quiz game with users. The system detects whether a person in the environment intends to engage with the system, in order to decide whether to involve the person in the game. In work by Jung et al. 16 (2015), the robot acts as a mediator, mitigating team con ict with emotional repair strategies. 2.3 Application Domains Work in SAR is driven by real-world needs in the domains of health, wellness, and education, for users from infants to older adults. This dissertation work takes inspira- tion from three application domains where group interactions could prove particularly benecial to achieving user goals. The rst of these is childhood education, particu- larly learning about nutrition. Research in the social sciences suggests that small-group work can improve education gains in the classroom (Parker, 1984). Second, this work is evaluated in family interactions with older adults, who may see particular health ben- ets from strong relationships with family and friends (Jablonski et al., 2005, Seeman, 1970). Finally, this dissertation work is evaluated in group interactions with children with autism, inspired by research suggesting that peer-based (Pierce and Schreibman, 1995) and family-based (Prizant et al., 2003) interventions can improve outcomes for children with autism. The following subsections review the relevant literature in each of these domains. 2.3.1 Nutrition Education Childhood obesity has tripled in the United States over the past four decades (Ogden, 2012). Obesity among children and adolescents has been shown not only to lead to increased risk of being overweight in adulthood (Singh et al., 2008), but also diseases later in life, including high cholesterol and triglycerides, hypertension, and type 2 di- abetes (Freedman et al., 1999). Educating children about healthy food and beverage choices, and motivating them to make healthier choices can help to lower rates of obesity 17 (Spruijt-Metz, 2011). Technological interventions lend themselves to the broad repli- cation and personalization that will be necessary to combat this challenging problem (Tate et al., 2013). Several technology-based nutrition interventions for children have been developed, using smartphones, computers, and video games (see Hingle et al. (2013) or Hieftje et al. (2013) for a review). While these interventions make use of technologies that are widely available, there is some evidence that HRI systems could promote learning more than screen-based technologies, as discussed above. In adults, Kidd and Breazeal (2008) show that SAR systems have the potential to improve eating and exercise beyond a paper- or computer-based intervention. The socio-constructivist theory of learning (Ormrod, 2006) says that learners' knowl- edge is constructed over time and is mediated by social interactions with teachers and peers. The use of social robots in learning scenarios allows students to build their knowledge in this type of social scenario, while retaining the guidance of the robot as a more expert peer, who can also guide the dynamics of an interaction to prevent students from being left out of interactions. Other work in education suggests that group learn- ing can have educational benets (Dillenbourg, 1999, Hill, 1982, Pai et al., 2014), as well as suggesting that group performance is superior to individual performance due to individual learning gains (Schultze et al., 2012). However, as Kreijns et al. (2003) high- lights, having multiple individuals in an interaction with a computational agent does not necessarily mean that they will interact with each other, underscoring the need for computer systems that support group interaction. Our work aims to use SAR to leverage children's excitement about both pretend play and technology, to provide an aordable, accessible, and personalized means of delivering nutrition education and coaching. In the nutrition work presented in this dissertation, we focused on teaching rst-grade children (primarily 6-7 years old, although several 18 participants were 5 or 8 years old at the time of the study). Young children in this age range do not have signicant control over their food choices, allowing a robot companion to instill healthy habits before unhealthy ones become ingrained. 2.3.2 Older Adults and Aging-in-Place Many parts of the world are facing an aging population, with over 40.3 million people aged 65 and older in the United States, a total of 13% of the population (West et al., 2014). This number is projected to grow to 20.9 percent by 2050, and given that over 38 percent of those those aged 65 and older in 2010 had one ore more disabilities (West et al., 2014), this is likely to result in an increase in demand for care-giving. Simultaneously, there is a push towards supporting aging-in-place, that is, allowing older adults to remain in their homes and out of institutions for as long as possible. New technologies can provide an important source of support to allow for this aging-in- place (Mynatt et al., 2000). However, given the increase in the numbers of older adults living alone thanks to advances in technology and medicine (West et al., 2014), there is a need for companionship in addition to assistance in activities of daily living. Social isolation is a major health risk for older adults (Cornwell and Waite, 2009, Shankar et al., 2011, Tomaka et al., 2006), and strong relationships with family and friends can have a protective eect (Jablonski et al., 2005, Seeman, 1970). Socially Assistive Robotics has the potential to provide this combination of health- and wellness-supporting behaviors in while providing companionship to older adults. In this dissertation work, we focus on robot moderators that support family interactions, with the goal of fostering strong family relationships. 19 2.3.3 Autism Autism is a broad category of developmental conditions resulting in a variety of atyp- ical approaches to social interaction, behavior, and communication (Wetherby et al., 1998) that aects as many as 1 in 68 children in the United States alone (CDC, Centers for Disease Control and Prevention, 2014). Some dierences seen between typically- developing children and children with ASD include reduced vocal expressions, anticipa- tory gestures, social reciprocity, facial expression, aection, and eye contact (American Psychiatric Association, 2013, Szatmari et al., 1995). Typical therapeutic interventions to help children with autism 1 navigate the social world include many hours of intensive practice of social skills and related foundational behaviors (e.g., Applied Behavior Anal- ysis (Anderson and Romanczyk, 1999, Foxx, 2008), Pivotal Response Training (Pierce and Schreibman, 1995), or the Early Start Denver Model (Dawson et al., 2010)). One challenge common to all of these interventions is maintaining engagement over the many hours of practice they require, particularly given that the interaction may not have any inherent appeal for the child. In contrast, technological interventions have been consis- tently found to hold a great deal of appeal for children with autism, and allow users to maintain a \spirit of play" (Colby, 1973) while building social and communicative skills (Moore et al., 2000, Sansosti and Powell-Smith, 2008, Swettenham, 1996, Tanaka et al., 2010, Wainer and Ingersoll, 2011). As the eld of computing has advanced, robots have become an increasingly-accessible technology, and research in the elds of Socially Assistive Robotics (SAR) (Feil-Seifer and Matari c, 2012), autism therapy, and social communication in autism (Anagnostou et al., 2014, Kitzerow et al., 2015) has provided evidence that the technological anity of children with autism extends to robots as well as computers (Scassellati et al., 2012). Furthermore, as rule-based systems that can be 1 The terms Autism Spectrum Disorder (ASD) or Autism Spectrum Condition (ASC) emphasize the heterogeneity of the condition or indicate atypical presentations of autism; in this work we will use the term \autism" to include the full spectrum of abilities and diculties exhibited by this population. 20 built to be consistent and predictable, both robots and computers may be appealing to children with ASD who dislike change and surprises (Sigman et al., 1999). This aversion to unpredictability can explain -the draw of technology and the positive inter- actions seen with children with autism and further supports using robots as assistive partners (Chen and Bernard-Opitz, 1993, Jordan, 1998). 2.4 Socially Assistive Robotics This dissertation work falls into the area of Socially Assistive Robotics. Researchers in SAR have developed systems that address various needs of the populations described in Section 2.3. This section reviews prior work in SAR in the domains of education, in- home support for older adults, and therapy for children with autism. This dissertation contributes to the understanding of socially assistive robotics in each of these elds, through the user studies described in Chapter 4 and through the validated algorithms for social interaction described in the rest of this dissertation. 2.4.1 Socially Assistive Robotics for Education Children love imaginary play, and such play can make learning more engaging and eective (Jent et al., 2011, Singer and Lythcott, 2004). There is also a need for indi- vidualized support in the classroom, allowing each child to progress at their own speed, using personalized learning strategies. Child-friendly social robots have the potential to provide this individualized support through imaginary play, allowing children to engage with their lessons in a tangible and active way. Many studies of socially assisitive robots for children are conducted in an educational context. Many of these studies use teleoperated robots; those studies focus more on understanding children's learning with robots than on the technical development of robot controllers. In one such study, Kanda et al. (2012) use a teleoperation system 21 to explore the use of the Robovie humanoid robot teaching a class of children how to use the Lego Mindstorms robot-building kit. Their study includes both individually- focused lessons, group lessons and between-group interactions in a classroom setting. They found that a robot that included purely social behaviors was better-liked by the children, but did not improve learning. Tanaka and Matsuzoe (2012) and Ghosh and Tanaka (2011) also used a teleoperated robot to help children learn second-language vocabulary. In contrast to much other work in the area of child-robot interaction, they explored a care-receiving role for the robot. They found that the care-receiving robot helps to promote vocabulary learning, and that children display a variety of caregiving behavior towards the robot. Baroni et al. (2014) found that motivational cues from a robot improve children's self-reported healthy eating goals. Syrdal et al. (2011) use teleoperation in order to enable elementary-aged students to learn a second language from a remotely-located native speaker of that language. Short et al. (2014), our prior work, described in more detail below, used a teleoperated social robot to teach nutrition information to rst-grade children, and found preliminary evidence that a robot can support learning, mediated by the temperament of the child. Shahid et al. (2014) nd that children playing a game with a social robot rate the interaction as more enjoyable than interacting alone, but not as enjoyable as interacting with a friend. Several studies of child-robot interaction do use autonomous controllers for the in- teraction. Sanghvi et al. (2011) studied the iCat robot in a chess-playing interaction with children, and developed a system to automatically detect children's aect and pro- vide appropriate aective feedback. Leite et al. (2013) studied the use of electrodermal activity to recognize aect in the same context. In mathematics learning, Brown and Howard (2014) used an autonomous robot to improve engagement in the material for adolescents 13-18 years old. In a similar context, Muldner et al. (2014) studied an au- tonomous social robot in a geometry learning task. In one of the largest studies of an 22 autonomous robot for learning, Kanda et al. (2004) deployed an autonomous robot in a two-week long eld trial in an elementary school. They found that the rst graders interacted with the robot for longer than the sixth graders, and that interaction de- creased from the rst to the second week of the interaction. In a key result relevant to this work, they also found that children frequently interacted with the robot in groups (63% of the average rst-grader's interaction with the robot was in a group, and 72% of the average sixth grader's interaction was in a group). Other work has studied the use of autonomous robots for number concept learning in young children, and found that the robot was able to support learning and to detect children struggling with the task (Clabaugh et al., 2015). Finally, in a study of multi-child, multi-robot interaction, Leite et al. (2015a) deployed an autonomous system for emotional storytelling, with two robots performing a story to an audience of either one or three children. The authors found that children interacting with the robot alone were able to recall more narrative details from the interaction, but there was no dierence in emotional understanding, the primary goal of the interaction. In the same interaction, Leite et al. (2015b) found that disengagement cues for children in group interactions are dierent than those for children in individual interactions with robots, and that a model trained on one type of interaction does poorly at predicting disengagement in the other type of interac- tion, suggesting that models intended to be used in multi-party interactions need to be trained on similar interactions, motivating our focus on group rather than individual interactions. 2.4.2 Socially Assistive Robotics for Older Adults Smart devices, sensors, and other technologies can provide immense benet to older adults, especially those with mobility impairments and other disabilities. These tools can help people with a broad range of daily activities, from housecleaning to cognitive 23 assistance (Forlizzi and DiSalvo, 2006, Pollack, 2005). In that context, there has been an increasing interest in technologies such as Socially Assistive Robots that can also provide social support and companionship for older adults. A wide range of non-robotic quality of life and pervasive computing technologies have been developed for in-home use by older adults. Some are designed for passively monitoring users' health and wellness, such as fall detection systems (Chaudhuri et al., 2014, Stone and Skubic, 2015), systems for detecting activities of daily living (Roy et al., 2016), and medication adherence monitoring systems (Hayes et al., 2006). Others both monitor and take a more active role in trying to improve users' health, such as en- couraging exercise (Bickmore et al., 2013) and providing reminders to take medications (Hayes et al., 2009). Building on this work, researchers in human-robot interaction have explored the use of robots for delivering such non-contact health and wellness services, but as physically embodied agents, robots can also engage in physical services in addi- tion to the monitoring functions described above. Accordingly, they have been used to help people with mobility impairments with activities of daily living (Chen et al., 2013, Kapusta et al., 2016), to pick up and deliver objects to older adults (Srinivasa et al., 2010), and to help with cleaning tasks (Xu and Cakmak, 2014). Research has also shown that the physical embodiment of robots can have notable benets for social interaction (Fasola and Matari c, 2012, Leyzberg et al., 2012). Given that social isolation is a major health risk for older adults (Cornwell and Waite, 2009, Shankar et al., 2011, Tomaka et al., 2006), researchers have begun to explore using socially assistive robots to support social and emotional as well as physical health. SAR systems have been used for cognitive and physical exercise (Fasola and Matari c, 2013, Tapus et al., 2009), reminders (Schroeter et al., 2013), and encouragement for activities of daily living (McColl and Nejat, 2013). Our work introduces the novel and important aspect of studying the role of a socially assistive robot in the context of the family, with 24 the goal of improving intergenerational family interactions. The robot does not replace any human contact, but instead helps to nurture existing relationships. 2.4.3 Socially Assistive Robotics for Children with Autism In the area of SAR for children with autism, our work focuses on the role of agency in shaping the child-robot relationship. The agency of the robot has not previously been explicitly manipulated, but the robots used have varied widely in anthropomorphism and behavior. In many cases, more anthropomorphic robots with more agent-like be- havior are matched to users with a higher level of social capability. For example, the 2014 study with QueBall (Salter et al., 2014) focuses on the eect of the robot on children with a high level of diculty with social interaction, while the 2007 study with the FACE robot (Pioggia et al., 2007) includes participants with \high functioning autism". A few studies, such as with the Keepon robot (Kozima et al., 2007) or Nao robot (Tapus et al., 2012) include participants with a wider variety of comfort levels with social interaction. These studies show positive human-robot interactions across participants, but with interactions with very dierent characteristics: for example, in the work by Tapus et al., the nonverbal child's \ attention was focused on the Nao robot, while ignoring the behavior of the experimenter", while one of the more verbal children \rapidly understood that the interaction partner was mirroring his actions and manifested vocalizations and positive aect as a consequence" (Tapus et al., 2012). The work included in this dissertation contributes to our understanding of SAR for children with autism by explicitly manipulating the agency of the robot with regards to both its morphology and behavior, and performing a within-subjects comparison of children's behavior and the characteristics of their interactions with robots of dierent agency. 25 2.5 Summary This chapter reviewed the relevant literature in computer-supported cooperative work, socially assistive robotics, and human-robot interaction, as well as domain-specic back- ground from the social sciences. The work described in this dissertation contributes to all of these elds, by developing algorithms that enable a robot moderator to support multi-party social interactions in assistive contexts, and through validation studies that increase understanding of how these interactions function. 26 Chapter 3 Robot and Behavior Controller This chapter discusses the development of a socially assistive robot used in much of this dissertation work. The design goals for the robot are discussed, including both goals related to the socially assistive applications and engineering design goals. The nal design of the robot, the Stewart Platform Robot for Interactive Tabletop Engagement (SPRITE) is described, as well as a software stack, the Co-Robot Dialogue system (CoRDial), that provides for both SPRITE-specic low-level control and robot-independent high-level dialogue with synchronized behavior. The Stewart Platform Robot for Interactive Tabletop Engagement (SPRITE) was de- signed as part of this dissertation work, specically for use in SAR applications. Based on design goals for a tabletop socially assistive robot, we developed a platform whose mechanical design and software are publicly available to the scientic community for research purposes 12 . The platform primarily uses o-the-shelf and 3D-printed phys- ical components with o-board computation and platform-independent mobile phone faces that allow the computational components to be readily replaced as more power- ful hardware becomes available. The total cost of the mechanical components of the 1 Hardware: http://robotics.usc.edu/sprite 2 Software: https://github.com/interaction-lab/cordial-public 27 robot is aordable in numbers facilitating conducting large-scale, multi-site research studies. Participants found the robot non-threatening and attractive, and requested a customizable appearance, suggesting that we were successful in achieving our design goals. To deploy the robot, we developed several \skins", including two professionally- designed, proprietary skins (Figure 3.1) and a category of toddler-clothing-based skins that can be shared with the research community (Figure 3.5). In addition to the robot hardware, we present CoRDial, the Co-Robot Dialogue system: a software stack implemented in the Robot Operating System (ROS) (Quigley et al., 2009) that includes controllers for the SPRITE as well as robot-independent nodes for playing synchronized speech and behaviors such as body animations and appropriate mouth movements. The robot's face, part of the CoRDial software stack, can be displayed on any device with a JavaScript-enabled browser and can be used to provide any robot with an expressive display-based face. The appearance of the robot, especially the \Chili" skin (Figure 3.1(a)), is based in part on the concept for the DragonBot robot (Setapen, 2012), a tabletop dragon robot with an inverted-delta-platform mechanical design (Clavel, 1988). Additionally, there have been a number of other tabletop robots for social interaction research such as iCat, used to teach children chess (Leite et al., 2008); Travis, a music-listening companion (Homan, 2012); Keepon, one of several small robots designed for children with autism (Kozima et al., 2007, Scassellati et al., 2012); Maki, a 3D-printed robot (Hello-robo); and Jibo, a commercial product still under development as of this publication (Jibo, 2017). Our design occupies a similar niche, but was developed specically for SAR research with more degrees of freedom in the body than Maki, Keepon, Travis, DragonBot, and Jibo and more degrees of freedom in the animated face than Keepon, iCat, and Maki. SPRITE is designed with the needs of the human-robot interaction (HRI) research 28 (a) Robot as "Chili" (dragon) (b) Robot as "Kiwi" (stylized bird) (c) "Chili" face (d) "Kiwi" face Figure 3.1: Custom SPRITE skins used in research. community in mind, with oboard computation to enable the latest computing platforms to be used to control the robot and a robust and easy-to-repair physical design. 3.1 Design Goals Based on our USC Interaction Labs SAR research involving various target user popu- lations and contexts, we developed a set of goals for the robot's design. Because they use social interaction to motivate, coach, and teach, socially assistive robots need to be able to express a range of emotional responses, such as being happy when a patient in 29 physical rehabilitation nishes their exercises or sad when a child has trouble with a math problem. 3.1.1 Social Considerations In order to engage in appropriate social behavior, the robots should also be able to engage in basic nonverbal behavior, such as nodding or using gaze to direct the user's attention. To summarize: 1. The robot must be capable of expressive movement. 2. The robot must be capable of aective communication. A socially assistive robot used in research might be used to study interactions with a wide variety of users, from children with developmental challenges to older adults with motor impairments. For these users, the robot's basic size and shape should be non-threatening and appealing so as to motivate interaction. A robot that appeals to children might not appeal to older adults, so the ability to customize the robot's appearance to the user population is important. Finally, SAR applications often involve long-term interactions with users outside of the direct supervision of an experimenter, so the robot must be as safe as possible. To summarize: 3. The robot should have a friendly, inviting, and non-threatening appearance for both adult and child users. 4. The robot's appearance should be customizable for dierent target users. 5. The robot should be safe to interact with, for novice as well as experienced users. Additionally, because SAR involves non-contact interactions, the robot does not need to manipulate objects. 30 3.1.2 Engineering Considerations In addition to the requirements listed above, because SPRITE is designed to be a re- search platform, it needs to function through multiple development cycles, user studies, and demonstrations, and should remain useful for several years, as computational and electronics component technologies improve. The robot also needs to integrate with existing software; as with the hardware, the software should be modular, allowing indi- vidual components to be replaced as better versions become available. This introduces several additional considerations: 6. The robot should be robust and easy to repair or upgrade. 7. The robot should be suciently inexpensive to allow replication in multi-site stud- ies. 8. The robot should include control software that allows for rapid development of human-robot interactions. 9. The robot's software should integrate with other robotics software, such as the Robot Operating System (ROS) (Quigley et al., 2009). 3.2 Robot Design SPRITE's materials and basic design allow for exibility in the external appearance of the robot, facilitating customization for dierent target applications. 3.2.1 Hardware The underlying design of SPRITE is based on the rotary, six degree-of-freedom Stewart platform (Stewart, 1965). On the platform is mounted a bracket that holds a mobile 31 Figure 3.2: Internal hardware of the SPRITE, with added \neck". Figure 3.3: Schematic of the internal SPRITE design. phone that displays the robots face. In interactions, the robots appearance can be customized with a skin that covers the mechanical components. The robot is powered by an external power supply. Instructions for assembling the robot and power supply, including models for 3D printing, are freely available to the research community 3 . More details about the design of the robot are found in the following sections. 3 http://robotics.usc.edu/ sprite 32 (a) z=+4cm (moved up) (b) z=-4cm (moved down) (c) pitch=+30 degrees (look up) (d) pitch=-30 degrees (look down) (e) roll=+30 degrees (tilted head left) (f) roll=-30 degrees (tilted head right) Figure 3.4: SPRITE robot movement. 3.2.1.1 Dimensions and Workspace The Stewart platform design gives the robot's top platform six degrees of freedom: x, y, z, roll, pitch, and yaw. The robot is approximately 30cm tall, tting easily on a standard desk-sized table. The robot can move approximately 5cm in any direction, and can reach angles of up to around 45 degrees from the vertical axis (see Figure 3.4). The separate power supply is approximately 15cm by 30cm, and with a 1m cord, and can be placed away from the main robot body. 33 3.2.1.2 Hardware Six metal gear-driven servo motors actuate the top platform and are housed within a 3D-printed hexagonal base. A laser-cut top plate is attached to an adjustable 3D- printed phone bracket that can accommodate most contemporary phones. The locking push-pull connector between the robot and power supply prevents the robot from being accidentally unplugged. All electrical components are enclosed in the base away from where a user might access. The cost of the robot's mechanical parts, with high-torque servos and 3D-printed parts, is under $1500 in 2017 U.S. dollars. 3.2.1.3 Appearance The appearance of the robot is modiable either through custom-built skins, as seen in Figure 3.1, or using toddler-sized (24M/2T, US sizing) hooded sweatshirts, which allow for a wide range of appearances. Using purposeful choices of face and clothing colors can be used to control the robot's apparent gender, personality, and other characteristics (Figure 3.5) depending on the relevance for the target population, or customize the robot to the preferences of an individual user. 3.2.2 Software The SPRITE robot platform is controlled by CoRDial, the Co-Robot Dialogue system we developed, made freely available to the research community 4 . The software stack is designed for the SPRITE robot, but includes a number of robot-independent com- ponents as well. The software components, described in detail in this section, are as follows: 1. Motor controller (Python) 4 By request from the authors; released publicly as of December 2016 34 (a) (b) Figure 3.5: SPRITE in toddler clothing for customized appearance. 2. ROS interface for keyframe-based animations (Python/ROS) 3. Speech with synchronized animations and speech-related mouth movements (visemes) (Python/ROS) 4. Browser-based face for mobile phone or computer (JavaScript/ROS) 5. ROS interface for face and TF tracking (Python/ROS) Components 1 and 2 are specic to SPRITE, while the remaining components could be used across a wide variety of robots, provided nodes are implemented to translate the robot-independent messages into the correct behavior. 3.2.2.1 SPRITE Motor Controller and Keyframe Animations The motor control board connects via USB to a computer; all computation is done o-board. The rotary Stewart platform has a relatively simple inverse kinematic model; motor positions are calculated through the intersection of the circles formed by the servo 35 horns and the push rods. Software limits are implemented at both the motor control level (by limiting maximum and minimum ticks) and the kinematic control level (by preventing the sending of invalid poses to the motor controller). In order to allow researchers to develop animations for the robot, we additionally provide a ROS node that takes a JSON-based keyframe specication and plays the scripted animations. Interpolation is done using Bezier curves. The keyframe player node can play facial behaviors as well as robot body movements. 3.2.2.2 Synchronized Speech and Movements CoRDial's central node is the speech player node, which takes a string that is either the text of a dialogue action (script with tags for behaviors) or an index into a le of saved dialogue actions. When a request is received, the node plays the corresponding audio, either cached or from a local or remote text-to-speech server, sends the appropriate visemes (mouth positions associated with speech sounds) and expressions to the face, and sends the appropriate movements to the body to synchronize with the speech. To simplify development of robot interactions, a Python class interface is provided which allows the speech to be triggered with a single line of code. CoRDial can support multiple robots on a single computer, enabling multi-robot, multi-human interactions. 3.2.2.3 Browser-Based Face The face of the robot is implemented in JavaScript, using a 3D animation framework 5 that allows the robot's eyes to be modeled as 3-dimensional spheres. Communication to the robot's face is done via ROS with the Robot Web Tools (Toris et al., 2015). In addition to sending a limited set of visemes, the user can activate action units in the face, designed to be analogous to the action units of the Facial Action Coding System 5 three.js http://threejs.org 36 (FACS) (Ekman and Friesen, 1978). The 3D model of the eyes enables the controller to direct the robot's gaze to any point in the three dimensional workspace of the robot, with appropriate vergence by rotating the spheres towards the point. The 2D projection of this 3D model gives the proper appearance to the eyes on the screen. If the point is provided as a ROS transform, the ROS node associated with the face will automatically maintain the robot's gaze on the desired point, allowing the eye movement to lead the body movement of the robot while maintaining the gaze target. The colors, pupil shape, and face element sizes (except for the eyes) are fully customizable on the face. Two examples can be seen in Figure 3.1. 3.2.2.4 Using CoRDial When the CoRDial nodes are running on a computer with access to cloud services (for text-to-speech), it takes only a few lines of code to get the robot to say \Hello world!" while doing a keyframe-scripted \happy dance" animation with synchronized visemes displayed on the face. On an oine system, a few additional steps allow the user to download speech audio with synchronized visemes. This simple interface allows for rapid development of HRI studies, including quickly iterating through wording in robot speech or changing speech online. Robot speech can also be generated prior to an interaction, allowing for deployment oine or where latency in cloud-based text-to-speech is a problem. 3.3 Discussion The design of the SPRITE robot platform addresses a number of goals, as described in Section 3.1. These goals, developed from expertise in socially assistive robotics research, 37 address both the capabilities and design of the robot, and the engineering considera- tions important for research deployments of SAR systems. The goals and insights are summarized in this section. 3.3.1 Expressive Movement and Aective Communication SPRITE is designed to be capable of expressive movement (Requirement 1) and aective communication (Requirement 2). The Stewart platform design and high-torque servos allow the robot to move quickly and smoothly, and keyframe animation with Bezier curve interpolation allows for rapid design of new animations and behaviors. 3.3.2 Size and Appearance The small size and colorful appearance of the robot makes it non-threatening to both adults and children (Requirement 3). The appearance is customizable (Requirement 4), allowing researchers to tailor the robot to the specic domain being studied. Our pilot studies support the importance of personalization; when asked about in-home use of the robot, many participants specically wanted to be able to customize the robot's appearance. 3.3.3 Safety A number of features are included to ensure that the system is safe for user interaction (Requirement 5). There is an emergency stop button on the power supply, allowing power to the motors to be quickly cut if necessary. The software is not aected by an emergency stop event since the motor controller board remains powered via USB connection to the computer. The connector between the robot and power supply is designed to have no exposed electrical components and cannot be easily pulled out. 38 3.3.4 Cost and Performance The robot's design uses o-the-shelf and 3D-printed components, making the system easy to repair and upgrade (Requirement 6). The total cost of these components is under $1500 6 , making the robot inexpensive enough for replication across multiple sites and multiple in-home deployments (Requirement 7). 3.3.5 Software In addition to the hardware, we developed CoRDial, a software stack that includes the SPRITE control software as well as robot-independent components. At the highest level, the robot can be controlled with a few lines of Python, enabling rapid development and iteration of human-robot interactions (Requirement 8). The robot's software is integrated with the Robot Operating System (ROS) in a modular design that enables individual components to be modied or replaced depending on the deployment context and improvements in the state-of-the-art (Requirement 9). 3.4 Summary This chapter provided an overview of the development of the Stewart Platform Robot for Interactive Tabletop Engagement (SPRITE), used in three of the studies described in this dissertation, and reviewed the ways in which this robot met the design goals set out in Section 3.1. This chapter also discussed the socially assistive robot controller, the Co-Robot Dialogue system (CoRDial), which includes both robot-specic components for controlling SPRITE, and robot-independent social behavior controllers. This robot is used in a number of the studies in this dissertation work, including one of the studies 6 At the time of publication. 39 of SAR target populations in Chapter 4, and the validation studies in Chapters 5, 6, 7, and 8. 40 Chapter 4 Understanding Users In this chapter, we present two novel studies and two analyses of pre-existing datasets that were used to inform the specic behavior of the robot used in the validation studies of the moderation algorithms. Two novel studies are presetned: a study of inter-generational interactions with a socially assistive robot and a study of nutrition learning for rst-grade children. Two analyses of pre-existing data sets, including data coding, were conducted: on data from a study of children with autism interacting with a socially assistive robot and on a dataset of group interactions among adults. A key component of the development of any SAR system is understanding the target users population that the robot will be assisting. In this dissertation work, the robot moderator is intended to support inter-generational groups of participants, including both children and adults of all ages. In order to better understand the needs of these users, we conducted a series of studies and analyses of human-robot interaction in socially assistive contexts. Two novel studies were conducted: one long-term study of nutrition learning for children and one single-session study of inter-generational groups playing a series of games with a socially assistive robot. Additionally, two novel analyses were performed of existing datasets, including coding the data for relevant features 41 and analyzing participant behavior. The datasets used for these analyses were from a multi-cultural, multi-modal study of group interactions among adults and a study of human-robot interaction with children with autism. The core algorithms controlling the robot moderator's behavior are described in Chapters 5, 6, 7, and 8, and were developed based on the model of moderation intro- duced in this dissertation work. However, additional understanding of human-robot interaction is necessary to develop the interaction scenarios, robot speech and move- ment, and other key elements that enable the output of the moderation algorithms to be put into eect in socially assistive human-robot interactions. These studies inform the development of those elements of the interactions. 4.1 Socially Assistive Robot for Teaching Nutrition As part of the large multi-institution National Science Foundation Expeditions in Com- puting project, Socially Assistive Robotics, we engaged in a study taking a SAR ap- proach to teaching nutrition to 1st-grade children, with the goal of promoting positive habits and behavior change through human-robot interaction (HRI). The results of this study serve as the foundation for this dissertation work and povide a preliminary basis for the type of long-term deployment that would be necessary to see major gains in learning, including measuring children's engagement with the SAR system over time and evaluating whether it has the potential to facilitate nutrition information learning. The work described in this section was performed collaboratively with researchers at the Massachusetts Institute of Technology, Yale University, and Stanford University, and was published in the 23rd IEEE International Symposium on Robot and Human Interactive Communication (Short, Swift-spong, Greczek, Ramachandran, Litoiu, Grig- ore, Feil-seifer, Shuster, Lee, Huang, Levonisova, Litz, Li, Ragusa, Spruijt-Metz, and Matari c, 2014). 42 4.1.1 Methodology In the sections that follow, we describe the design of a SAR-based nutrition education intervention, as well as the study used to test the eects of the initial version of the intervention. Our study addressed ve research questions, as follows: Q1: Do children enjoy interacting with the SAR system? Q2: Are children able to maintain engagement with a SAR system over time? Q3: Are children able to build a relationship with the SAR system over time? Q4: What is the impact of the SAR system on child learning of nutrition infor- mation? Q5: What is the relationship between the child's temperament and their interac- tion with the SAR system? 4.1.1.1 Intervention Design While most nutrition education interventions are 12-16 weeks long, most SAR systems are used in much shorter-term interactions (often as little as one session). In order to move towards a longer intervention, we developed a 3-week long, twice-weekly interven- tion for rst grade children, with a within-subjects design. Each of the six intervention sessions consists of an approximately 5- to 10-minute long one-on-one interaction be- tween the child and the robot. We used a Wizard-of-Oz interaction and monitoring design, with a teleoperator providing dialogue selection and some perceptional capabil- ities, with pre-scripted dialogue behaviors (including both speech and movement), and autonomous control of the overall interaction ow. Each session consisted of two parts, introduction and food selection game, as de- scribed in section 4.1.1.4. In the rst session of each week, the robot acted as an expert, 43 giving feedback on food choices one-by-one, while in the second session of the week the child and robot collaborated toward making healthy choices together. 4.1.1.2 Robot and Experimental Setup The intervention centered on an interaction with the DragonBot robot (Setapen, 2012), a dragon-like squash-and-stretch robot with ve degrees of freedom (see Figure 4.2, center- left), covered with a plush skin designed in collaboration with an expert puppeteer. The skin includes poseable arms and tail, as well as removable wings in four sizes, allowing the robot's wings to \grow" from session to session. The robot is approximately 18 inches tall at its full height and can be seen in Figure 4.2. The SPRITE robot described in Chapter 3 was inspired by this robot, but addresses mechanical and design limitations of DragonBot. The interaction was conducted in a small parent-teacher conference room, allowing children to interact individually with the robot. The robot was set up on a table facing the child, with realistic articial foods arranged around it; the arrangement of foods was kept constant across participants. In order to provide the richest possible dataset for future analysis, a Microsoft Kinect sensor, an HD camera, and USB camera are arranged as seen in Figure 4.1, in addition to the two laptops, the DragonBot base station (providing power to the robot), and set of speakers necessary to run the robot. The intervention setup can be seen in Figure 4.2 (the cameras and Kinect are out of the frame). 4.1.1.3 Interaction Structure and Progression The verbal interaction between the child and robot used pre-recorded speech following a script that was written with the assistance of a screenwriter experienced in children's 44 Figure 4.1: Diagram of the intervention setup for the nutrition education study. television, and follows the story of Chili the DragonBot through several weeks of train- ing and preparation for the \big upcoming dragon race". We employed child-centric storytelling techniques such as character development and backstory to create a richer interaction. We also increase the diculty of the task over the three-week interven- tion, challenging the child within her or his zone of proximal development, using a socio-constructivist approach (Ormrod, 2006, Sanchez, 2006) to integrate the increasing diculty and social interaction, maximizing the child's learning potential. Over the course of a six-session, three-week experiment, we covered three nutri- tion topics, one per week: packing a lunch box (choosing whole grains and non-sugary drinks), choosing after-school snacks (avoiding nutritionally bankrupt junk food), and building balanced meals (choosing whole grain breakfast cereals and colorful vegetables at dinner). Each topic had two sessions devoted to it; each session built upon the previ- ous one, using a gradual increase in challenge of content. In the rst session, the robot served as an expert, sampling foods oered to it one at a time and providing nutritional 45 Figure 4.2: Intervention area setup for the nutrition education study. information as feedback. In the second session, the robot behaved in a more cooperative role, in order to provide feedback in a more challenging game where the child chooses several foods at one time. We refer to the rst of these as the expert sessions (ES) and the second as the cooperative sessions (CS). 4.1.1.4 Interaction Segments Before the beginning of the interaction, the participating child was brought from the classroom by an adult experimenter (not the teleoperator). This person, the \fetcher", remained in the room, sitting behind the child, instructed to answer as simply as possible any questions asked by the child, and to make sure that the child did not damage the robot. The child was invited to sit down, and told to wake up the robot. Once the child said something to the robot, the robot woke up and began the rst segment, the introduction. The introduction included a welcome speech, relationship-building through small talk, and nally backstory and character progression. During the CS, 46 the second day on a given nutritional topic, there was a brief review of the nutritional curriculum from the previous session. The interaction then moved into the food choice segment. For the ES, there were two rounds of the food choice game, where the child was asked to choose a food, the robot \tasted" the food, and then provided feedback. In the CS, the child was asked to select a food item until they chose a healthy item, receiving progressively more specic advice about how to improve the selections, based on an evaluation external to both robot and child (in this case, a \magic plate"), and following the theory of graded cueing, an occupational therapy technique (Bottari et al., 2010). For both session types, some vocabulary used in the feedback was explained with a backstory-type dialogue item. 4.1.1.5 Data Collection and Associated Measures In order to build a rich dataset, we recorded teleoperator selections, pre- and post- questionnaires (modied to be administered orally by an experimenter), audio and video data, and Kinect pose data. The audio-visual data were used to transcribe child speech; deeper analysis of the audio, video and Kinect data is beyond the scope of this study. We also collected data from four questionnaires widely used either in the robotics or child development research elds as measures of child personality, child evaluation, and child interaction. The Child Behavior Questionnaire (Rothbart et al., 2001) (CBQ-S; Cron- bach's= 0.87) contains a 4-point Likert-type scale and requires parents to rate various aspects of their child's behavior and personality (aect). It contains three subscales and provides an ecient child temperament measure for school aged children (ages 5-12). The CBQ-S subscales are surgency (positive aect), eortful control (self-regulation) and negative aect. The following three were administered directly to the children, in an interview-type format to accommodate the children's limited cognitive and develop- mental abilities and to avoid biases associated with diversity in child reading abilities 47 (Wilson, 2008). The Perceived Value Questionnaire is adapted from an evaluative ques- tionnaire by Lombard et al. (2000) and also used in SAR related research by Kidd and Breazeal (2004) (Cronbach's = 0.95). This questionnaire required child participants to rate their interaction with the robot using an 8-point Likert-type scale. The \utility" and \value" subscales of this questionnaire were administered after the rst interaction that the child had with the robot, and then again after the nal interaction with the robot at the culmination of the intervention. The Social Presence Questionnaire was used to quantify the eectiveness of the robot's social capabilities (or social presence). The social presence of the robot was measured by an 8-point Likert-type scale using questionnaire items established from Jung and Lee (2004), (Cronbach's = 0.82). We administered this questionnaire after the rst intervention session and at the end of the intervention. Finally, the Adapted Companion Animal Bonding Scale (Poresky et al., 1987) asks the child to rate the various features of the robot including whether the robot is bad/good, loving/not loving, cuddly/not cuddly, and warm/cold. This questionnaire was administered twice, rst before the children interacted with the robot (but after a brief introduction to the robot), and again at the culmination of the intervention. 4.1.1.6 Hypotheses Six research hypotheses were developed that are associated with the study's research questions: H1: Participants will have a positive reaction to the SAR system, that will increase over time. H2: Children will be more engaged with the robot over time, as measured by a decrease in their response time to the robot's verbal questions (a well-established proxy for engagement in the child development literature (Thomason and La Paro, 2009)). H3: Children will use more complex speech with the robot over time, as measured by 48 mean length of utterance (MLU) and a qualitative categorization of their utterances. H4: Children's knowledge of, and comfort with, nutritional information will increase, decreasing their time to make a choice of food when prompted by the robot. H5: Children's performance on the nutrition task will improve over time, as measured by a choice indicating ratio. H6: Children with a positive aect and higher self-regulatory ability will have greater interaction with the robot. 4.1.1.7 Study Population As we hope to create systems that can be deployed with children across a wide range of economic and social backgrounds, we collected data from two highly diverse sites within the United States: a West Coast site in an urban center and an East Coast site that drew from primarily suburban households. We treated these samples as one cohort in our analysis to highlight commonalities relevant to a more robust long-term deployment. There were 26 participants in the study with age range of 5-8 (twenty-two 6-7-year-olds; two 5-year-olds; two 8-year-olds). Seventeen of the children were female (65%) and the remaining nine were male (35%). In terms of ethnicity, the sample was diverse and representative of the areas in which the participants reside. The largest ethnic group represented was children of Hispanic descent (69%), 19% were children of African American descent and the remaining 6% were European American or had mixed ethnicities. Approximately 62% of the study participants reside in the western United States in a large urban city and the remaining study participants (38%) came from the eastern most region of the U.S. The participating children's parents' ages ranged between 20-49, with 25% of the parents in the 20-29 year age range, 44% between 30-39 years of age, and 31% between 40-49 years of age. With regard to parental education, approxi- mately 19% of the children's parents did not graduate from high school, 19% were high 49 school graduates, 43% had completed some college, 19% of the parents had bachelor's degrees, and the remaining 6% had completed graduate or professional education. 4.1.2 Results 4.1.2.1 Evaluation and Perception of the Robot In general, the participants in our study had positive perceptions and reactions to the socially assistive robot before, during, and and after the SAR intervention. The children's perception of the robot was high (M = 7:58;SD = :76, 8-pt. scale) pre- intervention and remained relatively constant through the culmination of the interven- tion (M = 7:45;SD = 0:80). Specic to the measures of robot evaluation, the children perceived the robot as useful (M PRE = 7:35;SD = 1:9;M POST = 7:31;SD = 1:7) and enjoyable (M PRE = 8;SD = 0;M POST = 7:85;SD = 0:54). They also rated it as exciting (M PRE = 8;SD = 0;M POST = 7:92;SD = 0:40), valuable (M PRE = 7:62;SD = 1:4;M POST = 7:85;SD = :63), as having strong social pres- ence (M PRE = 6:81;SD = 0:76;M POST = 7:04;SD = 0:43), and as attractive (M PRE = 6:65;SD = 1:7;M POST = 7:89;SD = :35). We did not nd signicant dierences between the pre- and post-intervention ratings, as seen in Figure 4.3, likely due to the extremely high positive evaluation. Thus the rst part of H1, the positive perception of the robot, is supported, but not that there is an increase in this perception over the duration of the intervention. 4.1.2.2 Child-Robot Interaction To determine the level of engagement between the robot and the child during the inter- vention, we calculated the children's mean response time (in seconds) when prompted by the robot's verbal questions. Although we found that child response times ebbed 50 Figure 4.3: Child evaluation of the robot in ve categories after the nutrition education study. and owed across intervention sessions, the comparative results of these calculations re- vealed that the mean child response time decreased from the rst day of the intervention to the end of the intervention. The mean response time across children during the rst intervention was 4.3 seconds and the mean response time was 3.5 for the last session, indicating a 0.8 second mean decrease in response time. A paired t-test was performed to ascertain whether there was statistically signicant response eciency between the beginning and culminating weeks of the intervention; we use this pre-post comparison of means throughout the analysis of this study, because the limited length of the inter- vention renders mid-intervention measurements and predictive analyses non-signicant (see Pedhazur and Schmelkin (1991)). The change response eciency was signicant M = 1:35;SD = 3:13;N = 26;t(25) = 2:206, two-tailed, p < 0:05 (Figure 4.4). A Cohen's d statistic was also computed to measure this eect size (Cohen's d = :57), which indicates a moderate intervention eect. This provides some support H2, that 51 Figure 4.4: Child response times to robot conversational queries and food selection prompts in the nutrition education study. children's engagement increases over time, since this decrease in response time could be caused by children being more focused on the interaction. 4.1.2.3 Conversation as a Means of Interaction We were also interested in the level of conversation that the child had with the robot. To demonstrate this, we calculated each participating child's mean length of utterance (MLU). The MLU is a widely utilized proxy for speech production by researchers and practitioners in the speech and language elds. We dene an utterance to be a complete response to the robot for the purposes of this calculation. The mean length of utterance for participants across all weeks of the intervention was 28.41 (SD = 24:57). There were successive changes in response words to the robot over time. There was a 2.29 mean increase in utterance length from the start to end of the intervention period, in accordance with H3, however this trend did not reach statistical signicance. 52 In order to explore dierences in the types of child utterances, we employed an empir- ical analysis of the transcripts. We analyzed the content of 137 hand-transcribed verbal responses employed by the participants, categorized these interactions and quantied them (via frequency distribution) to measure changes in types and frequency of interac- tion over time. The categories we used to identify patterns in the transcripts included: (a) simple responses, which included one-to-three word responses to robot prompts (e.g., yes, no, huh, okay), (b) expansions that included details (e.g., it's healthy, I like it, a magic plate!), and (c) relational responses, which demonstrate evidence that the child was beginning to relate to the robot (e.g., he's hungry? you said you didn't like it). While there was great variability in response types across children, we noted robust changes in types of interactions over time. For example, one child began his interac- tion with \yeah" and \maybe," and proceeded to recall what the robot had said in a previous session and said \you said you don't like mashed potatoes!", demonstrating both relational speech and, given the tone of the interaction, humor with the robot. We computed the percentage of each category by child and then computed an average percentage per week, per category for the study sample. Figure 4.5 illustrates these changes between week one and three of the intervention. Thus we nd some support for H3, that children use more complex speech with the robot. 4.1.2.4 SAR in the Context of Learning Similar to our measurement of child-robot interaction, we calculated the mean time that children in the intervention took to make food choices in response to the robot's prompting. The mean response time increased from 13.11 seconds in the rst session to 15.36 seconds in the nal intervention session, a 2.25 second increase. A paired t- test was performed to ascertain whether the child's food selection time became more ecient (measured by decreases in response times). The mean response dierence for 53 Figure 4.5: Response categories over time in child-robot interaction in the nutrition education study. food selection (M = 2:24;SD = 5:26;N = 26), (t(25) = 2:174, two-tailed, p < 0:05), Cohen's d = :45 providing evidence of the eect size of this change (see Figure 4.4). Although this is in direct contradiction to H4, the child task expectations increased in diculty each week. Accordingly, we interpreted this result as an indication that as the tasks in the interventions became successively more challenging, the children took additional time to make thoughtful selections. As an indicator of whether the child participants had learned how to make healthy food selections, we computed the ratio of poor to healthy food selection for each child as a way to normalize choice quality across sessions and facilitate comparison. We found a trend towards improved choices, with a decrease of the mean poor choice ratio from 0.46 to 0.39, indicating that the children may have begun to make healthier food selections between the rst and last intervention sessions, weakly supporting H5, however this change was not statistically signicant. 4.1.2.5 Temperament, Interaction, and Learning As a nal comparative measurement, we wished to determine if child temperament and associated behavior in uenced child interaction with the robot. Therefore, we used 54 CBQ-S as a comparative metric for our other research results. The mean child self- regulation score on the CBQ-S was 3.08 (SD =:53), the mean child positive aect level was 2.95 (SD = 0:38), and the mean negative child aect was scored as 2.40 (SD = 0:34) for our study sample. Specically, using this measure, we wished to determine whether child temperament predicted child interaction time, child food selection time, or food ratio (described above). Accordingly, each of these variables was selected as dependent variables in multiple regression analysis, with Bonferroni error correction. A step-wise regression was selected using our theoretical perspectives as a means of determining the order in which independent variables would be loaded into the model. Accordingly, socio-demographic variables were loaded into the model rst, followed by child tempera- ment variables. These analyses revealed that positive child aect was moderately predic- tive of food ratio (serving as a proxy for learning), r 2 = 0:359;F (3; 14) = 6:15;p<:05, supporting the rst part of H6, that children with a positive aect will have greater learning agains. Child-robot interaction and food selection did not contribute to the model and therefore are not predicted by child temperament (thus leaving H6 unsup- ported), however child temperament (which contributed to the model) may predict child learning. 4.1.3 Discussion The goal of this study was to examine the feasibility of a SAR-based intervention for teaching children about nutrition. We wished to examine the children's evaluation of, and engagement with, the robot, changes in their verbal interaction with the robot over time, the learning eects of the SAR system, and the eect of child temperament on interaction and outcomes. We nd that children rate the robot highly positively across all measures used, and retain a positive perception of the robot after a three-week intervention. We nd 55 that over the course of the intervention, children respond more quickly to the robot's verbal queries, suggesting that they not only maintain engagement with the SAR system over time, but likely become more comfortable with the system, perhaps even building rapport with the robot character. While we do not nd changes in their MLU, we do nd that they use more complex speech with the robot over time, suggesting that the social presence of the robot encouraged child-robot relationship building. We do not nd a relationship between between child temperament and social interaction with the robot (H6), but this may mean that children with diverse temperaments can interact equally well with the SAR system. In terms of the educational goal of the intervention, we nd limited evidence that the children in the study learned about nutrition over the intervention (H5). We nd that positive aect is moderately predictive of healthier food choices (our proxy for learning), which is consistent with the literature on education. Finally, although we nd that children take longer to make food selections over time, contrary to H4, we have modest evidence that children choose healthier foods over time, suggesting that the increase in time may be due to greater thoughtfulness in their responses. The results of this study motivate our subsequent work on developing SAR interac- tions that support cognitive skill development in children. The validation studies in this dissertation use the graded-cuing-inspired approach to robot speech developed in this study as well as similar game-playing interaction contexts and backstories and robot personalities that enable richer interactions. 4.2 Socially Assistive Robot for Children with Autism In this section, we describe the results of an analysis of interactions with children with autism and a socially assistive robot. The system design and data collection were 56 performed by Feil-Seifer and Matari c and are described in a publication in the Human- Robot Interaction (HRI) conference (Feil-Seifer and Matari c, 2011). The annotation and analyses described in the following sections were performed as part of this dissertation work. Prior studies of SAR for children with autism have often focused on a limited num- ber of participants with narrow inclusion criteria, but taken in aggregate, this body of work includes a wide diversity of robot morphologies, robot behaviors, and child devel- opmental proles, nearly all of which show positive child-robot interactions and several of which include positive therapeutic outcomes. In this work, we study the role that a robot's agency plays in interactions with children with autism. We found limited dierences in children's behavior with varying robot morphologies, but a post hoc analysis suggests that individual reactions to the robots can be classied as agent-like or object-like, and that these individual reactions to the robot drive the quality of interactions more than the properties of the robots themselves. While robots show promise as therapeutic tool for children with autism due to their appeal alone, they contrast with purely computer-based interventions in that they are technological artifacts that can also act as embodied agents. That is, robots in in- teractions with humans can function as social agents as well as mechanical artifacts, depending on both their behavior and appearance. A body of research in HRI has ex- amined the ways in which robot designers can eect users perceptions of robots agency. Agency, the capacity, or perceived capacity, of individuals to act as independent entities (Pickering, 1993), can be broken into a number of dierent types and in this experiment we focus on exploring morphological and behavioral agency. Morphological agency refers to attributes of an agent in relation to its physical form and work in socially interactive robots has resulted in 4 values for robot morphology: anthropomorphic, zoomorphic, 57 caricatured, and functional (Fong et al., 2003). Behavioral agency discusses the compo- nents of an agent's agency related to its actions and demonstrated abilities and advises the human user's role in the interaction (Scholtz, 2003). A combination of the aor- dances of the robot embodiment, or morphological agency, with initial demonstrations of ability, or behavioral agency, are likely strong predictors for what type of interaction the user will engage the system in. Building systems that look like familiar social agents such as humans or animals and behave the way that those familiar social agents behave will likely result in agent-like interaction and vice versa. These perceptions may not al- ways function on the conscious level: Takayama (2009) found that people react to robots as if they are agents, even though upon re ection they do not always attribute agency to those robots. Perceived agency also depends on robots behavior: Levin et al. (2013) develops a model of how people conceptualize agency that takes into account how their perceptions of a robot's agency can vary with the robot's behavior. Researchers have found that perceptions of robot agency are sensitive to manipulation, by such means as having the robot cheat at a game (Short et al., 2010) and changing robot movement patterns (Avrunin et al., 2011) and gaze behavior (Srinivasan and Murphy, 2011). Robots have great potential as untiring social partners whose behavior can be tuned to best benet an individual child. However, when developing SAR systems for children with autism, researchers must consider that one atypical pattern of development in these children is a substantial delay in performance on tasks designed to measure Theory of Mind, or \the ability to infer other persons' mental states and emotions" (Br une and Br une-Cohrs, 2006). This has led some researchers to claim that Theory of Mind is ab- sent or severely impaired in children with autism (Baron-Cohen et al., 1985), although subsequent research suggests that only the magnitude of the delays are unique to this population (Yirmiya et al., 1998), and more recent work has found that deaf children from hearing families perform similarly on these Theory of Mind tasks (Peterson and 58 Siegal, 2000). Another typical characteristic of children with autism is a decreased inci- dence of symbolic and pretend play (Baron-Cohen, 1987), especially in spontaneous and open play scenarios (Jarrold et al., 1996). While there is not a clear dierence between symbolic and functional play, and when researchers explicitly elicit pretend play, less of a dierence is seen between children with autism and children with typical development (Jarrold, 2003), this pattern may still aect how children with autism interact with robots in play scenarios. Because of these developmental dierences, we expect that children with autism may perceive robot agency dierently from children with typical development, and we explore those dierences in this work. A critical component, then, of developing SAR-based therapeutic interventions for children with autism is under- standing how they perceive and relate to robots, particularly when researchers expect that robots will function in the interaction as agents; that is, entities with goals, mental state, and the capacity for independent action. 4.2.1 Methodology We developed an autonomous robot with both object-like and agent-like properties, and compared it to a robot in which behavioral agency is reduced by having it act randomly, a robot that has reduced morphological agency by removing the humanoid torso portion of the robot, and a completely non-agent-like toy. We performed a within-subjects study of these robots, motivated by the following research questions: How do the robot's bubble-blowing and movement behaviors aect children's vo- calization (a proxy for social/agent-like interaction)? How does the morphological and behavioral agency of the robot aect children's social (measured by vocalization) and non-social (measured by button-pressing) interactions with the robot? 59 Do some children have more object-like or more agent-like interactions? What patterns in children's behavior are seen in object-like interactions? What patterns are seen in agent-like interactions? 4.2.1.1 Robot Behavior A complete description of the robot controller design can be found in prior work by Feil-Seifer and Matari c (2011). In this section we provide a brief overview of the robot morphologies and behavior. In order to study how the agency of the robot's behavior aected the interactions, the embodiments of the robot were varied to enable the study of how the robot's appearance of agency (or lack thereof) aected the interaction (see Figure 4.6), including one robot with humanoid appearance, and two with reduced morphological agency: Humanoid robot: A humanoid robot torso mounted on a mobile base, approxi- mately one meter tall in total. The torso made simple gestures, and used synthe- sized speech or pre-recorded phrases, with simple utterances such as "Ow!" and "Woo-hoo!". The behavior of the mobile base is described below. Box robot: A non-humanoid mobile base that navigated around the environment, responded to the child's behavior, and activated the bubble blower in response to button presses and controller rules. Toy: A non-mobile toy equipped with a bubble blower and a button for activating the bubbles. The toy was placed on a table during the interactions to ensure that it was easy to reach. This morphology was used to ensure that the aordances of the toy matched its behavior; without wheels, it was clear that the toy was non-mobile. 60 Figure 4.6: The three robot embodiments used in the study of children with autism interacting with a socially assistive robot. Left: mobile humanoid robot. Center: mobile box robot. Right: non-mobile toy (control). The robots' behavior was as follows: The robot socially orients to the child by default; If the child is more than 1m from the robot, the robot waves at the child, then gestures to the child to \come here;" 1 If the child is more than 1m from the robot for longer than 10 seconds, the robot approaches the child; If the child approaches the robot, the robot makes an approving vocalization; If the child moves away from the robot, the robot makes a disappointed vocaliza- tion; If the child moves behind the robot, the robot does nothing; this creates a safe zone where the child can be ignored by the robot; If the child presses the button on the robot, the robot blows bubbles 2 ; and If the child makes a vocalization, the robot blows bubbles. 1 This behavior was not implemented for the box robot. 2 As mentioned above, this was the only behavior implemented on the bubble-blowing toy 61 In order to create a robot diering from the humanoid robot in behavioral agency in the same way that the box robot diered from the humanoid robot in morphological agency, a random controller was implemented for the humanoid robot. In this controller, the robot executed the above behavior repertoire randomly, rather than in response to the child's behavior. The toy served as a highly object-like control, with neither morphological nor behavioral agency: its only behavior was to blow bubbles in response to a button press. 4.2.1.2 Experimental Design In order to examine the eects of the robot's level of morphological and behavioral agency on the interactions with participating children with autism, we created a 4- way within-subjects experiment design, with the following conditions presented in a randomized order: \MB": High morphological agency, high behavioral agency; the full robotic sys- tem described above \Mb": High morphological agency, low behavioral agency; the randomly- behaving full humanoid robot \mB": Low morphological agency, high behavioral agency; the non-humanoid mobile base with contingent behavior \mb": Low morphological agency, low behavioral agency; a robotic toy Based on the idea that more agency in the robot will lead to more agent-like child- robot interaction with more attention to the robot's head and less interaction with the button and bubbles, we developed the following hypotheses: 62 H1: Participants will speak the most in the MB condition, followed by the Mb and mB conditions, and the least in the mb condition. H2: Participants will orient their head pose towards the robot's head more in the MB condition than the Mb condition. H3: Participants will press the button the most in the mb condition, followed by mB and Mb conditions, and the least in the MB condition. H4: Participants will interact with the bubbles the most in the mb condition, followed by mB and Mb conditions, and the least in the MB condition. The study participants included nine boys and one girl between the ages of 5 and 9 years (mean age = 7.2 years), recruited from Autism Speaks' Autism Genetic Resource Exchange (AGRE) (Geschwind et al., 2001). The AGRE program provides bio-materials and phenotype and genotype information of families with two or more children (multi- plex) with autism to the scientic community. Details can be found in Table 4.3. Most interactions for a given participant pair (parent and child) took place on a single day, except for three participants who returned on a second day to interact with the the box robot and repeat the toy conditions. 4.2.1.3 Measures and Data Coding The data from this experiment were hand-coded for 2 robot behaviors, 9 child behaviors, and one parent behavior, some of which had sub-codings (e.g., child head orientation to parent vs. child head orientation to robot). A single primary coder annotated all video data, and a second coder was used for reliability analysis on a randomly chosen selection of 14 of the 38 sessions. We treat each annotation as a binary variable in 250 millisecond time steps, taking the OR of annotations in the step (i.e., if any part of the time step is annotated as true, the step is labeled as true), and calculate 63 Cohen's on this. Because of the highly unbalanced nature of these annotations (events occurred as little as 5 10% percent of the time in many cases), chance agreement is extremely high, articially depressing . Therefore, we take values of .6 or higher as acceptable. The variables for which agreement was suciently high and which are used in this analysis, are listed in Table 4.1. The remaining behaviors resulted in low coder agreement, due to either appearing extremely infrequently in the data (child posture (sitting/standing/kneeling), aect (positive/negative/neutral), and avoidance (of the robot/parent)), or being impossible to detect from the available data (child movement target, which was indistinguishable in a small space; child gesture, which was indistinguishable from playing with the bubbles and stereotyped movement). Feature Cohen's Percent Agreement Robot Moving 0.61 91.2 Robot Blowing Bubbles 0.90 95.9 Child Vocalization 0.79 92.4 Parent Vocalization 0.73 95.5 Child Touching Button 0.72 95.8 Child Head Towards Any Part of Robot 0.65 82.5 Child Interacting with Bubbles 0.77 92.7 Child Head Towards Humanoid Part of Robot 0.70 94.7 (Humanoid Conditions only) Child Movement Target Low Agreement: space too small Child Gesture Low Agreement: indistinguishable Child Aect Low Agreement: low incidence Child Posture Low Agreement: low incidence Child Location Low Agreement: space too small Child Torso Orientation Low Agreement: space too small Table 4.1: Agreement values from data coding of features in the child-robot interaction study with children with autism. 64 Condition Child V ocalization MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Participant A B F G I J (a) Proportion of time child spent vocalizing. Condition Child Touch Robot Button MB Mb mB mb 0 1 2 3 4 5 6 Participant A B F G I J (b) Number of button presses by the child per minute. Condition Child Interacting w Bubbles MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Participant A B F G I J (c) Proportion of time child spent interacting with the bubbles. Condition Child Head To Humanoid MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Participant A B F G I J (d) Proportion of time child spent with head oriented to the humanoid portion of the robot. Figure 4.7: Outcomes by individual and condition in the child-robot interaction study with children with autism. 4.2.2 Results We provide an analysis across conditions of child-robot interaction outcomes, then di- vide the participants into two groups, based on the eects of robot behaviors on their vocalizations. We then analyze the dierences and similarities in interaction patterns between the two groups in the interaction outcomes and show that these patterns can be interpreted as either more object-like or more agent-like interactions with the robot. 65 4.2.2.1 Qualitative Results In order to provide a qualitative understanding of the interactions, a coder wrote a description of the child's behavior in each 30s interval, starting at the beginning of the interaction. We observed the following patterns in the child-robot interactions: Participant A: Participant A was one of the participants with an agent-like interac- tion with the robot. A spoke to all of the mobile robots, giving them names and trying to get them to play games. MB: In this condition, Participant A spoke frequently to the robot (there was speech in every 30-second interval). A specically asked the robot what games it could play, and suggested that they play \tag", and tried to encourage the robot to play the game throughout the interaction (due to the robot's social spacing, it never got close enough to \tag" A). Mb: A saw the low behavioral agency condition after the high behavioral agency condi- tion, and continued to try to get the robot to play \tag". A discussed the robot's behavior with his parent, and they changed the game from \tag" to a dance con- test. Towards the end of the session, A expressed some frustration with the robot's lack of response to the rules he tried to set out, saying to his parent, \Now I guess I know what it feels like when you have to teach a little kid." mb: A spent most of the session with the low morphological and behavioral agency robot playing with the bubbles, speaking less to the parent and not at all to the robot. When he did speak to his parent, he focused on the properties of the robot as an object, saying things like \What are the holes for?" and \That's it, it just blows bubbles?" mB: A gave the low morphological and high behavioral agency robot a name (\Bebe"), and asked the parent what the robot's gender is. A also asked whether the robot 66 \play[ed] with" the robot from the Mb and MB conditions, and again tried to play \tag" with the robot. Participant B: Participant B had an object-like interaction with the robot, focusing on touching the robot and pressing the button to trigger the buttons. B spoke very little to the caregiver or the robot. MB: Participant B spent the interaction touching the robot and pressing the bubble- triggering button. B spoke only 3 times during the interaction, twice to say something unintelligible and once to say \Hi" to the robot. However, B expressed enjoyment during the interaction with frequent laughter while playing with the bubbles. Mb: In this condition, B pressed the button again, but spent most of the time touching the robot. B continued to touch the robot, even when the robot said \ouch" and tried to move away. mb: B pressed the button on the toy repeatedly throughout the 5-minute interaction, and had little speech, except for asking the caregiver for help with the robot towards the end when it ran out of bubble uid. mB: In this condition, B said \ouch" repeatedly, mimicking the robot from the Mb condition. B also spent about half of the interaction playing with the bubbles. Participant F: Participant F had an agent-like interaction with the robot, spending most of the time trying to command the robot. F did not like the humanoid as much as the non-humanoid mobile robot, but had positive interactions with both. MB: F continued to try to instruct the robot, and was excited whenever the robot did what he said, exclaiming, \He's listening, daddy!" whenever the robot's behavior 67 aligned with the command. However, F did not want to get too close to the robot, moving away and saying, \He looks creepy," when it approached. Mb: In the low behavioral agency condition, F tried to command the robot with phrases such as \Turn!" or \Blow bubbles!" (only the latter resulted in the requested behavior). F touches the robot's face several times and stands very close to it. Towards the end, the F states that \This one scares me; the other two doesn't [sic]," but is again excited when he asks the robot to move and it does. mb: F pressed the button and played with the bubbles in this condition, but focused on talking with his caregiver about the bubbles and other equipment in the room such as the microphones. mB: In this condition, F got down on his hands and knees to try to talk to the robot, including asking it, \Are you scared?" when it attempted to put appropriate social distance between it and F by backing up. F played with the bubbles occasionally when the robot blew bubbles in response to speech, but is focused on trying to get the robot to interact with him. Participant G: Participant G also had an agent-like interaction with the robot, expressing concern for its well-being and trying to give it commands. G enjoyed playing with the bubbles, and tried to trigger them with speech with the humanoid robots, and by pressing the button on the non-mobile bubble-blowing toy. MB: In this condition, G spent much of the time trying to get the robot to make \the train sound". When this failed, G asked the robot to blow bubbles and was pleased to succeed at that. At one point, the robot backed into a wall, and the child asked, \Are you okay?" several times. 68 Mb: G began the session trying to get the robot's attention, saying \No, don't look at Mom, look at me. Over here!" In this condition the child also asked the robot to blow bubbles and was pleased when by chance the robot did as it was told. mb: With the robotic toy, G was excited to play with the bubbles, saying, \Yay! Bubble time!" repeatedly and pressing the button throughout the session. However, G also spent part of the session looking at himself in a mirror in the room and ignoring the robot. mB: G began the session by trying to get the robot to move according to directions (\Can you get closer to me, please?") and telling it to blow bubbles (which was successful due to the robot's contingent behavior). G gave the mobile robot a name and gave it requests by name, including asking it to play \chase". Participant I: Participant I had an object-like interaction with the robot. The participant enjoyed the bubbles and spent most of the time with the humanoid robot and with the low morphological and behavioral agency robot pressing the button to trigger the bubble blower. In the condition with the non-humanoid robot, participant I played with the robot by moving around as well as by pushing the button. MB: Participant I spent the entire session with the humanoid robot pressing the button and playing with the bubbles. They also made 3 unintelligible comments during the session, but did not otherwise speak. Mb: Participant I did not interact with the low behavioral agency, high morphological agency robot. mb: Participant I spent the session pressing the button, playing with the bubbles, laughing and dancing. 69 mB: In this session, participant I spent most of the interaction looking at the robot as it moved around, then moving around to cause it to follow her, saying \I'm making the robot follow me." In the last 2 minutes of the session, the participant pressed the button and played with the bubbles repeatedly. Participant J: Participant J refused to interact with the humanoid robot, and had an object-like interaction with the non-humanoid mobile robot. They were focused on pressing the button to trigger the bubbles, and watching the mobile robot move. MB: Participant J refused to interact with the humanoid robot. Mb: Participant J refused to interact with the humanoid robot. mb: Participant J expressed enjoyment about the bubbles, laughing and saying \Yay!" while playing with the bubbles. J also told their caregiver \I think it's cool." and asks how much time is left to play with the robot. mB: In this condition, J pressed the button on the robot repeatedly, playing with the bubbles and watching the robot move around. J again asked the caregiver repeatedly how much time was left. Other Participants: Participant C had a negative reaction to the robot and had to terminate their participation early. Participant D enjoyed the bubbles, but had a negative reaction to the robot, trying to hide from it, and terminated their participation early. Participant E enjoyed playing with the bubble toy, but avoided the robot entirely, and terminated their participation early. Participant H had a negative reaction to the robot and had to terminate their participation early. 4.2.2.2 Outcomes Aggregating results by condition across participants reveals no signicant dierences in the proportion of the interaction that the children spent vocalizing, one of our primary 70 outcome measures (Figure 4.7(a)). However, using a one-way within-subjects ANOVA, a signicant dierence across conditions in the number of button presses per minute was found (F (3; 12) = 26:76;p < :001; Figure 4.7(b)). A pairwise t-test with Bonfer- roni correction revealed signicant dierences between the mb condition and all other conditions at the p<:001 level (with eect sizes of 0.87 for MB, 0.96 for Mb, and 0.78 for mB conditions), and between the mB and Mb conditions at the p<:05 level (with an eect size 0.74). There was also a dierence between the mB and MB conditions with p =:10 and a medium eect size of 0.51, suggesting that with additional partici- pants this dierence might be signicant. Means and standard deviations can be found in Table 4.2. The proportion of time spent interacting with bubbles and proportion of time spent with head oriented to the robot's head both had no signicant primary eect of condition. However, as seen in Figures 4.7(a), 4.7(b), 4.7(c), and 4.7(d), there was substantial individual variation in these measures, which combined with the qualitative results in Section 4.2.2.1, suggests that children's treatment of the robot was more a function of the individual child than the robot's features. Condition Mean SD MB 0.68 1.22 Mb 0.29 0.45 mB 2.04 1.04 mb 4.24 0.70 Table 4.2: Means and standard deviations of number of button-presses per minute by children with autism interacting with the robot. 4.2.2.3 Participant Sub-groups: Object-Like and Agent-Like Interactions Dividing the children into two subgroups based on the qualitative analysis of their behavior, we conducted a post hoc analysis to begin to understand how children with autism might react to socially assistive robots. We examined the correlations between 71 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Robot Bubble Blowing Child V ocalization ● ● ● ● ● ● ● ● ● ● ● random contingent box toy A B F G I J (a) Proportion of time children with autism spent vocalizing compared to proportion of time bubbles were blown, with linear regres- sions per participant. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Robot Movement Child V ocalization ● ● ● ● ● ● ● ● ● ● ● A B F G I J random contingent box toy (b) Proportion of time children with autism spent vocalizing compared to proportion of time robot was moving, with linear regressions per participant. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0.4 0.5 −0.2 0.0 0.2 0.4 Overall Speech P(Stop Speech)−P(Begin Speech) A F F F G G F B G I I I A A B B J J G G B B A A B (c) Relationship between overall child speech and the dierence in the probability that child began speech versus ended speech after the on- set of robot bubble blowing. Figure 4.8: Correlations between robot behavior and child speech in interactions with children with autism. 72 participant behavior and robot behavior, aggregating across conditions. Figure 4.8(a) shows the correlation between the proportion of time that the participant vocalized and the proportion of time the robot was blowing bubbles. Although there was an overall signicant negative correlation of0:398 between bubbles and speech (p < :05), we found a signicant interaction between participants and the correlation (p<:01) with three participants having a negative correlation and three participants having a positive correlation. Next, we compared the dierence in the probability that a child will stop speaking within 1.5 seconds of the robot blowing bubbles and the probability that a child will start speaking in the same timeframe, we nd a statistically signicant correlation with overall speech (p = 0:01233), including a crossing of the zero point, indicating that for some children bubbles increased speech while for others they decreased speech (Figure 4.8(c)). A similar interaction eect was found in the correlation between robot movement and child speech (p<:05), as seen in Figure 4.8(b), with the same participants who had a negative correlation between bubbles and speech having a positive correlation between robot movement and speech, and vice versa. 4.2.2.4 Group Dierences in Outcomes of Interest Throughout this section, we focus on patterns in the variables of interest; we do not focus on statistical signicance, given that dividing the participants into two groups re- sults in only three participants per group. However, we nd a consistent pattern in the trends across several variables of interest in the dierences between the two groups. As can be seen in Figure 4.8(a), the three participants in the agent-like interaction group were also the three participants who consistently vocalized more in the interaction, more clearly seen in Figure 4.9(a). We also observe that the agent-like interaction partici- pants pressed the button fewer times per minute than the participants with object-like 73 Condition Child V ocalization MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Agent−like Object−like (a) Proportion of time child with autism spent vocalizing when interacting with the robot. Condition Child Touch Robot Button MB Mb mB mb 0 1 2 3 4 5 6 Agent−like Object−like (b) Number of times per minute the child with autism pressed the bubble-triggering button. Condition Child Head Towards Robot MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Agent−like Object−like (c) Proportion of time child with autism spent with head oriented to the robot. Condition Child Head to Humanoid MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Agent−like Object−like (d) Proportion of time child with autism spent with head oriented to the humanoid portion of the robot. Figure 4.9: Child vocalization, button-pressing behaviors, and head orientation, sep- arated into object-like and agent-like interaction groups in the child-robot interaction study with children with autism. 74 Condition Parent V ocalization MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Agent−like Object−like (a) Proportion of time parent spent vocalizing. Condition Child Interacting w Bubbles MB Mb mB mb 0.0 0.2 0.4 0.6 0.8 1.0 Agent−like Object−like (b) Proportion of time spent interacting with the bubbles. Figure 4.10: Parent vocalization and child interaction with bubbles, separated into object-like and agent-like interaction groups in the child-robot interaction study with children with autism. interactions (Figure 4.9(b)), consistent with them treating the robot as an agent and interacting with it via speech rather than via mechanical means (i.e., by pressing the button). We note that these group dierences disappear in the mb condition, where the interaction \partner" can only be interpreted as an object. Consistent with our observation that children with object-like interactions were still interested in the robot, we observe limited dierences between the groups in children's head orientation to the robot overall (Figure 4.9(c)). Finally, we nd that in the humanoid robot conditions, the participants with agent-like robot interactions spent more time with their heads oriented toward the humanoid portion of the robot than the participants with object- like interactions (Figure 4.9(d)), particularly in the MB condition. To check that the dierences between these groups were not caused by the bubbles having a dierent level of appeal or by the parents' behavior, we examined the dierences in the proportion of time spent vocalizing by the parents (Figure 4.10(a)) and the amount of time the child spent interacting with the bubbles (Figure 4.10(b)) and found limited or no dierences between groups. 75 4.2.3 Discussion In this work, we presented an autonomous SAR system for interacting with children with autism, and used the system in a study of robot agency and children with autism. We did not nd support for our hypothesis that greater agency in the robot would lead to greater vocalizations (H1) or attention to the robot's head (H2) for children with autism. We found some support that the button was most appealing in the mb condition (H3), with button presses increasing from the MB to Mb, mB, and mb conditions, although we did not see a signicant dierence in interaction with the bubbles between conditions (H4). A post hoc analysis dividing the participants into two groups provides preliminary evidence for a spectrum of interpretations of a socially assistive robot among children with autism. Half of the children in the experiment both spoke more as a function of the robot's movement and spoke less as a function of the robot's bubble-blowing be- havior, a pattern that can be classied as an \agent-like" interaction. The other three children in the experiment exhibited the opposite pattern, speaking more as a function of bubble-blowing and less as a function of movement, a pattern that can be classied as a \object-like" interaction. Further analysis of the data indicates the following trends about agent-like interactions: 1) participants spoke more overall (suggesting that this pattern might be correlated with overall social motivation); 2) they pressed the robot's button fewer times; and 3) they spent more time looking at the humanoid portion of the robot (rather than other parts of the robot). The results also reveal that participants with an object-like interaction pattern also found the robot appealing: both groups ex- hibited head orientation to the robot about 70% of the time, equal to or more than their head orientation to the bubble-blowing toy. Finally, the dierences between the groups are minimized in the mb condition, a result that is consistent with the claim that these group dierences are driven by dierences in the interpretation of the robot's agency 76 for the robots with both agent-like and object-like features, given that the toy is nearly impossible to interpret in an agent-like manner. Based on qualitative observations of the data, and given that the children with agent-like interactions with the robots had consistently greater vocalization than the children with object-like interactions, these dierences in interactions may be driven by varying baseline levels of social motivation among the participants in the interaction. However, even children with lesser inclination to interact socially with the robot exhibited a high level of attention towards the robot (Figure 4.9(c)). We also nd limited dierences between the two conditions with higher morphological agency, compared to the mB condition, suggesting that interpretations of agency in this group of participants may have been driven by morphological dier- ences in the robots rather than behavioral dierences. The primary dierences in the reduced behavioral agency (Mb) condition were driven by the children with object-like interactions with the robot, who focused on button-pressing with in the MB condition. In this case, although the behavior was intended to be more agent-like, the children were focused on the mechanical button-bubbles connection rather than the agency of the robot. For designers of SAR systems and researchers in SAR for children with autism, these results provides important insights for future work. For scenarios where children's enjoyment of interactions (as measured by responsiveness and positive aect) is most important, these outcomes may be maximized when the behavioral and morphological agency of the robot is matched to the developmental level of the target sub-population of children with autism. If the goal is to allow children with autism opportunities to develop their social skills, a robot with exibly agent-like and object-like features might provide children with opportunities to engage with it in the manner most comfortable to them, while also providing socially challenging interactions within their zone of prox- imal development. These semi-social interactions might give children the opportunity 77 (a) Sessions included in this analysis: participants with pos- itive reactions to the robots. ID Age (yrs) Condition Session Time (s) MB 343 Mb 273 A 9.4 mb 309 mb (d2) 297 mB (d2) 308 MB 326 mb 303 B Mb 320 5.8 MB 116 mb (d2) 255 mB (d2) 321 mB 244 F 6.3 mb 295 Mb 292 MB 299 mb 289 Mb 297 G 9.4 MB 334 mB (d2) 316 mb (d2) 274 MB 316 I 6.6 mb 304 mB 300 J 5.3 mb 297 mB 307 Table 4.3: Ages of the children with autism who interacted the the robot and a descrip- tion of the sessions of the six participants included in this study (A,B,F,G,I,J). Of the ten overall participants, there were four sets of siblings (A-G, D-C, F-J, I-H). *: Robot malfunction that ended the session prematurely. to experiment with new social behaviors in a positive interaction with a robotic interac- tion partner that will neither judge them nor unexpectedly exhibit the complex social behaviors that come naturally to human interaction partners. The agent-like or object- like features of a robot might also be changed over time in order to provide scaolding 78 towards more complex social interactions. Finally, future research in SAR for children with autism that enables autonomous characterization of how agent-like or object-like a child's interaction with the robot might provide important input for decision-making in autonomous SAR systems. Challenges inherent in performing studies involving children with autism are well established. This study, like most, features a small number of participants and a lim- ited amount of total interaction time with those participants (Scassellati et al., 2012). Consequently, data analysis takes an exploratory approach towards examining patterns in the data rather than focusing on statistical rigor. We posit that the results serve as evidence for the importance of discussing child-robot interactions in terms of how object-like or agent-like the children's treatment of the robot is, rather than as evidence for drawing strong conclusions about the eects of the particular robot morphologies included in this pilot study. Other limitations of the study include the use of head orientation as an imperfect proxy for attention; future work would benet from mod- ern gaze-tracking hardware. Additionally, in the high behavioral agency condition, the robot responds to any child speech, without regard for content. Thus although this condition involved a robot with higher behavioral agency than the other conditions, the robot did not have the highest possible level of behavioral agency: with a more sophis- ticated natural language processing system or a more constrained interaction, a robot might be able to behave in ways more contingent on the content of the children's speech. Finally, despite rigorous initial diagnoses, more current information about each child's developmental prole was not available, making it impossible to perform pre/post be- havior comparisons and correlate the discovered trends with historic data, as would be desirable. 79 Finally, there are a variety of ethical issues raised by studies such as this in which children with autism interact with a socially assistive robot. On the one hand, ther- apists and parents might be wary of child-robot interaction replacing human-human interaction, while on the other hand, activists in the Autism community might question whether it is necessary to \normalize" a child's behavior. In this context, we situate our results within a developmental perspective on child behavior: a SAR system might be able to serve as a comfortable interaction partner for a child with a low level of interest in other people, while increasing its agency over time or incorporating a human interaction partner into a triadic interaction could enable a child to naturally develop their social skills in whatever way is most natural for that child. Thus we nd in these results an encouraging middle ground in which a robot can provide scaolding to a child's natural development in the social environment. This work presents a study of robot agency in interactions with children with autism. A SAR system with both object-like and agent-like features was developed and studied with a cohort of children with autism. A full system with high behavioral and mor- phological agency was compared to three controls: a robot with reduced behavioral agency (a randomly-behaving humanoid robot), a robot with reduced morphological agency (non-humanoid mobile robot), and a fully object-like toy. In addition to ex- pected dierences in children's behavior with the toy versus the robots, participants could be divided into two groups based on the patterns in their vocal responses to robot behavior. post hoc analysis of participants in these two groups show diering patterns in their vocalization, button-pressing, and head-orientation to the robot's head with the robots with various levels of morphological and behavioral agency, and those patterns are consistent with more object-like interactions in one group, and more agent-like in- teractions in the other group. We observe that participants' reactions to the conditions in this study appeared to be driven by their interpretation of the robot's agency more 80 than the intended levels of agency. However, despite these between-groups dierences in the specics of the human-robot interaction, participants in both groups showed a high level of interest in interacting with the robot. Researchers and therapists may be able to harness the exibility of the robot's role in order to scaold children towards more frequent positive social interactions with other people, while maintaining an enjoyable and engaging interaction for the children themselves. This work also showed that an interactive agent is more capable of engag- ing children in agent-like social behavior than traditional toys, which further validates the usage of socially assistive robots for social skill therapy with children with ASD. Future work would include developing systems that facilitate this scaolding, as well as characterizing participants in order to predict what level of robot agency will result in the most productive and positive interactions for individual children. The insights from this work informed the structure of child-robot interactions with children with autism, the focus of the fourth validation study included in this disserta- tion work, described in Section 8.2. This study also allowed us to explore how children with diering levels of social inclination interact with a SAR system, and provided support for the notion that robots can be engaging interaction partners in therapeutic interactions with children with a wide range of strengths and interests. 4.3 Understanding Inter-Generational Interactions Another application domain of this dissertation work is in social interactions within intergenerational family groups, supporting positive interactions between older adults and their families. We describe the results and insights from a pilot study of six inter- generational family groups interacting with a socially assistive robot. 81 Figure 4.11: Robot appearance in the inter-generational interaction study. 4.3.1 Methodology To better understand how multi-generational groups interact with the SAR system, we developed an exploratory pilot study with tri-generational groups of participants interacting with a tabletop robot and a set of tablet-based games. Figure 4.12: The study setup for the intergenerational interaction study. 4.3.1.1 Robot and Physical Setup The experimental setup consisted of a wheeled cart and table on which the SPRITE robot with the \Kiwi" skin was placed (see Section 3.2 for a description of the robot). 82 Activity Demographic and Personality Questionnaires Card Game 1: Go Fish Card Game 2: War Interview about Card Games Creative Game 1: Robot Choreography Creative Game 2: Scrapbooking Interview about Creative Games Interview about System Group Cohesion and Interaction Questionnaires Table 4.4: The sequence of activities in the inter-generational study session. The top of the wheeled cart held the robot, three USB cameras (one per interaction participant) for face tracking, and a 3D depth sensor with skeleton tracking. The activities took place using an 18-inch tablet placed on a lightweight table in front of the cart. Communication between the robot, tablet, computers, and sensors took place over USB and a wireless local area network. A diagram of the setup is shown in Figure 4.12. 4.3.1.2 Activities A set of four activities was chosen for this study: two card games, scrapbooking, and a robot choreography game. The games were designed to cover a variety of interac- tion types: two structured rule-based games and two creative games. All games were implemented on a tablet to allow the robot to accurately sense the state of the activity. Structured Games: We implemented two structured, turn-based games: War and Go Fish. While playing these games against the family team, the robot commented on its performance. In Go Fish, as seen in Figure 4.13(a), the family worked as a team, playing one hand together against the robot. Each player attempted to obtain 83 (a) Go Fish (b) War (c) Robot Choreography (d) Scrapbooking Figure 4.13: Tablet interface for interactive games used in the inter-generational inter- action study. books of a single card value in all four suits (e.g., four aces) by asking each other for cards. In War, as seen in Figure 4.13(b), the deck was evenly divided face-down among the players, and they take turns placing a card face-down from their deck to a center area, with the player with the highest card winning. As in Go Fish, the robot was a competitor in this game, but the family members were also competing with each other. Creative Games: Additionally, we implemented two unstructured, creative games: a choreography game, and a scrapbooking game. The Robot Choreography game, as seen in Figure 4.13(c), was a game in which family members, as a team, dragged and ordered tiles representing dance moves for the robot. When ready as a team, they pressed a play button to start the music and the robot's performance of the dance they designed. The robot made a variety of positive comments while dancing (\Yeah!", \Move it!", etc.). The family members jointly chose and ordered the dance moves for the robot from the tablet. The scrapbooking game was a creative, open-ended activity 84 in which family members jointly decorate an image, as seen in Figure 4.13(d). The robot observed the family members' creation and made positive comments. 4.3.1.3 Recruitment Study inclusion criteria were 55-75 year old adults with local family that included at least one 5-13 year old child, who could come to the campus study location together. Participants who could not speak and understand uent English were excluded. The USC Healthy Minds Research Volunteers participant pool was used for recruitment. There were 344 participants in the pool who met the criteria and were sent the recruit- ment email. Of the 14 participants who responded, one was ineligible, four declined to participate, and ve had scheduling con icts. From the remaining four respondents, two provided one family group each, and two provided two family groups each. 4.3.1.4 Experimental Procedure After completing the informed consent process, adult participants completed a demo- graphic questionnaire and the 10-item shortened Big Five Personality Inventory (Ramm- stedt and John, 2007). All three members of the group were then seated in front of the robot, and put on the headset microphones. The participants then interacted with the War, Go Fish, and Robot Choreography games for 10 minutes each. Four of the groups also interacted in the Scrapbooking game for 10 minutes, and two for ve minutes. The order of the games was held constant across groups, as seen in Table 4.4. Groups an- swered interview questions about the games they had just played after the second and fourth games. The groups also underwent a debrieng interview that included questions about the SAR system in general, about other wellness-related applications for the SAR system, and time for additional feedback from the participants. Finally, the adult par- ticipants completed the 7-item Group Cohesiveness Scale (Wongpakaran et al., 2013) 85 Code Values Target Robot; Not Robot; Ambiguous Verbal Sentiment Positive; Negative; Neutral/Ambiguous Nonverbal Sentiment Positive; Negative; Neutral/Ambiguous Robot Nouns \The robot"; Kiwi; None Robot Pronouns Male; Female; Neutral; None Does the robot have Yes; No; Ambiguous agency? Does the robot have Yes; No; Ambiguous emotions (aect)? Table 4.5: The coding of the experiment data in the inter-generational interaction study. using modied items that referred to \we" and \us" instead of \the group" and \the members". 4.3.1.5 Data Coding The study video transcripts were transcribed for robot, experimenter, and participant speech (by speaker), as well as for salient relevant non-verbal behaviors such as laughter and dancing. The transcripts of the game-playing sessions were then coded for speech target, sentiment, nouns and pronouns used with the robot, whether their speech in- dicated that they saw the robot as an agent (e.g., \She wants to win!") or having emotions (e.g., \It's sad."), and nonverbal sentiment, as seen in Table 4.5. Four re- search assistants independently coded the transcripts, with three coders for each item. Disagreement was resolved by majority rule, where possible. Out of 12,145 annotations, there were 51 instances where no two coders agreed on the annotation (0:042%). Be- cause none of these instances totaled more than 4% of the utterances for the category, these instances of total disagreement were excluded from the analysis. Transcripts of participant responses to the interview questions were categorized; the categories aligned with the questions asked. The categories discussed in this work are 86 Group: 1 2 3 4 5 6 Go Fish 57 25 29 279 27 76 War 51 26 71 237 74 76 Choreography 26 19 68 105 41 61 Scrapbooking 8 y 2 y 103 158 50 66 Table 4.6: Number of utterances per group in the intergenerational interaction study. ( y Group only played game for 5 minutes) (a) (b) Figure 4.14: Two inter-generational groups interacting with the robot. the game evaluations, in-home interaction, in-home setup, and privacy concerns. The responses in each category were summarized, and the major themes identied for each question. 4.3.2 Results A total of 18 participants participated in the study in six groups consisting of one older adult, one adult, and one child. The older adults ranged in age from 65 to 73 years old (mean = 68:17;SD = 3:13), the younger adults from 37 to 48 years old (mean = 42:17;SD = 4:12), and the children from 7 to 13 years old (mean = 9:67;SD = 2:50). 12 participants identied as white, one identied as Asian, one identied as mixed-race (\Hispanic/Latino/Asian"), and four did not specify. All adult participants had lived 87 Group Older Adult Younger Adult Child 1 67, F 48, F 7, F 2 73, M 46, M 13, F 3 65, F 40, F 12, F 4 66, M 37, M 9, M 5 67, F 40, M 10, F 6 71, M 42, F 7, F Table 4.7: Age and gender composition of groups in the inter-generational interaction study. in the US for at least 32 years. 11 participants identied as female, and 7 as male. The groups are summarized in Table 4.7. To characterize the groups, adult and two of the older children participants com- pleted the Group Cohesiveness Scale (Wongpakaran et al., 2013); all groups scored very highly on group cohesion, with the lowest individual participant score being 3.0 (out of 4.0), and a mean score of 3.7 (SD = 0:2). 4.3.2.1 Behavioral Measures The number of instances of each annotation were normalized by the total number of utterances, to correct for variation in amounts of speech between participants, and dierences in utterance breaks between coders. Additionally, the number of words spoken by each participant was normalized by the length of the interaction to give a measure of speech quantity for each participant. The proportion of utterances with each label was calculated for each group, and a repeated-measures ANOVA was performed. Corrections on multiple comparisons were done with Holm's step-down procedure. The measures of female and male pronouns were combined to measure how often the participants used human-like pronouns with the robot, measuring anthropomorphism. This metric builds on the more general measure of whether or not the robot is treated as 88 having agency. A signicant eect of the game was found on the rate of personal pronoun usage [F (3; 15) = 3:74;p = 0:03] with a medium eect size of 0:365. However, corrected pairwise comparisons did not result in any signicant dierences. These results can be seen in Figure 4.15. A signicant eect of game on robot-targeted speech was found [F (3; 15) = 7:45;p = 0:003], with a large eect size of 0:559. Multiple comparisons showed a signicant dierence between the choreography game (M = 0:0686;SD = 0:039) and the Go Fish game (M = 0:400;SD = 0:190) with p = 0:04. The results can be seen in Figure 4.16. The annotations for which the participants' statements denitely (\Yes") or maybe (\Ambiguous") attributed aect (i.e., emotions) to the robot were combined, and a signicant eect of the game type was found on this measure [F (3; 15) = 3:45;p = 0:04], with a medium eect size of 0:330. There were no signicant pairwise dierences (Figure 4.18(a)). There were no signicant eects of the game on perceived agency (Figure 4.19(a)), verbal and non-verbal sentiment, and robot noun choice in the utterances. In addition to these between-game dierence, we observed dierences between groups on the proportion of utterances that indicated that they perceived the robot as having aect (Figure 4.18(b)) and agency (Figure 4.19(b)), as well as the amount of speech by each participant (Figure 4.17(b)). For the utterances per minute, marginal signicance of the game on speech with small eect sizes were found for child speech [F (3; 15) = 2:59;p = 0:09] (eect size 0:138) and older adult speech [F (3; 15) = 2:38;p = 0:11] (eect size 0:0532). No signicant eect of game on younger adult speech was found. These results are shown in Figure 4.17(a). 4.3.2.2 Interview Questions The groups answered interview questions that addressed each of the games, in-home interaction and deployment, and privacy concerns. 89 Figure 4.15: Proportion of utterances referring to the robot by personal pronouns (\she", \he") vs neutral pronouns (\it") in the inter-generational interaction study. Figure 4.16: Proportion of utterances directed to the robot versus other in the inter- generational interaction study. Games: The most widely appealing game for all participants was the Go Fish game, which was equally well liked by the younger adults, older adults, and children, although the child in group six did not enjoy the game because they thought the robot cheated (the robot's play was completely random). Groups one, two, three, ve, and six described the game as \fun" or \enjoyable". Participants in groups one and four specically mentioned liking the robot's victory dance (when it got a book of cards). There were a few problems with the specic implementation, with groups one and six saying that the timing of the game was awkward, and group six mentioning wanting the cards to be bigger. 90 (a) Words per minute by game. (b) Words per minute by group. Figure 4.17: Words per minute in the inter-generational interaction study. The War and Choreography games were liked better by the children than the adults. War had problems with timing, with groups three and six wanting the robot to put its card down at a dierent time, or have all players put their cards down at the same time, and groups two and ve saying the game was boring or too slow. The Choreography game was much more popular with the children than the older adults: while four out of the six children (in groups one, two, three, and ve) liked the activity, three of the six older adults (in groups four, ve, and six) specically mentioned that they found the game \boring" or would get bored trying to play it alone. Notably, group three enjoyed the activity very much and danced with the robot, and group four did not like the activity at all, and thought that the researchers should \delete the game". All of the groups liked the idea of the Scrapbooking game, but thought that the implementation needed more content in terms of decorations that could be added to the image, including more stickers, or being able to draw on the picture. Groups three and 91 (a) By game (b) By group Figure 4.18: Proportion of utterances in which the robot is treated as having aect by participants in the inter-generational interaction study. six wanted the individual robots that were in the picture on the tablet to be movable so that they could create a story with the pictures. All groups expressed enthusiasm for the idea of being able to upload their own family pictures into the game, including group four which did not enjoy the scrapbooking activity, but said that they might enjoy it more if there were more stickers and personal pictures. In-home Interaction and Privacy Concerns: Most of the older adults said that they were not interested in playing with the robot alone, although some said that they might play one of the games with the robot. Two of the older adults (in groups four and six) were interested in playing card games with the robot, although not necessarily the implemented games, and two older adults (in groups two and ve) were interested in doing scrapbooking with personal pictures or pictures of famous people. Groups one and three mentioned that playing alone with the robot would only be interesting if no human interaction were available, and group one specically mentioned not wanting the 92 (a) By Game (b) By Group Figure 4.19: Proportion of utterances in which the robot is treated as having agency by participants in the inter-generational interaction study. robot to replace human-human interaction. In general, participants did not think that the games or the robot's behavior were varied enough. Groups one and three mentioned that they wanted the games to be faster and more competitive and the robot to have more \personality" or \sass". All of the groups were able to think of somewhere they could put the robot and computer setup. Groups three, ve, and six said that they would like the setup to be portable (on wheels) so they could move it around, to t better into the environment or to be reachable by a family member with limited mobility. When asked about privacy concerns with the robot's cameras, groups one, two, three, and ve said that they did not have any major concerns with the system. Group one mentioned the idea of putting stickers over the robot's cameras when not in use. Group four was not asked about privacy concerns. In the four groups without privacy concerns, the participants noted that it was important that the robot is stationary and would be placed out of the main 93 areas of their homes and away from anywhere that sensitive information was handled (such as a home oce). 4.3.3 Discussion Intergenerational family groups present unique challenges, with a high degree of varia- tion both between families, and between generations within a single family. In order to provide an interaction context for such groups, we implemented a variety of games for the participants to play with a socially assistive robot, in order to begin to understand how dierent activities and dierent roles for the robot might aect the interactions and how dierent game types and generations should be considered as we move towards in-home deployments of the system. Between-Group Dierences: The groups participating in this study varied in per- sonality, gender composition, and age, but all indicated a high degree of group cohesion after interacting with the robot. Groups also varied in how much they talked during the interaction, as well as in the degree to which they appeared to attribute agency and emotional capacity to the robot. There was also variation in what game users pre- ferred, with some enjoying the card games, and others enjoying the more open-ended games. These between-family dierences support the design choices of having a variety of games, and demonstrate the need for personalized behavior for the robot. Intergenerational Dierences: The younger adult participants in every group spoke more than either the child or older adult participants, and in each game, the younger adult participant spoke the most on average. This dierence was largest in the chore- ography and Go Fish games, and smallest in the Scrapbooking game. The older adults spoke most in the War card game, and least in the Go Fish and Choreography game, while the children spoke most in the Scrapbooking game, and least in the Go Fish 94 game. This suggests that the best activities for supporting intergenerational interac- tions are those in which either each participant has a clear role, or that are suciently open-ended for participants to choose their own roles. However, the War game led to more interactions with the robot than the Scrapbooking game, suggesting that in the latter type of game, a socially assistive robot might need to be more proactive about participating in the interaction. Additionally, the voice of the robot was more clearly heard by children than older adults. One family in particular re ected this dierence, as the older adult kept asking other family members to repeat what the robot had said throughout the game. Inter-Game Dierences: Participants referred to the robot with pronouns most often in the Choreography game, but in many of those cases they referred to the robot as it, rather than he or she. In the War game and the Go Fish game, the robot was referred to as he or she a larger proportion of the time than it was referred to as it. In the Go Fish and War card games, the participants directed more of their speech to the robot, likely because it was actively competing in those games, rather than observing (as in the Scrapbooking game) or being directed by the participants (as in the Choreography game). As discussed above, there was a more equal distribution of speech among all three participants in the War game and Scrapbooking games. The participants' speech indicated that they perceived the robot as having agency more often in the Scrapbooking and War games than in the other two games (although this eect was not signicant). The same pattern was seen with respect to whether the robot was thought to have aect. Towards Moderators in Intergenerational Groups: This pilot study demonstrated that the audience exists for SAR systems that serve as in-home companions and support intergenerational family interactions. Additionally, it provided insights into the imbal- ances that exist in family interactions, with the younger adult participants speaking the 95 most out of the three participants, and suggested that there are clear opportunities for a robot moderator to support mutuality in the interactions. These insights were used to develop a cooperation-focused robot control algorithm, described in Section 7.1. 4.4 Understanding Group Social Interactions In this section, we describe an analysis of the UTEP-ICT multi-cultural multi-party interaction corpus (Herrera et al., 2010), conducted in order to better understand how groups self-moderate in discussion-based interactions. This corpus includes a multi- modal recording of multi-cultural multi-party interactions, and was used to gain insights into group human-human interaction, including the behaviors that groups use to self- moderate in the absence of a designated moderator. 4.4.1 Methodology This dataset includes twelve groups of four individuals engaging in ve dierent tasks each, from three dierent language/cultural groups: native speakers of Mexican Spanish, native speakers of Arabic, and native speakers of American English. The ve tasks, each lasting approximately 9 minutes, are as follows: 1. Discuss your pet peeves; 2. Figure out a movie that everyone has seen and discuss it; 3. Invent a name for a stued toy (given to the group immediately prior to the discussion); 4. Jointly develop a story about the stued toy; 5. Discuss one cross-cultural experience of each group member. 96 The participants were video recorded from 6 angles around the room, and audio recorded from lapel microphones. Additional survey data was collected, but the analysis of that data is beyond the scope of this work. A subset of the corpus was annotated to allow an examination of the management of both physical and social resources: the American English groups' toy-naming and story-telling interactions. The data were annotated by undergraduate coders for a number of relevant features to the interaction. In one of the groups, technical issues in audio recording rendered the data unusable for this purpose. For the remaining groups, for each participant, the following features were annotated: Speech Gaze (towards another participant or the toy) Toy-related gestures: holding the toy, bidding to hand the toy o, or bidding to receive the toy Additionally, an informal annotation was performed of types of speech behaviors, such as questions, interruptions, and backchannels. The analysis of these features was used to develop preliminary moderator behaviors. 4.4.2 Results In a pilot analysis of this dataset, we found the following patterns: Interaction goals aect self-moderation: We nd dierences in participant behavior between the naming and story tasks. In the naming task, where the participants are only given the task goal of agreeing on a name, we observed shorter turns and more interruptions. In the story task, where participants were also instructed that they should tell the story jointly, we nd longer turns and more even participation. As seen in Figure 97 Figure 4.20: Time spent speaking while holding the toy and while not holding the toy in the UTEP-ICT dataset. 4.21, the story task had more time with only one speaker, and less time with two or more simultaneous speakers. Resource allocation aects interaction state: Additionally, we nd that the distribu- tion of resources aects the interaction state: as seen in Figure 4.20, participants spoke a greater proportion of the time when they held the toy. This eect was stronger in the story task, suggesting that participants may have been deliberately using the toy to moderate the interaction. Moderation behaviors act on interaction resources: Qualitatively, we observed a number of moderation behaviors in the interactions, including open (not directed to a particular target) questions, used to release the oor, directed questions used to pass the oor to an individual, and the passing of the toy, which, as discussed above, may have been used to regulate the conversational oor. 4.4.3 Discussion This analysis provides preliminary insights into how groups of adults self-moderate interactions, used to develop the moderator behaviors described in subsequent chapters. 98 Figure 4.21: Proportion of time with [n] speakers in 100ms intervals in the UTEP-ICT dataset. Additionally, this work provided labeled data that were used to develop the speech detection algorithm used in the moderation algorithm described in Section 5.2. 4.5 Summary This chapter presented four studies of human-robot interaction that inform the devel- opment of the robot controllers used in this dissertation. A study of nutrition learning for rst graders informed the development of robot behaviors that can be used with children in cognitive learning tasks. A study of intergenerational interactions provided insights into the patterns of interaction in family groups. An analysis of children with autism interacting with a SAR system informed the development of the fourth validation study in which families including children with autism interact with a moderator robot. Finally, the analysis of group human-human interaction allowed for the development of preliminary speech models, as well as providing insights into how groups self-moderate. 99 Chapter 5 Moderation Model and Core Algorithm This chapter introduces the model of moderation and core moderation algo- rithm, in which the moderator monitors the interaction and takes actions to aect the social dynamics. The model and core algorithm are then instantiated for open-ended discussion tasks and validated in an open-ended storytelling in- teraction with adults. We nd that participants are accepting of the robot in the moderator role and that the robot moderator is able to change the interaction. In the context of socially assistive robotics, moderation is the problem of monitoring a multi-party interaction, and choosing both when and how the robot can act in that interaction. For this work, we primarily use time-based approaches to choosing when the robot should act, and focus on algorithms for selecting robot behaviors in order to achieve both task-related and social goals in the interaction. In this approach, the mod- erator chooses social behaviors that support either task-related goals or social factors in the interaction state, evaluating which aspects of the interaction to in uence using domain-specic functions. This approach is summarized in Section 5.1. 100 This chapter introduces the core moderation algorithm, which establishes the basic control loop used in all four moderation algorithms: the robot monitors one or more as- pects of the interaction, and takes social actions at specied intervals intended to change the interaction. This algorithm is then instantiated with question-asking behaviors and monitoring the relative speaking time of the group members, and directs questions to the least talkative group member at regular intervals. Because the results of the anal- ysis of the UTEP-ICT dataset presented in Section4.4 show that simultaneous speech is frequent in group discussions, complicating the problem of speech recognition, the robot engages in this moderation based only on who is speaking and when, without the use of speech understanding. The algorithm is validated in a storytelling interaction developed to be closely analogous to the task used in the UTEP-ICT dataset. 5.1 Modeling Moderation In this work, moderation is modeled as a cyclic decision-making process, in which the moderator monitors the interaction state, evaluates the state relative to some number of task and social goals, and chooses social behaviors that change the interaction. When executed by a robot moderator, this serves as a multi-party extension of one-on-one socially assistive robotics: rather than a robot helping an individual achieve goals, the robot is helping a group achieve goals. The model is summarized in Figure 5.1. We dene the moderator as a member of a goal-directed multi-party interaction whose role is to regulate the behavior of the other participants. Although in general a moderator might take both task-related and social actions, we focus on the case where the moderator does not directly participate in the task, but only takes social actions to aect the interaction state. We also distinguish moderation from mediation: our model applies only to interactions where the goals of the group members are not in con ict. 101 Figure 5.1: The model of the moderation process. In these interactions, the robot moderator must evaluate the interaction state ac- cording to a set of interaction goals, including both task goals and social goals. In most cases, the task goals are the primary purpose of the interaction from the perspective of the participants, such as building something, solving a problem, or winning a game. Task goals can be associated with specic participants or general to the group. Social goals relate to how the interaction proceeds and to the social interaction between par- ticipants. These goals often relate to pairwise features of the interaction, such as speech or gaze, as described in greater detail in Section 7.1.1. Social goals might be decided in advance by participants, but in many cases could be assigned by a supervisor, teacher, or parent. Finally, in order to achieve the social and task goals, the moderator autonomously takes social actions that change the interaction state. The specic interaction state features are highly-task dependent, but include such things as the amount of speech by each participant, how much participants help each other, the state of the task, and the performance of each individual group member. The social behaviors of the robot are similarly domain-specic, but in this work, primary consist of asking users questions or requesting that they take certain actions in the task. 102 Algorithm 5.1 Basic Social Moderation Algorithm while Interacting do if Time elapsed > set interval then Take moderator action Reset timer else Monitor interaction end if end while 5.2 Moderation by Timed Question-Asking As described above, this algorithm establishes the basic control loop for moderator behavior: the robot tracks some component of the interaction, then at regular intervals takes actions to change that component of the interaction. This algorithm is instantiated with the social feature of speaking time and the moderator action of asking questions for open-ended discussion tasks. 5.2.1 General Moderation Algorithm Algorithm 5.1 describes this basic control loop for moderator behavior. Throughout the interaction, the moderator monitors the state, taking actions intended to serve the goals of the interaction at regular intervals. The exact interval is task-dependent and empirically-determined; in most of the application domains in this work, it is set to between 20 and 30 seconds. While not acting, the moderator monitors the relevant features of the interaction and engages in unobtrusive socially-appropriate behavior, such as looking at the speaker or watching the users' interactions. 5.2.2 Discussion Task Moderation Algorithm This section describes an instantiation of this algorithm for discussion tasks in which the goal is to support equal participation between members of the group. In this case, 103 the tracked interaction component is the amount of speech from each participant, and the robot action is a combination of gaze behavior and question-asking. The behavior for the robot in this context was inspired by the behavior of meeting moderators in the AMI Meeting Corpus, a collection of annotated multi-modal record- ings of product design meetings (Mccowan et al., 2005). All questions used the second person, disambiguated by the use of gaze towards the target of the question. Based on a qualitative examination of the AMI corpus, we found that the moderator often used a pre-speech vocalization, such as \um" or \uh", in order to call attention to the subsequent speech. Therefore, each robot question was preceded by a short vocalization and a short two-second pause. When not engaging in a moderation behavior, the robot was programmed to look at the speaker, and in the absence of speech, to look around randomly. The goals in this context are to have all members of the interaction participate, and to ensure that no one person dominates the discussion. In order to achieve this, the moderation algorithm triggers a moderation behavior if some participant has not spoken in a 30-second window or if any one participant has spoken for more than half of the time in that window. An additional condition, that the moderator not have spoken in the window, ensures that the moderator itself does not dominate the conversation. In practice, the moderator choose a behavior every 30 seconds, as seen in Algorithm 5.2 The algorithm was validated in a storytelling interaction, described in the following section. 5.3 Validation of the Core Moderation Algorithm In this section, we present the validation of the core moderation algorithm, as described in Section 5.2. The robot monitors the speaking time of all participants, and asks questions at regular intervals, directed to the participant who has spoken the least 104 Algorithm 5.2 Instantiated Moderation Algorithm for Storytelling Task while Time elapsed < activity duration do S(i) = speech duration of participant i in the last interval if elapsed [intervallength] then Look at argmin(S(i)), i not currently speaking Make nonverbal vocalization Wait 2 seconds Ask a question else Look at current speaker end if end while during the interaction. This study validates the instantiated algorithm, as well as the use of robot moderators in open-ended interactions. We show that a SAR moderator is accepted into human-human interactions, and can improve group cohesion and increase participant speech in those interactions. 5.3.1 Methodology In this validation study, we examine the eect of a robot moderator in a simple social interaction: a group of participants talking to each other. By isolating the social task and goals, we validate the use of a robot moderator as a tool for improving purely social group interactions, and provide a basis for the work in the following chapters that incorporates task-related goals. As discussed in Section 5.2, the algorithm focuses on the following: Moderator behaviors: Gaze and question-asking Interaction state: Participant speech Interaction goals: No participant dominates the oor and every participant speaks 105 (a) Robot hardware (b) Robot with cover- ing (c) The CoRDial face Figure 5.2: Robot hardware used in validation of basic moderation algorithm. This algorithm does not use speech recognition or dialogue management; our ap- proach does not require that the moderator understand the conversation, and given the limited nature of the state-of-the-art in dialogue management for unconstrained interac- tion, avoiding its use allows the robot to participate in more open-ended discussions and scenarios. Future work might incorporate some limited speech recognition, but for this work, we validate that a robot can moderate an interaction based on who is speaking but not what is said. In order to test this algorithm, we developed a validation study examining three research questions: 1. Does the use of social moderation behaviors by a robot improve the interaction, as measured by behavioral dierences and subjective evaluations? 2. What are the eects of social moderation behaviors on participant behavior? 3. Can a robot moderate an interaction without the use of speech recognition? 106 5.3.1.1 Robot and Behavior In this work, we used the SPRITE robot, as described in Chapter 3, using the \Chili" skin (Figure 5.2(b)). The robot's behavior was developed using two multi-party inter- action corpora. The rst corpus used, the UTEP-ICT corpus Herrera et al. (2010), consists of a series of 4-person interactions without a designated moderator, described in Section 4.4. We developed a voice activity detection algorithm (Section 5.3.1.1) using the English-language subset of the corpus. The corpus also provided inspiration for the task used in the validation study. The second corpus used, the AMI corpus Mccowan et al. (2005), consists of a series of 4-person meetings with dened roles for each partic- ipant, including a \Project Manager" whose role closely followed that of a moderator, and provided insight into appropriate moderation behaviors for the robot. The behavior for the robot in this context was inspired by the behavior of meeting moderators in the AMI Meeting Corpus, a collection of annotated multi-modal record- ings of product design meetings (Mccowan et al., 2005). Two types of question-asking behaviors were chosen for the robot, based on the labeling schema used in the AMI corpus: in the rst type, the robot attempts to elicit a contribution from a participant, by asking a general question such as \Do you have anything to add?"; in the second type, the robot institutes a topic change by asking a more specic question such as \Does it [the stued toy in the validation interaction] have any sisters or brothers?". This algorithm was implemented in a ROS node, using the results of a voice activity detection algorithm to calculate speaking time, and sending behaviors to CoRDial to control the robot. Voice Activity Detection The primary sensing modality of the SAR system was through headset microphones and the use of a voice activity detection (VAD) algo- rithm, based on the work of Moattar and Homayounpour (2009). In their algorithm, measurements of energy, fundamental frequency, and spectral atness are compared to 107 threshold values. If at least two features cross these thresholds, the frame is labeled as a \sound" frame. Otherwise, it is labeled as \silence." Five or more consecutive \sound" frames indicate voice activity, while ten or more consecutive \silence" frames designate a lack of voice activity. An important limitation of this algorithm is that, in multi-party interactions, even with directional microphones, it recognizes all speech, including the speech of the other participants in the interaction. To counteract this, we introduced a root mean square (RMS) calculation to measure the signal power contained in an audio frame. Each 10 ms frame of audio is only tested for speech-like features if it crosses the RMS threshold. This allows for the detection of the primary speaker even if another participant is talking in the background. To calculate the RMS threshold, we used the mean RMS measure in 10 ms inter- vals of each of the audio tracks from a subset of the UTEP-ICT corpus that had been hand-annotated for the microphone wearer's speech. Based on a comparison of frames containing the primary speaker's to other frames (which might contain silence, back- ground noise, or other speakers whose speech was picked up by the microphone), we determined that an RMS threshold of 400 provided eective performance. In order to test this approach, the algorithm was used to annotate speech from two English-speaking groups from the UTEP-ICT corpus, and compared with a human- annotated ground truth. This evaluation yielded a mean Cohen's kappa score of 0.869 (SD = 0:057), for eight participants in the two groups (Table 5.1). 5.3.1.2 Study Design This section describes the validation study that evaluated the robot moderator in a group interaction, and examined the results of the two research questions posed above. 108 Group (Task) Speaker A Speaker B Speaker C Speaker D 1 (1) 0.79 0.85 0.84 0.91 1 (2) 0.88 0.95 0.94 0.93 2 (1) 0.85 0.86 0.78 0.92 2 (2) * 0.80 0.82 0.92 Table 5.1: Kappa Scores for voice activity detection algorithm (*No audio due to mi- crophone malfunction). (a) Multi-party interaction schematic (b) Multi-party interaction setup with partici- pants Figure 5.3: The multi-party interaction experiment setup. To evaluate the implemented SAR social moderation system, we invited two groups of four participants (eight participants total) to interact with the robot in a within- groups study design. Each group participated in a seated storytelling interaction with the robot running the moderation algorithm, as well as with an unmoderated control interaction in which the robot looked at the speaker and produced a backchannel utter- ance (\huh", \okay", etc.) approximately every 30 seconds. 5.3.1.3 Hypotheses In addition to a qualitative analysis of system performance, the study tested the follow- ing hypotheses: 109 H1: Moderated interactions will result in more even distribution of speaking time among participants. H2: Moderation will improve perceived group cohesion. H3: Participants' attitudes toward robots will improve in moderated interactions. 5.3.1.4 Task After completing the informed consent process, participants completed a series of ques- tionnaires (described in Section 5.3.1.5) while seated in separate locations. The partic- ipants were then brought into the interaction area, where the robot was located on a table. They were seated at the same table, where the experimenter explained how to put on the headset microphones. Next, the experimenter explained the task: participants were provided with a stued toy, and were asked to jointly name the toy and tell a story about it (Figure 5.3). The toy was chosen to be a generic \monster", so that it would not be directly identiable with any existing character or animal from popular culture. The experimenter then left the interaction area for nine minutes. After nine minutes, the experimenter returned and guided participants back to separate locations, where they completed a second series of questionnaires. The participants were then brought back into the robot interaction area and again completed the same nine-minute story-telling interaction, but this time with a dierent stued toy. The same post-questionnaire protocol was followed afterwards. This completed the experiment. The above task selected for guiding the interaction is identical to one of the tasks used in the UTEP-ICT corpus, potentially allowing for comparisons between the datasets, although such a comparison is outside the scope of this work. The use of a task that centers around an object also provides a dataset that includes a resource other than the 110 conversational oor, making it more useful than the AMI corpus for the development and parameterization of algorithms in accordance with the moderation model. 5.3.1.5 Measures Before the rst interaction, participants were asked to complete a demographics ques- tionnaire; the interaction and social role subscales of the Negative Attitudes towards Robots (NARS) questionnaire (developed by Nomura et al. (2006) and translated by Bartneck et al. (2005)); and a 10-question Big Five personality inventory (Rammstedt and John, 2007). After each interaction with the robot, the participants completed a set of questions about the behavior of the other participants in the interaction, additional demographic details, questions about the relationships between participants, were asked what their favorite part of the task was, and were provided with space to provide additional com- ments, as used in the UTEP-ICT dataset (Herrera et al., 2010); the Group Cohesiveness Scale (Wongpakaran et al., 2013); and nally repeated the NARS questionnaire. Participants each wore a wireless headset microphone connected to a 6-channel USB receiver. The headset microphone audio was passed to the VAD algorithm, which pro- vided input to the system. Additionally, during the interaction, the output of the VAD algorithm was recorded, and used as a (conservative) measure of participant speech. Total speech during the interaction was analyzed, as well as speech during the 5 sec- onds following the end of each robot moderation behavior. For video recording, a high-denition webcam mounted behind the robot was used. 5.3.1.6 Participants Four groups of participants were recruited from the university and surrounding commu- nities. The average age of the participants was 30.3 years (SD = 6:8); there were 6 male 111 and 9 female participants. All participants had completed a bachelor's degree or higher. Groups 1 and 3 participated in the unmoderated interaction rst, while Groups 2 and 4 saw the conditions in the reversed order. Groups 1 through 3 had four participants who were strangers to one another; Group 4 had three participants who knew each other. Because of this, we exclude Group 4 from further statistical analysis. 5.3.2 Results This section presents the results of the validation study according to the measures described in Section 5.3.1.5. 5.3.2.1 System Performance The robot was able to successfully participate in four nine-minute interactions with the participants. The VAD algorithm used in this work was found to be a conservative measure of speech; qualitatively it was observed that although parts of most speech acts were detected, only louder portions of speech (e.g., vowels) were labeled as speech. This resulted in much lower than expected absolute values for total speech (as seen in Figure 5.5); however, relative speech was not impacted. In practice, the algorithm was primarily triggered due to a participant not speaking in an interval. Since the algorithm depends on relative speech to choose a target, this did not impact the choice of moderation targets. In practice, the moderation algorithm resulted in the robot asking a question approximately every 30-45 seconds; the system never went 60 seconds without being triggered; since some part of most speech acts were detected, the conservative nature of the VAD algorithm would not have resulted in the robot's behavior being triggered incorrectly. Because the robot only addressed the participants who spoke the least, the number of questions directed to each participant was not equal. As will be discussed in Section 112 Figure 5.4: Participant ratings of group cohesion (4-point scale); dashed lines indicate participants who were not asked questions by the moderator robot. 5.3.2.2, those participants who were not addressed by the robot had smaller changes in their speech and perceptions of the group than those participants who were. 5.3.2.2 Participant Outcomes Participants had higher scores on the group cohesion questionnaire in the moderated condition (M = 3:29;SD = 0:387) than in the unmoderated condition (M = 3:03;SD = 0:349). A paired t-test on the group cohesion scores showed marginal signicance (p = 0:06), supporting hypothesis H1 (Figure 5.4). Additionally, we found that participants spoke more (M = 99:08;SD = 48:86) in the moderated than in the unmoderated condition (M = 77:08;SD = 47:06). A paired t-test on the total participant speech showed a marginally signicant dierence (p =:06), supporting hypothesis H2 (Figure 5.5). As seen in Figure 5.5, speech was increased for most participants, not only the participants who spoke the least. Several participants whose group cohesion scores did not increase in the moderated condition were not addressed by the robot (Figure 5.4), and two of the three participants not addressed by the robot also did not see an increase in their total speech (Figure 5.5). No signicant dierence was found in the 113 Figure 5.5: Participant speech (seconds); dashed lines indicate participants who were not asked questions by the moderator robot. NARS interaction subscale scores or participant speech after the two dierent types of question. 5.3.3 Discussion We observed that overall participants had a positive interaction with the robot, nding it engaging and, in the words of one participant, \cute". Participants in the group that saw the moderator robot rst found it strange when the robot only participated in the conversation as an observer, writing on the free response portion of the questionnaire, \[the robot] is weird this time, worse than last time". In the group that saw the spectator robot rst, participants sat in near-silence for several minutes, despite being told to develop a story about the toy, suggesting that participants may have expected the robot to take on a moderator role or at least to lead the conversation and were waiting for it to do so. In the group cohesion questionnaire scores and overall participant speech we found support for two of our hypotheses, suggesting that using moderation behaviors on a robot in a multi-party interaction can improve the human-human interactions. However, we did not nd support for the hypothesis that the moderator robot would decrease participants' negative attitudes towards robots. This may have been because many 114 of the participants had low scores on the NARS (especially the interaction subscale) initially, or may have been due to the fact that the interaction was centered on human- human interaction, not human-robot interaction. A larger cohort of participants might also allow the correlation of NARS scores with perceptions of the robot and reactions to the robots moderation behaviors. This work reports a small n study used to validate the moderation model before de- veloping more sophisticated moderation algorithms that operate on additional resources and robot behaviors. The results serve to validate that the robot will be accepted by multi-party interaction participants and that the participants will attend to the robot's behaviors in similar settings. Finally, the study enabled us to collect training data from interactions that include a moderator (as in the AMI corpus) and a resource other than the conversational oor and participant attention (as in the UTEP-ICT corpus, which lacks a moderator). The results of the validation study provide support for the value of robots as mod- erators, and inform the algorithms and studies described in the following chapters. Researchers frequently nd that study participants are willing to do almost anything a robot says (e.g., (Bainbridge et al., 2011)); the described approach to moderation takes advantage of that impulse in order to improve not only human-robot but also human- human interaction. The presented results validate the use of a moderation algorithm in which the moderator monitors and acts in the interaction, and shows that such a robot can improve the cohesion of the group and can encourage participation in a group discussion. 5.4 Summary This chapter presented the core moderation model, in which the moderator takes social actions to aect the interaction state and improve performance relative to both task 115 and social goals. Based on this model, the core moderation algorithm denes the con- trol loop on which all the subsequent moderation algorithms are based: the moderator monitors the interaction and takes actions at regular intervals to aect the interaction. This core algorithm was instantiated for open-ended discussion tasks where the moder- ator is trying to encourage the least-talkative participant to speak more, and evaluated in a storytelling interaction. The moderator was shown to improve the interactions, including group cohesion and speaking time, except in one group where the participants already knew each other. These results support the use of a robot as the moderator in a group interaction, and provide a basis for the development of further moderation algorithms. 116 Chapter 6 Task Goal Moderation This chapter introduces the second moderation algorithm, in which the robot moderator takes actions based on the group members' performance on a shared task. This algorithm is instantiated for collaborative tasks with individual goals, where the moderator either equalizes or reinforces performance on the task. This instantiated algorithm is validated in peer group interactions, and it is shown that the moderator's actions change both group cohesion and helpful behavior, depending on both how the moderator chooses which participant to help and how often the moderator addresses each participant. The results presented in Chapter 5 support the use of a socially assistive robot moderator in group interactions and indicate that such a moderator may be able to improve the interaction, even without the use of speech understanding. In this chapter, we build on those results to develop an algorithm for moderation that provides task-related support to participants in the service of social goals. The moderator observes users' performance on the joint task, selects subgoals of the task according to some objective function, then suggests task actions to support the achievement of that goal. This algorithm is instantiated for collaborative tasks with individual subgoals, and with objective functions that either reinforce or equalize performance of the participants. 117 The algorithm is then validated in peer-group interactions in a collaborative assembly game, and it is found that the robot's task-related suggestions eect on the social outcomes of the game. 6.1 Moderation by Task Assistance The second moderation algorithm builds on the idea of monitoring interaction features, by incorporating elements of planning in order to support the group in achieving task goals. 6.1.1 Task Goal Moderation Algorithm In this algorithm we build on the control loop described in Section 5.2.1, incorporating basic planning in order to select moderator behaviors based on the current state of the task and a task-specic cost function that evaluates the relative utility of the goals. In Algorithm 6.1, the robot provides assistance to support goal g mod not yet achieved in the interaction, with cost U(g mod ), such that g mod =argmin(U(g)) (6.1) whereg is some goal not achieved in the current interaction state. As in Algorithm 5.1, these actions are taken at regular intervals. The goals and actions are both task-specic; an instantiation of this algorithm to reinforce or equalize performance in collaborative tasks is described in Section 6.1.2. 6.1.2 Collaborative Task Moderation Algorithm This algorithm is instantiated for collaborative tasks in which users have individual goals that contribute to the group's performance. We dene two dierent objective 118 Algorithm 6.1 Goal-Oriented Moderation Algorithm while Interacting do if Time elapsed > activity interval then Choose goal minimizing U(g) Choose an appropriate action to support g Reset timer else Monitor interaction end if end while functions by which the moderator chooses which goal to support, based on individual performance: one that increases with task performance and one that decreases with task performance. Specically, in the performance-reinforcing version, the robot chooses a goal that is closest to completion, while in the performance-equalizing version, the robot chooses a goal from the user who has completed the fewest number of goals for the team so far (i.e., the poorest-performing user). Thus, if s(p) is the number of goals achieved by userp,n is the total number of actions needed to complete the goal, andr(g p ) is the remaining distance to the goalg p , measured by the number of actions needed to achieve the goal g p : U reinforcing (g p ) =r(g p ) (6.2) and U equalizing (g p ) =s(p) + (1 (r(g p )=n)) (6.3) This algorithm is validated in a collaborative game context; the planning algorithm used to select user actions for the robot to request is described with the specic collab- orative game used in the validation study in Section 6.2. 119 Algorithm 6.2 Instantiated Goal-Based Moderation Algorithm for Collaborative Tasks while Elapsed time < interaction length do if Time elapsed > 20s then Choose goal g target =argmin(U(g)) Ask a user to take an action that helps achieve g target Reset timer else Look at players' screens end if end while 6.2 Validation of Task Goal Moderation This section presents the results of a validation study of the moderation algorithm described in Section 6.1.1. Per the algorithm described in that section, the moderator evaluates the individual task goals according to the participants performance, then chooses behaviors that either equalize or reinforce user performance on the task. The algorithm is validated in a collaborative assembly game, with 30 participants in 10 groups. We compare the eect of the two objective functions described in Section 6.1.2: performance-equalizing and performance-reinforcing. We show that changing the objective function by which goals are evaluated can change the interaction: the performance-equalizing function increases task performance and decreases group cohe- sion compared to the performance-reinforcing version. We nd that this aect may be caused by the frequency with which the robot addressed the participants. These results demonstrate that a robot moderator using the task goal moderation algorithm can af- fect both social and task features of group interactions, and that a robot moderator can support improved group cohesion without the use of speech understanding. 120 6.2.1 Methodology To evaluate the eects of the two dierent objective functions for moderation, we con- ducted a double-blind repeated-measures study with groups of three participants. Par- ticipants engaged in \training" sessions with a robot moderator and \testing" sessions without moderation, to examine whether the moderator has eects on group perfor- mance after moderation. Prior work found that social and task goals of a group may be opposed to each other (Viller, 1991). In this work, we address the question of whether the performance-equalizing (PE) moderator, while making short-term sacri- ces to task performance, might better prepare the group for long-term collaboration than the performance-reinforcing (PR) moderator. Our study addresses the following hypotheses related to this trade-o: H1: Task performance (as measured by points scored) will be better with the PR moderator. H2: Participants will report greater group cohesion with the PE moderator. H3: Participants will speak more with the PE moderator. H4: Participants will look around more (at the other participants) in sessions with the PE moderator. Taking social goals into account maintains intra-group relationships and improves long-term performance. Thus we include the following additional hypotheses: H5: Participants will speak more in the testing session following the PE moderator interaction. H6: Participants will look around more in the testing session following the PE moder- ator interaction. 121 H7: Participants will have better performance in the testing session following the PE moderator interaction. Additionally, to understand the participants' views of the specic robot, as well as robots in general, and if those views change after interacting with a moderator robot, we measured self-reported attitudes towards robots in general, and technology acceptance of the specic robot system to address the following hypotheses: H8: After moderation, participants will report a more positive attitude towards robots than reported before the interaction. H9: Participants will report greater acceptance of the robot as a technology after more sessions with the robot. 6.2.1.1 Task To study this model of robot behavior in an interactive and enjoyable collaborative activity, we developed a cooperative game loosely inspired by SPACETEAM 1 , a popular collaborative mobile game in which a steady stream of actions needing to be completed is provided to the team, with both information and abilities distributed among the players, so that players need to communicate in order to perform the actions. We use a tablet-based interface to minimize the eect of perception errors on moderation performance. The goal of the game is for the team to score as many points as possible by creating sets of colored shapes from gray squares. Each user sits in front of an angled tablet that only they can see, on which is displayed the game interface, seen in Figure 6.1. The interface consists of four \machines" that can apply shapes or colors to parts, an \Incoming Parts" area to hold parts that have been sent from other users, a \Finished 1 http://spaceteam.ca/ 122 Parts" area for parts that have been completed according to the goals shown in the lower right hand corner, and two areas that can be used for sending parts to the other users' screens. The goal of the game is to score as many points as possible; the team gets a point every time one of the participants places the two parts shown on the lower right into the \Finished Parts" area. The participants start with the gray square parts (seen in the center of Figure 6.1), and place them in the machines to give them the correct color and shape. The collaborative challenge of the game stems from the fact that no participant can complete all of their goals without help from the other participants. Specically, in the low diculty version of the game, used as a warm-up, each participant needs help with only one property in their goal (either one shape or one color), while in the high diculty version used in the rest of the sessions, each participant needs help with three out of the four properties; that is, the goal only used one of the machines on their own screen. Once a goal is achieved, the team scores a point, and the participant is given a new randomly-generated goal that maintains the diculty level. In the game, s(p) is the number of points scored by the user, n is the total number of properties in the goal (in this game, two properties for each of two parts, for a total of four), and r(g p ) is the number of properties remaining to be applied to achieve the goal. To help the users, the robot chooses the appropriate request from Table 6.1, using Algorithm 6.3, providing information about capabilities or needs the rst time it helps with a goal, then providing specic instructions on subsequent attempts. 6.2.1.2 Robot Behavior In order to generate the robot's specic behavior for this task, Algorithm 6.3 is used. This algorithm chooses a goal based on one of the objective functions described in Section 6.1.2. Once a goal is chosen, the algorithm enables the moderator to select the 123 Figure 6.1: Screenshot of one user's game screen in the collaborative game. 1 Player <name> needs a <property> token. 2 Player <name> can make <property> tokens. 3 Player <name> should give player <name> a <property> token. 4 Player < name > should give player < name > a token to make [into a] <property>. 5 Player <name> should put the <properties> token in the goal area. 6 Player <name> should make a <property> token. Table 6.1: Robot-participant requests in the collaborative game. action that most quickly achieves a sub-goal of the goal. The rst time the moderator helps with a goal, the robot only provides information, while the second time, the moderator requests a specic behavior. The set of requests that the robot could choose is found in Table 6.1. The interval for the moderators behavior was set to 20-seconds, and that time validated in a pilot study with three groups, where it was found by all participants to be an acceptable moderator input rate. 6.2.1.3 Experimental Procedure After completing informed consent, participants were given an initial set of question- naires to complete (described in Section 6.2.1.4), then invited to sit down at the table, 124 Algorithm 6.3 Collaborative Game Task Request Generation Algorithm g the goal,s g g an open subgoal ofg,T u the tokens held byu,p(t) properties already applied to token t if p(t) =s g for some tT , t not in the goal area then Choose Request 5 else ifjT u j = 0 for all users that can help s g then Choose a random s g if It is the rst time the moderator helped s g then Choose Request 1 else Choose Request 3 end if else Let t be the token maximizing p(t)\s g if User can apply a property in s g p(t) then Choose Request 6 else if It is the rst time the moderator helped s g then Choose Request 2 else Choose Request 4 end if end if end if end if as seen in Figure 6.2(a). Participants were told that they would be playing ve six- minute sessions of the game with the robot, and that the robot would provide initial instructions. The experimenter then left the room, and the robot gave a set of simple verbal instructions for the game. After six minutes from the start of the game, the robot said that time was up and the experimenter returned to the room. This procedure was repeated four more times at the higher diculty, alternating moderated and unmoder- ated sessions, and with the system randomizing the order of the two utility functions, so that the experimenter was blind to what type of moderation was taking place. After 125 (a) Experiment area setup (b) Robot used in the study Figure 6.2: The robot and experimental area setup for the validation study for task goal moderation. the third and nal sessions, both unmoderated, the participants completed additional questionnaires, described in Section 6.2.1.4 below. The study used the SPRITE robot as described in Chapter 3, shown in Figure 6.2(b). The robot is approximately 30 cm tall (without fur), with a 10 cm range of motion, and uses a mobile phone to display simple facial expressions. In this study, the robot was covered by soft fur, as seen in Figure 6.2(b). 6.2.1.4 Measures Before the rst interaction, participants were asked to complete a demographics ques- tionnaire, the interaction and social role subscales of the Negative Attitudes towards Robots (NARS) questionnaire (developed by Nomura et al. (2006) and translated by Bartneck et al. (2005)), and a 10-question Big Five personality inventory (Rammstedt 126 and John, 2007). The latter questionnaire will be used to develop models for personal- ized moderation, and is not analyzed as part of this study. After each interaction with the robot, participants completed the Group Cohesive- ness Scale Wongpakaran et al. (2013), the Unied Theory of Acceptance and Use of Technology (UTAUT) questionnaire Davis (1989) as modied by Heerink et al. (2010), and repeated the NARS questionnaire. Additionally, after the nal interaction, par- ticipants completed a set of questions based on those used in the UTEP-ICT dataset Herrera et al. (2010); this questionnaire includes questions about the behavior of the other participants in the interaction, additional demographic details, and questions about the relationships between participants. Long-form questions were asked about perceptions of the task and space was provided for any additional comments. In addition to the self-report measures, a number of behavioral measures were col- lected. First, participants' actions in the game were recorded, along with the number of points scored by each participant. Three USB cameras recorded each participant's face, and the OpenFace library was used to annotate video from the cameras for head pose Baltrusaitis et al. (2013, 2016). Headset microphones were used to obtain audio recordings of each participant's voice, and automatic voice activity detection was used to annotate the data, using the method from Van Segbroeck, Tsiartas, and Narayanan Van Segbroeck et al. (2013). 6.2.2 Results Thirteen groups of three participants were recruited for this study from university fresh- man computer science classes and department mailing lists. Two groups were excluded from the analysis due to malfunctions in the robot's programming, and one group was run with only two participants due to a no-show, and is also excluded from the analysis. 127 Ordering 1 Ordering 2 Pre-experiment questionnaires Unmoderated Warm-Up Performance-equalizing Performance-reinforcing moderation moderation Unmoderated Post-round questionnaires Performance-reinforcing Performance-equalizing moderation moderation Unmoderated Post-round questionnaires Table 6.2: The two orderings of conditions in the validation study for task goal moder- ation. Of the ten remaining groups (30 participants), there were 19 male and 11 female par- ticipants. 17 identied as Asian (including South Asian), 6 as white, 2 Hispanic/Latin, and 4 mixed-race or other. Participants' average age was 18.97 years (SD = 1:067). Participants had limited prior experience with robots, rating their prior exposure to any kind of robot as 1.93 on a 4-point Likert scale (SD = 1:01). 27 of the participants had lived in the United States longer than any other country, for an average of 17.87 years (SD = 3:14), the remaining 3 participants had lived in East Asia the longest. All participants spoke uent English. Six of the groups saw the performance-reinforcing moderator rst, and four groups saw the performance-equalizing rst. Due to technical issues, one group did not have audio or video recording, and one group did not have audio recording; those data were not included in the video or audio analyses. 6.2.2.1 Performance and Group Cohesion To test hypothesis H1, we compared participants' performance during the moderated conditions with a paired t-test, and found a signicant dierence in participants' scores in the two conditions [t(29) =2:0997;p < 0:05]. However, the scores were higher in 128 (a) Individual Score (b) Group Cohesion Figure 6.3: Score and group cohesion in the task goal moderation validation study. (`*' p< 0:05) the performance-equalizing condition (M = 6:60;SD = 1:83) than in the performance- reinforcing condition (M = 6:03;SD = 1:81). Relative to hypothesis H7, the score for performance-equalizing in the post-moderation session (M = 7:133;SD = 1:81) was also higher than the score for performance-reinforcing in the post-moderation session (M = 6:83;SD = 1:98), as seen in Figure 6.3(a). Testing hypothesis H2 with a paired t-test, we found a signicant dierence in participants' group cohesion scores between conditions [t(29) = 2:3794;p < 0:05]. Group cohesion, however, was higher in the performance-reinforcing condition (M = 3:39;SD = 0:44) than in the performance- equalizing condition (M = 3:26;SD = 0:46). These results can be seen in Figure 6.3(b). 6.2.2.2 Speech and Gaze To examine participant speech during the interaction, we rst ltered the speech by times when a given participant was the only speaker, eliminating cross-microphone speech. To test H3, we compared speech during the two post-moderation sessions with a paired t-test and found no signicant dierences. There were also no signicant dier- ences in speech during the moderation sessions (H3). To determine how participants' 129 Figure 6.4: Average standard deviation of head pitch and yaw for participants in the task goal moderation validation study. (`*' p< 0:05 `.' p = 0:06) head orientation changed during the sessions, we calculated the variance on head pitch and roll for each session for each participant, then performed the statistical tests. This allowed us to normalize for the mean head position, since each participant sat at a slightly dierent angle to the camera. There were no signicant dierences between the post-moderation conditions (H6), however, during the moderation conditions (H4), there was signicantly higher variance in participants' head pose in the yaw direc- tion [t(26) = 2:5553;p < 0:05], and a trend towards signicance in the pitch direction [t(26) = 1:9555;p = 0:06], with more variance in the performance-reinforcing condition (head pitch: M = 0:174;SD = 0:086, head yaw: M = 0:182;SD = 0:074) than in the performance-equalizing condition (head pitch: M = 0:153;SD = 0:072, head yaw: M = 0:165;SD = 0:060), as seen in Figure 6.4. 130 6.2.2.3 Attitude Towards The System Using an ANOVA to test H8, we found a signicant eect of order on the NARS Neg- ative \Attitudes toward Situations and Interactions with Robots" subscale [F (2; 58) = 14:70359;p < :001], but no dierences in the \Negative Attitudes toward Social In u- ence of Robots" and \Negative Attitudes toward Emotions in Interaction with Robots" subscales. A post hoc test with Bonferroni correction showed a signicant dierence be- tween the baseline (M = 1:41;SD = 0:44) and post-study (M = 1:72;SD = 0:55) scores on the \Attitudes toward Situations and Interactions with Robots" subscale (p adj = 0:044), and a dierence trending on signicance between the baseline and mid- study (M = 1:70;SD = 0:46) scores (p adj = 0:058). Analyzing the UTAUT question- naire results to test H9, we observed that ve of the subscales (Facilitating Conditions, Perceived Enjoyment, Perceived Sociability, Social Presence, and Trust) had modest increases from the mid-experiment to post-experiment questionnaires, and none of the subscales had any decreases from mid- to post-experiment. However, after correcting for the number of tests, there were no signicant dierences. 6.2.2.4 Eect of Moderator Speech To better understand the dierences between conditions, we performed a post hoc anal- ysis examining the eects of the number of times the robot addressed a participant, calculated by the number of times the robot spoke that participant's name. Based on prior work suggesting that participants in a robot-moderated multi-party interac- tion who were not addressed by the robot had lower group cohesion scores Short et al. (2016), we included two additional hypotheses: H10: Robot speech towards participants will be correlated with greater group cohesion. 131 H11: Robot speech towards participants will be correlated with greater helpfulness to- wards other participants. We examined correlations between this value and both self-reported group cohesion and participant helpfulness, as measured by the number of subgoals that a participant achieved for someone else. We found positive correlations with robot mentions of a participant and group cohesion scores (r(52) = 0:28;p< 0:05) and helpfulness (r(52) = 0:29;p< 0:05). Furthermore, to control for the fact that participants' behavior aected how the robot chose whom to address, we examined the change in the cohesion and helpfulness scores based on the change in the number of times the robot addressed the participant, between the two dierent moderation conditions. In that analysis, we found that the change in group cohesion scores was positively correlated with the change in the number of times the participant was addressed by the robot (r(25) = 0:44;p < 0:05) and that the change in participant helpfulness was positively correlated with the change in the number of times the participant was addressed by the robot (r(25) = 0:41;p< 0:05). 6.2.2.5 Attitudes Towards Moderated Activity The game itself was entertaining to the participants, with 10 out of the 30 participants specically stating that the game was fun when asked to write about their favorite part of the task or in the additional comments section. However, several participants felt that they did not pay attention to the robot, or that its advice was not as helpful as it could have been. This varied between conditions, with groups taking more of the advice given by the robot in the performance-reinforcing condition (M = 13:1;SD = 2:13) than in the performance-equalizing condition (M = 11:2;SD = 2:74), with the dierence trending on signicance [t(9) = 2:0827;p = 0:067]. This result can be seen in Figure 6.6. 132 (a) Change in group cohesion score (b) Change in number of times participant helped others Figure 6.5: Changes in participant behavior as a result of changes in the number of times participants were addressed by the robot in the task goal moderation validation study. 6.2.3 Discussion We developed an algorithm that enables a socially assistive robot to moderate a group activity based on task features and evaluated the algorithm in an assembly game task, with two dierent utility functions (performance-reinforcing and performance- equalizing), predicting that improved task performance would come at the cost of de- creased social cohesion, and vice versa. The results show that the predicted trade-o did occur, but evidence of improved social features, including increased group cohesion scores and more looking around (as measured by variance in head pose), were found in the performance-reinforcing rather than performance-equalizing condition, in contrast to the predictions made in H1, H2, and H4. Further analysis found evidence that this 133 Figure 6.6: Number of the robot's suggestions taken by the group in the task goal moderation study, out of 18. (`.' p = 0:067) result may have been caused by dierences in how the robot addressed the participants: in the performance-equalizing condition the robot was more likely to repeatedly address the participant with the lowest performance. We found that the more the robot spoke to participants, the higher group cohesion they reported (supporting H10), and the more they helped the other participants in the group (supporting H11). However, the robot's assistance did come at a cost: the performance of the group was higher in the un- moderated testing sessions (H6 and H7 unsupported), likely due to fact that the game required frequent verbal coordination between players, and any robot speech decreased the amount of time the players could discuss their strategy, although no signicant dierences in speech were found between sessions (H3 and H5 unsupported). Further research is needed to determine whether the long-term social benets of a robot modera- tor oset short-term decreases in task performance. Additionally, participants attitudes towards robots on the \Attitudes toward Situations and Interactions with Robots" sub- scale of the NARS became more negative (H8 unsupported), and their scores on the UTAUT inventory did not change (H9 unsupported), raising the question of whether 134 the robot increases group cohesion by increasing participants' identication with the other human users in contrast to the robot moderator. Finally, the game task we used was very successful at getting participants involved and excited about working together as a group. The robot's approach to moderating the interaction resulted in participants completing over half of the robot's suggestions, although there are opportunities for improving the timing and salience of the robot's suggestions. 6.3 Summary This chapter presented the second moderation algorithm: a task-feature-based algo- rithm that uses an objective function to select task goals for the moderator to support. The algorithm was instantiated for collaborative tasks with individual subgoals and validated in a study of peer-group interactions. In this study, there was a negative rela- tionship between task performance and positive social features, with the performance- reinforcing robot improving group cohesion and helpful behavior but reducing task performance, and the performance-equalizing robot improving task performance but decreasing group cohesion and helpful behavior. Further analysis suggests that this was partially caused by perceptions of group cohesion and willingness to help others being increased the more the robot spoke to a participant. These results demonstrate that task-related moderation can aect social factors, and show that the moderator can im- prove social features of the interaction even when participants report not paying close attention to the robot's behavior. 135 Chapter 7 Social Graph Moderation This chapter presents the third moderation algorithm, which directly models the social relationships between users as a directed graph and takes actions to aect those relationships. This algorithm is instantiated in the same collaborative task context as the previous algorithm, and validated with both groups of adults and family groups including older adults, adults, and children playing a collaborative assembly game. This chapter describes the third moderation algorithm, which focuses on directly aect- ing the social relationships between users in a group interaction. These social relation- ships are represented as a fully-connected directed graph, a natural representation for features such as speech, attention, or helping that have both a producer and a target. The algorithm is instantiated in the same collaborative tasks with individual sub- goals as the previous algorithm. Based on the results of the validation study of the task-feature moderation presented in the previous chapter, the robot ensures that it addresses all participants, and either attempts to reinforce or equalize the existing in- teraction dynamics. 136 7.1 Moderation by Social Intervention In this section, we enhance the algorithms described in the previous sections in order to address pairwise social dynamics and enable the moderator change them. This enables the moderator to monitor and in uence not only the task goals, but also the social dynamics of the interaction, many of which occur at the level of directed pairwise interactions between users. 7.1.1 Social Feature Moderation Algorithm Algorithm 6.1 enables the moderator to address social goals on pairwise social features. Unlike task goals, which are achieved (or not) independently of the relationships between players, many social goals depend on ongoing relationships between users. Thus we model social features as a fully-connected directed graph, and the moderators behaviors as in uencing the weights of the edges in the graph. Many social features have this property of directedness, including low-level features such as speech and gaze, and high- level features like feelings of friendship and social in uence. We then dene a selection function f(S) that maps from the social feature graph S to pairs of users: f(S)!i;j (7.1) This algorithm uses the same interval-based timing, and updates the values of the edges s ij connecting vertices representing users i and j. 7.1.2 Collaborative Interaction Moderation Algorithm In the SAR context, where social interactions are goal-directed, we instantiate this algo- rithm to address the social feature of helpfulness, that is, how many times participant i helps participantj achieve a goal. Furthermore, we control the number of times that the 137 Algorithm 7.1 Pairwise Social Feature Moderation Algorithm while Interacting do if Time elapsed > set interval then Choose users i and j based on f(S) Choose an appropriate action to change s ij Reset timer else Monitor interaction and update s ij end if end while Algorithm 7.2 Instantiated Social Feature Moderation Algorithm for Collaborative Tasks Addr n the number of times the robot has spoken to player n while Elapsed time < interaction length do Find i;j according to Equation 7.3 or Equation 7.2, i or j argmin(Addr n ) Engage in an attention-acquisition behavior Ask user i to help user j Wait 20s end while robot addresses each participant, based on the results of the validation study described in Section 6.2, that show that the number of times the robot speaks to participants may aect their helpful behavior. This is achieved by ensuring that the robot always includes the least-frequently-addressed participant in its statement. In this context, we again examine a moderator that reinforces or equalizes the target element of the interaction, in this case the helpfulness of the users. In order to do this, the moderator chooses users i and j according to either: i;j =f(S) =argmax(s ij ) (7.2) or: i;j =f(S) =argmin(s ij ) (7.3) 138 Algorithm 7.2 controls the robot's behavior in collaborative contexts, by choosing the user who has been addressed the fewest times by the robot, then choosing the minimum-weight edge in the helpfulness graph connected to that user. The robot en- gages in an attention acquisition behavior, similar to the nonverbal utterances in the algorithm described in Section 5.2, then generates a suggestion of helpful behavior. These suggestions are generated in a task-specic way, and are described in Section 7.2.1.2. This algorithm is validated in a study that includes both intergenerational and peer-based groups, described in Section 7.2. 7.2 Validation of Social Feature Moderation We developed a validation study in order to investigate the eects of the social fea- ture moderation, where the edges in the social graph model helping behavior between the participants, and with cooperation-reinforcing and cooperation-equalizing mapping functions to select edges in the social graph, as described in Section 7.1.2. We evaluate the algorithm in the same collaborative game interaction as described in Section 6.2.1.1. 7.2.1 Methodology This study used the same collaborative game as the validation study for the task-based moderation algorithm (Section 6.1.2), but was evaluated with both peer based groups, and family groups including older adults, adults, and children. As described in Section 7.1.2 the two variations of the algorithm are intended to either reinforce (RE condition) or equalize (EQ condition) helpful behavior between the participants. This study was conducted with a double-blind methodology for increased validity. For both peer groups and family groups, we predicted that the robot's behavior would in uence the group dynamics as follows: 139 Figure 7.1: Experimental room setup for social feature moderation validation study. H1: Participants will help each other more equally in the EQ condition than in the RE condition. H2: Participants' overall helpfulness will be increased more by the EQ moderator. H3: Participants will report greater group cohesion with the EQ moderator. Additionally, we predicted that reinforcing established social dynamics would result in less cognitive load, and therefore higher performance than changing the dynamics, thus: H4: Task performance will be higher in the RE condition than in the EQ condition. Relative to the dierences between family and peer groups, we predicted that due to having more established social dynamics, families would prove more dicult to in uence than peer groups. Based on the results described in Section 4.3, we also make predictions about specic dynamics that will occur in families: H6: Family groups will have smaller changes in their behavior due to the EQ behavior than peer groups. H7: Adult family members will be more helpful than older adult or child family mem- bers. 140 Figure 7.2: Screenshot of one user's game screen in the social feature moderation vali- dation study. H8: Child family members will receive the most help. 7.2.1.1 Task We study the interaction in the context of the collaborative game described in Section 6.1 in which participants score points for the team by making colored shapes. To increase the appeal of the interaction, the appearance of the game screen was updated as seen in Figure 7.2. In this game S ij , the edge weights in the social feature graph, are updated each time playerj scores a point for the team, by counting the number of subgoals that were accomplished with playeri's machines. ThusS ij is the total number of times that player i helped player j by running a piece through a machine. 7.2.1.2 Robot Behavior As described in Section 7.1.2, the robot selects behaviors that either reinforce or equal- ize the social dynamics of the interaction. The specic robot requests are generated by 141 Algorithm 7.3 Collaborative Game Assistance Request Generation Algorithm To encourage player i to help player j P (n) the properties that player n can apply G all (n) all properties in player n's goal G open (n) the properties still needed by n if P (i)\:P (j)\G open (j)6=; then Choose type 1 request with property in P (i)\:P (j)\G open (j)6=; else if P (i)\G open (j)6=; then Choose type 1 request with property in P (i)\G open (j)6=; else Choose type 2 request end if end if Algorithm 7.3, and can be seen in Table 7.1. This algorithm provides a specic sugges- tion to support the helpfulness relationship if possible, and if not, simply suggests that user j should help user i more. 1 Player i should give player j a (property) token. 1 Player i should help player j with a (property) token. 1 Player j needs a (property) token from player i. 1 Player j needs help with a (property) token from player i. 2 Player i should help player j more. 2 Player i should give player j more tokens. 2 Player j needs to be helped by player i more. 2 Player j needs more tokens from player i. Table 7.1: Robot suggestions for helpful behavior in the social feature moderation vali- dation study. 7.2.1.3 Experimental Procedure After completing the informed consent procedure, participants were asked to complete the rst set of questionnaires (described in detail in Section 7.2.1.4). Participants were then brought into the experimental area (Figure 7.1) and seated behind each of the tablets. In family interactions, the child was seated in the center seat with the adult 142 Ordering 1 Ordering 2 Pre-experiment questionnaires Unmoderated Warm-Up Helpfulness-equalizing Helpfulness-reinforcing moderation moderation Unmoderated Post-round questionnaires Helpfulness-reinforcing Helpfulness-equalizing moderation moderation Unmoderated Post-round questionnaires Table 7.2: The two orderings of conditions in the social feature moderation validation study and older adult on either side. The participants then played one warm-up round without moderation at an easier diculty level, followed by alternating moderated and unmod- erated rounds at the higher diculty level. After the third and nal rounds, participants were asked to complete additional questionnaires, as described in Section 7.2.1.4. The procedure is summarized in Table 7.2. 7.2.1.4 Measures Three cameras and three headset microphones were used to record head-on video and speech audio for each participant, enabling the use of automatic annotation approaches. Additionally, questionnaires were administered to participants at the beginning of the interaction and after the third and fth rounds of the game. For the peer groups, these questionnaires consisted of a demographics questionnaire and short-form Big-5 personality questionnaire (Rammstedt and John, 2007) before the interaction; a set of questions based on those used in the UTEP-ICT dataset Herrera et al. (2010) after the interaction; the interaction and social role subscales of the Negative Attitudes towards Robots (NARS) questionnaire (developed by Nomura et al. (2006) and translated by 143 Bartneck et al. (2005)) in all three questionnaire sessions; and the Group Cohesiveness Scale (Wongpakaran et al., 2013) and the Unied Theory of Acceptance and Use of Technology (UTAUT) questionnaire, by Davis (1989) as modied by Heerink et al. (2010), after each of the unmoderated interactions. The family groups completed the same set of questionnaires, but children were not administered the Big-5 inventory, and only children over 12 years old were asked to complete the other questionnaires. The interaction data were analyzed to determine how much each participant helped other participants, how much each participant was helped by other participants, how many points were scored by each participant and the team, and the overall variance in the helpfulness relations between participants. 7.2.2 Results Six groups of participants were recruited for the study, including six groups of adults and two family groups. In the adult groups, there were 8 male and 8 female partic- ipants (2 participants declined to provide demographic information), and the average age of participants was 21.63 (SD = 8:07). The family groups consisted of one family ages 65 (male), 35 (female), and 7.5 (male), and one family with ages 85 (female), 57 (female), and 12 (male). All adult participants who provided educational information had completed at least some college. One adult group experienced technical diculties during the session due a robot malfunction, and is excluded from further analysis. We analyzed the standard deviation of the weights of the edges of the social graph, that is, the amount that each participant helped each other participant. As seen in Figure 7.3, these results were inconclusive, with the standard deviation varying widely across groups, including one group, represented by the red line, that was a substantial outlier. Thus support for H1 is inconclusive. 144 Figure 7.3: Standard deviation of users' helpful behavior across the collaboration study conditions. Dotted lines represent family groups. Preliminary results on users' helpful behavior, as seen in Figure 7.4, however, shows some support for H2, with participants engaging in more helpful behavior in the EQ condition. Furthermore, all groups experienced an increase in helpful behavior between the EQ condition and the following unmoderated condition, and also experienced a decrease in helpful behavior from the RE condition to the following unmoderated con- dition, suggesting that equalizing helpfulness across groups might improve longer-term collaboration. Finally, as seen in Figure 7.5, participants generally reported very high group cohesion scores, with mixed results as a result of the conditions. 7.2.3 Discussion We conducted a study with ve groups of participants, including two inter-generational family groups, in order to validate the social feature moderation algorithm that models pairwise social features of the interaction and intervenes to change them. In this case, the moderator models helpful behavior and takes actions to support participants in helping each other with the task. We found preliminary evidence that the EQ moderator 145 Figure 7.4: Users' helpful behavior across the collaboration study conditions. Dotted lines represent family groups. increased helpful behavior in the session following the moderated session, in which the robot only observed the interaction, while the RE moderator decreased helpful behavior in the following session, providing limited support for H2. However, due to a high degree of variation between groups, the results were inconclusive about the eect of the robot on the standard deviation of the edge weights in the social graph, and thus H1 was not supported. Additionally, since helpfulness is directly related to the user's score in the game (participants must help each other in order to score points), we did not nd support for H4; in the cases where the participants engaged in more helpful behavior, they had higher scores. Finally, group cohesion scores did not show a strong dierence between conditions, therefore we did not nd support for H3. These results suggest that the robot moderator is able to change the social features of the interaction when running the social feature moderation algorithm. Additionally, these changes may endure even when the robot has stopped moderating and is merely observing the interaction. As we continue to collect data as part of this study, we expect that some of these results may become signicant. 146 Figure 7.5: Mean of groups' self-reported group cohesion across the collaboration study conditions. Dotted lines represent family groups. This study provides further support for the notion that a robot can aect the social dynamics of interactions without the need for speech understanding. Although some participants report not feeling like they paid attention to the robot, their behavior changes nonetheless as a result of the moderation algorithm being used. 7.3 Summary This chapter presented a social-feature-based moderation algorithm that models the social features of the interaction as a fully-connected graph, and where the socially assistive robot moderator takes actions to aect those social features. This algorithm was instantiated in the context of collaborative tasks with individual sub-goals, and validated in a study that included both family groups and groups of adults playing a collaborative game. This study showed that the robot moderator's behavior changed the team's performance and helpful behavior even in subsequent sessions where the moderator was present but not attempting to change the interaction. These results 147 suggest that a socially assistive robot moderator can manage the social dynamics of an interaction and that the eects may endure even when the interaction is no longer moderated. 148 Chapter 8 Combined Task and Social Moderation This chapter presents the nal moderation algorithm, combining the social- feature and task-based moderation of the previous two chapters into a single algorithm that selects both a task goal and edge in the social graph to support. This algorithm is instantiated in collaborative learning tasks, and validated with synthetic data and in group interactions with children with autism and their families. This chapter describes the nal moderation algorithm developed as part of this disser- tation work. This algorithm brings together the key features of the task-based moder- ation and social-feature moderation to both support task goals and manage the social dynamics of the interaction. The algorithm is instantiated for collaborative learning tasks, where the task goal selection reduces to selecting the user who will complete the next exercise, and the social feature selection focuses on supporting collaboration between users while preventing a more skilled user from taking over all of the turns in the interaction. 149 This algorithm is evaluated rst with synthetic data simulating high- and low-skill users collaborating on a task, then in a number concept learning game played by chil- dren with autism and a sibling in the presence of a parent. Leveraging the results of the nutrition learning study presented in Section 4.1, the robot uses a graded cueing approach to both encourage collaboration and provide feedback on the task. This work demonstrates that the algorithm is able to handle mixed-skill interactions and support collaborations in learning tasks. 8.1 Moderation for Social and Task Performance The goal of the fourth moderation algorithm is to integrate the goal-based task feature moderation used in Section 6.1.1 task and the social feature moderation described in Section 7.1.1. We also introduce the use of task-based timing of moderation behaviors, for tasks with natural opportunities for moderation such as in turn-based activities. This enables the moderator support task goals in the interaction, while still in uencing the social dynamics in desirable ways. 8.1.1 Combined Task Goal and Social Feature Moderation Algorithm Algorithm 8.1 combines the task-based moderation of Algorithm 6.1 with the social moderation of Algorithm 7.1. This algorithm includes both a set of goalsG =fg i g with cost function U(g i ), and social feature graph S =fs ij g. The algorithm rst chooses a goal to address withU(g i ), then chooses an edge in the social graphS that includes the vertex for the user whose goal is being addressed. The moderator then chooses one or more actions in order to support both the task goal and the social feature. When not intervening, the moderator monitors the interaction. This nal algorithm introduces the idea of moderation opportunities: task-dependent times when the moderator's input is appropriate, such as between turns or during task feedback. For tasks without such 150 Algorithm 8.1 Combined Task and Social Moderation Algorithm while Interacting do if Moderation opportunity available then Choose goal belonging to user i, minimizing U(g i ) Choose user j based on f(S;i) Choose one or more actions to support g i and change s ij else Monitor interaction and update social features s ij end if end while opportunities, the same interval-based timing as in the previous algorithms could be used. 8.1.2 Cooperative Learning Task Moderation Algorithm This algorithm is instantiated for cooperative learning tasks in which a group of users solve a series of exercises which can be grouped in to types and for which there is a variety of diculty levels. In this domain, the robot uses graded cueing to support the users in completing the interaction. Graded cueing is an occupational therapy technique in which the therapist provides increasingly specic support for a user trying to solve a problem (Bottari et al., 2010). Thus we consider the following features: 1. Co-action,C i;j , how much playeri has intervened during turns assigned to player j 2. Performance, P i (e;d), how much of the graded cueing sequence is expected to be left when i is to be able to solve exercise e at diculty level d. A value of 1 indicates that the user should be able to solve the exercise on the rst try, while a value of 0 indicates that the user would not be able to solve the exercise. \Co-action" is closely related to the idea of helpfulness in Section 7.1, but with the understanding that having one user intervene too much in another's turn is as 151 undesirable, if not more so, than having too little interaction between the users. Based on these two interaction features, the robot provides two types of feedback: rst, at the beginning of each exercise, the robot chooses a user to complete the next exercise, and second, the robot provides graded-cueing feedback to encourage the players to help each other in the exercise and provide hints. In this domain, the task goals are solving the exercises, and the robot selects which user who will be asked to complete the exercise. The type of the exercise is held to a specic sequence to ensure that the users see every type of exercise. The user i is chosen to minimize the following utility function, with t i the number of turns assigned to player i, and ties broken randomly: U(i) =t i jfc k;i ;k6=i;c k;i = 1;8kgj (8.1) We select the users i and j from the social graph S, holding i to be the same user as calculated from Equation 8.1, above, and choosing behaviors to reinforce s i;j =c j;i j =f(S;i) =argmin(c j;i );j6=i (8.2) Skill is modeled as the likelihood that a user will provide a correct answer for an exercise, and is calculated as a weighted average of the user's performance on all the exercises. Taking into account the structure in the exercises, the weights are calculated based on a Gaussian distribution scaled to be 1.0 for the same diculty of the same exercise, then decrease to 0.01 for maximally dierent exercises (predicting performance on a level 5 exercise based on a level 1 exercise of a dierent type). Additionally, the weight is held to the maximum value on the diculty axis for correct answers on more dicult exercise and for incorrect answers on easier exercises, since we expect that being able to solve a more dicult exercise indicates that the user is likely to solve an easier 152 Algorithm 8.2 Instantiated Social and Task Moderation Algorithm for Turn-Based Learning Games for Exercise in sequence do Engage in an attention-acquisition behavior Get next player i from Equation 8.1 Get helper j from Equation 8.2 Generate behavior, including selecting the diculty d of the next exercise to en- courage user j to help user i with the next exercise Get annotation c j;i if c j;i < 1 then Update P i (e;d) else Update P j (e;d) end if end for exercise, while not being able to solve an exercise indicates that the user is unlikely to be able to solve a more dicult exercise. More precisely, the weights within a single exercise are calculated as follows: w(d) = 8 > > < > > : k; d< 0 if correct, d 0 otherwise. ke d 2 6:949 ; 0 if correct, d< 0 otherwise. (8.3) The constant k is dened using the same scaled Gaussian to be 1.0 within the same exercise, and 0.1 for exercises of a dierent type: k = 8 > > > > > > < > > > > > > : 1:0; for dierent diculties of the same exercise 0:5624; for other exercises of the same type 0:1; for exercises of a dierent type (8.4) To initialize the model, we use a prior of 0:5 with weight 0:1. Then for each exercise and diculty, we can calculateP i (e;d), the probability that useri can complete exercise 153 e at dicultyd using the weighted average performance on every exercise that the user has seen, where performance on a single exercise is measured as follows: p n = 1 number of graded cueing prompts used total graded cueing prompts (8.5) So if w n is the weight calculated from equation 8.3 for the rst n exercises the user has completed: P i (e;d) = 0:5 0:1 + P n w n p n 0:1 + P n w n (8.6) To calculate C i;j , the amount that user i has intervened in user j's turns, we take the mean of c i;j for all turns in the session, where: c i;j = 8 > > > > > > < > > > > > > : 1:0; if player i solved the problem for player j 0:5; if player i collaborated with player j 0; if no collaboration occurred between player i and player j (8.7) That is, C i;j is calculated as follows: C i;j = P c i;j n turns (8.8) The values ofc i;j are obtained from live annotation by a human observer in order to minimize the eects of classication error on the results of the algorithm; these labels could be combined with the collected data to create an autonomous co-action detection system, but such a model is beyond the scope of this work. 154 8.2 Validation of Task Goal and Social Feature Modera- tion The combined social and task-based moderation is evaluated in a study of collaborative learning games, in which the robot monitors user performance in order to select task goals and helping behavior in order to select edges in the social graph, and chooses behaviors to support collaboration and assign exercises to participants. This algorithm was validated with synthetic data, and the use of a robot in this context was validated in interactions with two siblings and one parent playing a number concept learning game with the robot. 8.2.1 Methodology In this context we studied three variations on the social and task moderation algorithm controlling the robot's behavior: the full moderation (FM) condition, in which the robot engaged in both social and task moderation, using the skill model and the co-action annotations; the goal-only moderation (GM), which modeled user skill to select exercises but did not encourage cooperation; and the social-only moderation (SM), which used co-action annotations to provide input but did not include skill or turn-taking modeling to choose the goal. 8.2.1.1 Task The task consists of numeracy and early mathematics exercises with a space theme, as seen in Figure 8.1. The exercises are implemented on a tablet interface to minimize the eect of perception errors on moderator behavior. To complete the exercises, shapes on the screen need to be selected or dragged between locations. The exercises address various areas of early numeracy, grouped into three broad categories: number concept 155 Figure 8.1: Screenshot of the game screen for number concepts games used in the combined moderation validation study. learning, ordering and organization, and patterns and sequencing. Each exercise also has ve diculty levels, based in most cases on the magnitude of the quantities involved. 8.2.1.2 Robot Behavior As described in Section 8.1.2 the core algorithm involves making three decisions: which player to ask to do the next exercise, whether to encourage more collaboration in the graded cueing responses, and what the diculty level of the next exercise should be. These values are calculated, then used to control the robot's behavior via Algorithm 8.3. Examples of the robot speech generated by these algorithms can be found in Tables 8.2 and 8.1. The algorithm is evaluated with 3 variations: rst, the full algorithm that takes into account both the helping activity and participant skill level, then two variations: one that does not model skill level and one that does not model co-action. In the rst variation, the diculty of the exercise is chosen randomly, and in the second, the robot 156 Level Encouraging Collaboration Not Encouraging Collaboration 0 That's not quite right! Maybe player (j) has an idea? That's not quite right! Try again. 1 (Level 1 hint) Could you get help from player (j)? (Level 1 hint) 2 (Level 2 hint) Player (j), do you know the answer? (Level 2 hint) 3 (Level 3 hint) Player (parent) can you help? (Level 3 hint) 4 Let's try a dierent problem Let's try a dierent problem. Table 8.1: Graded-cueing statements to encourage collaboration in the combined mod- eration validation study. 1 Player (i) can do the next one and player (j) can help. 1 2 Player (i) should try the next exercise. 2 Table 8.2: Robot speech for choosing participant to complete next exercise in the com- bined moderation validation study. strictly alternates which user is asked to complete the task, never uses collaboration- encouraging statements in the graded-cueing approach (Column 2 of Table 8.1), and always uses the Type 2 requests from Table 8.2. 8.2.1.3 Procedure The interaction was conducted in a room with a table on which the robot and a tablet are placed. The participants were seated such that the children were in front of the robot and the parent was seated to one side. Cameras were set up to capture video of each participant's face, as well as to audio-record the interaction. A schematic of the setup can be seen in Figure 8.2. Participants rst completed the informed consent process, then the parent completed the Child Behavior and Temperament Questionnaire (Putnam and Rothbart, 2006) 157 Figure 8.2: Experimental room setup for combined moderation validation study. for each child. This was followed by a familiarization period in which the children played through a tutorial explaining the interface. Data were not collected during the familiarization period. Following the familiarization, the family group interacted in three sessions with the robot, each consisting of 15 exercises and lasting approximately 7-10 minutes. The robot engaged in moderation behavior in each of these sessions, providing graded cueing responses as well as suggestions for which participant should complete the next exercise. The co-action feature was obtained from live annotation by an annotator on a separate computer, observing the interaction through USB cameras and a two-way mirror. The order of the conditions was counter-balanced across sessions, and both the experimenter and participants were blind to the order of conditions. 8.2.1.4 Measures Participants were audio- and video-recorded for automatic analysis. Additionally, all game information was recorded, as well as the live annotations. The Short Form Child Behavior and Temperament Questionnaire was collected for each child (Putnam and 158 (a) Estimated Skill at Exercises (b) Diculty of Exercises Chosen Figure 8.3: Modeled skill and output diculty for random-skill user. Rothbart, 2006). The video data were analyzed for head pose, and the game informa- tion was analyzed to determine how many turns each child took, how much co-action occurred, how much the parent intervened, and the performance of each child at the task. 8.2.2 Algorithm Validation on Synthetic Data In order to validate the core elements of the algorithm, we tested the system with synthetic data generated based on a variety of conditions that might be found in sibling interactions. We examined the output of the algorithm in terms of the predicted skill of the participant, the behavior of the robot, and the diculties of the exercises chosen. For the synthetic player behavior, we used four types of performance behavior and ve types of collaborative behavior. The performance behavior patterns were as follows: 159 1. Both children have a high level of skill at the game, with performance increasing as diculty decreases. 2. Both children have a low level of skill at the game, with performance increasing as diculty decreases. 3. One child has a high skill level while the other has a low skill level. 4. Each player has a low level of skill on a single exercise type. 5. The players' skill at the game is generated randomly. The collaborative behavior patterns were: 1. One player takes over the other player's turns occasionally. 2. The players only help each other if they have a high level of skill at the exercise. 3. The children's collaboration behavior is generated randomly. Each combination of behaviors was run through a \session" of 300 randomly- generated exercises. This number of exercises was determined by the application do- main: it is the number that children would experience by playing 15 exercises per day, ve days per week, for four weeks. 8.2.3 Results We found that with synthetic data, the algorithm was able to successfully estimate users' skill at the games. Figure 8.3(a) shows the algorithm's convergence given random input. Because the number of incorrect answers was randomly chosen from the interval [0; 5] (performance in the interval [0; 1] in increments of 0:2), the user's average performance over many instances of the tasks is 0:5, which the algorithm converges to. However, the diculty levels of the exercises chosen vary widely, because the distribution of skill across exercises is at, as seen in Figure 8.3(b). Figures 8.5(b) and 8.4(b) show the algorithm's 160 (a) Estimated Skill at Exercises (b) Diculty of Exercises Chosen Figure 8.4: Modeled skill and output diculty for low-skill user. output diculty for the exercises with a high-skill and low-skill user, respectively. As expected, the algorithm quickly converges to a low diculty for the low-skill user, and high diculty for the high skill user. However, as seen in Figures 8.5(a) and 8.4(a), these diculties result in similar skill estimates, ideally resulting in each participant being challenged in the activity. In mixed-skill groups, the algorithm selects the diculty of the exercise in order to encourage collaboration. Figure 8.6(a) shows the diculty selection for the higher- skill user in a mixed-skill interaction. In that case, the higher diculty level is chosen when the robot is not encouraging collaboration, and the lower diculty level is chosen when collaboration is too low. In comparison, the task-goal-only moderation results in a lower degree of collaboration, as seen in Figure 8.6(b). Collaboration is slightly higher in the social-feature-only moderation, since the diculty is chosen randomly, and the 161 (a) Estimated Skill at Exercises (b) Diculty of Exercises Chosen Figure 8.5: Modeled skill and output diculty for high-skill user. lower skill user may be able to help with some of the less-dicult, randomly-produced exercises. The skill model is also able to handle the case where a player has a low level of skill in only one game type. As seen in Figure 8.7(a), the algorithm quickly converges to provide a higher level of diculty for one game than the other. Additionally, as shown in Figure 8.7(b), the algorithm has matched the diculty level has been matched to the user's skill at each game in order to create more similar performance. Finally, in the condition where one player takes over some exercises from the other, the task goal algorithm compensates, and assigns additional turns to the player who is being overshadowed. Additionally, the robot stops encouraging collaboration during that player's turn, while still encouraging collaboration during the turns of the player who is taking over. In the test dataset, Player 2 took over 110 of Player 1's turns, and 162 (a) Output diculty for higher-skill user in mixed-skill interaction. (b) Co-action scores for algorithm variations (red: full algorithm, green: no task goal modeling, blue: no social goal modeling) Figure 8.6: Modeled skill and output diculty for high-skill user. so Player 1 was given 204 turns to Player 2's 95 turns. Additionally, collaboration was encouraged only for the rst four of Player 1's turns, and for all of Player 2's turns. The task-goal-only moderation split the turns evenly, and never encouraged collaboration. 8.2.4 Pilot Validation with Families Following the protocol described in Section 8.2, four families were recruited to play the number concepts game with the Kiwi robot. All families included one child 4-6 years old, and one older sibling 4-13 years old. In three of the families, the younger child was a child with autism. One of the families terminated the study early after the younger child withdrew assent to participate due to becoming bored with the interaction. Overall, we observed that the families were accepting of the robot in the moderator role, and tried to execute the exercises as requested by the robot. The siblings were able 163 (a) Output diculty for two game types for a user with low skill at one game. High-skill game is shown in green and low-skill game in blue. (b) Estimated skill for two game types for a user with low skill at one game. High-skill game is shown in purple and low-skill game in red. Figure 8.7: Modeled skill and output diculty for mixed-skill user. to collaborate on most of the exercises, although as discussed below, the collaboration was not always equal. We found that the behavior of the parent had a strong eect on the exercise per- formance and collaboration between siblings, with most parents providing scaolding and graded-cueing style feedback before the submission of a response to the robot, so that participants got the exercises correct \on the rst try" from the perspective of the system: group 1 averaged 1.3 cues per exercise, group 2 averaged 0.3 cues per exercise, and group 3 averaged 0.0 cues per exercise. Because of this, the graded cueing system was rarely triggered, however, the algorithm compensated by encouraging collaboration before the beginning of the exercise. The parent also mediated the interaction in col- laboration with the robot, helping the children develop strategies for working together, 164 such as taking turns moving tokens into the target area, or each solving part of the exercise. Although we observed that the children were able to collaborate to some extent, in many cases the one child dominated the interaction by telling the other sibling what to do, or doing more than half of the exercise during what was intended to be their sibling's turn. Although the parent sometimes attempted to reduce this behavior, the collaboration-encouraging statements of the robot in some cases also encouraged the more aggressive sibling to continue to dominate the interaction. 8.2.5 Discussion We conducted a validation of the combined task goal and social feature moderation algo- rithm using synthetic data, in order to explore the algorithm's behavior under a number of conditions. We found that the algorithm behaved as expected when estimating user performance, and that the full algorithm was better able than either the task-goal-only or social-feature-only variations to handle mixed-skill users who only collaborate when they know the answer to a problem, as well as the case where one player regularly takes over the other player's turn. We also conducted a preliminary validation of the system with four families, and found that the robot was able to work with the parent to encourage siblings to collab- orate on number concepts exercises. We found support for the design of the algorithm, especially encouraging collaboration both during graded cueing and before the start of the exercise. The adaptive diculty modeling was able to suggest more dicult exercises that the children were able to complete with the parent's support. We also nd promising directions for future work, including modeling dominance and suggesting strategies for collaboration, and having the robot more explicitly leverage the parent's presence in order to improve the interaction. 165 8.3 Summary This chapter presented a moderation algorithm that combines task-based and social- feature moderation by rst choosing a task goal associated with some user, then choos- ing an edge in the social graph including that user. This algorithm was instantiated for collaborative learning games, and evaluated with both synthetic data and in an interac- tion with children with autism and their families. The algorithm was shown to quickly adapt to dierent skill levels, and to support collaboration even in the case where the children had a very high performance at the task. In the pilot study, the use of the combined algorithm for moderation of family interactions was accepted by participants, and was shown to be eective at encouraging collaboration, especially with the help of a co-present parent. 166 Algorithm 8.3 Turn-Taking Collaborative Task Behavior Generation Algorithm Given next player i, helper j C j;i calculated from Equation 8.8 P i (e;d) is calculated from Equation 8.6 if C j;i > 0:75 then c =False, do not encourage collaboration else c =True, encourage collaboration end if Calculate ideal diculties d i and d j , such that d n is the most dicult exercise such that d n =d m in, the minimum diculty or P n (e;d n )thresh, for n =i and n =j if C i;j < 0:25 then d =min(d i ;d j ) else d =d i end if Engage in an attention-acquisition behavior if c and P i (e;d)> 0:8 then Say request for player p from Table 6.1, type 1. else Say request for player p from Table 6.1, type 2. end if Set diculty of next exercise to d. Let k = 0 for Each incorrect answer do if c then Say graded cueing response from Table 8.1 column 1, at level k else Say graded cueing response from Table 8.1 column 2, at level k end if Increment k end for 167 Chapter 9 Summary This dissertation presented a model that treats moderation as a decision-making prob- lem in which the moderator must choose social behaviors that support task goals and in uence social dynamics in goal-oriented multi-party interactions. Using the model, four algorithms for moderation were developed to enable a socially assistive robot to support multi-party human-human interactions. Using the results of the studies of human-robot interaction presented in Chapter 4 to inform the behavior of the robot, four validation studies were performed, one for each of the moderation algorithms. These studies showed that participants were accepting of the robot in the moderator role, and that a robot moderator could support positive features of human-human interaction such as helpfulness and group cohesion. Moderation was formalized as the process by which an autonomous agent can regu- lates a multi-party interaction by monitoring the interaction state and taking actions to support both task and social goals. Based on the model of moderation, four algorithms were developed. The rst algorithm established the basic control loop for moderation: monitoring the state and taking actions at regular intervals to support users' interaction. The second algorithm built on this work to incorporate elements of planning for task 168 goals: the SAR moderator chose a goal based on an objective function and took actions to support that goal. The third algorithm allowed the moderator to support goals re- lated to the pairwise social features of the interaction. The nal algorithm incorporated both the task goal moderation and the social feature algorithm to enable a moderator to address both social and task goals. Each of the four moderation algorithms described above was instantiated into a specic application domain, with domain-specic inter- action features and objective functions, and then evaluated in that domain. The basic moderation algorithm was evaluated in a four-person storytelling task with peer groups of adults, the task goal moderation was evaluated in peer groups of adults in a collabo- rative game, the social feature moderation algorithm was evaluated in intergenerational family groups playing the same collaborative game, and the combined task and social moderation algorithm was evaluated in family groups including children with autism, a parent, and a sibling. In addition to the moderation model,several secondary contributions enabled the primary work of this dissertation and contribute to our understanding of users in the application domains addressed by this work. The Stewart Platform Robot for Inter- active Tabletop Engagement (SPRITE), a small, friendly tabletop robot was designed to have six degree-of-freedom expressive movement and a socially expressive face. The robot was used in all of the validation studies and one of the domain understanding studies. The Co-Robot Dialogue System (CoRDial), an open-source dialogue system, was developed to enable socially assistive interactions. Including both robot control for the SPRITE robot and robot-independent code for enabling synchronized speech and behavior in socially assistive contexts, this robot control stack was released to the research community. Two pre-existing datasets were annotated for social features and analyzed as part of this work: a study of children with autism interacting with a so- cially assistive robot and a multi-cultural multi-modal multi-party interaction study. 169 Additionally, two novel studies were completed as part of this work and informed the development of robot behaviors for the validation studies: a long-term study of nutri- tion learning for rst-grade children was completed as part of this work and a study of multi-generational family interactions with a socially assistive robot. A number of choices were made in scoping the work presented in this dissertation; each of these choices provides a starting point for future development of robot modera- tors. One possible direction for future work is personalized moderation. The algorithms presented as part of this dissertation use closed-loop control and change the robot's be- havior depending on the dynamics of the specic group, but the specic behaviors used are the same across all participants. Especially as robots move into homes and long- term interactions, it may become necessary to personalize the robot's behavior to the preferences of the individual: one user might be more likely to take the robot's sugges- tion if asked nicely, while another might prefer the robot to be more rm. Incorporating online learning algorithms into the robot's behavior would enable this personalization over time, and could make the moderator more eective. Another area of future work is addressing the problem of mediation, or the case where the goals of the individual group members are in con ict. In this case, the robot would need to more explicitly model the desires of each group member and choose which goals to address in the interaction, either by guiding the group to consensus, enforcing compromises, or using other social features such as seniority to choose a user whose goals take precedence. The problem of when to moderate is not directly addressed in this work. Future research might directly model the dynamics of the interaction in order to enable the moderator to take appro- priate turns and make decisions about whether or not the moderator's intervention is needed. Such work might also need to incorporate personalization, not to the individ- ual, but to a group's preferences: some groups might wish to have the moderator closely monitor and intervene as soon as a goal is not being optimally met, while other might 170 prefer that the moderator monitor the interaction less closely and intervene only when the interaction is in a highly undesirable state. Finally, future work might address the case where the robot can take both task actions and social actions as both a participant and moderator of a group interaction. In this case, the moderator might be able to implicitly moderate the interaction by choosing task actions that would lead to desir- able inter-user interactions. Alternatively, the moderator might need to choose between taking actions that advance the task directly or engaging in a moderation behavior, trading o explicitly between task performance and moderation goals. One consistent area of ethical concern in socially assistive robotics is that the robot interaction partner will replace human-human contact and further isolate vulnerable users. This work demonstrates another path for socially assistive robotics: a robot supporting and enabling human-human interaction. This approach allows the richness of the interaction to come from the relationships between human users, and supports people in turning towards each other rather than away. Future robot moderators might unobtrusively smooth human-human interaction in a variety of contexts, combating social isolation and building human relationships. 171 Bibliography J. A. Adams. Multiple robot / single human interaction: eects on perceived workload. Behaviour & Information Technology, 28(2):183{198, Mar 2009. doi: 10.1080/01449290701288791. M. Adkins, M. Burgoon, and J. F. Nunamaker. Using group support systems for strate- gic planning with the United States Air Force. Decision Support Systems, 34:315{337, 2002. S. Al Moubayed, J. Edlund, and J. Gustafson. Analysis of gaze and speech patterns in three-party quiz game interaction. Interspeech, pages 1126{1130, 2013. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disor- ders (DSM-5R ). American Psychiatric Pub, 2013. E. Anagnostou, L. Zwaigenbaum, P. Szatmari, E. Fombonne, B. A. Fernandez, M. Woodbury-Smith, J. Brian, S. Bryson, I. M. Smith, I. Drmic, and Others. Autism spectrum disorder: advances in evidence-based practice. Canadian Medical Associa- tion Journal, 186(7):509{519, 2014. S. R. Anderson and R. G. Romanczyk. Early Intervention for Young Children with Autism: Continuum-Based Behavioral Models. Research and Practice for Persons with Severe Disabilities, 24(3):162{173, sep 1999. ISSN 02749483. doi: 10.2511/rpsd.24.3.162. P. M. Aoki, M. H. Szymanski, L. Plurkowski, J. D. Thornton, A. Woodru, and W. Yi. Where's the "Party" in "Multi-Party"? Analyzing the Structure of Small-Group Sociable Talk. In Proceedings of the 2006 Conference on Computer Supported Co- operative Work (CSCW '06), page 10, New York, New York, USA, nov 2006. ACM Press. ISBN 1-59593-249-6. doi: 10.1145/1180875.1180934. E. Avrunin, J. Hart, A. Douglas, and B. Scassellati. Eects related to synchrony and repertoire in perceptions of robot dance. In 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 93{100, Lausanne, Switzerland, 2011. 172 W. A. Bainbridge, J. Hart, E. S. Kim, and B. Scassellati. The eect of presence on human-robot interaction. In The 17th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2008), pages 701{706, New York, NY, USA, 2008. IEEE. W. A. Bainbridge, J. W. Hart, E. S. Kim, and B. Scassellati. The Benets of Interactions with Physically Present Robots over Video-Displayed Agents. International Journal of Social Robotics, 3(1):41{52, 2011. doi: 10.1007/s12369-010-0082-7. T. Baltrusaitis, P. Robinson, and L.-P. Morency. Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild. In 2013 IEEE International Conference on Computer Vision Workshops, pages 354{361. IEEE, dec 2013. ISBN 978-1-4799-3022-7. doi: 10.1109/ICCVW.2013.54. T. Baltrusaitis, P. Robinson, and L.-P. Morency. OpenFace: An open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Com- puter Vision (WACV), pages 1{10. IEEE, mar 2016. ISBN 978-1-5090-0641-0. doi: 10.1109/WACV.2016.7477553. M. J. Barnes, J. Y. C. Chen, F. Jentsch, and E. S. Redden. Designing Eective Soldier- Robot Teams in Complex Environments: Training, Interfaces, and Individual Dif- ferences. In International Conference on Engineering Psychology and Cognitive Er- gonomics, pages 484{493, Orlando, FL, USA, 2011. Springer, Berlin, Heidelberg. S. Baron-Cohen. Autism and symbolic play. British Journal of Developmen- tal Psychology, 5(2):139{148, jun 1987. ISSN 0261510X. doi: 10.1111/j.2044- 835X.1987.tb01049.x. S. Baron-Cohen, A. M. Leslie, and U. Frith. Does the autistic child have a "theory of mind" ? Cognition, 21(1):37{46, 1985. ISSN 00100277. doi: 10.1016/0010- 0277(85)90022-8. I. Baroni, M. Nalin, M. C. Zelati, E. Oleari, and A. Sanna. Designing motivational robot: How robots might motivate children to eat fruits and vegetables. In 23rd IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2014), Edinburgh, Scotland, 2014. ISBN 9781479967643. C. Bartneck, T. Nomura, T. Kanda, T. Suzuki, and K. Kennsuke. A cross-cultural study on attitudes towards robots. In Proc. HCI International Conference, Las Vegas, NV, jan 2005. R. Beard, T. McLain, M. Goodrich, and E. Anderson. Coordinated target assignment and intercept for unmanned air vehicles. IEEE Transactions on Robotics and Au- tomation, 18(6):911{922, dec 2002. doi: 10.1109/TRA.2002.805653. 173 T. W. Bickmore, R. A. Silliman, K. Nelson, D. M. Cheng, M. Winter, L. Henault, and M. K. Paasche-Orlow. A randomized controlled trial of an automated exercise coach for older adults. Journal of the American Geriatrics Society, 61(10):1676{1683, 2013. D. Bohus and E. Horvitz. Models for Multiparty Engagement in Open-World Dialog. Computational Linguistics, pages 225{234, sep 2009. D. Bohus and E. Horvitz. Facilitating multiparty dialog with gaze, gesture, and speech. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI '10), page 1, New York, New York, USA, nov 2010. ACM Press. ISBN 9781450304146. doi: 10.1145/1891903.1891910. D. Bohus and E. Horvitz. Multiparty turn taking in situated dialog: Study, lessons, and directions. Proceedings of the SIGDIAL 2011 Conference, (1974):98{109, jun 2011. C. L. Bottari, C. Dassa, C. M. Rainville, and E. Dutil. The IADL prole: development, content validity, intra- and interrater agreement. Canadian Journal of Occupational Therapy. Revue Canadienne d'Ergotherapie, 77(2):90{100, apr 2010. ISSN 00084174. doi: 10.2182/cjot.2010.77.2.5. C. Breazeal. Toward sociable robots. Robotics and Autonomous Systems, 42(3-4):167{ 175, 2003. ISSN 09218890. doi: 10.1016/S0921-8890(02)00373-1. L. N. Brown and A. M. Howard. The positive eects of verbal encouragement in mathematics education using a social robot. In 2014 IEEE Integrated STEM Ed- ucation Conference, pages 1{5. IEEE, mar 2014. ISBN 978-1-4799-3229-0. doi: 10.1109/ISECon.2014.6891009. M. Br une and U. Br une-Cohrs. Theory of mind{evolution, ontogeny, brain mechanisms and psychopathology. Neuroscience & Biobehavioral Reviews, 30(4):437{455, 2006. ISSN 01497634. doi: 10.1016/j.neubiorev.2005.08.001. CDC, Centers for Disease Control and Prevention. Prevalence of Autism Spectrum Dis- order Among Children Aged 8 Years Autism and Developmental Disabilities Moni- toring Network, 11 Sites, United States, 2010, mar 2014. S. Chaudhuri, H. Thompson, and G. Demiris. Fall detection devices and their use with older adults: a systematic review. Journal of Geriatric Physical Therapy (2001), 37 (4):178, 2014. S. S. A. Chen and V. Bernard-Opitz. Comparison of personal and computer-assisted instruction for children with autism. Mental Retardation, 31(6):368, 1993. 174 T. L. Chen, M. Ciocarlie, S. Cousins, P. M. Grice, K. Hawkins, C. C. Kemp, D. A. Lazewatsky, A. E. Leeper, A. Paepcke, C. Pantofaru, W. D. Smart, and L. Takayama. Robots for humanity: using assistive robotics to empower people with disabilities. IEEE Robotics & Automation Magazine, 20(1):30{39, mar 2013. ISSN 1070-9932. doi: 10.1109/MRA.2012.2229950. C. Clabaugh, G. Ragusa, F. Sha, and M. Mataric. Designing a socially assistive robot for personalized number concepts learning in preschool children. In 2015 Joint IEEE In- ternational Conference on Development and Learning and Epigenetic Robotics (ICDL- EpiRob), pages 314{319. IEEE, aug 2015. ISBN 978-1-4673-9320-1. doi: 10.1109/DE- VLRN.2015.7346164. A. S. Clare, M. L. Cummings, J. P. How, A. K. Whitten, and O. Toupet. Operator Object Function Guidance for a Real-Time Unmanned Vehicle Scheduling Algorithm. Journal of Aerospace Computing, Information, and Communication, 9(4):161{173, dec 2012. doi: 10.2514/1.I010019. R. Clavel. Delta: A Fast Robot with Parallel Geometry. In Proc. Int. Symposium on Industrial Robotics, pages 91{100, Lausanne, Switzerland, 1988. K. M. Colby. The rationale for computer-based treatment of language diculties in nonspeaking autistic children. Journal of Autism and Childhood Schizophrenia, 3(3): 254{260, jul 1973. ISSN 0021-9185. doi: 10.1007/BF01538283. E. Y. Cornwell and L. J. Waite. Social disconnectedness, perceived isolation, and health among older adults. Journal of Health and Social Behavior, 50(1):31{48, 2009. F. D. Davis. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. Management Information Systems Quarterly, 13(3):319, sep 1989. ISSN 02767783. doi: 10.2307/249008. G. Dawson, S. Rogers, J. Munson, M. Smith, J. Winter, J. Greenson, A. Donaldson, and J. Varley. Randomized, Controlled Trial of an Intervention for Toddlers With Autism: The Early Start Denver Model. Pediatrics, 125(1), 2010. I. de Kok and D. Heylen. Multimodal end-of-turn prediction in multi-party meet- ings. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI '09), page 91, New York, New York, USA, nov 2009. ACM Press. ISBN 9781605587721. doi: 10.1145/1647314.1647332. P. Dillenbourg. What do you mean by \collaborative learning" ? Collaborative Learning: Cognitive and Computational Approaches, 1(6):1{15, 1999. ISSN 08895406. doi: 10.1.1.167.4896. 175 J. Edlund, S. Alexandersson, J. Beskow, L. Gustavsson, M. Heldner, A. Hjalmarsson, P. Kallionen, and E. Marklund. 3rd Party Observer Gaze As a Continuous Measure of Dialogue Flow. In International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, jan 2012. ISBN 978-2-9517408-7-7. J. Edlund, M. Heldner, and M. W lodarczak. Catching wind of multiparty conversation. In Multimodal Corpora: Combining Applied and Basic Research Targets (MMC 2014), Reykjavik, Iceland, 2014. P. Ekman and W. V. Friesen. Manual for the Facial Action Coding System. Consulting Psychologists Press, 1978. J. Fasola and M. Matari c. A Socially Assistive Robot Exercise Coach for the Elderly. Journal of Human-Robot Interaction, 2(2):3{32, jan 2013. ISSN 2163-0364. doi: 10.5898/jhri.v2i2.32. J. Fasola and M. J. Matari c. Using socially assistive human-robot interaction to motivate physical exercise for older adults. Proceedings of the IEEE, Special Issue on Quality of Life Technology, 100(8):2512{2526, 2012. doi: 10.1109/JPROC.2012.2200539. D. Feil-Seifer and M. Matari c. Automated detection and classication of positive vs. negative robot interactions with children with autism using distance-based features. In Proceedings of the 6th international conference on Human-robot interaction - HRI '11, page 323, New York, New York, USA, mar 2011. ACM Press. ISBN 9781450305617. doi: 10.1145/1957656.1957785. D. Feil-Seifer and M. Matari c. Distance-Based Computational Models for Facilitating Robot Interaction with Children. Journal of Human-Robot Interaction, 1(1):55{77, jul 2012. ISSN 21630364. doi: 10.5898/JHRI.1.1.Feil-Seifer. D. J. Feil-Seifer and M. J. Matari c. Dening Socially Assistive Robotics. In Proceedings of the IEEE 9th International Conference on Rehabilitation Robotics, volume 2005, pages 465{468, Chicago, IL, jun 2005. ISBN 0780390032. doi: 10.1109/ICORR.2005.1501143. T. Fong, I. Nourbakhsh, and K. Dautenhahn. A Survey of Socially Interactive Robots. Robotics and Autonomous Systems, 42(3-4):143{166, 2003. J. Forlizzi and C. DiSalvo. Service robots in the domestic environment: A study of the Roomba vacuum in the home. In Proceeding of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pages 258{265, Salt Lake City, Utah, USA, mar 2006. ACM Press, New York, NY, USA. 176 M. Foster, A. Gaschler, and M. Giuliani. Two people walk into a bar: Dynamic multi- party social interaction with a robot agent. In Multimodal Interaction, pages 3{ 10, New York, New York, USA, oct 2012. ACM Press. ISBN 9781450314671. doi: 10.1145/2388676.2388680. R. M. Foxx. Applied Behavior Analysis Treatment of Autism: The State of the Art. Child and Adolescent Psychiatric Clinics of North America, 17(4):821{834, 2008. ISSN 10564993. doi: 10.1016/j.chc.2008.06.007. D. S. Freedman, W. H. Dietz, S. R. Srinivasan, and G. S. Berenson. The relation of overweight to cardiovascular risk factors among children and adolescents: the Bo- galusa Heart Study. Pediatrics, 103(6 Pt 1):1175{1182, jun 1999. ISSN 0031-4005. doi: 10.1542/peds.103.6.1175. H. Furukawa, M. Nishida, K. Jokinen, and S. Yamamoto. A multimodal cor- pus for modeling turn management in multi-party conversations. In 2011 Inter- national Conference on Speech Database and Assessments, Oriental COCOSDA 2011 - Proceedings, pages 142{146. IEEE, oct 2011. ISBN 9781457709319. doi: 10.1109/ICSDA.2011.6085996. D. Gatica-Perez. Analyzing group interactions in conversations: A review. In IEEE In- ternational Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 41{46. IEEE, sep 2006. ISBN 1424405661. doi: 10.1109/MFI.2006.265658. D. H. Geschwind, J. Sowinski, C. Lord, P. Iversen, J. Shestack, P. Jones, L. Ducat, and S. J. Spence. The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. American Journal of Human Genetics, 69 (2):463{466, aug 2001. ISSN 00029297. doi: 10.1086/321292. M. Ghosh and F. Tanaka. The impact of dierent competence levels of care- receiving robot on children. In IEEE International Conference on Intelligent Robots and Systems, pages 2409{2415. IEEE, sep 2011. ISBN 9781612844541. doi: 10.1109/IROS.2011.6048743. E. Gilmartin, F. Bonin, C. Vogel, and N. Campbell. Laugher and Topic Transition in Multiparty Conversation. In Proceedings of the SIGDIAL 2013 Conference, pages 304{308, Metz, France, 2013. M. Gombolay, A. Bair, C. Huang, and J. Shah. Computational Design of Mixed- Initiative Human-Robot Teaming that Considers Human Factors: Situational Aware- ness, Workload, and Work ow Preferences. International Journal of Robotics Re- search, 2017. 177 M. C. Gombolay, R. A. Gutierrez, S. G. Clarke, G. F. Sturla, and J. A. Shah. Decision- making authority, team eciency and human worker satisfaction in mixed human- robot teams. Autonomous Robots, 39(3):293{312, oct 2015. doi: 10.1007/s10514-015- 9457-9. M. A. Goodrich, B. S. Morse, C. Engh, J. L. Cooper, and J. A. Adams. Towards using Unmanned Aerial Vehicles (UAVs) in Wilderness Search and Rescue: Lessons from eld trials. Interaction Studies, 10(3):453{478, dec 2009. doi: 10.1075/is.10.3.08goo. T. L. Hayes, J. M. Hunt, A. Adami, and J. A. Kaye. An electronic pillbox for continuous monitoring of medication adherence. In Engineering in Medicine and Biology Society, 2006. EMBS'06. 28th Annual International Conference of the IEEE, pages 6400{6403. IEEE, 2006. T. L. Hayes, K. Cobbinah, T. Dishongh, J. A. Kaye, J. Kimel, M. Labhard, T. Leen, J. Lundell, U. Ozertem, M. Pavel, and Others. A study of medication-taking and unobtrusive, intelligent reminding. Telemedicine and e-Health, 15(8):770{776, 2009. M. Heerink, B. Kr ose, V. Evers, and B. Wielinga. Assessing Acceptance of Assistive Social Agent Technology by Older Adults: the Almere Model. International Journal of Social Robotics, 2(4):361{375, sep 2010. ISSN 1875-4791. doi: 10.1007/s12369-010- 0068-5. Hello-robo. Maki: 3D Printable Humanoid Robot. URL http://www.hello-robo.com/maki. L. V. Herlant, R. M. Holladay, and S. S. Srinivasa. Assistive teleoperation of robot arms via automatic time-optimal mode switching. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 35{42. IEEE, mar 2016. ISBN 978-1-4673-8370-7. doi: 10.1109/HRI.2016.7451731. D. Herrera, D. Novick, D. Jan, and D. Traum. The UTEP-ICT cross-cultural multiparty multimodal dialog corpus. In Multimodal Corpora Workshop: Advances in Capturing, Coding and Analyzing Multimodality (MMC 2010), Valletta, Malta, 2010. K. Hieftje, E. J. Edelman, D. R. Camenga, and L. E. Fiellin. Electronic media- based health interventions promoting behavior change in youth: a systematic review. JAMA Pediatrics, 167(6):574{580, jun 2013. ISSN 2168-6211. doi: 10.1001/jamape- diatrics.2013.1095. G. W. Hill. Group versus individual performance: Are N + 1 heads better than one? Psychological Bulletin, 91(3):517{539, 1982. M. D. Hingle, L. Macias-Navarro, A. Rezaimalek, and S. B. Going. The use of technology to promote nutrition and physical activity behavior change in youth: A review. The Research Dietetic Practice Group Digest, pages 1{10, 2013. 178 G. Homan. Dumb Robots, Smart Phones: a Case Study of Music Listening Compan- ionship. In The 21st IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2012), pages 358{363. IEEE, sep 2012. ISBN 978-1-4673- 4606-1. doi: 10.1109/ROMAN.2012.6343779. R. A. Jablonski, D. Reed, and M. L. Maas. Care Intervention for Older Adults with Alzheimer's Disease and Related Dementias: Eect of Family Involvement on Cogni- tive and Functional Outcomes in Nursing Homes. Journal of Gerontological Nursing, 31(6):38{48, jun 2005. ISSN 0098-9134. doi: 10.3928/0098-9134-20050601-10. C. Jarrold. A Review of Research into Pretend Play in Autism. Autism, 7(4):379{390, dec 2003. ISSN 13623613. doi: 10.1177/1362361303007004004. C. Jarrold, J. Boucher, and P. K. Smith. Generativity decits in pretend play in autism. British Journal of Developmental Psychology, 14(3):275{300, sep 1996. ISSN 0261510X. doi: 10.1111/j.2044-835X.1996.tb00706.x. D. B. Jayagopi and J. M. Odobez. Given that, should I respond? Contextual addressee estimation in multi-party human-robot interactions. In 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2013), pages 147{148, Tokyo, Japan, mar 2013. IEEE. ISBN 9781467330558. doi: 10.1109/HRI.2013.6483544. J. F. Jent, L. N. Niec, and S. E. Baker. Play and interpersonal processes. In S. W. Russ and L. N. Niec, editors, Play in Clinical Practice: Evidence-Based Approaches., pages 23{47. Guilford Press, New York, NY, 2011. ISBN 978-1-60918-046-1. Jibo. Jibo - The Worlds First Social Robot, 2017. URL https://www.jibo.com/. P. W. Jordan. An Introduction to Usability. CRC Press, 1998. M. F. Jung, N. Martelaro, and P. J. Hinds. Using Robots to Moderate Team Con- ict: The Case of Repairing Violations. In ACM/IEEE Conference on Human-Robot Interaction (HRI '15), pages 229{236, Portland, Oregon, USA, 2015. Y. Jung and K. M. Lee. Eects of physical embodiment on social presence of social robots. In Proceedings of Presence, 2004, pages 80{87, Valencia, Spain, 2004. ISBN null. T. Kanda, T. Hirano, D. Eaton, and H. Ishiguro. Interactive Robots as Social Partners and Peer Tutors for Children: A Field Trial. Human-Computer Interaction, 19(1): 61{84, jun 2004. ISSN 0737-0024. T. Kanda, M. Shimada, and S. Koizumi. Children learning with a social robot. In Proceedings of the 7th Annual ACM/IEEE International Conference on Human-Robot Interaction - HRI '12, page 351, New York, New York, USA, mar 2012. ACM Press. ISBN 9781450310635. doi: 10.1145/2157689.2157809. 179 A. Kapusta, W. Yu, T. Bhattacharjee, C. K. Liu, G. Turk, and C. C. Kemp. Data-Driven Haptic Perception for Robot-Assisted Dressing. In Proceedings of the IEEE Inter- national Symposium on Robot and Human Interactive Communication (RO-MAN), 2016. M. Katzenmaier and R. Stiefelhagen. Identifying the Addressee in Human-Human- Robot Interactions based on Head Pose and Speech. In Human Factors, number June, pages 144{151, New York, New York, USA, oct 2004. ACM Press. ISBN 1581139543. doi: 10.1145/1027933.1027959. C. Kidd and C. Breazeal. Eect of a robot on user perceptions. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), volume 4, pages 3559{3564. IEEE, 2004. ISBN 0-7803-8463-6. doi: 10.1109/IROS.2004.1389967. C. D. Kidd and C. Breazeal. Robots at home: Understanding long-term human- robot interaction. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pages 3230{3235, 2008. ISBN 9781424420582. doi: 10.1109/IROS.2008.4651113. J. Kim, K. P. Truong, V. Charisi, C. Zaga, M. Lohse, D. Heylen, and V. Ev- ers. Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study. In 16th Annual Conference of the International Speech Communication Association, Interspeech 2015, pages 1645{1649, Baixas, France, sep 2015. International Speech Communication Association (ISCA). URL http://doc.utwente.nl/99576/. T. Kim, A. Chang, L. Holland, and A. S. Pentland. Meeting mediator. In Proceeding of the 26th CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI 2008), page 3183, New York, New York, USA, apr 2008. ACM Press. ISBN 978160558012X. doi: 10.1145/1358628.1358828. J. Kitzerow, K. Teufel, C. Wilker, and C. M. Freitag. Using the brief observation of social communication change (BOSCC) to measure autism-specic development. Autism Research, 2015. D. Klotz, J. Wienke, J. Peltason, B. Wrede, S. Wrede, V. Khalidov, and J.-M. Odobez. Engagement-based Multi-party Dialog with a Humanoid Robot. Proceedings of the SIGDIAL 2011 Conference, pages 341{343, jun 2011. H. Kozima, C. Nakagawa, and Y. Yasuda. Children-robot interaction: a pilot study in autism therapy. Progress in Brain Research, 164:385{400, jan 2007. ISSN 00796123. doi: 10.1016/S0079-6123(07)64021-7. 180 K. Kreijns, P. a. Kirschner, and W. Jochems. Identifying the pitfalls for social interaction in computer-supported collaborative learning environments: A review of the research. Computers in Human Behavior, 19(3):335{353, may 2003. ISSN 07475632. doi: 10.1016/S0747-5632(02)00057-2. P. K. Kuhl, F.-M. Tsao, and H.-M. Liu. Foreign-language experience in infancy: eects of short-term exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences of the United States of America, 100(15):9096{9101, jul 2003. ISSN 0027-8424. doi: 10.1073/pnas.1532872100. J. F. Lehman. Robo fashion world. In Proceedings of the 2014 Workshop on Under- standing and Modeling Multiparty, Multimodal Interactions - UM3I '14, pages 15{ 20, New York, New York, USA, nov 2014. ACM Press. ISBN 9781450306522. doi: 10.1145/2666242.2666248. I. Leite, C. Martinho, A. Pereira, and A. Paiva. iCat: an aective game buddy based on anticipatory mechanisms. In Proceedings of the 7th International Joint Confer- ence on Autonomous Agents and Multiagent Systems, pages 1229{1232. International Foundation for Autonomous Agents and Multiagent Systems, may 2008. ISBN 978- 0-9817381-2-3. I. Leite, R. Henriques, C. Martinho, and A. Paiva. Sensors in the wild: Exploring elec- trodermal activity in child-robot interaction. In ACM/IEEE International Conference on Human-Robot Interaction, pages 41{48. IEEE, mar 2013. ISBN 9781467330558. doi: 10.1109/HRI.2013.6483500. I. Leite, M. McCoy, M. Lohani, D. Ullman, N. Salomons, C. Stokes, S. Rivers, and B. Scassellati. Emotional Storytelling in the Classroom. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction - HRI '15, pages 75{82, New York, New York, USA, mar 2015a. ACM Press. ISBN 9781450328838. doi: 10.1145/2696454.2696481. I. Leite, M. McCoy, D. Ullman, N. Salomons, and B. Scassellati. Comparing Mod- els of Disengagement in Individual and Group Interactions. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction - HRI '15, pages 99{105, New York, New York, USA, mar 2015b. ACM Press. ISBN 9781450328838. doi: 10.1145/2696454.2696466. D. T. Levin, J. A. Adams, M. M. Saylor, and G. Biswas. A transition model for cognitions about agency. In 8th ACM/IEEE International Conference on Human- Robot Interaction (HRI), pages 373{380, Tokyo, mar 2013. IEEE. ISBN 978-1-4673- 3101-2. doi: 10.1109/HRI.2013.6483612. 181 M. Lewis, H. Wang, Shih-Yi Chien, P. Scerri, P. Velagapudi, K. Sycara, and B. Kane. Teams organization and performance in multi-human/multi-robot teams. In IEEE International Conference on Systems, Man and Cybernetics, pages 1617{1623. IEEE, oct 2010. ISBN 978-1-4244-6586-6. doi: 10.1109/ICSMC.2010.5642379. D. Leyzberg, S. Spaulding, M. Toneva, and B. Scassellati. The Physical Presence of a Robot Tutor Increases Cognitive Learning Gains. In 34th Annual Conference of the Cognitive Science Society, number 1, pages 1882{1887, Sapporo, Japan, 2012. M. Lombard, T. B. Ditton, D. Crane, B. Davis, G. Gil-Egui, K. Horvath, and J. Ross- man. Measuring presence: A literature-based approach to the development of a stan- dardized paper-and-pencil instrument. In Presence 2000: The Third International Workshop on Presence, page 13, Delft, The Netherlands, 2000. A. V. Lopez, Q. Booker, N. S. Shkarayeva, R. O. Briggs, and J. F. Nunamaker. Embed- ding Facilitation in Group Support Systems to Manage Distributed Group Behavior. In Proceedings of the 35th Annual Hawaii International Conference on System Sci- ences, pages 42{50, Big Island, Hawaii, 2002. Y. Matsuyama, H. Taniyama, S. Fujie, and T. Kobayashi. Framework of Communication Activation Robot Participating in Multiparty Conversation. In Dialog with Robots: Papers from the AAAI Fall Symposium, pages 68{73, 2010. ISBN 9781577354871. D. McColl and G. Nejat. Meal-time with a socially assistive robot and older adults at a long-term care facility. Journal of Human-Robot Interaction, 2(1):152{171, 2013. I. Mccowan, G. Lathoud, M. Lincoln, A. Lisowska, W. Post, D. Reidsma, and P. Wellner. The AMI Meeting Corpus. In Proceedings Measuring Behavior, 5th International Conference on Methods and Techniques in Behavioral Research, Wageningen, The Netherlands, 2005. M. McGregor and J. C. Tang. More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings. In 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing, pages 2208{2220, Portland, Oregon, 2017. doi: 10.1145/2998181.2998335. M. H. Moattar and M. M. Homayounpour. A Simple But Ecient Real-Time Voice Activity Detection Algorithm. In 17th European Signal Processing Conference (EU- SIPCO), number Eusipco, pages 2549{2553, Glasgow, U.K., 2009. ISBN 978-1-4419- 1753-9. doi: 10.1007/978-1-4419-1754-6. D. Moore, P. McGrath, and J. Thorpe. Computer-Aided Learning for People with Autism a Framework for Research and Development. Innovations in Educa- tion & Training International, 37(3):218{228, jan 2000. ISSN 1355-8005. doi: 10.1080/13558000050138452. 182 L.-p. Morency. Co-occurrence Graphs: Contextual Representation for Head Gesture Recognition during Multi-Party Interactions. In Proceedings of the Workshop on Use of Context in Vision Processing - UCVP '09, pages 1{6, New York, New York, USA, nov 2009. ACM Press. ISBN 9781605586915. doi: 10.1145/1722156.1722160. J. R. Movellan, M. Eckhardt, M. Virnes, and A. Rodriguez. Sociable robot improves toddler vocabulary skills. In 4th ACM/IEEE International Conference on Human- Robot Interaction (HRI), pages 307{308, NY, NY, USA, mar 2009. ACM Press. ISBN 2167-2121. doi: 10.1145/1514095.1514189. K. Muldner, V. Girotto, C. Lozano, W. Burleson, and E. Walker. The Impact of a Social Robot's Attributions for Success and Failure in a Teachable Agent Framework. In Proceedings of the International Conference of the Learning Sciences, pages 278{285, 2014. R. Murphy. HumanRobot Interaction in Rescue Robotics. IEEE Transactions on Sys- tems, Man and Cybernetics, Part C (Applications and Reviews), 34(2):138{153, may 2004. doi: 10.1109/TSMCC.2004.826267. B. Mutlu, T. Shiwa, T. Kanda, H. Ishiguro, and N. Hagita. Footing in human-robot conversations. In Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction (HRI 2009), page 61, New York, New York, USA, mar 2009. ACM Press. ISBN 9781605584041. doi: 10.1145/1514095.1514109. E. D. Mynatt, I. Essa, and W. Rogers. Increasing the opportunities for aging in place. In Proceedings of the 2000 Conference on Universal Usability (CUU '00), pages 65{71, Arlington, Virginia, USA, 2000. ACM Press. ISBN 1581133146. doi: 10.1145/355460.355475. F. Niederman, C. M. Beise, and P. M. Beranek. Facilitation issues in distributed group support systems. In Proceedings of the 1993 Conference on Computer Personnel Research, pages 299{312, St. Louis, Missouri, USA, 1993. ISBN 0-89791-572-0. doi: 10.1145/158011.158239. T. Nomura, T. Suzuki, T. Kanda, and K. Kato. Measurement of negative attitudes toward robots. Interaction Studies, 7(3):437{454, 2006. ISSN 1572-0373. doi: 10.1075/is.7.3.14nom. D. G. Novick. Models of Gaze in Multi-party Discourse. In Proceedings of Computer Human Interface (CHI) Workshop on the Virtuality Continuum Revisted, Portland, OR, 2005. C. Oertel, F. Cummins, J. Edlund, P. Wagner, and N. Campbell. D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7 (1-2):19{28, sep 2013. ISSN 17837677. doi: 10.1007/s12193-012-0108-6. 183 C. L. Ogden. Prevalence of Obesity and Trends in Body Mass Index Among US Children and Adolescents, 1999-2010. JAMA: The Journal of the American Medical Associa- tion, 307(5):483, feb 2012. ISSN 0098-7484. doi: 10.1001/jama.2012.40. J. Ormrod. Educational Psychology: Developing Learners. Merrill, Upper Saddle River, NJ, 5th editio edition, 2006. K. Otsuka, Y. Takemae, and J. Yamato. A probabilistic inference of multiparty- conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proceedings of the 7th International Conference on Multimodal Interfaces, pages 191{198, New York, New York, USA, oct 2005. ACM Press. ISBN 1-59593-028-0. doi: 10.1145/1088463.1088497. H. H. Pai, D. a. Sears, and Y. Maeda. Eects of Small-Group Learning on Trans- fer: a Meta-Analysis. Educational Psychology Review, 27(1):79{102, feb 2014. ISSN 1040726X. doi: 10.1007/s10648-014-9260-8. R. E. Parker. Small-Group Cooperative Learning{Improving Academic, Social Gains in the Classroom. NASSP Bulletin, 69(479):48{57, 1984. ISSN 0192-6365. doi: 10.1177/019263658506947908. E. J. Pedhazur and L. P. Schmelkin. Measurement, Design, and Analysis: An Integrated Approach. Lawrence Erlbaum Associates, 1991. ISBN 0805810633. C. C. Peterson and M. Siegal. Insights into Theory of Mind from Deafness and Autism. Mind and Language, 15(1):123{145, mar 2000. ISSN 0268-1064. doi: 10.1111/1468- 0017.00126. A. Pickering. The mangle of practice: Agency and emergence in the sociology of science. American Journal of Sociology, 99(3):559{589, 1993. K. Pierce and L. Schreibman. Increasing complex social behaviors in children with autism: eects of peer-implemented pivotal response training. Journal of Applied Behavior Analysis, 28(3):285{295, 1995. ISSN 0021-8855. doi: 10.1901/jaba.1995.28- 285. G. Pioggia, M. L. Sica, M. Ferro, R. Igliozzi, F. Muratori, A. Ahluwalia, and D. De Rossi. Human-Robot Interaction in Autism: FACE, an Android-based Social Therapy. In RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication, pages 605{612. IEEE, 2007. ISBN 978-1-4244-1634-9. doi: 10.1109/ROMAN.2007.4415156. M. E. Pollack. Intelligent technology for an aging population: The use of AI to assist elders with cognitive impairment. AI Magazine, 26(2):9, 2005. 184 R. H. Poresky, C. Hendrix, J. E. Hosier, and M. L. Samuelson. the Companion Animal Bonding Scale- Internal Reliability and Construct Validity. Psychological Reports, 60 (3):743{746, jun 1987. ISSN 0033-2941. doi: 10.2466/pr0.1987.60.3.743. B. M. Prizant, A. M. Wetherby, E. Rubin, and A. C. Laurent. The SCERTS Model A Transactional, Family-Centered Approach to Enhancing Communication and Socioe- motional Abilities of Children With Autism Spectrum Disorder. Infants and Young Children, 2003. ISSN 0896-3746. doi: 10.1097/00001163-200310000-00004. S. P. Putnam and M. K. Rothbart. Development of Short and Very Short Forms of the Children's Behavior Questionnaire. Journal of Personality Assessment, 87(1): 102{112, 2006. M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng. ROS: an open-source Robot Operating System. In International Confer- ence on Robotics and Automation, 2009. R. Rajan, C. Chen, and T. Selker. Considerate Audio MEdiating Oracle (CAMEO). In Proceedings of the Conference on Designing Interactive Systems (DIS 2012), page 86, New York, New York, USA, jun 2012a. ACM Press. ISBN 9781450312103. doi: 10.1145/2317956.2317972. R. Rajan, C. Chen, and T. Selker. Considerate supervisor: an audio-only facilitator for multiparty conference calls. In Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA '12), page 2609, New York, New York, USA, may 2012b. ACM Press. ISBN 9781450310161. doi: 10.1145/2212776.2223844. B. Rammstedt and O. P. John. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality, 41(1):203{212, feb 2007. ISSN 00926566. doi: 10.1016/j.jrp.2006.02.001. M. K. Rothbart, S. a. Ahadi, K. L. Hershey, and P. Fisher. Investigations of tempera- ment at three to seven years: the Children's Behavior Questionnaire. Child Develop- ment, 72(5):1394{1408, 2001. ISSN 0009-3920. doi: 10.1111/1467-8624.00355. N. Roy, A. Misra, and D. Cook. Ambient and smartphone sensor assisted ADL recog- nition in multi-inhabitant smart environments. Journal of Ambient Intelligence and Humanized Computing, 7(1):1{19, 2016. T. Salter, N. Davey, and F. Michaud. Designing and developing QueBall, a robotic device for autism therapy. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication, pages 574{579. IEEE, aug 2014. ISBN 978-1- 4799-6765-0. doi: 10.1109/ROMAN.2014.6926314. 185 P. Sanchez. Book Review: Funds of Knowledge: Theorizing Practices in Households, Communities, and Classrooms, volume 6. Erlbaum, Mahwah, NJ, 2006. ISBN 14687984n. doi: 10.1177/1468798406066447. J. Sanghvi, G. Castellano, I. Leite, A. Pereira, P. W. McOwan, and A. Paiva. Auto- matic analysis of aective postures and body motion to detect engagement with a game companion. In Proceedings of the 6th international conference on Human-robot interaction - HRI '11, pages 305{311, New York, New York, USA, mar 2011. ACM Press. ISBN 9781450305617. doi: 10.1145/1957656.1957781. F. J. Sansosti and K. A. Powell-Smith. Using Computer-Presented Social Stories and Video Models to Increase the Social Communication Skills of Children With High- Functioning Autism Spectrum Disorders. Journal of Positive Behavior Interventions, 10(3):162{178, jul 2008. ISSN 1098-3007. doi: 10.1177/1098300708316259. B. Scassellati, Henny Admoni, and M. Matari c. Robots for Use in Autism Research. Annual Review of Biomedical Engineering, 14(1):275{294, aug 2012. ISSN 1523-9829. doi: 10.1146/annurev-bioeng-071811-150036. G. Schiavo, A. Cappelletti, E. Mencarini, O. Stock, and M. Zancanaro. Overt or subtle? Supporting group conversations with automatically targeted directives. In Proceedings of the 19th International Conference on Intelligent User Interfaces (IUI '14), pages 225{234, New York, New York, USA, feb 2014. ACM Press. ISBN 9781450321846. doi: 10.1145/2557500.2557507. J. Scholtz. Theory and evaluation of human robot interactions. In Proceedings of the 36th Annual International Conference on System Sciences, pages 10|-pp. IEEE, 2003. C. Schroeter, S. Mueller, M. Volkhardt, E. Einhorn, C. Huijnen, H. van den Heuvel, A. van Berlo, A. Bley, and H.-M. Gross. Realization and user evaluation of a com- panion robot for people with mild cognitive impairments. In IEEE International Conference on Robotics and Automation (ICRA), pages 1153{1159. IEEE, 2013. T. Schultze, A. Mojzisch, and S. Schulz-Hardt. Why groups perform better than individ- uals at quantitative judgment tasks: Group-to-individual transfer as an alternative to dierential weighting. Organizational Behavior and Human Decision Processes, 118 (1):24{36, may 2012. ISSN 07495978. doi: 10.1016/j.obhdp.2011.12.006. T. E. Seeman. Health promoting eects of friends and family on health outcomes in older adults. American Journal of Health Promotion : AJHP, 14(6):362{370, 1970. ISSN 0890-1171. doi: http://dx.doi.org/10.4278/0890-1171-14.6.362. A. Setapen. Creating Robotic Characters for Long-Term Interaction. Master of science, Massachusetts Institute of Technology, 2012. 186 S. Shahid, E. Krahmer, and M. Swerts. Childrobot interaction across cultures: How does playing a game with a social robot compare to playing a game alone or with a friend? Computers in Human Behavior, 40:86{100, nov 2014. ISSN 07475632. doi: 10.1016/j.chb.2014.07.043. A. Shankar, A. McMunn, J. Banks, and A. Steptoe. Loneliness, social isolation, and behavioral and biological health indicators in older adults. Health Psychology, 30(4): 377, 2011. E. Short, J. Hart, M. Vu, and B. Scassellati. No fair!! An interaction with a cheating robot. In Proceeding of the 5th ACM/IEEE International Conference on Human- Robot Interaction (HRI '10), page 219, New York, New York, USA, 2010. ACM Press. ISBN 9781424448937. doi: 10.1145/1734454.1734546. E. Short, K. Swift-spong, J. Greczek, A. Ramachandran, A. Litoiu, E. C. Grigore, D. Feil-seifer, S. Shuster, J. J. Lee, S. Huang, S. Levonisova, S. Litz, J. Li, G. Ragusa, D. Spruijt-Metz, and M. Matari c. How to Train Your DragonBot: Socially Assis- tive Robots for Teaching Children About Nutrition Through Play. In 23rd IEEE Symposium on Robot and Human Interactive Communication (RO-MAN '14), pages 924{929. IEEE, aug 2014. ISBN 9781479967643. doi: 10.1109/ROMAN.2014.6926371. E. Short, K. Sittig-Boyd, and M. J. Matari c. Modeling Moderation for Multi-Party Socially Assistive Robotics. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016), New York, NY, 2016. IEEE. M. Sigman, E. Ruskin, S. Arbelle, R. Corona, C. Dissanayake, M. Espinosa, N. Kim, A. L opez, C. Zierhut, C. B. Mervis, and Others. Continuity and change in the so- cial competence of children with autism, Down syndrome, and developmental delays. Monographs of the Society for Research in Child Development, pages i{139, 1999. J. Silver. Facilitorials. In Proceedings of the 6th International Conference on Interaction Design and Children (IDC '07), page 179, New York, New York, USA, jun 2007. ACM Press. ISBN 9781595937476. doi: 10.1145/1297277.1297322. J. L. Singer and M. A. Lythcott. Fostering school achievement and creativity through sociodramatic play in the classroom. In E. F. Zigler, D. G. Singer, and S. J. Bishop- Joseph, editors, Children's Play: The Roots of Reading, page 7793. Zero to Three Press, Washington, DC, 2004. ISBN 1085-5300, 1085-5300. A. S. Singh, C. Mulder, J. W. R. Twisk, W. Van Mechelen, and M. J. M. Chinapaw. Tracking of childhood overweight into adulthood: A systematic review of the litera- ture. Obesity Reviews, 9(5):474{488, sep 2008. ISSN 14677881. doi: 10.1111/j.1467- 789X.2008.00475.x. 187 D. Spruijt-Metz. Etiology, treatment, and prevention of obesity in childhood and ado- lescence: A decade in review. Journal of Research on Adolescence, 21(1):129{152, mar 2011. ISSN 10508392. doi: 10.1111/j.1532-7795.2010.00719.x. S. S. Srinivasa, D. Ferguson, C. J. Helfrich, D. Berenson, A. Collet, R. Diankov, G. Gal- lagher, G. Hollinger, J. Kuner, and M. V. Weghe. HERB: a home exploring robotic butler. Autonomous Robots, 28(1):5{20, 2010. V. Srinivasan and R. Murphy. A survey of social gaze. In 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 253{254, Lausanne, 2011. D. Stewart. A platform with six degrees of freedom. Proceedings of the Institution of Mechanical Engineers, 180(1965):371{386, jun 1965. E. E. Stone and M. Skubic. Fall detection in homes of older adults using the Microsoft Kinect. IEEE Journal of Biomedical and Health Informatics, 19(1):290{301, 2015. J. Swettenham. Can Children with Autism be Taught to Understand False Belief Using Computers? Journal of Child Psychology and Psychiatry, 37(2):157{165, feb 1996. ISSN 0021-9630. doi: 10.1111/j.1469-7610.1996.tb01387.x. D. S. Syrdal, T. Nomura, H. Hirai, and K. Dautenhahn. Examining the Frankenstein Syndrome. In ICSR'11 Proc. of the Third Int. Conf. on Social Robotics, volume 7072 of Lecture Notes in Computer Science, pages 142{152, Amsterdam, Netherlands, nov 2011. ISBN 978-3-642-25503-8. doi: 10.1007/978-3-642-25504-5. P. Szatmari, L. Archer, S. Fisman, D. L. Streiner, and F. Wilson. Asperger's syndrome and autism: Dierences in behavior, cognition, and adaptive functioning. Journal of the American Academy of Child & Adolescent Psychiatry, 34(12):1662{1671, 1995. T. Takasaki and Y. Mori. A webcam platform for facilitating intercultural group activi- ties. In Proceeding of the 2009 International Workshop on Intercultural Collaboration (IWIC '09), page 129, New York, New York, USA, feb 2009. ACM Press. ISBN 9781605585024. doi: 10.1145/1499224.1499245. L. Takayama. Making sense of agentic objects and teleoperation: In-the-moment and re ective perspectives. In 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 239{240, La Jolla, California, 2009. F. Tanaka and S. Matsuzoe. Children Teach a Care-Receiving Robot to Promote Their Learning: Field Experiments in a Classroom for Vocabulary Learning. Jour- nal of Human-Robot Interaction, 1(1):78{95, jan 2012. ISSN 21630364. doi: 10.5898/JHRI.1.1.Tanaka. 188 J. W. Tanaka, J. M. Wolf, C. Klaiman, K. Koenig, J. Cockburn, L. Herlihy, C. Brown, S. Stahl, M. D. Kaiser, and R. T. Schultz. Using computerized games to teach face recognition skills to children with autism spectrum disorder: the Let's Face It! program. Journal of Child Psychology and Psychiatry, 51(8):944{952, mar 2010. ISSN 00219630. doi: 10.1111/j.1469-7610.2010.02258.x. A. Tapus, C. Tapus, and M. J. Matari c. The use of socially assistive robots in the design of intelligent cognitive therapies for people with dementia. In International Conference on Rehabilitation Robotics, pages 924{929, Kyoto, Japan, 2009. IEEE. doi: 10.1109/ICORR.2009.5209501. A. Tapus, A. Peca, A. Aly, C. Pop, L. Jisa, S. Pintea, A. S. Rusu, and D. O. David. Children with autism social engagement in interaction with Nao, an imitative robot: A series of single case experiments. Interaction Studies, 13(3):315{347, 2012. E. B. Tate, D. Spruijt-Metz, G. O'Reilly, M. Jordan-Marsh, M. Gotsis, M. A. Pentz, and G. F. Dunton. mHealth approaches to child obesity prevention: Successes, unique challenges, and next directions. Translational Behavioral Medicine, 3(4):406{415, jul 2013. ISSN 18696716. doi: 10.1007/s13142-013-0222-3. A. C. Thomason and K. M. La Paro. Measuring the Quality of TeacherChild Interactions in Toddler Child Care. Early Education & Development, 20(2):285{304, apr 2009. ISSN 1040-9289. doi: 10.1080/10409280902773351. K. R. Th orisson, O. Gislason, G. R. Jonsdottir, and H. T. Thorisson. A multiparty mul- timodal architecture for realtime turntaking. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, and A. Safonova, editors, Lecture Notes in Computer Science (includ- ing subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinfor- matics), volume 6356 LNAI of Lecture Notes in Computer Science, pages 350{356. Springer Berlin Heidelberg, 2010. ISBN 3642158919. J. Tomaka, S. Thompson, and R. Palacios. The relation of social isolation, loneliness, and social support to disease outcomes among the elderly. Journal of Aging and Health, 18(3):359{384, 2006. R. Toris, J. Kammerl, D. V. Lu, J. Lee, O. C. Jenkins, S. Osentoski, M. Wills, and S. Chernova. Robot Web Tools: Ecient Messaging for Cloud Robotics. In Proceed- ings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015. F. Toriumi, H. Yamamoto, and I. Okada. Eects of Controllable Facilitators on So- cial Media: Simulation Analysis Using Generalized Metanorms Games. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pages 114{117. IEEE, nov 2013. ISBN 978- 0-7695-5145-6. doi: 10.1109/WI-IAT.2013.162. 189 M. Van Segbroeck, A. Tsiartas, and S. Narayanan. A robust frontend for VAD: exploit- ing contextual, discriminative and spectral cues of human voice. In INTERSPEECH, pages 704{708, 2013. K. van Turnhout, J. Terken, I. Bakx, and B. Eggen. Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In Proceedings of the 7th International Conference on Multimodal interfaces (ICMI '05), page 175, New York, New York, USA, oct 2005. ACM Press. ISBN 1595930280. doi: 10.1145/1088463.1088495. S. Viller. The Group Facilitator: A CSCW Perspective. In L. Bannon, M. Robin- son, and K. Schmidt, editors, Proceedings of the Second European Conference on Computer-Supported Cooperative Work ECSCW '91, pages 81{95, Amsterdam, The Netherlands, 1991. A. L. Wainer and B. R. Ingersoll. The use of innovative computer technology for teaching social communication to individuals with autism spectrum disorders. Re- search in Autism Spectrum Disorders, 5(1):96{107, 2011. ISSN 17509467. doi: 10.1016/j.rasd.2010.08.002. M. R. Walter, M. Antone, E. Chuangsuwanich, A. Correa, R. Davis, L. Fletcher, E. Fraz- zoli, Y. Friedman, J. Glass, J. P. How, J. hwan Jeon, S. Karaman, B. Luders, N. Roy, S. Tellex, and S. Teller. A Situationally Aware Voice-commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments. Journal of Field Robotics, 32(4):590{628, jun 2015. doi: 10.1002/rob.21539. L. A. West, S. Cole, D. Goodkind, and W. He. 65+ in the United States: 2010. Special Studies: Current Population Reports, 2014. URL https://www.census.gov/content/dam/Census/library/publications/ 2014/demo/p23-212.pdf. A. M. Wetherby, B. M. Prizant, and T. A. Hutchinson. Communicative, So- cial/Aective, and Symbolic Proles of Young Children with Autism and Pervasive Developmental Disorders. American Journal of Speech-Language Pathology, 7(2):79{ 91, 1998. ISSN 10580360. doi: 10.1044/1058-0360.0702.79. M. Wilson. Constructing Measures: an Item Response Modeling Approach. Lawrence Earlbaum, New Jersey, 2008. ISBN 1410611698 9781410611697. doi: 10.4324/9781410611697. T. Wongpakaran, N. Wongpakaran, R. Intachote-Sakamoto, and T. Boripuntakul. The Group Cohesiveness Scale (GCS) for psychiatric inpatients. Perspectives in Psychiatric Care, 49(1):58{64, jan 2013. ISSN 1744-6163. doi: 10.1111/j.1744- 6163.2012.00342.x. 190 Z. Xu and M. Cakmak. Enhanced robotic cleaning with a low-cost tool attachment. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2595{2601. IEEE, 2014. S. Yamasaki, H. Furukawa, M. Nishida, K. Jokinen, and S. Yamamoto. Multimodal Corpus of Multi-Party Conversations in Second Language. In International Confer- ence on Language Resources and Evaluation (LREC 2012), pages 416{421, Istanbul, Turkey, 2012. ISBN 978-2-9517408-7-7. A. Yamazaki, K. Yamazaki, T. Ohyama, Y. Kobayashi, and Y. Kuno. A Techno- Sociological Solution for Designing a Museum Guide Robot : Regarding Choosing an Appropriate Visitor. In Proceedings of the 7th annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), pages 309{316, Boston, MA, mar 2012. ACM Press. ISBN 9781450310635. doi: 10.1145/2157689.2157800. N. Yirmiya, O. Erel, M. Shaked, and D. Solomonica-Levi. Meta-analyses comparing theory of mind abilities of individuals with autism, individuals with mental retarda- tion, and normally developing individuals - ProQuest. Psychological Bulletin, 124(3): 283{307, 1998. M. Zancanaro, L. Giusti, E. Gal, and P. T. Weiss. Three around a table: the facilitator role in a co-located interface for social competence training of children with autism spectrum disorder. In INTERACT'11 Proceedings of the 13th IFIP TC 13 Interna- tional Conference on Human-Computer Interaction, pages 123{140. Springer-Verlag, sep 2011. ISBN 978-3-642-23770-6. 191
Abstract (if available)
Abstract
This dissertation presents a domain-independent computational model of moderation of multi-party human-machine interactions that enables a robot or virtual agent to act as a moderator in a group interaction. A moderator is defined in this work as an agent that regulates social and task outcomes in a goal-oriented social interaction. This model has multiple applications in human-machine interaction: groups of people often require some management or facilitation to ensure smooth and productive interaction, especially when the context is emotionally fraught or the participants do not know each other well. A particularly relevant application domain for moderation is in Socially Assistive Robotics (SAR), where group interactions can benefit from a moderator's participation. The evaluation of the model focuses on intergenerational interactions, but the model is applicable to various other SAR domains as well, including group therapy, informal teaching between peers, and social skills therapy. ❧ Moderation is formalized as a decision-making problem, where measures of task performance and positive social interaction in a group are maximized through the behavior of a social moderator. This framework provides a basis for the development of a series of control algorithms for robot moderators to assist groups of people in improving task performance and managing the social dynamics of interactions in diverse domains. Based on reliably-sensed features of the interaction such as task state and voice activity, the moderator takes social actions that can predictably alter task performance and the social dynamics of the interaction. Thus the moderator is able to support human-human interaction in unpredictable, open-ended, real-world contexts. ❧ The model is evaluated in inter-generational applications, where the moderator supports interactions including members of multiple generations within the same family. In interactions with older adults, the moderator can support positive family interactions that lead to strong social support networks and ultimately, better outcomes for health and quality of life. In interactions with families and siblings of children with autism, the moderator can support socially appropriate interactions, and the social integration and learning that may be as important as more traditional cognitive milestones. Simpler algorithms are validated in-lab with a convenience population, while algorithms that more fully integrate task and social goals are evaluated in inter-generational interactions with older adults and in interactions between children with autism and their families. ❧ The model of moderation provides a framework for developing algorithms that enable robots to moderate group interactions without the need for speech recognition
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Multiparty human-robot interaction: methods for facilitating social support
PDF
Coordinating social communication in human-robot task collaborations
PDF
Situated proxemics and multimodal communication: space, speech, and gesture in human-robot interaction
PDF
Nonverbal communication for non-humanoid robots
PDF
Socially assistive and service robotics for older adults: methodologies for motivating exercise and following spatial language instructions in discourse
PDF
Towards socially assistive robot support methods for physical activity behavior change
PDF
On virtual, augmented, and mixed reality for socially assistive robotics
PDF
Macroscopic approaches to control: multi-robot systems and beyond
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
Quality diversity scenario generation for human robot interaction
PDF
Robot life-long task learning from human demonstrations: a Bayesian approach
PDF
Coalition formation for multi-robot systems
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Rethinking perception-action loops via interactive perception and learned representations
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Robot, my companion: children with autism take part in robotic experiments
PDF
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
Multi-robot strategies for adaptive sampling with autonomous underwater vehicles
PDF
The task matrix: a robot-independent framework for programming humanoids
PDF
Intelligent robotic manipulation of cluttered environments
Asset Metadata
Creator
Short, Elaine Schaertl
(author)
Core Title
Managing multi-party social dynamics for socially assistive robotics
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/20/2019
Defense Date
05/03/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aging,autism,family interaction,human-robot interaction,moderation,multi-party human-robot interaction,multi-party interaction,OAI-PMH Harvest,robotics,socially assistive robotics
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Matarić, Maja (
committee chair
), Ragusa, Gisele (
committee member
), Sukhatme, Gaurav (
committee member
), Traum, David (
committee member
)
Creator Email
elaine.g.short@usc.edu,elaine.short@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-388844
Unique identifier
UC11258940
Identifier
etd-ShortElain-5435.pdf (filename),usctheses-c40-388844 (legacy record id)
Legacy Identifier
etd-ShortElain-5435.pdf
Dmrecord
388844
Document Type
Dissertation
Rights
Short, Elaine Schaertl
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
autism
family interaction
human-robot interaction
moderation
multi-party human-robot interaction
multi-party interaction
robotics
socially assistive robotics