Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Physical and social adaptation for assistive robot interactions
(USC Thesis Other)
Physical and social adaptation for assistive robot interactions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PHYSICAL AND SOCIAL ADAPTATION
FOR ASSISTIVE ROBOT INTERACTIONS
by
Nathaniel Steele Dennler
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2025
Copyright 2025 Nathaniel Steele Dennler
Acknowledgements
No one accomplishes anything entirely on their own. This dissertation results from countless interactions
I have had with people throughout my life—both those I can remember and name, and those whose ideas
reached me through echoes. If you are not mentioned here, know that I am eternally grateful.
First and foremost, I thank my advisors Maja Matarić and Stefanos Nikolaidis. You have granted me
the invaluable and improbable opportunity to mature both academically and personally. I entered this
program with very little experience with the world, and your guidance and support throughout my Ph.D.
shaped my values and understanding of my identity as a researcher. Thank you for pushing me to work
on real robots that help real people address real problems.
I am grateful to the professors that have served as my committee members: Shri Narayanan, Gaurav
Sukhatme, Jesse Thomason, and Heather Culbertson. You have all brought new and helpful insights that
shaped my research to be useful beyond my own circles.
I extend my thanks to my peers, especially my fellow Interaction and ICAROS lab members for their
support and comradery throughout the years. I am forever indebted to Kate Swift-Spong, who recruited
me to the 2018 research experience for undergraduates that began my research journey, to Liz Cha, who
provided feedback on my graduate research fellowship application in the midst of a dental surgery, and
to Naomi Fitter, for showing me the joys of doing research studies and crafting research questions that
align with personal interests. You three are behind every academic opportunity that found me. I thank
Tom Groechel for accompanying me in the early hours of the workday and commiserating about the
ii
construction of academic knowledge, Chris Birmingham for introducing me to the wide world of tools for
computational modeling, and Lauren Klein for matching my energy and making me feel welcome. Thank
you Matt Rueben for encouraging my creativity and wandering interests.
Thank you to Heramb Nemlekar for listening to me vent about motion planning libraries, Matt Fontaine
for expanding my understanding of optimization techniques and algorithms more generally, Hejia Zhang
for helping me tackle my first conference, and Aaquib Tabrez for creating a positive lab culture.
For those who officially began their PhD journey after me, thank you to Zhonghao Shi for being a
wonderful collaborator and friend, Amy O’Connell for sharing my love of textiles and figure skating, Mina
Kian for sharing my love of dancing in heels, Leticia Pinto Alva for sharing my love of finding free food
around campus, and A’di Dust for sharing my love of coffee. Thank you Kaleen Shrestha for teaching
me about the importance of a growth mindset and for asking insightful questions, Emily Weiss for also
sharing my love of figure skating (what are the odds?), and Kait Zareno for always picking up on my
niche references. Thank you Robby Costales for our exciting conversations on the importance of adaptive
agents, Bryon Tjanaka for teaching me about writing Python libraries and ballroom dance, Sophie Hsu for
discussions on the applications of balloon robots, Varun Bhatt for agreeing to test out my studies, Saeed
Hedayatian for always being curious and knowledgeable, Shihan Zhao for teaching me the ins and outs of
CMA-ES, and Yuming Jackson Gu for being a resource on both graphics and street dance.
Without the communities I belonged to outside of school, this dissertation would not be possible. I
want to thank my dance community, Whacking LA, for always giving me a space to freely be the punk I
am. I especially want to thank Tori Cristi, Kevin Cassasola, and Roxi Smith, Sasha Suslina, Cara Hurley, Em
Smith, Ray Ra, Zoran Liu, and Percy Fong. Thank you to Viktor Manoel and the OG punks for originating
the dance that I have found a home in. I also extend my thanks to the USC Figure Skating team for being
the first community I had in LA. I especially thank Lily Zeng, Cal Chen, Colleen Feng, and Natalie McLain
iii
for our countless shenanigans. Thank you to Queer in AI and Queer in Robotics for helping me realize I
am not alone.
I deeply appreciate my day-one friends that have maintained contact throughout our years of game
nights: Caroline Johnston, Loc Trinh, Xiao Fu, Alex Spangher, and Johnny Wei. I also thank Rin Yunis,
Yuanzhong Pan, Luciana Custer, and Raymond Yu for always being there and motivating me until the end.
Thank you Uksang Yoo for navigating my first in-person conference with me, you mean the world to me.
I give my final thanks to my family. I am grateful for my parents Peggy O’Neil and Ted Dennler. You
have tirelessly supported me in all my endeavors and instilled within me the great joys of life. You taught
me to never give up and to stand up for myself. I also want to thank my sister, Aimee Dennler, for being
the only person who fully understands me. I dedicate this dissertation to you.
iv
Funding
This dissertation contains work supported by a National Science Foundation Graduate Research Fellowship, a University of Southern California Annenberg Fellowship, an Amazon Research Award on “Learning
Situational Human Preferences of a Mobile Social Robot Through In-Situ Augmented Reality", and a National Institute for Mental Health Smart Health and Biomedical Research in the Era of Artificial Intelligence
and Advanced Data Science grant on “Personalized AI-Driven Models for Supporting User Engagement
and Adherence in Health Interventions: Validation in Cognitive Behavioral Therapy for Anxiety".
v
Table of Contents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Robots as Physical and Social Agents . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Robots as Adaptable Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Robots as Assistive Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Adaptation as Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 The Agents that Mutually Adapt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Methodological Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Algorithmic Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.3 Software and Systems Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2: Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Assistive and Rehabilitative Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Physically Assistive Robotics for Users with Limited Mobility . . . . . . . . . . . . 12
2.1.2 Socially Assistive Robotics for Users with Limited Mobility . . . . . . . . . . . . . 14
2.2 User Motivation, Social Evaluation, and Technology Adoption . . . . . . . . . . . . . . . . 16
2.2.1 The Self-Determination Theory of Motivation . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Social Identity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 The Technology Acceptance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 3: Conceptualizing Adaptable Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vi
3.1.2 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 Inverse Reinforcement Learning from Demonstrations . . . . . . . . . . . . . . . . 24
3.1.4 Learning a User’s Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Framework for Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Personalization and Customization within the Communication Framework . . . . 28
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Part I: Robot Embodiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 4: User Expectations of Robot Embodiments . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Inspiration: Design Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Annotated MUFaSAA Data: Low-level Design Features . . . . . . . . . . . . . . . . 37
4.3.2 Crowd-Sourced MUFaSAA Data: High-level Expectations . . . . . . . . . . . . . . 42
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Social Expectations by Metaphor Type . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.3 Functional Expectation by Metaphor Type . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.4 Social and Functional Expectations Along Semantic Axes . . . . . . . . . . . . . . 49
4.4.5 The Space of Robot Gender Expression . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.6 Expected Robot Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.7 Visually Exploring the Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Chapter 5: Customizing the Appearance of Manufactured Robots . . . . . . . . . . . . . . . . . . . 57
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Technical Implementation: PyLips, a Software Package for Screen-Based Faces . . . . . . . 61
5.2.1 Anatomy of the Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.2 Synchronizing Mouth and Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.3 Server/Client Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Inspiration: Roland Barthes, Bernard Rudofsky, Feminism, Queer Theory, and the
Elements of Fashion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.1 Experimental Validation: Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.2 Experimental Validation: Appearance . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4.4 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.2 Manipulation Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.3 Social Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.4 Gendering Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Part II: Adapting Physical Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 6: Personalizing Movement Models in Post-Stroke Participants . . . . . . . . . . . . . . . 98
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
vii
6.2 Inspiration: Clinical Standards for Measuring Nonuse . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 Actual Amount of Use Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.2 Motor Activity Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.3 Bimanual Arm Reaching Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Technical Approach: Modeling User Behavioral Metrics . . . . . . . . . . . . . . . . . . . . 105
6.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4.1 Technical System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4.2 Bilateral Arm Reaching Test with a Robot . . . . . . . . . . . . . . . . . . . . . . . 109
6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.1 Neurotypical Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.2 Post-Stroke Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.3 Testing Validity, Reliability, and Simplicity . . . . . . . . . . . . . . . . . . . . . . . 111
6.5.4 Post-Stroke Participant Insights on Rehabilitation Systems . . . . . . . . . . . . . . 114
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Chapter 7: Customizing Robot Haircare through Motion Planning . . . . . . . . . . . . . . . . . . 120
7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2 Inspiration: Hair Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Technical Approach: Following Hair Orientation Fields . . . . . . . . . . . . . . . . . . . . 123
7.3.1 Coherence-Enhancing Shock Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.2 Calculating Orientation through Hair Images . . . . . . . . . . . . . . . . . . . . . 125
7.3.3 Generating a Combing Path from Orientation Fields . . . . . . . . . . . . . . . . . 125
7.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.4.1 Experimental Validation: Path Planning Algorithm . . . . . . . . . . . . . . . . . . 126
7.4.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4.3 System Evaluation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.5.2 A Framework for Evaluating Hair Care Systems . . . . . . . . . . . . . . . . . . . . 131
7.5.3 Force Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Part III: Adapting Social Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Chapter 8: Personalizing Engagement Models in Rehabilitation Games for Users with Cerebral Palsy 135
8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.2 Inspiration: User Engagement and Exercise Gamification . . . . . . . . . . . . . . . . . . . 137
8.3 Technical Approach: Modeling Engagement Dynamics . . . . . . . . . . . . . . . . . . . . 138
8.3.1 Learning Personalized Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.4.1 Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.4.2 Interaction Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.4.3 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.4.4 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.5.2 User Preference for Robot Embodiment . . . . . . . . . . . . . . . . . . . . . . . . 144
8.5.3 Clustering Participant Social Feedback Preferences . . . . . . . . . . . . . . . . . . 145
viii
8.5.4 Personalizing Robot Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Chapter 9: Customizing Robot Signaling Behaviors through Novel Interfaces . . . . . . . . . . . . 151
9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.2 Inspiration: Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.3 Technical Implementation: The Robot Signal Design Tool . . . . . . . . . . . . . . . . . . . 155
9.3.1 Understanding User Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3.2 Creating Features for Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3.3 Generating Queries from User Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.4 Technical Approach: Contrastive Learning for Exploratory Actions . . . . . . . . . . . . . 158
9.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.4.2 Contrastive Learning from Exploratory Actions . . . . . . . . . . . . . . . . . . . . 159
9.4.3 Learning Preferences from Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.5 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.5.1 Experimental Validation: Usability of the RoSiD Tool . . . . . . . . . . . . . . . . . 162
9.5.2 Collecting User Preference Data on Robot Signals to Evaluate CLEA . . . . . . . . 167
9.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.6.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.6.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Chapter 10: Efficiently Customizing Learned Behaviors through Ranking . . . . . . . . . . . . . . . 174
10.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.2 Inspiration: Active Learning and Black-box Optimization . . . . . . . . . . . . . . . . . . . 176
10.3 Technical Approach: Covariance Matrix Adaptation Evolution Strategy with Information
Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10.3.2 Bayesian Optimization of Preferences . . . . . . . . . . . . . . . . . . . . . . . . . 179
10.3.3 Covariance Matrix Adaptation Evolution Strategies (CMA-ES) . . . . . . . . . . . . 180
10.3.4 Combining Information Gain and CMA-ES . . . . . . . . . . . . . . . . . . . . . . . 181
10.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.4.1 Experimental Validation: Simulated User Rankings . . . . . . . . . . . . . . . . . . 182
10.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.4.3 User Study Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.5.2 Ease of Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.5.3 Perceived Behavioral Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.5.4 Overall Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Chapter 11: Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.1.1 Adapting Robot Embodiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
11.1.2 Adapting Physical Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
11.1.3 Adapting Social Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
ix
11.2 Future Directions for Adapting Assistive Robots . . . . . . . . . . . . . . . . . . . . . . . . 194
11.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
x
List of Tables
4.1 A table of the binary and ordinal robot descriptors that were developed through inspection
of the robot designs, and user descriptions. For each feature, we provide a description of
what the feature means, and the measure of Cronbach’s alpha that we obtained between
two raters of the robotic systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 A table of the continuous feature descriptors taken from the robots’ websites. . . . . . . . 42
5.1 Clothing design results. We report the marginal means of femininity (µfeminine) and
masculinity (µmasculine) across the different task and clothing types. We also report the
number of participants that selected each clothing type as the most feminine (Nfeminine)
or the most masculine (Nmasculine) when presented with all three clothing options for
each task. Values following ± represent standard error. . . . . . . . . . . . . . . . . . . . . 82
6.1 Demographic Information of the Neurotypical Group . . . . . . . . . . . . . . . . . . . . . 110
6.2 Demographic Information of the Post-Stroke Group . . . . . . . . . . . . . . . . . . . . . . 111
7.1 Examples of positive and negative qualitative responses from participants for each theme. 130
8.1 Survey questions and associated factors of Companionship (C), Perceived Enjoyment (PE),
and Perceived Ease of Use (PEU). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.1 Simplicity results. For each modality, we found the area under the curve (AUC) of
the alignment metric over 100 pairwise comparisons across feature dimensionalities.
Asterisks indicate best-performing algorithm within each dimension (all p < .05). . . . . . 170
10.1 Quantitative Results. We report the area under the curve (AUC) for alignment of learned
reward and quality of query across d-dimensional feature spaces. . . . . . . . . . . . . . . 183
10.2 The Likert scale items that participants answered for our two metrics: perceived ease of
use and perceived behavioral adaptation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
xi
List of Figures
1.1 The agents that mutually adapt. The user has a preference for the robot’s behavior
determined by their personal experiences and their expectations of what the robot can do.
The robot can perform certain trajectories determined by its embodiment. . . . . . . . . . 6
1.2 Communication framework. The user communicates their preference to the robot through
evaluations of the robot’s performed trajectories. The robot learns the user’s preference
by performing tasks. Each chapter in the dissertation models a component of the two-way
communication between the robot and the user. The chapter numbers are color-coded
according to the three domains of adaptation: robot embodiment (yellow), physical
interaction (blue), and social interaction (red). . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Overview of the dissertation. The regions represent the three domains of research: robot
embodiment, physical adaptation, and social adaptation. The vertical axis denotes the
method of adaptation; images on the top row explore adaptation through personalization,
whereas images on the bottom row explore adaptation through customization. . . . . . . 8
2.1 Example Physically Assistive Robots. In (a), the HARMONY exoskeleton (Oliveira et al.
2019); (b), Assistive Dexterous Arm (Nanavati, Alves-Oliveira, et al. 2023);(c) end-effector
robot with a custom end-effector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Example Socially Assistive Robots. (a) The Kiwi robot shown tutoring children with
autism (Shi, Groechel, et al. 2022); (b) Pepper shown assisting a user in a cup-stacking
game (Feingold-Polak, Barzel, and Levy-Tzedek 2021); (c) the Blossom Robot (Suguitan
and Hoffman 2019). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 The self-determination theory of motivation (Deci and Ryan 2012). Internal motivation
ranges from amotivated to extrinsic to intrinsic. By satisfying a user’s needs for autonomy,
belonging, and competence, we can internalize motivation. . . . . . . . . . . . . . . . . . . 16
2.4 The Technology Acceptance Model (Davis 1989). Actual system use is affected by two
concepts that system creators can affect: perceived usefulness and perceived ease of use. . 20
3.1 An illustration of a robot’s state and action, and an example of sampled trajectories. . . . . 23
xii
3.2 Pathway for communication, adapted from the Shannon-Weaver model (Weaver 2017;
Shannon 1948). A user has a preference that they use to send a message to the robot based
on an internal encoding of their preferences, ϕH, in addition to the other requirements of
the particular modality (e.g., the set of trajectories that are compared or the demonstration
provided by the user). The encoded message is communicated over a noisy channel to be
decoded into a robot-interpretable command via ϕR. . . . . . . . . . . . . . . . . . . . . . 26
4.1 Examples of robots’ physical designs measured by abstraction level along three different
design metaphors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Example composite images from the MUFaSAA dataset. . . . . . . . . . . . . . . . . . . . . 36
4.3 Differences in means for anthropomorphic, zoomorphic, and mechanical embodiments
for social constructs of embodiment. All differences are significant with p < 0.001 unless
marked otherwise. Error bars represent 95% CI of means. . . . . . . . . . . . . . . . . . . . 47
4.4 Differences in means for anthropomorphic, zoomorphic, and mechanical embodiments
for functional constructs of embodiment. All differences are significant with p < 0.001,
unless marked otherwise. Error bars represent 95% CI of means. . . . . . . . . . . . . . . . 49
4.5 Perceived competence and perceived perceptual ability by metaphors for different
maturity levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 A visualization of the space of gender expression by robot metaphor type. . . . . . . . . . 51
4.7 A heat map of the distribution of the top two tasks for robots in our dataset separated by
their metaphor type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.8 A t-SNE visualization of the design space of robot embodiments. Each point represents one robot in the dataset. Brown represents high values and teal represents
lower values of the measured ratings. Here we show only the front view of robots;
study participants viewed composite images that included scaling information, as
described in Section 4.3. The fully interactive version of this plot is located at
interaction-lab.github.io/robot-metaphors/, where researchers, designers, and
others interested in these findings may hover over points to view robots and click on a
specific robot to view its social and functional expectations. . . . . . . . . . . . . . . . . . 55
5.1 Control points of the animated face. Each control point represents attachments of human
muscles. The control points are used to map action units to the screen-based face. . . . . . 62
5.2 Example animation of the face saying "PyLips". We extract phonemes from the generated
audio file, then map those phonemes to visemes based on the International Phonetic
Alphabet and play the synchronized visemes and audio to achieve realistic movements
during speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Example network configuration. One PyLips server can host several faces. The server and
clients can run on the same computer or different devices. The PyLips Python interface
can send commands to individual clients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
xiii
5.4 Figures representing the bodies that clothes were designed for in four iconic fashion
periods. This figure is adapted from the exhibition “Are Clothes Modern" The Museum of
Modern Art, November 28, 1944–March 4, 1945. New York. The Museum of Modern Art
Archives, Photographic Archive. Photo: Soichi Sunami . . . . . . . . . . . . . . . . . . . . 66
5.5 Example variations on form. (left) Prototype with additional volume at top of robot body.
(right) Prototype with additional volume at bottom of robot body. . . . . . . . . . . . . . . 69
5.6 Example variations on style lines. (left) Prototype emphasizing horizontal lines. (right)
Prototype emphasizing vertical lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Example variations on texture. (left) Prototype exhibiting a texture through a textured
material. (right) Prototype exhibiting a texture through layering materials. . . . . . . . . . 72
5.8 Example variations on color. (left) Prototype using a single color. (right) Prototype using
a patterned fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.9 Perceived femininity of voices (-3 represents a masculine voice, 0 represents an
ambiguously gendered voice, and +3 represents a feminine voice) as a function of average
fundamental frequency (f0) of the utterance. Teal lines represent five-datapoint sliding
averages, the shaded region denotes ± one standard deviation, and the beige dots are
individual responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.10 Quori (unclothed) (Specian et al. 2021), the robot we selected to use for the clothing design
study and integrative video study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.11 Appearance modifications of the Quori robot. The first two images represent the feminine
and masculine versions of the robot clothing designed for the hotel receptionist task.
The second two represent the feminine and masculine clothing designed for the medical
professional task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.12 Sample frames from the two tasks we selected: medical professional (left) and hotel
receptionist (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.13 Summary of stimuli and results from the integration study. We found that the manipulation
of the social role was significantly different between the two conditions (p < .001), with
the expected social role of the receptionist task being lower than the social role of the
medical professional task (a). Participants saw videos of the robot performing a task with
a human. We also found that voice and appearance affected the perception of masculinity
(b) and femininity (c) in both conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.14 Summary of the effect of clothing the robot in the two tasks, for robots with androgynous
voices. We found that clothing had a significant effect on gendering the robot, depending
on the task. In tasks with higher social roles, clothing makes the robot more masculine
and less feminine, and in the lower social role task, clothing makes the robot less masculine. 94
xiv
6.1 Example reaching trial with the BARTR apparatus. The participant places hands on
the home position device. The socially assistive robot (SAR, on the left) describes the
mechanics of the BARTR, and the robot arm (on the right) moves the button to different
target locations in front of the participant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Visualization of the participant’s workspace. Viewed above, the workspace tested extends
radially from the home position from a distance of 10cm to 30cm (A). Viewed from the
side, the workspace extends upward 40cm (B). . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 Evaluations of the proposed metric. We demonstrate the Bimanual Arm Reaching Task
with a Robot (BARTR) metrics validity through its correlation with clinical measurements
of nonuse through a non-parametric Spearman correlation, r(13) = .693, p = .016 (A).
We demonstrate reliability with the absolute agreement of BARTR scores across three
sessions through the intraclass correlation coefficient, ICC(1, k) = .908, p < .001 (B).
We demonstrate its ease of use through usability ratings of the system, showing that
the average rating is above 72.6 through a non-parametric Wilocoxon signed-rank test,
Z = 16.0, p = .040 (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4 Qualitative responses from participants. We show overall perceptions of each of the
four factors of trust (Kellmeyer et al. 2018) that each participant mentioned. . . . . . . . . 115
7.1 Overview of path-planning module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2 Illustrative hairstyle where the image-based algorithm performs differently than the
mesh-based for corresponding starting points. . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3 Means of the three different algorithms. Error bars represent 95% Confidence Intervals of
mean ratings. All differences are significant. . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4 Sample frames of the video shown to the participants and forces measured by the arm
for different strokes of each hairstyle. Orange lines represent mean force values across
25 strokes with the orange region illustrating the first and third quartiles. Blue lines
represent individual stroke force values. All force readings were measured at 10Hz and
were post-processed with a sliding average of 9 timesteps for visualization. . . . . . . . . . 129
8.1 Stages of the within-subject experiment design. . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2 Participant responses to Likert-scale questions, grouped by measured construct. . . . . . . 143
8.3 Transition matrices of the two clusters found in the participant-based clustering method.
Each matrix specifies the probability of becoming engaged (E) or disengaged (D) at the
next time-step, given the current state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.4 Transition matrices of different clusters found in the action-based clustering method.
Each matrix specifies the probability of becoming engaged (E) or disengaged (D) at the
next time-step, given the current state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
xv
8.5 Percentage of time that modeled users were engaged for different methods of robot
action selection. Selecting actions based on the correct user clusters (a) keeps users more
engaged, however selecting actions on incorrect user models (b) has an adverse effect.
Considering the users as one group (c) performs similarly to the random baseline. . . . . . 148
9.1 Example exploratory search process. Users engaging in exploratory search test out
different robot behaviors to learn what the robot is capable of and what they prefer the
robot to do. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.2 CLEA: Contrastive Learning from Exploratory Actions. Users engage in exploratory
search to select their preferred robot behaviors. We automatically generate data from their
exploratory actions to learn features that facilitate future interactive learning processes.
Our contributions are highlighted in pink, and the enabling work that CLEA supports in
highlighted in green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.3 Interfaces for the RoSiD tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.4 Structure of the design session with approximate times for each section. . . . . . . . . . . 163
9.5 Box plots showing the times users spent deigning signals. . . . . . . . . . . . . . . . . . . . 165
9.6 Box plots comparing the alignment of initial queries based on random suggestions and
the proposed clustered suggestions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.7 Completeness results. Across three modalities, feature spaces using CLEA are able to
accurately predict user preferences. Error bars show mean standard error across participants. 169
9.8 Minimality results. Alignment of a linear reward model across numbers of pairwise
comparisons for the smallest sized feature space. Shaded region indicates mean standard
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.9 Interpretability Results. CLEA most often generated the users’ top-rated signal. Dotted
line represents expected number of users if all algorithms were equally preferred. . . . . . 172
10.1 The two domains that users taught robots their preferences for the robot’s behaviors. In
the physical domain, users ranked a JACO arm’s movement trajectories to hand them a
marker, a cup, and a spoon. In the social domain, users ranked a Blossom robot’s gestures
to portray happiness, sadness, and anger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
10.2 Example queries generated from an early step of each algorithm. The large circle
represents the space of all trajectories with lighter areas representing higher reward, light
blue arrows representing the user’s true preference, dark blue arrows representing the
current estimate of the user’s preference, orange circles representing sampled trajectories
to present to the user, and green dotted regions representing the sampling distribution
from the current step of the CMA-ES optimizer. Information gain results in easy to
differentiate queries, CMA-ES results in higher rewards on average, and CMA-ES-IG
results in higher rewards that are easy to differentiate. . . . . . . . . . . . . . . . . . . . . 182
xvi
10.3 Comparison of simulation results for learning user preferences. Shaded regions indicate
standard error. We found that all methods were able to learn user preferences across
varying dimensions. The quality of the trajectories in the query increases only for
CMA-ES and CMA-ES-IG, with CMA-ES-IG performing significantly better. . . . . . . . . 184
10.4 The framework for learning user preferences. We learned nonlinear features for sets of
robot trajectories. The query sampler produced sets of trajectories for the user to rank
and those rankings were used to update the estimate of the user’s preferences. . . . . . . . 185
10.5 User study setup. Users interacted with the robots through the ranking interface to specify
their preferences for how the Blossom robot used gestures to signal different affective
states and how the JACO robot arm handed them different items. . . . . . . . . . . . . . . 185
10.6 Ease of Use results. CMA-ES-IG was rated significantly easier to use than CMA-ES, and
empirically easier than IG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.7 Behavioral Adaptation results. CMA-ES-IG was rated as changing the robot’s behavior
significantly more over time than both CMA-ES and IG. . . . . . . . . . . . . . . . . . . . . 189
10.8 Algorithmic ranking results. CMA-ES-IG was consistently ranked as the most preferred
algorithm for teaching robots preferences in our user study. . . . . . . . . . . . . . . . . . 190
xvii
Abstract
Robots are expected to be deployed in diverse environments and use cases to provide physical and social
assistance to end-users. A major barrier to the widespread deployment of robots is the large variance
in user preferences for how robots should perform tasks. The impact of user preferences is exacerbated
in robots compared to already ubiquitous computer systems because a robot’s embodiment allows it to
physically interact with the world and form social connections with users through its actions.
This dissertation explores how robots can adapt their mechanical design, physical behaviors, and social
behaviors to align with users’ preferences. Across these domains, we emphasize the importance of both
automatic adaptation through personalization, and user-driven adaptation through customization.
First, this work identifies how robot embodiment affects expectations for interaction. We introduce design metaphors as a tool for reasoning about these expectations, and clothing design as a method to modify the robot’s perceived embodiment. Second, we show how robots can learn users’ preferences through
physical interaction. We create an objective metric by modeling interaction with a robot system that assesses movement in post-stroke users, and we develop a novel hair-combing robotic interaction. Finally
we show how robots can learn users’ preferences through social interactions. We introduced a process
to learn user engagement models based on robot social actions to facilitate exercise games for users with
cerebral palsy. We additionally create an interface to allow users to create non-verbal signals and provide a
machine learning framework to develop representations of these signals that facilitate customization. We
conclude the thesis with an algorithm that allows users to quickly customize robot behaviors, both social
xviii
and physical behaviors. Together, this work enables the design and implementation of assistive robotic
systems that can aid a variety of users with diverse preferences.
xix
Chapter 1
Introduction
This chapter provides an overview of how robots can assist users to motivate the dissertation. The goal
of the dissertation is to allow non-expert users to adapt assistive robotic systems to their specific use case.
This chapter concludes with a description of the contributions of this research and an outline for the rest
of the dissertation.
1.1 Overview
The dissertation contributes methodological techniques, algorithms, and system implementations that enable users to adapt assistive robots to their own needs and preferences. We investigate robot adaptation
in three domains: robot embodiment, physical behaviors, and social behaviors. This work emphasizes the
importance of making technical contributions to the field of robotics in domains that address real problems
faced by the intended users of robotic systems. A fundamental goal of human-robot interaction (HRI) is
to understand how robots can address these problems so we can facilitate the widespread adoption and
integration of robotic systems that operate around people.
The dissertation highlights that the adaptability of robots is an important area of research to enable
robotic systems to meaningfully assist users. Robots that are easy for users to adapt have the potential
1
to operate across environmental, physical, social, cultural, and geographic contexts. For robots to be useful in these varied contexts, we must understand the benefits of robots as opposed to other appropriate
technologies, such as computers. There are three differentiating views of robots that enables this research.
1.1.1 Robots as Physical and Social Agents
The key differentiation between robots and computers is that robots are physically embodied. This embodiment allows a robot to manipulate objects and otherwise interact with the physical world. However,
embodiment also imbues a robot with social attributes. Both computers and robots are social agents.
Researchers in human-computer interaction (HCI) have proposed the computers are social actors (CASA)
paradigm (Nass, Steuer, and Tauber 1994) to understand the automatic social attributions that people place
on computers. A robot’s physical embodiment extends these findings from CASA with a host of unique
modalities for social and physical interaction (Deng, Mutlu, and Mataric 2019). For example, a mobile
robot possesses both the ability to move between physical locations and the ability to encroach on a user’s
personal space.
Historically, roboticists generally consider physical and social modalities independently. However, this
is an over-simplification in practice, and physical modalities and social modalities are intertwined. For
example, people mindlessly associate social intention to the purely physical movement of shapes (Heider
and Simmel 1944). Conversely, people make choices concerning physical actions in service of their social
identity (Butler 2002). Studies in human-robot interaction (HRI) have also identified the interplay between
social and physical expectations of robots (Cha, Dragan, and Srinivasa 2015), finding that a robot that
speaks is expected to be more physically capable, even though speaking has no impact on a robot’s ability
to perform motion planning.
This overlap between the social world and the physical world is not unique to robotics. Several other
fields investigate similar phenomena. In philosophy, this manifests as the tension between mind-body
2
dualism (Robinson 2023) and monoism (Schaffer 2018). From the mind-body dualism perspective, reality
is decomposed into physical experiences in the body, and non-physical thought in the mind. Monists
believe that there is one unified reality that does not separate between the physical and non-physical;
for example, materialists are monists that believe that everything is grounded in physical experience, and
idealists are monists believe that everything is grounded in the mind. In medicine, this idea manifests in
the tension between allopathic medicine and holistic medicine (Mehta 2011). Allopathic medicine prioritizes
targeting specific physical symptoms with specialized interventions (e.g., drugs, surgeries, or radiation),
whereas holistic medicine makes interventions along many social and lifestyle factors that contribute to
overall wellness through preventative care. In the social and physical sciences, this tension exists as the
difference between positivism and constructivism (Cupchik et al. 2001). Positivists believe in uncovering a
physical and objective truth of the world through experimentation, and constructivists believe in a reality
is socially constructed through subjective experience and social interaction.
While these perspectives differ in their methods, beliefs, and values, they all point to the importance of
considering both the physical and social components of the human experience. Similarly, the dissertation
examines both physical and social aspects of robots, in terms of expectations and behaviors.
1.1.2 Robots as Adaptable Experiences
While computers and robots both have social implications, computers provide an interface to the digital
world, whereas robots provide an interface to the physical world. The physical world contains more noise
and uncertainty than the digital world, and agents that interact with the physical world often engage in
learning processes to adapt behaviors to their environment to achieve specific goals. In robots that adapt
to users, there are two kinds of adaptation: personalization and customization.
3
Personalization refers to a robot’s autonomous adaptation to a user through observed interaction metrics. This kind of adaptation is not explicitly directed by the user, and is instead based on implicit communication that the robot infers from the user. For example, a robot may use cues from the user such as
gaze (Admoni and Scassellati 2017; Huang et al. 2015), facial expression (Stiber, Taylor, and Huang 2023;
Cui et al. 2021), or exercise completion rates (Clabaugh et al. 2019; Metzger et al. 2014) to guess what the
user prefers. The benefit of personalization is that the data are easy to collect, but the drawback is that
automatic adaptation can make incorrect assumptions about the user’s intentions. It is important to note
that these incorrect assumptions more often affect users of marginalized identities (Dennler, Ovalle, et al.
2023; Buolamwini and Gebru 2018).
Customization refers to a user explicitly changing a robot’s behavior. For example, a user may indicate
their preferences among behaviors by choosing their favorite behavior from a set of candidate behaviors (Biyik, Palan, et al. 2020; Sadigh et al. 2017), by describing their preferences through words (Peng
et al. 2024), or by demonstrating what they want the robot to do (Ng and Russell 2000). The benefit of
customization is that the user intentionally controls the robot’s behavior, but the drawback is that this
requires conscious effort from the user.
Personalization and customization define two poles on the adaptation spectrum that the dissertation
explores in the context of assistive robotics. A robot should be easy for the user to adapt, and it should
adapt to the user’s actual needs. Robots can leverage both personalization and customization techniques
to maximize the benefits of adaptation while minimizing the drawbacks.
1.1.3 Robots as Assistive Devices
Assistive devices, such as wheelchairs, hearing aids, communication boards, or adaptive utensils are often designed to support users in performing a specific task for a specific type of user. A key benefit of
robots is their capacity to provide integrated social and physical support across many tasks, adapting to
4
the spectrum of users’ diverse needs in mobility, cognition, or communication. For example, a robot arm
can assist a user in performing physical rehabilitation exercises (Takebayashi et al. 2022), eating (Candeias
et al. 2018), or hair combing (Hughes et al. 2021) by providing and adaptable physical interface. Similarly,
a Socially Assistive Robot (SAR) (Matarić and Scassellati 2016) can aid users in performing cognition exercises (Bouzida et al. 2024), remembering to take medications (Su et al. 2021), and socially connecting with
friends or family (Short, Swift-Spong, et al. 2017).
The dissertation focuses on systems designed to help users with limited mobility, for example in populations affected by stroke or cerebral palsy. By working with these populations to address real problems
faced by these groups, the dissertation generates transferable insights that apply to all users of assistive
robotic systems. This property is known in accessibility discourse as the curb-cut effect (Blackwell 2017).
The cub-cut effect describes how interventions designed for marginalized populations (e.g., sidewalk ramps
for wheelchair users) generate utility for heterogeneous user populations beyond the originally intended
population (e.g., cyclists, caregivers with strollers).
The dissertation investigates how adaptable robots can assist diverse users across diverse tasks. We
view the problem of adapting assistive robots as a form of communication between robots and users.
1.2 Adaptation as Communication
The dissertation frames the problem of robot adaptation as a form of communication between two agents:
the user and the robot. The user has their own unobservable preferences that are shaped through a variety
of factors. The robot has fully observable behaviors it can perform, however these behaviors must be
generated by a system developer. These entities’ different points of view are illustrated in Figure 1.1.
5
Figure 1.1: The agents that mutually adapt. The user has a preference for the robot’s behavior determined
by their personal experiences and their expectations of what the robot can do. The robot can perform
certain trajectories determined by its embodiment.
1.2.1 The Agents that Mutually Adapt
User adapting to Robot. The user has a preference, denoted as ω and an understanding of what the robot
can do. The user’s preference is shaped by their personal experience and interactions with the world. Their
understanding of what the robot can do is initialized from their past experiences, e.g., through media, past
robotic interactions, and communication with other users. The user’s understanding of the robot also
evolves through directly interacting with the robot. The role of psychologists and sociologists working in
robotics is to understand how a user’s past experiences shape future use.
Robot adapting to User. The robot has a set of trajectories that it can perform, denoted as Ξ, with
each individual trajectory being denoted as ξ. These trajectories are mathematically defined as a sequence
of states and actions. In the context of the dissertation, a trajectory can refer to any robot behavior, e.g.,
a series of robot joint states, series of social actions, a series of facial expressions, etc. The performable
trajectories are a subset of the possible trajectories that the robot is physically capable of based on its
embodiment. The role of robotics research outside of human-robot interaction is to expand the set of
programmable trajectories that the robot can perform. This can be achieved through many techniques
6
Figure 1.2: Communication framework. The user communicates their preference to the robot through
evaluations of the robot’s performed trajectories. The robot learns the user’s preference by performing
tasks. Each chapter in the dissertation models a component of the two-way communication between the
robot and the user. The chapter numbers are color-coded according to the three domains of adaptation:
robot embodiment (yellow), physical interaction (blue), and social interaction (red).
such as planning, reinforcement learning, or optimal controls. In the context of this work, the robot is
concerned with understanding the user’s goals, and estimates the users goals internally.
1.2.2 Communication
Both the robot and the user update their internal models of each other through communication. This
high-level process is depicted in Figure 1.2. The robot may perform actions or ask the user for input to
understand what they want. By doing this, the user implicitly learns more about what the robot is capable
of doing, and can update their estimate of the robot’s capabilities. The user can then answer the robot’s
request for feedback, and the robot may update its understanding of the user’s preference.
7
Figure 1.3: Overview of the dissertation. The regions represent the three domains of research: robot embodiment, physical adaptation, and social adaptation. The vertical axis denotes the method of adaptation;
images on the top row explore adaptation through personalization, whereas images on the bottom row
explore adaptation through customization.
1.3 Problem Statement
Given the benefits of adaptable robots for physical and social assistance, the dissertation synthesizes these
insights into three research domains that contribute theoretical and computational techniques to adapt
assistive robots:
1. How does the physical embodiment of a robot shape a user’s expectations for assistance?
2. How can a robot adapt its physical behaviors to assist a user?
3. How can a robot adapt its social behaviors to assist a user?
Across these three domains, the dissertation additionally explores the trade-offs between personalization and customization. The summary of the dissertation work is shown in Figure 1.3.
1.4 Contributions
The dissertation makes the following contributions across the domains of adapting robot embodiment,
adapting physical interaction, and adapting social interaction:
8
1.4.1 Methodological Contributions
• Design Metaphors as a Tool to Understand User Expectations: the Metaphors for Understanding Functional and Social Anticipated Affordances (MUFaSAA) dataset and show the relationships
between design metaphors, social, and functional user expectations.
• Robot Clothing Design to Establish Social Expectations: a robot clothing design methodology
to affect how a generic robot can be adapted to a specific use case.
1.4.2 Algorithmic Contributions
• Objective Metric to Assess Nonuse: a metric derived from a reaching task that can be used to
calculate arm nonuse in post-stroke users, demonstrating correlation with established clinical measurements.
• Motion Planning for Hair Combing: a computer vision and motion planning algorithm to automatically generate paths to comb a users hair from images.
• Learning User Engagement Dynamics: a learning algorithm to model changes in user engagement and a demonstration using this model to select robot actions that maintain engagement.
• Contrastive Learning from Exploratory Actions (CLEA): a contrastive learning framework that
leverages data generated from users’ exploration of robot behaviors. This algorithm learns representations that facilitate other forms of customization.
• Covariance Matrix Adaptation Evolution Strategy with Information Gain (CMA-ES-IG): an
algorithm to allow users to specify preferences for robot behaviors in learned behavior representation spaces. This algorithm efficiently learns user preferences, and users perceive the robot as
improving through their interaction.
9
1.4.3 Software and Systems Contributions
• PyLips: a Python package that enables users to design interactions with a screen-based face. This
face automatically lip-syncs to speech and can be visually customized.
• A Number-Guessing Game to Facilitate Orthosis Use: a robot interaction design where a robot
plays a number guessing game with a user to use a physically assistive orthosis. The robot provides
social feedback throughout the exercise.
• Combing Assistance with a Robot Arm (CARA): a robot system to aid users with hair care. The
user can click on a picture of their hair, and the robot moves through the planned trajectory.
• The Robot Signal Design (RoSiD) Interface: an interface that allows users to design multimodal
signals for a robot. Users can select visual, auditory, and kinetic components for each signal, and the
interface allows users to explore behaviors while they specify their preferences.
10
1.5 Outline
This section outlines the rest of the dissertation. Adaptation of embodiment is discussed in Chapters 4 and
5. Adaptation of physical interaction is discussed in Chapters 6 and 7. Adaptation of social interaction is
discussed in Chapters 8–10. To facilitate digital readability, each chapter links back to the table of contents
and provides a link to the end summary of the chapter.
• Chapter 2 discusses the relevant background.
• Chapter 3 presents a framework for adaptive systems.
• Chapter 4 describes the expectations users have of robots based on their embodiment.
• Chapter 5 describes how clothing design can shape expectations of existing robots.
• Chapter 6 describes a robot system that adapts models of post-stroke user’s physical movements.
• Chapter 7 describes an algorithm to generate custom robot combing trajectories through hair.
• Chapter 8 describes how social feedback can affect engagement in physical therapy exercises for
users with cerebral palsy.
• Chapter 9 describes how users can customize robot social signals to help in an item-finding task.
• Chapter 10 describes a technique that quickly adapts both social and functional behaviors to user
preferences.
• Chapter 11 summarizes the dissertation.
11
Chapter 2
Background and Related Work
This chapter overviews the relevant work in assistive robotics and the psychological frameworks that
motivate why robots should adapt to their users. The purpose of this chapter is to situate the dissertation
in the broader research landscape. This chapter is summarized in Section 2.3. Return to the Table of
Contents to navigate other chapters.
2.1 Assistive and Rehabilitative Robotics
Assistive and rehabilitative robotics can be broken down into two related categories: robots that help to
directly improve users’ skills (e.g., rehabilitation devices), and robots that augment user’s actions (e.g.,
assistive devices). Both types of robots can assist users through physical or social means.
2.1.1 Physically Assistive Robotics for Users with Limited Mobility
Physically Assistive Robots (PARs) provide assistance to users through physical interactions with the world
and the user; see Nanavati, Ranganeni, and Cakmak 2023 for a review of PARs. These robots can assume
many morphologies, such as end-effector robots, exoskeletons, prostheses, orthoses, or mobile manipulators. While there are several kinds of robots for lower-limb and upper-limb rehabilitation and assistance
(Maciejasz et al. 2014; Shi, Zhang, et al. 2019), this background will focus on describing robots that assist
12
Figure 2.1: Example Physically Assistive Robots. In (a), the HARMONY exoskeleton (Oliveira et al. 2019);
(b), Assistive Dexterous Arm (Nanavati, Alves-Oliveira, et al. 2023);(c) end-effector robot with a custom
end-effector.
with upper-limb assistance and rehabilitation as the dissertation focuses on users with limited mobility in
their upper-limb. Example upper-limb PARs are shown in Figure 2.1.
Physically Assistive Robots for Improving Skills. Robots that help users to recover lost function
through exercise are generally either end-effector robots or exoskeleton robots. End-effector robots are
robots with a series of links and joints that are usually physically attached to the environment. These
robots can move in free space, and manipulate the environment using their end effectors, similarly to the
robots used in manufacturing or assembly tasks. To interact with users, the end effectors are designed to
allow users to physically connect to the robots by using a handle or splint (as in Figure 2.1c). Once the user
is coupled with the robot, the robots can exert forces of varying levels to the user to learn motor function.
Examples of end-effector robots include the MIT-MANUS system (Hogan et al. 1992; Hesse et al. 2003),
the PUPArm (Catalan et al. 2018), and the EULRR robot (Zhang, Guo, and Sun 2020). These systems have
informed products such as the InMotion ARM/HAND (Bionik 2022) or REAplan (Axinexis 2023).
Exoskeleton robots differ from end-effector robots in that they are grounded to the user. These robots
have morphologies that mimic the user’s kinematics, so the robot’s range of motion and the user’s range
of motion have a one-to-one mapping. The user and robot make contact at several locations on the robot,
so exoskeleton robots must be adjusted to the specific user. Similar to end-effector robots, exoskeleton
13
robots exert forces on the user to assist with relearning motor function. Example exoskeleton robots are
CLEVERarm (Zeiaee et al. 2021), the DASA system (Shi, Song, et al. 2021), or the IntelliArm (Ren, Park,
and Zhang 2009).
Physically Assistive Robots for Augmenting Actions. Not all users with limited mobility may recover
motor function. An equally important area of research is to aid users in completing activities of daily living
(ADLs) and instrumental activities of daily living (IADLs). ADLs include activities that are necessary to
complete multiple times throughout the day. For example, interactions such as eating, personal hygiene,
dressing, or moving around are ADLs (Edemekong et al. 2019). IADLs are more complex activities that
are important for independent living, but occur less frequently. Doing housework, planning finances, or
shopping for groceries are examples of IADLs (Edemekong et al. 2019). PARs can reduce the effort users
with limited mobility exert on performing both ADLs and IADLs, and have the potential to help users
across several ADLs/IADLs.
To date, PARs have aided users through eating (Nanavati, Alves-Oliveira, et al. 2023), dressing (Kapusta
et al. 2019), shaving (Hawkins, Grice, et al. 2014), and grabbing objects (Nikolaidis, Hsu, and Srinivasa 2017).
Throughout these interactions, researchers are becoming increasingly aware of the trade-off between autonomy of the robot and the user’s control of the robot (Selvaggio et al. 2021). Some users prefer robots
that are autonomous robots because they are easier to control, while other users prefer robots that are less
autonomous because they make fewer assumptions about the goals that the user is trying to achieve.
2.1.2 Socially Assistive Robotics for Users with Limited Mobility
Socially Assistive Robots (SARs) assist users through social support to achieve behavioral change (Feil-Seifer
and Mataric 2005; Matarić and Scassellati 2016). In the context of assisting users with limited mobility, they
are often used to promote exercise engagement, exercise adherence, and encouraging self-practice. SARs
may assume many morphologies that use social means to support users, such as having facial features or
14
Figure 2.2: Example Socially Assistive Robots. (a) The Kiwi robot shown tutoring children with autism (Shi,
Groechel, et al. 2022); (b) Pepper shown assisting a user in a cup-stacking game (Feingold-Polak, Barzel,
and Levy-Tzedek 2021); (c) the Blossom Robot (Suguitan and Hoffman 2019).
appendages for gesturing rather than powerful force-exerting physical interaction capabilities, as in PARs.
Examples SARs are shown in Figure 2.2.
Socially Assistive Robots for Improving Skills. SARs can help users in practicing rehabilitation exercises by providing social encouragement to complete prescribed exercise regimens. SARs are designed to
use social rather than physical interaction with the user to provide engaging exercise interactions. These
interactions serve to increase a user’s motivation to perform exercises. SARs have been used to encourage
stroke patients performing exercises (Fasola and Matarić 2013; Feingold-Polak, Barzel, and Levy-Tzedek
2021), as robot tutors for children with autism (Shi, Groechel, et al. 2022), and for improving cognitive
skills in users with mild cognitive impairments (Bouzida et al. 2024).
Socially Assistive Robots for Augmenting Actions. While SARs are not often used to perform ADLs
that require physical contact for users, they can help reduce some cognitive load associated with IADLs.
By communicating with users through natural social interactions, SARs can augment users through nonphysical means, such as through helping users remember plans or goals, and by providing users with
information that help them make their own choices. Prior research has shown the potential for SARs to
help with food preparation tasks like making tea (Moro, Nejat, and Mihailidis 2018), reminding users of
15
Figure 2.3: The self-determination theory of motivation (Deci and Ryan 2012). Internal motivation ranges
from amotivated to extrinsic to intrinsic. By satisfying a user’s needs for autonomy, belonging, and competence, we can internalize motivation.
scheduled exercises or commitments (Winkle, Caleb-Solly, et al. 2018), or encouraging sociability with
other people (McColl, Louie, and Nejat 2013).
Beyond users with limited mobility, SARs are also increasingly used to aid users in other non-physical
tasks and skill acquisitions through similar non-physical support. SARs can provide motivation for users
with attention deficit hyperactivity disorder (ADHD) to study by acting as a body-double (O’Connell et al.
2024), and SARs can aid students with anxiety symptoms by helping them adhere to cognitive behavioral
therapy (CBT) exercises (Kian et al. 2024).
2.2 User Motivation, Social Evaluation, and Technology Adoption
The dissertation is motivated by the lack of adoption of robots in settings where they directly interact with
humans. In order to understand how to tackle this problem, it is important to understand the psychological
drivers of technology use. We outline how motivation, social identity, and general technology acceptance
affect adoption. These frameworks are both descriptive and generative; they can be used to describe and
understand existing user behaviors and to generate research questions for roboticists to address through
research studies.
16
2.2.1 The Self-Determination Theory of Motivation
Self-determination theory (SDT) (Deci and Ryan 2012) is a psychological framework that describes the
factors of motivation in humans, illustrated in Figure 2.3. This framework has been successfully applied
in a broad variety of domains such as education (Reeve 2002), video game development (Ryan, Rigby,
and Przybylski 2006), mental health (Sheldon, Williams, and Joiner 2008; Ng, Ntoumanis, et al. 2012),
and physical activity (Standage and Ryan 2020). SDT describes motivation as a spectrum that ranges
from extrinsic motivation to intrinsic motivation. Extrinsic motivation to engage in activities refers to a
drive based on external reward, for example monetary rewards, material rewards, social status, or special
privileges. Intrinsic motivation to engage in activities refers to a drive based purely on the joy of performing
the activity. By moving towards intrinsic motivation to perform activities, people experience higher levels
of engagement, better task performance, experience fewer negative effects associated with these tasks,
and rely less on incentive structures. The theory emphasizes that fostering intrinsic motivation requires
satisfying three basic psychological needs: autonomy, belonging, and competence.
Autonomy refers to the need to control of one’s own actions and decisions. By having a sense of
volition, users feel less restricted. When robots allow users to choose how they use them, users have a
sense of agency over how the robot is used. This choice reduces pressure to use a system in way that is
unaligned with the user’s goals, and promotes continued use of the system.
Belonging refers to the need to feel connected to others within a social context. This need emphasizes
the importance of feeling valued by others. Robots that help users connect with their community and other
social networks can promote users to collaborate with other community members to achieve common
goals. Achieving these goals can foster interpersonal connection and lead to internal satisfaction that
promotes continued use of the robotic system.
Competence refers to the need to feel capable and confident in one’s own ability to accomplish goals.
Robots that help users feel competent encourage users to challenge themselves to develop skills. As users
17
use the robot to accomplish new tasks, users develop an internal sense of achievement that encourages
continues use of the robotic system.
When these needs are met, people are more likely to pursue goals with an intrinsic motivation, rather
than relying on extrinsic rewards. By designing robot systems with these three needs in mind, users are
more likely to use the robot system. Throughout this dissertation, we use the principles from SDT to
develop systems that support the motivation of users to perform tasks. User autonomy is instrumental in
the trade-off between personalization and customization of robotic systems. User belonging is emphasized
in our choice to use social modalities in our interactions. User competence is important for developing
assistive interactions that help the user perform tasks independently.
2.2.2 Social Identity Theory
Autonomy, belonging, and competence are all affected by a user’s social identity. Identity affects the
choices people can make, what social groups people belongs to, and how well people can complete collaborative tasks. The underlying process that drives changes in these metrics is the formation of different
social groups, as described by Social Identity Theory (SIT) (Stets and Burke 2000; Tajfel and Turner 2004).
Findings from social psychology, economics, operations research, and marketing research show that social
identity strongly influences both subjective and objective interaction metrics (Charness and Chen 2020;
Hennessy and West 1999; Tajfel and Turner 2004; White, Habib, and Hardisty 2019). SIT describes how
people in the same social identity groups attribute more positive characteristics and take more favorable actions toward other members of their social identity, regardless of whether the in-group member is
a human or a robot (Fraune 2020; Fraune, Šabanović, and Smith 2017; Häring, Kuchenbrandt, and André
2014; Kuchenbrandt, Eyssel, et al. 2013; Sebo et al. 2020).
18
Conversely, members of different social identity groups typically behave more negatively toward
each other (Davis, Love, and Fares 2019); this effect has been observed in both human-human and humanrobot interaction (Sebo et al. 2020; Chang, White, et al. 2012; Fraune, Sherrin, et al. 2019). For example,
Fraune et al. (Fraune, Šabanović, and Smith 2017) explored the effect of SIT in a study with two mixed
teams of two humans and two robots each competed in a price-is-right game. They found that participants subjectively rated the in-group robot as more cooperative than out-group humans, and participants
assigned more painful noise blasts to out-group humans than to in-group robots. While group structures
in game contexts are well-defined, groups formed in daily life are more contextual. However, similar effects are well established in real-world human groups based on identity traits such as gender (Charness
and Chen 2020), political affiliation (Hart and Nisbet 2012), and even dietary habits (Davis, Love, and Fares
2019).
The ideas introduced by SIT highlight the importance of adapting robot interactions to their users. The
interactions developed in the dissertation emphasize users’ ability to change robots to be a part of users’
in-groups to provide individualized and relevant assistance.
2.2.3 The Technology Acceptance Model
While SDT and SIT describe the individual drives a user may have to adopt and actually use a robot, the
Technology Acceptance Model (TAM) (Davis 1989), shown in Figure 2.4, describes how a population may
elect to adopt and use any form of novel technology, including robots. TAM provides two key factors that
drive users’ decision to adopt technology: perceived usefulness and perceived ease of use. Perceived usefulness
refers to the user’s expectation that the technology will increase their productivity or task performance.
Perceived ease of use refers to the user’s expectation that using the technology is intuitive.
These two factors can be influenced by many external factors, including those factors defined in SDT
and SIT. For example, the user’s sense of autonomy and competence is often closely related to perceived
19
Figure 2.4: The Technology Acceptance Model (Davis 1989). Actual system use is affected by two concepts
that system creators can affect: perceived usefulness and perceived ease of use.
usefulness and intention to use (Jung 2011). A sense of belonging is related to in-group evaluation, and
these factors both contribute to the perceived ease of use of a system that aligns with social usage and
communication norms (Roca and Gagné 2008). Ultimately, robot system designers cannot directly affect
actual system use, however, they can affect the factors that lead to actual system use by making systems
that are perceived as easy to use and useful.
The dissertation extensively uses the TAM as a way to evaluate the success of a robot system. A key
goal in HRI is actual system use, which we evaluate through scales that measure the dimensions of perceived
usefulness and perceived ease of use, such as the system usability scale (Brooke 1996), the perceived ease of
use scale (Venkatesh and Davis 2000), or the perceived behavioral adaptation scale (Lee, Park, and Song
2005).
2.3 Summary
This chapter discussed the relevant background for this work. We discussed physically and socially assistive robots and psychological factors that affect motivation and technology use. The next chapter presents
the computational framework for robot adaptation.
20
Chapter 3
Conceptualizing Adaptable Robots
This chapter proposes a computational framework for developing assistive robots that adapt to users. The
goal of this framework is to formally describe the problem of robots that adapt to users, and allow future
research to systematically address the problem of adapting robots to users. This framework defines a
problem that is agnostic to the specific robot, scenario, or computational method, allowing the framework
to be re-used to conceptualize and explore new research questions. The dissertation frames the adaptation
of robots to users as a bidirectional communication problem, where the user tries to communicate their
preference to the robot, and the robot communicates their capabilities to the user. The robot and user can
communicate through different channels that have varying levels of noise in transmitting these signals.
This chapter is summarized in Section 3.3. Return to the Table of Contents to navigate other chapters.
3.1 Preliminaries
The preliminaries section provides a brief explanation of the background and historical context for this
work. Mathematical notation may differ between communities, but the description of this framework is
internally consistent.
21
3.1.1 Dynamical Systems
There are three important concepts in controllable dynamical systems:
• State. The state refers to a vector that describes the important components of world at a particular
instant. The set of all possible states (configurations of the world) is often∗ denoted as S, and a
specific state is denoted as s ∈ S.
• Action. An action refers to a vector that describes the commands you can send to a robot at a
particular instant. This is often† denoted as a ∈ A.
• Transition. The transition function refers to the physical laws that the robot must follow. In deterministic systems, this is often‡
represented as a function T(st
, at) → st+1. Here, the t subscript
refers to the particular time step.
These three constructs are used to define a robot’s trajectory. Trajectories can take many forms, In
motion trajectories, states may be a robot’s joint positions, actions may be a change to these positions,
and transitions are the result of carrying out these actions in the physical world. Analogously, a robot
can perform a social trajectory, where states may be a user’s task completion, actions may be a statement
the robot says, and transitions are how task completion changes based on the robots statements, which is
defined by a user’s internal social processes. The specific states and actions can be redefined according to
the specific application, Figure 3.1a illustrates how states and actions may be defined for a mobile robot.
The same modeling techniques for dynamical systems can be applied to any instance of these problems.
A robot’s trajectory is often defined over a certain period of time to make computation more tractable.
A robot starts moving at time zero, and the end of the time period is denoted with an arbitrary end time
∗Other fields denote state as x ∈ X or q ∈ Q
†
actions can also be denoted as u ∈ U or a ∈ Σ
‡
the transition function can also look like δ(q, a) or f(s, a), and if it is non-deterministic, P(s
′
|s, a)
22
Figure 3.1: An illustration of a robot’s state and action, and an example of sampled trajectories.
denoted with K, also called the finite time horizon. A trajectory ξ is defined as the sequence of states and
actions during this time horizon and is denoted as:
ξ = {s0, a0, s1, a1, ...sK, aK} (3.1)
The space of all trajectories that are possible for a robot is denoted as Ξ and can be approximated by
randomly sampling every start state and sequences of actions that are K actions long. This results in a very
large space of random possible trajectories. Example motion trajectories for a mobile robot are illustrated
in Figure 3.1b.
3.1.2 Markov Decision Processes
Markov Decision Processes (MDPs) extend dynamical systems by adding a new construct:
• Reward. The reward is a function that describes how favorable a particular state and action are with
a scalar value. It is often§ denoted as R(s, a) → R.
In reinforcement learning, a roboticist designs this reward function. The goal of the reinforcement
learning (RL) algorithm is to learn a policy π(s) that intelligently creates trajectories that maximize the
§
reward functions can also be denoted as just R(s) or as R(s, a, s′
). Sometimes the negative reward, known as cost, denoted
as C(x, u) or J(x, u) is used.
23
sum of the reward values for all state-action pairs in the trajectory, max P
K
t=0
R(st
, at). There are many
framings, techniques, and algorithms to learn π(s). For example, robots may algorithmically select actions
to maximize a reward by modeling T, referred to as model-based RL which can employ techniques such
as Proximal Policy Optimization (Schulman, Wolski, et al. 2017), Monte Carlo Tree Search (Browne et al.
2012), Model Predictive Control (Kouvaritakis and Cannon 2016), or World Models (Ha and Schmidhuber
2018). Robots can also try to directly learn the action for each state that maximizes the overall reward
using model-free algorithms such as temporal differencing (Sutton 1988), Q-learning (Watkins and Dayan
1992) or Deep Q Networks (Fan et al. 2020), Trust Policy Region Optimization (Schulman, Levine, et al.
2015), or Asynchronous Advantage Actor-Critic (Mnih et al. 2016).
3.1.3 Inverse Reinforcement Learning from Demonstrations
In practice, designing a reward function mathematically is hard, especially for non-expert users (HadfieldMenell et al. 2017). Where reinforcement learning assumes a given R(s, a) to learn a π(s), the problem
of inverse reinforcement learning (IRL) and learning from demonstration (LfD) assumes access to samples
from π(s) that can be collected from users to learn the users’ underlying R(s, a). This framing is useful
because it is sometimes easier to show a robot how to do something than it is to describe what a good
trajectory is.
The fundamental idea behind IRL and LfD is that the observed behavior, often demonstrated by an
expert, implicitly reveals the reward function that the expert is optimizing. In the dissertation, every user
is an expert on their own preferences. By analyzing these demonstrations, an IRL/LfD algorithm attempts
to reconstruct the reward function that would make the demonstrated policy π(s) optimal. Formally, given
a set of expert demonstrations, D = ξ1, ξ2, ..., ξn, the goal of IRL/LfD is to find a reward function R(s, a)
such that the demonstrated trajectories achieve a higher total reward than other possible trajectories in Ξ.
24
3.1.4 Learning a User’s Preference
To learn a user’s preference, several works make the assumption that trajectories can be described by a set
of feature descriptors. The process of creating these features from a trajectory is described by a function,
ϕ:
ϕ(ξ) → R
d
(3.2)
These trajectory features can be hand-crafted for a specific application or learned from data. Feature
learning can leverage autoencoders (Brown, Coleman, et al. 2020), or users choosing which two trajectories
from a set of three are the most similar (Bobu, Liu, et al. 2023).
Given these features, the problem of inverse reinforcement learning can be described as learning a
function that maps these features to a reward value. This function can be modeled as:
Ruser(ξ) = fω(ϕ(ξ)) (3.3)
This function fω can take any functional form, for example a linear combination of features (Ng and
Russell 2000), gaussian processes (Biyik, Huynh, et al. 2024) or neural networks (Bobu, Liu, et al. 2023).
This framing allows roboticists to decouple the understanding of user preferences from the algorithmic
approaches that model these functions from user data.
3.2 Framework for Adaptation
The dissertation extends the works in IRL and LfD by conceptualizing how users can communicate preferences through multiple channels. These channels communicate between two internal spaces, a user’s
25
preference space, Ω, and a robot’s trajectory space, Ξ. Communication is achieved by leveraging an hypothetical shared space called interactus, denoted as I, that contains the necessary distinctions for understanding behavioral preferences. This term comes from the latin inter- meaning “between", and actus
meaning “act". This term is inspired by a similar concept from linguistics called interlingua that refers
to the theoretical space that all languages map to, and enables translation between any two specific languages. The idea of interactus extends the idea of interlingua to robots, which have unique ways of being
and acting from people or other organic life. Both robots and people map to the interactus space by leveraging their mapping functions, ϕH and ϕR. This translation process is necessarily error-prone due to errors
in mapping and in using an interface for communication. These errors are modeled as the noisy channel
shown in Figure 3.2.
Figure 3.2: Pathway for communication, adapted from the Shannon-Weaver model (Weaver 2017; Shannon
1948). A user has a preference that they use to send a message to the robot based on an internal encoding
of their preferences, ϕH, in addition to the other requirements of the particular modality (e.g., the set
of trajectories that are compared or the demonstration provided by the user). The encoded message is
communicated over a noisy channel to be decoded into a robot-interpretable command via ϕR.
We can mathematically describe this communication by leveraging insights from Information Theory
(Shannon 1948). One important characteristic of a communication channel is the channel capacity, which
describes the theoretical upper bound on the rate of information transfer between source and destination.
The equation that describes this quantity is given by the Shannon-Hartley theorem:
C = B · log(1 + SNR) (3.4)
26
The channel capacity, C in bits per second, is described in terms of the bandwidth of the channel, B,
and the signal-to-noise ratio of the channel, SNR. A channel can communicate more information either
by increasing the bandwidth of the channel or reducing the noise in a channel.
In the context of adapting a robot, a user may have many communication channels they can use to
communicate their preferences with a robot (e.g., trajectory rankings, trajectory choice, or demonstrations). We describe a particular modality as M. Additionally, a user has personal factors that affect how
well they can use a particular channel. For example, past work has shown that users who have experience
playing video games are more proficient at robot teleoperation (Nenna and Gamberini 2022). We encapsulate user-specific factors that affect communication and channel use as u. Using this communication
channel over a particular period a time, a user communicates a particular amount of information to adapt
the robot, denoted as A and is described by this equation for a user u and modality M:
A =
Z
K
0
B(M) · log(1 + SNR(t | u,M)) · U(t | u,M)dt (3.5)
where t represents time, K represents the time horizon over which communication occurs, and U
represents the utilization of the channel as a function of the user and the modality over time. SNR and
U are time-dependent because external contextual factors can affect their values, for example, a user may
not be able to communicate using language as efficiently when a room is crowded, and thus elect to use
other modalities for communication, resulting in decreases in both SNR and U. We can use this equation to maximize A for users to maximally communicate their preferences to robots. This can be achieved
by creating new modalities of communication with higher bandwidths, increasing the signal-to-noise ratio by algorithmically improving existing modalities by modeling users, or improving the utilization of a
particular modality by aligning the modality with the user’s goals for using the robotic system.
27
3.2.1 Personalization and Customization within the Communication Framework
Using this communication framework, we can mathematically describe the difference between personalization and customization for robot adaptation based on a communication channel’s signal-to-noise ratio,
SNR, and its utilization, U. Personalization occurs from automatic user feedback. This typically results
in high channel utilization because a user is unconsciously and consistently providing such automatic
feedback, but the signal-to-noise is typically low due to the user being unaware that communication is
occurring. In contrast, customization, which occurs from conscious feedback from the user results in a
high signal-to-noise ratio because the user is intentionally communicating their preferences to the robot.
These channels are often orthogonal to the users’ functional goals for the robot, however, and thus typically experience lower utilization.
Thus a personalization interface for a particular user and given time horizon is defined by:
Z
K
0
U(t|u,M)dt > Z
K
0
SNR(t|u,M)dt (3.6)
The a definition of a customization interface is given by:
Z
K
0
U(t|u,M)dt < Z
K
0
SNR(t|u,M)dt (3.7)
This user-level distinction can describe how some users may have different experiences of the same
communication interface. For example, a facial expression communication modality may be perceived as
a personalization interface by a highly expressive user, whereas the same modality may be perceived as a
customization interface by a less expressive individual.
28
3.3 Summary
This chapter presented a framework that encapsulates adaptation across the three domains discussed in the
dissertation. The framework provides a computational interpretation of personalization and customization.
The dissertation leverages this framework to describe several new modalities for communicating user
preferences and robot capabilities: design metaphors, clothing design, physical reaching interactions, hair
combing path selection, facial affect, and exploratory search. Each of communication channels align with
either personalization or customization. The next chapter describes how design metaphors can be used
to set users’ initial expectations of robots. Robots’ embodiment is the first channel that users and robots
communicate through when beginning a new interaction.
29
Part I: Robot Embodiment
Before robot adaptation occurs, a user has expectations for how a robot will behave based on its physical design.
When these expectations are not aligned with a robot’s true capabilities, the user may be disappointed with
the robot’s actual performance, limiting intention to use the robot. As a motivating example, a robot with a
screen face that features eyes may indicate to the user that the robot can “see", but without a camera the robot
may be unable to process visual input. This misconception causes the user to be disappointed if the robot fails
to reason about visual stimuli. In addition to expected function, robot embodiment shapes the preferences that
a user has for the robot’s behavior. In chapters 4 and 5, we explore how robot embodiment shapes expectations,
and how embodiment can be modified to align with expectations.
30
Chapter 4
User Expectations of Robot Embodiments
This chapter establishes that the physical design of a robot affects a user’s expectation of what that robot
can do. These expectations are the initial condition from which the robot adaptation progress begins.
Roboticists need to understand what users expect the robot to do in order to develop assistive interactions
that align with these expectations. We apply the idea of design metaphors to robot embodiments, and show
that this tool can be used to measure differences in social and functional expectations across a variety of
socially interactive robots. This chapter is summarized in Section 4.5. Return to the Table of Contents to
navigate other chapters.
This chapter is adapted from the paper “Design metaphors for understanding user expectations of socially interactive robot embodiments" (Dennler, Ruan, et al. 2023), written in collaboration with Changxiao
Ruan, Jessica Hadiwijoyo, Brenna Chen, Stefanos Nikolaidis, and Maja Matarić.
31
Figure 4.1: Examples of robots’ physical designs measured by abstraction level along three different design
metaphors.
4.1 Motivation
Human-robot interaction (HRI) research aims to develop robotic systems that can aid humans in a variety of different contexts. While advances in HRI have enabled robots to be more functionally performant
and socially competent than ever, few robots are present in everyday life. The lack of adoption is in part
due to concerns about user acceptance, which is linked to user expectations (Cha, Dragan, and Srinivasa
2015; Davis 1989; Kwon, Jung, and Knepper 2016). In socially interactive robots (Fong, Nourbakhsh, and
Dautenhahn 2003), these expectations are formed around two high-level concepts: the robot’s functional
capabilities (i.e., how well it can perform the task it is designed to do) and the robot’s social capabilities (i.e.,
how natural interactions with the agent are) (Cha, Dragan, and Srinivasa 2015; Deng, Mutlu, and Mataric
2019; Fong, Nourbakhsh, and Dautenhahn 2003; Honig and Oron-Gilad 2018). Setting expectations too low
32
results in robots that are expected to be useless while setting expectations too high leads to disappointment when robots fail to meet those expectations (Paepcke and Takayama 2010). Both scenarios inhibit
acceptance and adoption. While software changes can rapidly change functional and social capabilities to
align with user expectations, physical designs of robots have much longer development cycles. In order to
inform robot embodiment design, it is crucial to understand the effect of robots’ physical appearances on
their social and functional expectations.
The problem of understanding the implications of design on system use has long been a topic of interest
in human-computer interaction (HCI) (Zimmerman, Forlizzi, and Evenson 2007; Myers 1994). A powerful
tool for addressing this problem is the concept of design metaphors (Cila 2013; Carroll, Mack, and Kellogg
1988), which link novel designs with extant and familiar concepts or interactions. For example, a computer
desktop shares many similarities with a physical desktop: a user can sort, organize, and label files by
placing them in physically/visually co-located folders. By using these design metaphors, HCI practitioners
are able to accurately set user expectation about actual system functionality. While previous work has
focused on functional system behavior, design metaphors have recently been applied to social chatbots,
indicating that selecting metaphors to align social expectations with true social capabilities drives user
acceptance of these systems (Khadpe et al. 2020).
However, robots are significantly different from computers because robots are physically embodied
(Deng, Mutlu, and Mataric 2019). While computers can also be treated as social actors (Nass, Steuer, and
Tauber 1994), physical embodiment increases the social presence of robots by affording several signaling
modalities that are not available to computers (e.g., gesture (Rifinski et al. 2020), gaze (Andrist, Mutlu, and
Tapus 2015), and proxemics (Mumm and Mutlu 2011)). Physical expressiveness introduces complexities
in user expectations, as specific aspects of a morphology designed for functional tasks may cause users
to also expect competence in social tasks–a robot with an arm that grasps objects may reasonably be expected to use that arm to gesture. This duality of use for both functional and social expectations has lead
33
to the exploration of robot embodiments as a sum of their low-level design features to understand how
expectations are formed in anthropomorphic robots (Phillips et al. 2018), zoomorphic robots (Löffler, Dörrenbächer, and Hassenzahl 2020), and rendered robot faces (Kalegina et al. 2018). We argue that design
metaphors, as illustrated in Figure 4.1, offer a broader and more holistic view of robot embodiment that
can inform socially interactive robot design in complementary ways to component-based approaches. Particularly, robot designers and HRI practitioners can use design metaphors to quickly find similar robots to
understand how people expect to interact with novel robots. Comparing robots based on design metaphors
is much easier than comparing hundreds of low-level features.
In this paper, we introduce design metaphors as the conceptual tool for addressing the problem of understanding user expectations of robots based on their physical designs. We evaluate this tool by collecting
a dataset of 165 extant robot designs and exploring four core research questions related to embodiment:
(RQ1.) How can we crowd-source design metaphors to describe how potential end-users
conceptualize socially interactive robots?
(RQ2.) To what extent does a robot’s embodiment establish social expectations in relation
to its identity and social characteristics (e.g., role, likeability, and social perceptions)
and how are these expectations moderated by design metaphors?
(RQ3.) To what extent does a robot’s embodiment establish functional expectations in relation to its capabilities and expected use cases and how are these expectations
moderated by design metaphors?
Because of the interplay of social and functional expectations identified in prior work (Cha, Dragan, and
Srinivasa 2015; Honig and Oron-Gilad 2018; Wang and Krumhuber 2018), we also aim to understand:
34
(RQ4.) How are social and functional expectations related in socially interactive robots and
what does this imply for the design of socially interactive robots?
In addressing these questions, this work contributes the Metaphors for Understanding Functional and
Social Anticipated Affordances (MUFaSAA) dataset, an open-source collection of 165 robot embodiments
and results of three crowd-sourced studies that provide insights toward the effect of robot design on user
expectations of robot capabilities. The collected dataset and interactive data visualizations to explore the
dataset are made publicly available at interaction-lab.github.io/robot-metaphors/.
4.2 Inspiration: Design Metaphors
Design metaphors concisely describe complex ideas by associating unfamiliar objects with familiar objects
that have similar characteristics. Design metaphors are extensively studied in human-computer interaction
(HCI) as a way to help users develop mental models of the system they are interacting with in order to
facilitate interaction (Voida, Mynatt, and Edwards 2008; Jung et al. 2017; Khadpe et al. 2020; Kim and Maher
2020). For example, HCI research shows that describing a chatbot with different design metaphors shaped
user perceptions of the chatbot’s warmth and competence, thereby affecting both the users’ pre-interaction
intention to use the system and their subsequent intention to adopt the system post-interaction (Khadpe
et al. 2020).
The notion of design metaphors has also been recently applied to formalizing general design processes
for socially interactive robots (Deng, Mutlu, and Matarić 2018). Deng et al. (Deng, Mutlu, and Mataric
2019) provide a comprehensive review of HRI studies through the lens of design metaphors of the robot
embodiments used and provide a design-metaphor based analysis of the relationships between different
user studies and their outcomes. We apply this framework to explore how design metaphors shape the
formulation of social and functional expectations of robots, aiming to enable HRI practitioners to contextualize their study findings relative to user expectations resulting from a robot’s design.
35
4.3 Data Collection
(a) Composite image for Aeolus. (b) Composite image for TJBot.
Figure 4.2: Example composite images from the MUFaSAA dataset.
Due to the immense cardinality of the design space of robots in widely varied contexts (i.e., drones,
autonomous vehicles, industrial robots, etc.), we limited the scope of our dataset to those that fit the definition of socially interactive robots as proposed by Fong et al. (Fong, Nourbakhsh, and Dautenhahn 2003).
Unlike Fong et al. (Fong, Nourbakhsh, and Dautenhahn 2003), we do not require high-level dialogue so
we can include non-humanoid embodiments. Thus, our dataset inclusion criteria were robots that had or
could be perceived as having all of the following capabilities:
1. The ability to perceive and express emotion.
2. The ability to learn or recognize other agents.
3. The ability to establish and maintain social relationships.
4. The ability to use natural cues for social interaction (e.g., gaze or gesture).
5. The ability to exhibit a distinctive personality or character.
6. The ability to learn social competencies.
Using those guidelines, we assembled a collection of 165 robots from the IEEE "ROBOTS: Your guide
to the world of robots" site (Spectrum 2018) and Google searches of "Social Robot", "Socially Interactive
36
Robot", "Socially Assistive Robot", "Robot Pet", and "Social Robot Animal". Google searches were performed
under several user profiles and in incognito modes to mitigate the effects of prior search histories and stored
user information. The data collection took place in June of 2020.
Each robot was represented with a composite image consisting of two high-resolution images, one of
a front view and one of a side view of the robot, to convey the 3D structure of the robots’ design. The
sense of scale was provided by including a common reference image: a 170 centimeter tall androgynous
silhouette for robots at/over 80 centimeters in height or a silhouette of a 18 centimeter tall human hand for
robots under 80 centimeters in height. The image backgrounds were solid white, to control for contextual
factors, cues, and influence. In addition, any objects that a robot was holding in the original image were
edited out. We prioritized the use of images of robots in neutral poses with neutral facial expressions (for
robots that had actuated faces). All composite images were created with identical aspect ratios and each
view of the robot robot took up 30-40% of the composite image by width. Two example composite images
from our dataset are shown in Figure 4.2.
For each of the 165 robots in the MUFaSAA dataset, we collected two forms of data: annotated data
and crowd-sourced data. These data are summarized in the following sections, and a more complete
description is available in the paper by Dennler et al. (Dennler, Ruan, et al. 2023).
4.3.1 Annotated MUFaSAA Data: Low-level Design Features
Similar to previous work in developing robot datasets (Kalegina et al. 2018; Phillips et al. 2018; Löffler, Dörrenbächer, and Hassenzahl 2020), we codified robot embodiments with a series of manually labeled features
derived from observed design patterns of the robots in the dataset and applicable features from previous
studies (Kalegina et al. 2018; Phillips et al. 2018; Löffler, Dörrenbächer, and Hassenzahl 2020; Trovato, Lucho, and Paredes 2018). In total, we labeled 43 binary or categorical variables related to present/absent
features, 4 ordinal variables related to feature counts, and 5 continuous variables. The continuous features
37
were directly reported by robot data sheets, design documentation, or through manufacturer websites (e.g.,
height, weight, etc.). The categorical and binary features were evaluated through images of the robot. To
address potential differences between observers of these manually defined features, two researchers independently coded all of the robots in the dataset. We calculated the interrater reliability of the attributed
low-level design features that were not directly reported. The full set of coded features, descriptions, and
interrater reliability are provided in Table 4.1 and Table 4.2.
Table 4.1: A table of the binary and ordinal robot descriptors that were developed through inspection of
the robot designs, and user descriptions. For each feature, we provide a description of what the feature
means, and the measure of Cronbach’s alpha that we obtained between two raters of the robotic systems.
Robot Feature Description Cronbach’s α
Anthropomorphic Embodiment?
Presence of human-like features (e.g., is bipedal, has two arms, two legs, or
hair on the head).
.87
Zoomorphic Embodiment? Presence of animal-like features (e.g., a tail, wings, animal-like ears) 1.00
Mechanical Embodiment? Presence of visible mechanical parts (e.g., exposed wires, wheels, or visible
motors).
.89
Dominant Classifaction One of {Anthropomorphic, Zoomorphic, Mechanical}, which describes the
overall form of embodiment.
.83
Number of Wheels The assumed number of wheels that the embodiment uses to move. .70
Number of Legs The number of appendages that can be used for locomotion. .88
Number of Arms The number of assumed appendages that could be used for gesturing and
grasping.
.95
Number of Eyes The number of round components that can be perceived as eyes. 1.00
Mobile? Can physically move between points in space. .89
Does it ride on something? Presence of a platform that the robot appears to rest on top of. .86
Drivetrain Skirt? Indicates that the wheels and motors were contained within a skirt-like shape
that smoothly connects with the rest of the embodiment.
.79
Treads? Presence of treads as a means of locomotion. 1.00
Continued on next page
38
Table 4.1 – Continued from previous page
Robot Feature Description Cronbach’s α
Spherical Head? Presence of a head that appears to be a near-perfect sphere. .92
Box Head? Indicates that the head is approximately box-shaped (but not just a standalone
screen).
.87
Tablet Head? Indicates that the head consists of a single screen (e.g., a phone, tablet, etc.) 1.00
Human Head? Indicates that the head is human-like in appearance and has a skin-like quality. 1.00
Wearing a Helmet? Indicates that the robot appears to be wearing a helmet or face shield. .61
Antennae? Presence of one or more antenna-like structures on the head 1.00
Hair Follicles? Presence of many separate hair-like protrusions from the head in a distinct
region that represents hair.
.87
Mechanical Hair? Presence of mechanical structure on the head that can be interpreted as a hair
style.
1.00
Ears? Presence of shapes or mechanisms that resemble ears. .81
Screen Face? Presence of a screen near the top of the robot that displays at least one facial
feature.
.94
Static Face? Presence of physical facial features that are not physically actuated. .78
Mechanical Face? Presence of a physical facial features that contains components that are physically actuated.
.77
Mouth? Presence of a shape or mechanism that resembles a mouth. .89
Nose? Presence of a shape or mechanism that resembles a nose. .83
Eyebrows? Presence of shapes or mechanisms that resemble eyebrows. 1.00
Blush? Presence of a shape, mechanism, or coloring that resembles rosy cheeks. .72
Eyelids? Presence of a shape or mechanism that resembles eyelids .72
Pupils? Presence of a shape within a round shape perceived as eyes that represents a
pupil.
.92
Irises? Presence of a (colorful) shape within a round shape perceived as eyes that
represents an iris, which contains a pupil.
.78
Continued on next page
39
Table 4.1 – Continued from previous page
Robot Feature Description Cronbach’s α
Eyelashes? Presence of hair-like protrusions from the eye that represent eyelashes. .89
Lips? Presence of shapes or mechanisms that resemble lips. .82
Mechanical Lips? Presence of physical tube-like structures that represent lips. 1.00
Low Waist-to-Hip Ratio? Indicates that the perceived waist width of the robot is much smaller than
(< 0.8 times) the perceived hip width.
.80
High Shoulder-to-Waist Ratio? Indicates that the perceived shoulder width is much larger than (> 1.25 times)
the perceived waist width.
.93
High Shoulder-Hip Ratio? Indicates that the perceived shoulder width is much larger than (> 1.25 times)
the perceived hip width.
.62
Screen On Chest? Presence of a display interface at a medium height on the embodiment. 1.00
Furry? Indicates that the robot’s embodiment is covered in multiple hair-like protrusions.
1.00
Matte Body? Indicates that the external sheen of the embodiment is not highly reflective. .94
Hard Exterior? Indicates that the robot’s exterior is constructed from hard materials (e.g., plastic, metal, etc.).
1.00
Skin-like Material? Indicates the presence of a skin-like, flexible, and non-furry material covering
any part of the embodiment.
1.00
Exposed Wires? Presence of visible string-like structures that are needed for power requirements of the embodiment.
.80
Jointed Limbs? Indicates that the limbs of the robot contain visible joints (i.e., not hidden under
fabrics or outer casings).
.79
Industry? Indicates that the robot was released for purchase by end-users. .95
Curvy Embodiment? Indicates that the embodiment is designed with organic-looking curves and the
embodiment is not obviously partitioned into simple shapes (e.g., rectangular
prisms or cylinders).
.73
Continued on next page
40
Table 4.1 – Continued from previous page
Robot Feature Description Cronbach’s α
Symmetric Embodiment? Indicates that the embodiment exhibits reflective symmetry across its sagittal
plane.
.79
41
Table 4.2: A table of the continuous feature descriptors taken from the robots’ websites.
Robot Feature Description
Height The total height of the robot in centimeters.
Weight The total mass of the robot in kilograms, or "UNK" if this information was
not available.
Year The year in which the robot was created or first written about publicly.
Country of Origin The country in which the robot was developed
Most Prominent Color The color that is used in most of the embodiment.
4.3.2 Crowd-Sourced MUFaSAA Data: High-level Expectations
Our dataset collected three types of information for all robots in the MUFaSAA dataset: Design Metaphors,
Social Expectations, and Functional Expectations. Due to the large amount of data we planned to collect, we partitioned the dataset collection into three separate studies to reduce the load on participants.
All studies were conducted on Amazon Mechanical Turk.
Study 1: Attributing Design Metaphors to Embodiments
The first study we conducted addresses RQ1: How can we crowd-source design metaphors to describe
how potential end-users conceptualize socially interactive robots? This first study collected user-reported
design metaphors to describe robots. Participants were paid US$1.00 per robot for which they provided
2-5 design metaphors. Participants viewed up to five robots that were presented in a randomized and
counter-balanced manner. Each response took about 3 minutes, and the whole survey took around 15
minutes.
To attribute design metaphors to each robot in the dataset, we developed three qualitative questions
to allow participants to freely associate familiar concepts with the designs of robots in our dataset. In
42
addition to specific metaphors, we asked users to explain their thought process by indicating what aspects
of the robot represented the metaphors they provided. We asked the following three questions:
1. Description of Robot: We provided an open-form response box with the prompt to describe the robot
to a friend using two to three sentences.
2. Related Design Metaphors: We provided an open-form response box to input at least two and up to
five specific persons, animals, plants, characters, or objects that the robot looks like.
3. Reasoning for Related Design Metaphors: We provided an open-form response box to describe why
the aforementioned design metaphors were chosen. This box was immediately to the right of the
previous response box.
Study 2: Social Expectations
The goal of the second study was to address RQ2: To what extent does a robot’s embodiment establish
social expectations of robots in relation to the robot’s identity and social characteristics and how are these
moderated by design metaphors? We measured the social attributes of robots that formed the expectation
of how a robot should socially behave.
The study followed a mixed design wherein each participant provided ratings for up to five robots in
the dataset. Participants that did not pass the attention checks ended the study early. The assignment of
robots was randomized and counter-balanced. Participants were paid US$0.20 per robot they rated, and
took a median of 1.5 minutes per robot for an expected maximum length of 7.5 minutes to complete the
survey.
To evaluate the social expectations of robot embodiment, we assembled a collection of questionnaires
from relevant areas of HRI to measure general social constructs that can to be applied to a wide variety of
robots. We collected quantitative evaluations of the following constructs:
43
1. RoSAS Scale: We used a modified version of the validated RoSAS scale (Carpinella et al. 2017) to assess
the constructs originally defined in RoSAS that were confirmed to be reliable. All items followed the
prompt "Indicate how closely the following words are associated with the robot" and were rated on
a 7-point Likert scale of "strongly disagree" to "strongly agree". The scale measured the following
constructs:
• Warmth is related to the perception that another agent may want to help or harm us.
• Competence is related to the perception that another agent has the ability to help or harm us.
• Discomfort is related to the awkwardness of a robot.
2. Robot Gender Expression: While gender is a complex social phenomenon, we measured perceived
gender expression as proposed by the Bem Sex-Role Inventory Scale (Bem 1981), using two axes–
masculinity and femininity–as 7-point Likert scales. This approach allowed for perceptions of androgyny and non-gendered robots within the two axes.
3. Social Role: The social role is a measure of the interaction dynamics between the person and robot in
an interaction (Rae, Takayama, and Mutlu 2013; Deng, Mutlu, and Mataric 2019). We used a 9-point
differential scale from Deng et al. (Deng, Mutlu, and Mataric 2019), where 1 labeled the robot as "a
subordinate", 5 labeled the robot as "a peer", and 9 labeled the robot as "a superior".
4. Identity Closeness: Identity closeness measures the degree of in-group identification of the person
with the robot (Tajfel 1974). We used a 9-point differential scale where 1 corresponded to the rater
viewing the robot as "not at all like me", and 9 corresponded to the rater identifying the robot as
"exactly like me". This scale has been shown to achieve high validity and reliability in related contexts
(Reysen et al. 2013).
44
5. Likeability: Likeability measures the general attitude toward a robot, and has been used in other
robot assessment studies (Kalegina et al. 2018; Mathur and Reichling 2016). It was assessed using a
9-point differential scale, where 1 indicated the rater "strongly dislikes" the robot, and 9 indicated
that the rater "strongly likes" the robot, adapted from the Godspeed Scale (Bartneck et al. 2009).
In addition to quantitative measures, we also employed qualitative evaluations of social perception.
In particular, we were interested in open-ended responses to what participants liked about the robot, in
order to glean participants’ thought processes behind their quantitative ratings. We asked for qualitative
evaluation of the following:
1. Reasoning for Likeability Rating: In addition to the likeability rating, we collected an optional openended response about the reasons for liking or disliking a robot.
Study 3: Functional Expectations
The goal of the third study was to address RQ3: To what extent does a robot’s embodiment establish functional expectations in relation to its capabilities and expected use cases and how are these moderated by design
metaphors? The third study measured the robots’ expected functional affordances and the attribution of
expected tasks to the different robot embodiments.
The study followed a mixed design where each participant provided ratings for up to five robots. Each
rating was paid US$0.50 and took approximately 2 minutes to provide, and the whole survey had an expected length of 10 minutes. The robots each participant saw were randomized and counter-balanced to
mitigate ordering effects.
To evaluate the functional expectations of robot embodiment, we assembled a new collection of questionnaires from relevant areas of HRI to measure general functional constructs that can to be applied to a
wide variety of robots. We collected quantitative evaluations of the following constructs:
45
1. EmCorp Measures: We used a modified version of the 7-point Likert EmCorp-Scale (Hoffmann, Bock,
and Rosenthal v.d. Pütten 2018) that has been validated in online survey contexts. We focused on the
constructs of Shared Perception and Interpretation, Tactile Interaction and Mobility, and Nonverbal
Expressiveness. The Corporeality construct was not studied because it represents how co-present
a robot is in the room with the observer, and the robots in the dataset are 2D images. All items
were rated on a scale from "strongly disagree" to "strongly agree". The scale measured the following
constructs.
• Shared Perception and Interpretation is a measure of a robot’s perceived perceptual capabilities,
such as vision and hearing.
• Tactile Interaction and Mobility is a measure of a robot’s perceived ability to move around and
manipulate objects in space.
• Non-verbal Expressiveness is a measure of a robot’s ability to use natural cues such as gestures
and facial expressions.
2. Design Ambiguity and Design Atypicality Measures: Design ambiguity and atypicality have been
linked to aversion toward different robot designs in prior work (Strait et al. 2017). In this work, we
defined ambiguity as the difficulty of placing a robot in a single category, and atypicality as a robot
having embodiment features not usually associated with the category it represents. We quantify
these measures with differential scales valued from 1 to 9.
3. Metaphor Abstraction Measures: The abstraction level of a metaphor provides a way to quantify how
abstractly or literally the robot embodiment follows the metaphor. We quantified these values as a 9-
point differential scale where 1 represented "highly abstract" interpretations of the design metaphor,
and 9 represented "highly literal" interpretations of the design metaphor.
In addition to the above quantitative measures, we collected the following qualitative measures:
46
Ath. Zoo. Mec.
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Rating
n.s.
Perceived
Warmth
Ath. Zoo. Mec.
n.s.
Perceived
Competence
Ath. Zoo. Mec.
Perceived
Discomfort
Anthropomorphic Embodiments (Ath.) Zoomorphic Embodiments (Zoo.) Mechanical Embodiments (Mec.)
Figure 4.3: Differences in means for anthropomorphic, zoomorphic, and mechanical embodiments for social constructs of embodiment. All differences are significant with p < 0.001 unless marked otherwise.
Error bars represent 95% CI of means.
1. Task Descriptions: We required participants to report two to five kinds of tasks each robot would be
appropriate for, using open-ended responses.
4.4 Results
4.4.1 Participants
For all studies in this chapter, we found several answers where participants failed attention checks or
ended the survey early. We excluded these responses from anaylsis, and found that this did not affect the
distribution of responses for any of the the three studies.
A total of 382 participants took part in the metaphor attribution study (Study 1). A total of 803 participants took part in the social expectation study (Study 2). A total of 805 participants took part in the
functional expectation study (Study 3).
4.4.2 Social Expectations by Metaphor Type
For the robots in our dataset, the different categorizations of robot types had significant effects on participants’ broad social expectations with respect to the RoSAS Scale, as shown in Figure 4.3. For warmth,
the main effect of group type was significant, Welch’s F(2, 1382.94) = 9.28, p < .001, η
2
p = .005. Post hoc
analysis revealed that the mean Warmth of anthropomorphic embodiments (M=-0.58) was significantly
47
higher than the mean warmth of mechanical embodiments (M=-.83), p = .001, η
2 = .007, and the mean
Warmth of zoomorphic embodiments (M=-.65) was significantly higher than that of mechanical embodiments, p = .003, η
2 = .004.
The main effect of group type for competence was also significant with Welch’s F(2, 1353.95) = 24.09,
p < .001, η
2
p = .016. The difference between perceived competence in anthropomorphic embodiments
(M=.86) was significantly higher than the mean competence of zoomorphic embodiments (M=.50), p =
.001, η
2 = .022, and the mean competence of mechanical embodiments (M=.90) was significantly higher
than the mean competence of zoomorphic embodiments, p = .001, η
2 = .026.
Significant differences in discomfort were also observed across robot types, with Welch’s F(2, 1399.77)
= 16.13, p < .001, η
2
p = .010. Zoomorphic embodiments (M=-.73) were rated significantly lower in discomfort than mechanical embodiments (M=-.57), p = .04, η
2 = .003, followed by anthropomorphic embodiments (M=-.31), p = .001, η
2 = .007, and zoomorphic embodiments were rated significantly lower in
discomfort than anthropomorphic embodiments, p = .001, η
2 = .042.
4.4.3 Functional Expectation by Metaphor Type
The different categorizations of metaphor type had clear effects on how participants expected a robot to
perceive and interpret the world, with Welch’s F(2, 1297.52) = 94.36, p < .001, η
2
p = .053. Post hoc analysis revealed significance between all pairwise comparisons with p < .001. Zoomorphic embodiments
were perceived as having the lowest perceptual capabilities (M=-.14), followed by mechanical embodiments (M=.26), η
2 = .018, and then followed by anthropomorphic embodiments (M=.95), η
2 = .046.
Therefore, anthropomorphic embodiments had a much larger difference in perceived perceptual abilities
than zoomorphic embodiments, η
2 = .118.
Tactile interaction and mobility also showed differences in metaphor types, with Welch’s F(2, 1209.25)
= 27.81, p < .001, η
2
p = .019. All pairwise comparisons were significant in the post hoc analysis, with
48
Ath. Zoo. Mec.
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Rating
Perception
and Interpretation
Ath. Zoo. Mec.
Tactile Interaction
and Mobility
Ath. Zoo. Mec.
n.s.
Non-verbal
Expressiveness
Anthropomorphic Embodiments (Ath.) Zoomorphic Embodiments (Zoo.) Mechanical Embodiments (Mec.)
Figure 4.4: Differences in means for anthropomorphic, zoomorphic, and mechanical embodiments for functional constructs of embodiment. All differences are significant with p < 0.001, unless marked otherwise.
Error bars represent 95% CI of means.
p < .001. Zoomorphic embodiments were perceived as having the lowest ability to manipulate objects
in the world (M=.84), followed by mechanical embodiments (M=-.58), η
2 = .007, then followed by anthropomorphic embodiments (M=-.22), η
2 = .013. Zoomorphic embodiments therefore had much lower
perceived tactile abilities than anthropomorphic embodiments, η
2 = .037.
The different forms of embodiment showed different expectations for non-verbal communication, with
Welch’s F(2, 1239.08) = 49.65, p < .001, η
2
p = .032. Zoomorphic embodiments (M=-1.06) were perceived as
less capable of communicating non-verbally than anthropomorphic embodiments (M=-.33), p = .001, η
2 =
.055. Mechanical embodiments (M=-.91) were also viewed as having lower non-verbal communicative
abilities than zoomorphic embodiments, p = .001, η
2 = .033.
4.4.4 Social and Functional Expectations Along Semantic Axes
We were interested in exploring whether design metaphors were semantically meaningful in terms of
user perceptions. While the semantic space of metaphors is difficult to describe, there are some locally
ordered areas. To examine the effects of social and functional perceptions, we selected three metaphors:
"a baby", "a toddler", and "a person". Because age is associated with competence (Khadpe et al. 2020) and
interpretation of the world, we expected that robots described with more mature metaphors would have
higher competence and perceptual capabilities.
49
Baby Toddler Person
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Rating
n.s.
Perceived
Competence
Baby Toddler Person
Perception
and Interpretation
Figure 4.5: Perceived competence and perceived perceptual ability by metaphors for different maturity
levels.
As expected, we found that a main effect was present on metaphor name and competence with Welch’s
F(2, 127.81) = 16.55, p < .001, η
2
p = .057. The perceived competence is lower for robots labeled with the
metaphor "a baby" (M=.08), followed by robots described with the metaphor "a toddler" (M=.78), p = .001,
η
2 = .09, and then followed by robots described with the metaphor "a person" (M=.96), p = .001, η
2
p =
.110.
There was an additional effect on the perceived perceptual abilities of robots with Welch’s F(2, 96.60) =
30.81, p = .001, η
2
p = .120. Robots described as babies were assumed to have lower expected perceptual capabilities (M=-.45) than robots described as toddlers (M=.67), p = .001, η
2
p = .118. Robots associated with
the toddler metaphor were, in turn, perceived as having lower perceptual abilities than robots described
as persons (M=1.23), p = .001, η
2
p = .110. Additionally, robots associated with the baby metaphor had
significantly lower perceived perceptual abilities than robots associated with a person metaphor, p = .001,
η
2
p = .212.
4.4.5 The Space of Robot Gender Expression
In addition to broad social characteristics, we were interested in identifying how robots may form a
social identity through their embodiment. We examined user-reported perceptions of gender as a form
of a robot’s identity. To examine the space of gender expression in robots (i.e., how masculinity and
50
Feminine Not
Associated
No
Association
Feminine
Associated
Masculine Not
Associated
No
Association
Masculine
Associated
1 11 7
15 4 0
9 1 0
(a) Anthropomorphic embodiments.
Feminine Not
Associated
No
Association
Feminine
Associated
Masculine Not
Associated
No
Association
Masculine
Associated
7 10 0
7 3 0
0 0 0
(b) Zoomorphic embodiments.
Feminine Not
Associated
No
Association
Feminine
Associated
Masculine Not
Associated
No
Association
Masculine
Associated
16 17 6
35 11 0
6 0 0
(c) Mechanical embodiments.
Figure 4.6: A visualization of the space of gender expression by robot metaphor type.
femininity are embodied (Anderson 2020)), we constructed the space according to the results of a twotailed Wilcoxon signed-rank test. For each robot, we independently determined if the robot’s average
ratings for femininity and masculinity were significantly above zero, significantly below zero, or the null
hypothesis that the value is zero could not be rejected. A value of zero corresponded to masculinity or
femininity being neither associated with the robot nor not associated with the robot. The cutoffs for the
robots were around average values of ±1, corresponding to "slightly agree" and "slightly disagree". The
results of this analysis are shown in Figure 4.6.
By separating across design metaphor classifications, we observed patterns in how robot gender expression was perceived. Anthropomorphic embodiments were more likely to be perceived as having a
significant association with either femininity or masculinity. Zoomorphic robots were unlikely to be associated with either masculinity or femininity. Mechanical embodiments were more likely to have no gender
expression association, but in some cases they were associated with either masculinity or femininity.
51
Companion
Customer
Service Educator Entertainer
Home
Assistant Informant Manufacturer Surveillant
Anthropomorphic
Zoomorphic
Mechanical
4 22 13 14 19 11 9 4
16 1 3 21 5 2 1 5
4 29 3 31 39 42 15 19
Figure 4.7: A heat map of the distribution of the top two tasks for robots in our dataset separated by their
metaphor type.
4.4.6 Expected Robot Use Cases
To evaluate the task expectations of the different robot embodiments in the dataset, we developed a coding
scheme based on an iterative axial coding approach (Strauss and Corbin 1998) and applied it to the participants’ free-response answers to the question regarding what task the robot appeared to be useful for. We
observed eight task-related codes and three intended population codes from the participants.
The companion context was characterized by tasks involving the robot acting socially to improve mood
or mental health over long periods. Examples of common tasks participants provided for this context were
robots that "provide warmth and comfort", "are an interactive friend for my child", and "being a conversation partner". Most commonly, zoomorphic robots were described as being appropriate for this task,
with task descriptions of 16 robots aligning with this category. This finding aligns with the zoomorphic
robots’ tendency to be perceived as comforting and warm, a key component of these tasks where functional
expectation are not as important.
Robots ascribed to customer service contexts were defined as directly interacting with people in public
places such as stores, restaurants, or hotels. Example tasks were robots that function as a "greeter or
a receptionist", "a waiter" and "a museum guide". Both anthropomorphic embodiments (22 robots) and
52
mechanical embodiments (29 robots) were described as being useful for customer service-type tasks. This
aligns with the high expected functionalities of these embodiments to perform the services those tasks
require.
Educator tasks were defined to involve knowledge transfer from or through the robot to a person
interacting with the robot. Tasks fitting this category involved robots that could be used "in language
education", "to interact with students in class", and to provide "light educational lessons like spelling or
math". Interestingly, a robot’s embodiment was often related to the topic that the robot was meant to teach.
For example, the baby-like robot Babyloid was described as a "a training baby for expecting mothers", and
the cat-like robot MarsCat saw suggested "to help educate about cats". Most commonly anthropomorphic
embodiments (13 robots) were assigned to tasks relating to the educator category. This is consistent with
the high perceived competence and functionality of anthropomorphic embodiments.
For robots that played the role of entertainers, expected tasks aligned with short-term entertainment
purposes. For example, robots in this category were expected to "play music", "be used like a toy", and
"tell jokes". This category was common across all types of metaphors, however each metaphor was described as entertaining in a specific way. Anthropomorphic metaphors were described as being used as
"a game-playing partner", zoomorphic metaphors were most often seen as functioning like "a pet that
doesn’t require attention when not in use", and mechanical metaphors fulfilled roles that are common in
other forms of technology such as "playing music".
Home assistant robots were described as being able to work within the household, performing chores
and other daily tasks, including "cleaning up after kids", "making coffee", and "carrying groceries". These
tasks are similar to the customer service task, but are distinct in that they occur in the home and consist of
repeated interaction with a few people. Similar to customer service tasks, both mechanical embodiments
(39 robots) and anthropomorphic embodiments (19 robots) were found to be well-suited for the home
assistant task.
53
Robots that act as informants were described with tasks that answer questions or otherwise provide
information. Common tasks in this category were robots that "verbally answer questions", "tell time", or
"report daily events like news or weather". Mechanical embodiments (42 robots) were most frequently
described as being useful for these impersonal and intellectual tasks, consistent with their perceived high
competence.
Manufacturer robots were described in contexts where they build or move objects, typically without
constant direct human interaction. These robots were expected to "carry heavy objects", "be a factory
worker", and "pack in a warehouse". 15 Mechanical embodiments and 9 anthropomorphic embodiments
were selected for tasks like these, primarily for their functional capabilities, as these tasks were perceived
to not require social interaction.
Robots that fell in the surveillant category were those that monitor behavior, and were typically expected to provide security in some way. These robots were expected to be similar to "security alarms",
"spy cameras", or "a sentry". Mechanical embodiments (19 robots) were most frequently attributed to this
task. Similar to informants, these types of tasks are impersonal but require high levels of competence and
perceptual capabilities, qualities that were attributed to mechanical embodiments.
4.4.7 Visually Exploring the Design Space
To enable other researchers and robot designers to readily benefit from our findings, we developed an intuitive open-source visualization of our findings. Specifically, we used the hand-crafted features developed
in Section 4.3.1 as descriptions of the physical attributes of the robots in our dataset. To learn a mapping without supervision from that high-dimensional feature space to 2D, we used t-Stochastic Neighbors
Embedding (Maaten and Hinton 2008) that preserves distances between points from high-D to 2D space.
Figure 4.8 demonstrates that robots mapped near each other share similar characteristics. We show evaluations of different robots with different color values in 2D space. Higher values are concentrated in different
54
(a) t-SNE plot of expected social role. (b) t-SNE plot of tactile interaction and mobility.
Figure 4.8: A t-SNE visualization of the design space of robot embodiments. Each point represents one
robot in the dataset. Brown represents high values and teal represents lower values of the measured
ratings. Here we show only the front view of robots; study participants viewed composite images that
included scaling information, as described in Section 4.3. The fully interactive version of this plot is located at interaction-lab.github.io/robot-metaphors/, where researchers, designers, and others
interested in these findings may hover over points to view robots and click on a specific robot to view its
social and functional expectations.
parts of the space, indicating differences in social and functional expectations of the robot embodiments.
This visualization technique can be used as a design tool to rapidly explore different robot embodiments
for a desired set of expectations related to specific tasks.
4.5 Summary
This chapter presented a methodology to understand the expectations suggest by robot embodiments
through the idea of design metaphors, establishing the initial expectations that a users have about an assistive robots’ possible actions. These expectations can be used by robot designers to personalize robot
designs to particular users or tasks. We showed the efficacy of using design metaphors holistically to
55
identify overall trends in anthropomorphic, zoomorphic, and mechanical robots. We also showed finegrained relationships of expectations in metaphors along the semantic spectrum of age. We released the
dataset collected in this work to the public, and described techniques that both robot designers and user
study practitioners can use to interact with this dataset. The next chapter explores techniques to allow
non-expert users to actively adapt generic robot embodiments to specific occupational contexts through
clothing design.
56
Chapter 5
Customizing the Appearance of Manufactured Robots
This chapter identifies techniques that robot designers can use to make modifications to a robot’s embodiment, namely through screen-based faces and clothing design. By modifying a robot’s appearance through
face and clothing design, a generically designed robot can be adapted to fit different tasks and environments. By establishing a robot’s identity through its appearance, users can infer its intended use case.
Skilled users may utilize their python programming and sewing experience as a way to make a generic
robot reflect their own values and aesthetic sensibilities. This chapter is summarized in Section 5.6. Return
to the Table of Contents to navigate other chapters.
This chapter is adapted from “Designing Robot Identity: The Role of Voice, Clothing, and Task on
Robot Gender Perception" (Dennler, Kian, et al. 2025), written in collaboration with Mina Kian, Stefanos
Nikolaidis and Maja Matarić and “PyLips: an Open-Source Python Package to Expand Participation in Embodied Interaction" (Dennler, Torrence, et al. 2024), written in collaboration with Evan Torrence, Uksang
Yoo, Stefanos Nikolaidis, and Maja Matarić.
57
5.1 Motivation
Robots are increasingly moving from being purely functional devices that work in isolated contexts to
taking on social roles that engage with, support, and interact with humans (Pandey and Gelin 2018; Specian
et al. 2021; Suguitan and Hoffman 2019). This shift of contexts introduces new design considerations about
how a robot’s social identity affects its use, in addition to its functional role (Deng, Mutlu, and Matarić
2018).
Several theoretical underpinnings support the idea that technology has unavoidable and salient social
implications. The paradigm of computers as social actors (Nass, Steuer, and Tauber 1994) has accumulated
a long-standing and in-depth body of work showing that computers that are deployed in social contexts
are viewed as social agents, regardless of whether or not they have social agency (Nass and Moon 2000). In
the case of robots, this effect is further reinforced by the robot’s physical embodiment (Wainer et al. 2006;
Deng, Mutlu, and Mataric 2019). Embodiment provides additional interaction modalities–such as gesture,
gaze, and movement–that reinforce social identity. Understanding how robots may use these additional
modalities to establish identity is crucial for robot deployments in human-facing tasks because ecological
use is affected by both user and robot identity (Esmaeilzadeh 2021; DeVito, Walker, and Birnholtz 2018;
Tapus, Ţăpuş, and Matarić 2008).
Beyond use considerations, identity affects how well people can complete collaborative tasks. Findings
from social psychology, economics, operations research, and marketing research show that social identity
strongly influences both subjective and objective interaction metrics (Tajfel and Turner 2004; White, Habib,
and Hardisty 2019; Hennessy and West 1999; Charness and Chen 2020). The underlying process that drives
changes in these metrics is the formation of different social groups, as described by Social Identity Theory
(Tajfel and Turner 2004; Stets and Burke 2000). Social Identity Theory describes how people in the same
social identity groups attribute more positive characteristics and take more favorable actions toward
other members of their social identity, regardless of whether the in-group member is a human or a robot
58
(Fraune 2020; Fraune, Šabanović, and Smith 2017; Häring, Kuchenbrandt, and André 2014; Kuchenbrandt,
Eyssel, et al. 2013; Sebo et al. 2020). Conversely, members of different social identity groups typically
behave more negatively toward each other (Davis, Love, and Fares 2019); this effect also applies to both
human-human and human-robot interaction (Sebo et al. 2020; Chang, White, et al. 2012; Fraune, Sherrin,
et al. 2019). For example, Fraune et al. (Fraune, Šabanović, and Smith 2017) explored the effect of Social
Identity Theory in a study with two mixed teams of two humans and two robots each competed in a priceis-right game. They found that participants subjectively rated the in-group robot as more cooperative
than out-group humans, and participants assigned more painful noise blasts to out-group humans than
to in-group robots. While group structures in game contexts are well-defined, groups formed in daily life
are more contextual. However, similar effects are well established in real-world human groups based on
identity traits such as gender (Charness and Chen 2020), political affiliation (Hart and Nisbet 2012), and
even dietary habits (Davis, Love, and Fares 2019). To understand how a robot’s identity affects use and
adoption, designers must accurately understand how the robot’s identity is perceived.
In this this work, we focus on a particular aspect of a robot’s identity: gender. Gender is one of the
most widespread modalities of social identity, and has been linked to several interaction differences in
computer interfaces (Bardzell 2010; Stumpf et al. 2020) as well as in robots (Kuchenbrandt, Häring, et al.
2014; Sandygulova and O’Hare 2018). Gender is a highly contextual and complicated form of identity,
however, previous works on gender perception in robotics have adopted simplistic frameworks to investigate gender that only consider a single sensory modality to modify the robot’s perceived gender (Steinhaeusser et al. 2021; Law, Malle, and Scheutz 2021; Crowelly et al. 2009; Raghunath, Sanchez, and Fitter
2022; Raghunath, Myers, et al. 2021). Those works assumed that single-modality effects can be linearly
combined to understand the impact that a robot’s gender has on interaction. In this work, we show that is
not the case. While some past research has considered multiple modalities to gender a robot, it generally
explored normative views of gender and designed the robot to be wholly masculine or wholly feminine
59
(Chita-Tegmark, Lohani, and Scheutz 2019; Kuchenbrandt, Häring, et al. 2014; Powers et al. 2005; Bryant,
Borenstein, and Howard 2020; Tay, Jung, and Park 2014; Eyssel and Hegel 2012). In contrast, research
in philosophy, feminism, queer theory, and Human-computer interaction (HCI) has found that gender is
highly nuanced and formed through a complex interaction of several modalities and social interactions
(Butler 2002). The current simplifications present in the robotics formulations of how robots are gendered
may lead to the inconsistencies present in the literature, such as finding the influence of robot gender
on a user’s interaction with the robot as significant in some cases (Chita-Tegmark, Lohani, and Scheutz
2019; Crowelly et al. 2009; Eyssel and Hegel 2012; Tay, Jung, and Park 2014; Kuchenbrandt, Häring, et al.
2014; Powers et al. 2005; Raghunath, Sanchez, and Fitter 2022), and having no effect in others (Law, Malle,
and Scheutz 2021; Steinhaeusser et al. 2021; Bryant, Borenstein, and Howard 2020; Robben, Fukuda, and
De Haas 2023; Raghunath, Myers, et al. 2021; Paetzel et al. 2016). By understanding how different sensory
modalities interact, we can more effectively interpret those inconsistencies and further the field’s understanding of those complex influences at a time when robots are actively being developed for use in human
daily lives.
To expand the understanding of how gender is attributed to robots in human-robot interaction (HRI)
studies from a queer and feminist perspective, we contribute the following: (1) a framework to evaluate
the perceived gender characteristics of a robot’s voice, (2) a design methodology to develop clothing for
robots, and (3) a quantitative evaluation showing that gender perception is not a linear combination of its
constituent parts.
We posit that the construction of a robot’s gender is an important part of the design process that
needs to be considered for each context. To address this challenge, we outline a design process and design
principles for eliciting a particular perception of the robot’s gender in robots through voice and appearance.
Based on those principles, we designed and conducted three user studies that explored users’ perceptions
of a robot’s voice and appearance, separately and then together, in two tasks with different social roles,
60
reinforcing some expected stereotypes and uncovering some novel insights. We found that the perception
of a robot’s gender can be modulated through the careful design and evaluation of both modalities and
that the perceived gender of a robot is influenced by the robot’s task. We present new results about the
construction of robot gender, and the relative influences of voice, appearance, and task.
5.2 Technical Implementation: PyLips, a Software Package for ScreenBased Faces
Based on insights from linguistics, insights from human anatomy, and prior work that describes important
facial features for a wide range of screen-based faces (Kalegina et al. 2018), we developed an abstractly
anthropomorphic face that is grounded in the literature on human facial affect and robot design.
5.2.1 Anatomy of the Face
The PyLips face is based on the anatomical structure of human faces, allowing us to apply AUs to allow
interpretable and principled expressions. Human faces have nearly 30 striated skeletal muscles that are
separated into two groups: muscles primarily for facial expression, and muscles primarily for mastication
(i.e., chewing) (Westbrook et al. 2024). Muscles for facial expression originate at different points on the
skull and insert into the skin, whereas muscles for mastication originate on the skull and attach to the
mandible. Both kinds of muscles have corresponding AUs and can be used to generate facial expressions
(Ekman and Friesen 1978).
Based on insights from human anatomy, we identified three key muscle attachment points for each
eyebrow, and six key muscle attachment points in the mouth. To create a face that moves in similar ways
to the human face, we represent these points as the control points for two Bézier curves, a common choice
for modeling mouths (Song et al. 2021); one curve for the top lip and one curve for the bottom lip. We
visualize the anatomically-inspired control points on the face in Figure 5.1.
61
Figure 5.1: Control points of the animated face. Each control point represents attachments of human
muscles. The control points are used to map action units to the screen-based face.
Each control point represents the muscle insertion site for multiple muscles, and therefore it can be
influenced by multiple AUs acting simultaneously. Because each muscle can only act in one dimension,
the movement at a given control point is modeled as the linear combination of the displacement vectors
for each AU associated with the muscles at that point. Each AU’s effect is represented by a displacement
vector and a magnitude that scales this vector.
The displacement vectors are based on the relative proportions of facial components, enabling this
approach to generalize across different face shapes and sizes. This anatomically-inspired framework encourages naturalistic facial movements. The interaction of multiple AUs at each control point mimics the
complex, coordinated actions of facial muscles in human expressions.
5.2.2 Synchronizing Mouth and Speech
In order to synchronize lip-synching and audio, we created preset mouth shapes, called visemes, based on
the IPA classifications. To achieve synchronized mouth movements and speech, we allow the user to pass
a series of visemes and their initiation times in the audio file. The face then plays the sequence of visemes
at the corresponding times while playing speech.
There are several approaches for extracting phonemes and visemes from audio (Li et al. 2020; HugginsDaines et al. 2006; Xu, Baevski, and Auli 2021), so we designed the interface for synchronizing mouth
62
Figure 5.2: Example animation of the face saying "PyLips". We extract phonemes from the generated audio
file, then map those phonemes to visemes based on the International Phonetic Alphabet and play the
synchronized visemes and audio to achieve realistic movements during speech.
movements and audio to be modular. Currently, we have implemented two methods for generating the
timed visemes and audio: (1) via Amazon Polly (Gay, Pepusch, and Nicholson 2024) for generating both
audio and visemes, and (2) via the user’s operating system text-to-speech through pyTTSx3 (Rao 2023) for
generating audio and Allosaurus (Li et al. 2020) for generating timed viseme information.
Inspired by the IPA and current animation practices–such as Rhubarb Lip Sync’s six to nine viseme
model (Wolf 2022)–we generated a sequence of six visemes for consonants and five visemes for vowels.
The six categories we defined for consonants are based on the location of articulation, from forward to
backward: bilabial, labio-dental, interdental, dental-alveolar, postalveolar, and velar-glottal. We used a
higher concentration of different visemes for sounds articulated at the front of the mouth as these are
more visibly distinct, whereas articulations further back in the mouth are visually more similar. We use
the following categories for vowel sounds: close-front, open-front, mid-central, close-back, and open-back.
We visualize how the pipeline aligns timed viseme information with facial expression in Figure 5.2. While
we manually selected these categories based on prior research, we allow the mapping for speech to visemes
to be flexible in the PyLips package. This allows new open-source techniques for generating timed viseme
information to be incorporated into the PyLips package as they are developed.
To interpolate between mouth shapes over time, we implemented a queue that stores AU goals and the
associated times they should be played at. As time progresses, the AU goals are popped from the queue,
and we use a slow-in slow-out interpolation technique to specify how the PyLips face should move to
63
reach these goals. Because there are limited control points and we render the face with svg objects, these
interpolations are calculated and executed in real time.
5.2.3 Server/Client Architecture
Inspired by the robotics community’s widespread adoption of distributed computing frameworks such as
ROS (Quigley et al. 2009) and ROS-2 (Macenski et al. 2022), we also designed PyLips to be usable on physically distributed devices. To achieve this, we developed a server/client architecture. One computer hosts a
web server using Flask (Grinberg 2018) that serves the face HTML and Javascript. Any number of clients
can connect over the network to the server to display faces. We allow each client to use a unique namespace, so commands sent from the PyLips interface control specific face instances. By allowing different
namespaces and devices to connect over a network, PyLips can be used to design and control multiple
faces. We present an example network configuration describing how multiple faces can be controlled in
Figure 5.3. Because PyLips is web-based, each face can be displayed either on the same device as the PyLips
server using local connections, or on a different device on the PyLips server network using the server’s IP
address.
5.3 Inspiration: Roland Barthes, Bernard Rudofsky, Feminism, Queer
Theory, and the Elements of Fashion
Roland Barthes’s The Fashion System.
Roland Barthes’s The Fashion System (Barthes 1990) draws parallels between the fashion people wear
and the languages people speak. Both concepts constructing meaning through a system of signs and
symbols. Barthes argues that clothing serves a purpose beyond merely covering wearers and sheltering
them from the environment; it is also an expression of the wearer’s identity and the social groups that
they belong to. The Fashion System breaks down fashion into three concepts: the functional garment, how
64
Figure 5.3: Example network configuration. One PyLips server can host several faces. The server and
clients can run on the same computer or different devices. The PyLips Python interface can send commands
to individual clients.
it is portrayed in media, and how it is discussed. Each of these concepts contributes to the overall cultural
message of clothing.
This framework posits that clothing serves to communicate ideas as well as perform a function. Just as
a specific garment forms an identity based on societal conventions and expectations in people, robots have
the potential to use the same conventions and expectations to create their own identity. This perspective is
crucial in recognizing how identity, including gender, is constructed in robots through a series of repeated
actions such as clothing choice.
Bernard Rudofsky’s Are Clothes Modern?.
Where Barthes explores the social impact of clothing, Bernard Rudofsky critiques the practical design
of clothing through art. In Are Clothes Modern? (Rudofsky 1947), Rudofsky questions the practicality,
functionality, and cultural significance of contemporary clothing. He argued that the focus on aesthetic
trends negatively impacts comfort, and explains how non-Western and historical clothing instead prioritized comfort. In this critique, he created a set of sculptures that re-imagine the human body that the
silhouettes of various clothing eras were designed for, this work is depicted in Figure 5.4.
65
Figure 5.4: Figures representing the bodies that clothes were designed for in four iconic fashion periods.
This figure is adapted from the exhibition “Are Clothes Modern" The Museum of Modern Art, November
28, 1944–March 4, 1945. New York. The Museum of Modern Art Archives, Photographic Archive. Photo:
Soichi Sunami
In this work, we are inspired by the ability of clothing to change the underlying perception of a robot’s
embodiment. These clothing designs and choices can modify the expectations that users place on a robot.
This has the potential to extend the work from Chapter 4 to the setting where a user study designer already
has an existing robot, but the robot’s expectations are not perfectly aligned with the task it is meant to
engage in.
Feminist and Queer Conceptualizations of Gender. Historically, researchers across many fields have
conflated two fundamentally distinct constructs: sex (e.g., male or female), a person’s biological category
related to their genetic makeup, and gender (e.g., man or woman), a person’s social category related to
their behaviors in society (West and Zimmerman 1987). Previous longstanding and problematic conceptualizations of gender described gender as immutable, binary, and physiological (Keyes 2018), where binary
refers to the presence of only two labels (man and woman), immutable refers to the inability to change
the label “man" or “woman" once it has been established, and physiological refers to gender being assigned
based on physically expressed characteristics of a person.
66
Modern feminist and queer researchers have identified that none of these previously adopted properties
of gender describe how societies actually perceive gender (Messerschmidt 2009). For example, trans people
are people whose gender does not align with their sex assigned at birth (Scheurman et al. 2020). The
existence of trans people directly conflicts with the idea of immutable gender. Non-binary people do not fit
neatly into the labels “man" or “woman" (McNabb 2017). The existence of non-binary people conflicts with
the idea of a binary gender. Intersex people are born with physiological characteristics that do not directly
match the criteria for either sex (Preves 2003), and ethnographic researchers have found that people make
determinations of gender with non-physiological cues, such as posture, dress, or vocal cues (Kessler and
McKenna 1985). Both of these ideas conflict with the notion of physiological gender. Previously held beliefs
of gender have excluded queer and trans people from research (Namaste 2000; Queerinai et al. 2023), and
have influenced the way that gender is perceived in robotics research. We highlight two important ideas
that were created by queer and feminist scholars to examine how these misconceptions have impacted
robotics research: feminist standpoint theory, and gender performativity. Precise language is critical for
discussing nuanced topics; thus we adopt the recommended terminology from the HCI gender guidelines
(Scheurman et al. 2020): “male" and “female" are only used to refer to sex, “men" and “women" are used to
refer to people, and “masculine" and “feminine" are used to refer to items that may be associated with a
gender but do not intrinsically have a gender.
Feminist Standpoint Theory. Feminist standpoint theory is an epistemology developed by feminists to
describe how the construction of knowledge is affected by power structures. Feminist standpoint theory has been used to develop several research agendas, across HCI (Bardzell 2010), HRI (Winkle, McMillan, et al. 2023), and the social sciences (Rayaprol 2016). This theory provides four guiding theses described in detail by Gurung (Gurung 2020), summarized here: (1) strong objectivity requires marginalized
perspectives–what is described as a fact must be agreed upon from people of multiple perspectives to be
true, (2) social context (i.e., a person’s standpoint) shapes and limits what can be learned–members of a
67
socially advantaged group may not be aware of the experiences of other marginalized groups and cannot
perform research that generates the knowledge held by these marginalized groups, (3) marginalized people
are acutely aware of their experience–many rules and regulations are created by the socially advantaged
members of society, so marginalized populations are more likely to be aware of how these rules and regulations oppose their own experience, and (4) power dynamics distort evidence–researchers that are not
part of a marginalized group may not be able to collect holistic data from a marginalized group due to
historical mistreatment of the group.
In this chapter, we investigate how a robot’s gender is constructed from the perspective of feminist
standpoint theory. First, we present the creation of a robot’s identity as a design problem. This allows
multiple perspectives to be included in the construction of a gendered robot, thereby providing an avenue
for marginalized identities to incorporate their knowledge into the development of robots. Second, we
incorporate queer and non-binary perspectives into that design process. Queer and non-binary people are
marginalized groups that possess knowledge on the construction of gender, yet are often underrepresented
in robotics and AI research (Korpan et al. 2024; Queerinai et al. 2023).
Gender Performativity. The third-wave feminist movement emphasized the idea of gender as being
socially constructed, rather than an objective fact. In particular, Butler reconceptualized gender as being
performative–i.e., defined by a sequence of acts that reinforce a particular identity (Salih 2007; Butler 2002;
West and Zimmerman 1987). This means that a person’s gender is created by repeatedly performing actions
that align with society’s perception of a gendered role. Concretely, a person is a woman because they
repeatedly perform feminine actions expected of a woman, not because they are born a woman. This
modern view of gender more closely aligns with the way people interpret gender and refutes the previous
perspective of a binary, immutable, and physiological gender.
Core to the conceptualization of a performative gender is the idea of choice. In robotics, designers
make the choices about a robot’s appearance and behaviors, yet users typically perceive a robot as being
68
Figure 5.5: Example variations on form. (left) Prototype with additional volume at top of robot body. (right)
Prototype with additional volume at bottom of robot body.
agentic (Jackson and Williams 2021). Therefore, choices made by designers may instead be attributed to
robots instead, establishing a robot’s identity beyond what the designer originally intended. In this work,
we investigate how design choices made by robot designers can be interpreted as intentional choices by
the robot to communicate a particular gender. This stands in contrast to the prior assumption that robots
are designed to be a particular gender and people assign robots genders from that design (Nomura 2017).
The Four Elements of Fashion.
The four elements of fashion design encapsulate the ways that designers construct and adhere to different aesthetics. By considering how these elements work together in a garment, designers can make
decisions on how the qualities of these elements either add or subtract from design goals.
Form. The form, or silhouette, is one of the first things that is apparent about a garment. The form of
a garment is described by the overall outline of the garment–similar to the shadow that would be cast by
the garment if a spotlight was directed at it. Two example forms are shown in Figure 5.5.
69
Character design also heavily uses form as a way to suggest social qualities of a character (Ekström
2013; Tillman 2012). For example, character designs utilizing circular shapes may evoke feelings of warmth
and friendliness. Designs utilizing square shapes evoke sturdiness and the impression of strength. Triangle
shapes are often used to represent sharpness–both physical sharpness and mental sharpness. Although
these shapes do not inherently equate to these qualities, these techniques are commonly used in media,
and so they shape the way that people expect to interact with objects in the real world as described by
Cultivation Theory (Shrum 2017).
Robots can leverage these patterns in form-identity association to match the expectations of the task
they are deployed in. For example, robots taking on customer service tasks may be expected to be more
friendly, and thus use more circular shapes in their form. Robots that are expected to be experts in a
particular area and provide advice or information may instead benefit from using more triangular shapes
in their design. Other works in HRI have identified that shape is an important factor in understanding
how gender is ascribed to robots (Bernotat, Eyssel, and Sachse 2021), which is another important factor in
understanding how identity is perceived through design.
Style Lines. Style lines refer to how the construction or pattern of the garment may emphasize the
directional attributes of a garment. Two examples of pattern suggesting horizontal and vertical style lines
are shown in Figure 5.6.
There are two important attributes of style lines: their quality and direction. The three qualities of
style lines are straight, curved, or jagged. Their direction is typically expressed as vertical, horizontal, or
diagonal. The combinations of quality and direction have different effects on the aesthetics of the garment.
Lines with a straight quality are typically used to imply rigidity and order. Curved lines, in contrast,
imply softness and fluidity. Jagged lines, like zig-zags, are often used to imply playfulness. Robots may
utilize these different qualities to reinforce personality traits that are aligned with their task. For example,
70
Figure 5.6: Example variations on style lines. (left) Prototype emphasizing horizontal lines. (right) Prototype emphasizing vertical lines.
a robot using curved lines may appear as more agreeable, and a robot with zigzag lines may be perceived
as more extroverted.
Lines in vertical directions elongate the design and communicate a sense of formality. Conversely,
horizontal lines are seen as more relaxed and instead emphasize width. Diagonal lines imply movement
and excitement. Robots may utilize these directions to change their perceived height, which is an attribute
that largely affects the social perception of socially interactive robots (Dennler, Ruan, et al. 2023). In
addition, modifying the lines present in a robot’s design can allow a robot to operate in contexts that have
different levels of formality.
Texture. Texture refers to the tactile qualities of a garment as well as how these tactile qualities are
reflected visually. We illustrate two different textures on the Blossom robot in Figure 5.7.
Texture is largely determined by the material properties of the fabric that is used to construct the
garment. In particular, a fabric’s texture is a function of its fiber content, yarn structure, fabric structure,
and finish of the fabric (Dasgupta n.d.). In addition, the same fabric can also provide different textures by
71
Figure 5.7: Example variations on texture. (left) Prototype exhibiting a texture through a textured material.
(right) Prototype exhibiting a texture through layering materials.
using fabric manipulation techniques such as gathering, pleating, embroidery, or shibori (Gong and Shin
2013).
Textures can take on several qualities related to their material and preparation. Smooth fabrics such
as silk or satin are associated with elegance, refinement, and comfort. Rough fabrics like denim, tweed, or
burlap may instead evoke a more rugged aesthetic. Fuzzy materials such as faux furs, angora, or fleece are
pleasant to touch and portray warmth.
Robots can utilize these different fabrics to elicit different expectations for interaction. For example,
they may employ spiky textures to warn users not to directly touch them, which can reduce the likelihood
of injury to the user or the robot. Robots that are meant to be companions, on the other hand, can be made
from a fuzzy material to encourage contact. Because robots do not have the same comfort requirements
as humans, they may also use materials for their clothes that are not traditionally used in fashion design,
such as silicone, plastic, or cardboard.
72
Figure 5.8: Example variations on color. (left) Prototype using a single color. (right) Prototype using a
patterned fabric.
Color. Color is also a quality that is closely tied to the material properties of the fabric. Color in fashion
design is typically viewed in terms of its hue, value, and intensity. Selecting colors that are harmonious in
the same outfit can be very complex and highly personal (Shamoi, Inoue, and Kawanaka 2020). We show
two examples of how color can be incorporated into a robot design by interacting with a sheer texture and
how color can be incorporated as part of a pattern in Figure 5.8.
The effect of color on the social expectations of a robot can be highly variable, especially depending
on the particular experiences of the person interacting with the robot. In general, colors are an effective
form of branding (Baxter, Ilicic, and Kulczynski 2018) by associating colors with particular designers–
e.g., Valentino Red (Porter 2022), Tiffany Blue (Taylor 2022), or Bottega Green (Solá-Santiago 2024). By
associating brands with particular identities, the social attributes of the brand as a whole are reflected in
the garment.
Similarly, robots can utilize color to signal group membership, similar to what has been employed
in HRI studies investigating how group membership impacts interaction (Fraune, Šabanović, and Smith
73
2017). Perceiving a group membership can more strongly set expectations for previously unseen robots if
users have interacted with other robots from the same group. This can reduce the time it takes for a user
to develop a mental model of the interaction that they will have with the robot, and it informs how the
robot’s behaviors should be designed to reflect the behaviors of other robots in the group.
5.4 Data Collection
We were interested in understanding how clothing affects perceptions of a robot, especially in the presence
of other modalities that establish identity, such as voice. To evaluate this, we designed a large-scale online
study. The online study aimed to follow a 3 × 3 × 2 design, where voice gender perception varied at three
levels: androgynous, feminine, and masculine, clothing perception varied at three levels: androgynous,
feminine, and masculine, and task social role varied at two levels: high and low. To create the levels to
vary in the final experiment, we performed two careful stimulus validation studies.
5.4.1 Experimental Validation: Voice
We performed an initial design study of the perceived genders of various voices. We created a set of
guiding design principles, created a set of six voices, and evaluated these voices in terms of our original
design principles.
Design Principles Based on the results of previous studies in voice perception, we developed the
following design principles (DPs) for the robot’s voice:
• DP1: Pitch Modulates Gender Perception The perceived gender of the voice should change as the
fundamental frequency of the voice changes. While several studies have linked voice pitch and perceived gender (Pernet and Belin 2012; Puts, Gaulin, and Verdolini 2006), there are other factors that
contribute to the assignment of gender in voices (Cartei and Reby 2013). To evaluate the interaction
of physical appearance and gender, the voice should be able to alter its perceived gender.
74
• DP2: Clarity and Realism are Consistent The voice should be similarly understandable and
realistic for all perceived genders and pitch modulations. Clarity and Realism have been linked to
user acceptance in digital assistants (Cambre, Colnago, et al. 2020), and should be held constant
across different perceived genders.
• DP3: Identity Follows Function The voice should be perceived as professional across different
pitches, as we intend to use this voice in professional contexts in the integrative video study. While
perceived gender is one salient feature of a voice, other important aspects of identity are also expressed through voice, such as age, personality, and geographic region (Cambre and Kulkarni 2019).
Design Choices
For our voice design study, we selected the state-of-the-art text-to-speech service from Amazon Web
Services called Polly (Gay, Pepusch, and Nicholson 2024) to generate synthetic robot voices. Polly has
three key benefits. First, it is highly intelligible to users, ensuring that social perceptions of the robot
are not tied to a user’s difficulty in comprehending the generated speech of the robot. Second, Polly
provides viseme information to automatically synchronize robot mouth movement to spoken words. This
establishes the embodiment that produces the speech, which is required for our interaction design. Finally,
Polly provides a variety of voice identities, enabling evaluation of multiple options for generating robot
speech. At the time of this study, Polly has six adult voices that spoke US English: Joanna, Joey, Kendra,
Kimberly, Matthew, and Salli. There were also two child voices, Justin and Ivy. Given that the robot is
meant to operate in professional settings, we deemed child-like voices to be inappropriate and selected the
six adult voices for evaluation.
To evaluate DP1, we modified the six existing adult voices to generate additional ones, by changing the
pitch by a random value between -3 and +3 semitones using the Python package PyRubberBand (McFee
2018). A semitone refers to the amount that a sound is shifted in the pitch domain. For example if our audio
was an instrument playing the note C, shifting by one semitone resulted in a C# and two semitones
75
resulted in a D. The range of -3 to +3 semitones ensures that the audio remains natural-sounding. We
additionally calculated the average fundamental frequency (f0) for all of the modified voices using the
librosa Python package (McFee, Raffel, et al. 2015) to obtain a meaningful quantity to compare sounds.
Study Description
We performed a within-subjects study of the perceived gender of the generated voices. Our goal was
to evaluate each voice in isolation to better understand how it interacts with other features of the robot. To
achieve this goal, each study participant in this study evaluated six voices. The voices were presented as
an audio file with no contextual clues that may bias participants’ perceptions of gender, i.e., voice names,
fundamental frequencies, and intended tasks were all unknown to the participants. We presented the voices
in a randomized and counter-balanced order, and each had a random pitch modification within the range of
-3 and +3 semitones. Following standard practices, we evaluated each voice by generating sentences that
had a neutral sentiment (Torre et al. 2023). We used sentences that described text-to-speech and lasted 12-
18 seconds, because previous works that evaluated perception of TTS voice recommend at least 10 seconds
of audio (Cambre, Colnago, et al. 2020). To evaluate the voices, participants rated the following 7-point
Likert items (based on Cambre et al. (Cambre, Colnago, et al. 2020)) ranging from “Strongly Disagree" to
“Strongly Agree":
1. The voice sounds feminine (feminine)
2. The voice sounds masculine (masculine)
3. The voice sounds like a real person (realism)
4. The voice is easy to understand (clarity)
Participants also provided answers to an open-ended question detailing their personal perception of each
voice by responding to the prompt “describe how the voice sounds to you in one to two short sentences".
76
We administered this study, approved under USC IRB #UP-18-00510, using Amazon Mechanical Turk.
The participants first filled out an informed consent form and entered their demographic information.
They were required to play the entire audio file for a given voice and answer all questions about it before
being able to move on to the next voice and moving backwards in the survey was not allowed. After
participants heard each audio file, they rated the Likert items described above and answered the freeresponse question. Participants then repeated the process for the remaining voices. After all voices were
rated and described, the study session ended. The study was deployed in August 2021. The survey took
approximately 5 minutes to complete and participants were compensated US $1.25. We used the following
inclusion criteria for participants recruited from MTurk: they were located in the United States, had an
approval rate of 99% or higher, and had performed at least 1,000 tasks previously.
Participants We recruited 65 participants through Amazon Mechanical Turk. All passed our inclusion
criteria of fully answering qualitative questions, and thus no responses were excluded from analysis. We
requested gender information through open-ended responses as recommended by HCI guidelines for collecting gender data. Open-ended responses reduce the negative experiences of participants who may not
align with the check boxes provided (Scheurman et al. 2020), and the coding process was tractable for this
number of participants. The open responses were manually coded. Participants’ ages ranged from 22 to
60 years, with a median age of 32. Participants self-identified as men (40), women (24), and non-binary (1).
Their reported ethnicities were Asian (4), Black or African American (2), Biracial (1), Hispanic (2), Native
American (1), and White (53). Six participants identified as part of the LGBTQ+ community.
Evaluation of Design Principles
We used a combination of quantitative and qualitative methods to analyze the design principles described in Section 5.4.1.
DP1: Pitch Modulates Gender Perception
To evaluate gender perception of voice, we analyzed participants’ responses of perceived masculinity and
77
80 100 120 140 160 180
Average f0 (Hz)
3
2
1
0
1
2
3
Perceived Femininity
f0=128Hz
f0=143Hz
Kendra
80 100 120 140 160 180
Average f0 (Hz)
Joey
80 100 120 140 160 180
Average f0 (Hz)
Matthew
80 100 120 140 160 180
Average f0 (Hz)
3
2
1
0
1
2
3
Perceived Femininity
f0=130Hz
Joanna
80 100 120 140 160 180
Average f0 (Hz)
f0=139Hz
Kimberly
80 100 120 140 160 180
Average f0 (Hz)
Salli
Figure 5.9: Perceived femininity of voices (-3 represents a masculine voice, 0 represents an ambiguously
gendered voice, and +3 represents a feminine voice) as a function of average fundamental frequency (f0)
of the utterance. Teal lines represent five-datapoint sliding averages, the shaded region denotes ± one
standard deviation, and the beige dots are individual responses.
femininity. The responses showed an extremely high intrarater reliability when the scores of the masculinity scale were flipped/negated (Cronbach’s α = .97). While it is possible that non-human vocalizations
can be perceived as not being gendered (e.g., chirps, beeps, or buzzes (Cha, Kim, et al. 2018)), we observed
that all voices in our study were perceived as gendered along an axis from masculinity to femininity. We
posit that this is because all voices spoke in a human language, and were thus highly anthropomorphic.
For this reason, we averaged the two items of perceived masculinity and perceived femininity into a single
construct, arbitrarily choosing positive values to be feminine and negative values to be masculine. In the
context of the voice study, we refer to this averaged construct as femininity.
Using this combined metric, we visualized the effect of fundamental frequency on the perceived gender
of the voices, as shown in Figure 5.9. Only one voice, Kendra, exhibited significantly different gender
perceptions at different frequencies. Through a visual data analysis process (Szafir and Szafir 2021), we
78
identified three regions (shown in Figure 5.9) that were relatively consistent in femininity ratings and
grouped the data into those ranges for analysis.
We performed a Welch ANOVA analysis to observe the effect of frequency group on perceived gender.
We found that there was a significant main effect of frequency range on gender F(2, 31.42) = 37.64,
p < .001, η2
p = .46. Using pairwise Games-Howell post-hoc tests revealed that the frequency range of
116Hz-128Hz (Mfemininity=-.97) was perceived as having significantly lower perceived femininity than
the frequency range 128Hz-143Hz (Mfemininity=.75), p = .003, η2 = .25 which was, in turn, lower in
perceived femininity than the frequency range 143Hz-164Hz (Mfemininity=2.27) p < .001, η2 = .65. These
results support DP1, and narrowed our design space to one voice. We then verified that the Kendra voice
fit the subsequent design principles for use in the integrative video study.
DP2: Clarity is Maintained Across Genders
To verify that the Kendra voice is both clear and realistic across the different perceived genders, we examined the realism and clarity items of our survey. For both items, we conducted Welch ANOVA analyses.
With realism as a dependent variable, we observed no significant change between fundamental frequency
ranges F(2, 33.579) = 0.117, p = .890. Similarly, with clarity as a dependent variable, we observed
no main effect across fundamental frequency ranges F(2, 32.326) = 2.41, p = .106. From these two
tests and our sample size, we find that there is no large disparity in clarity and realism between the range
frequencies, making the Kendra voice a good choice for the integrative study.
DP3: Identity Follows Function
We also qualitatively evaluated the aspects of the Kendra voice to see if it would be useful for the context
of our integrative study. We planned on selecting a voice that sounds professional, because the robot was
being presented as a medical professional or a hotel receptionist. We found that participants described the
voice as sounding educated and professional at different fundamental frequencies. For example:
79
Figure 5.10: Quori (unclothed) (Specian et al. 2021), the robot we selected to use for the clothing design
study and integrative video study.
“This sounds like a robot who was programmed to sound like an average American educated male." –P24,
f0 = 131Hz
“It sounds very formal and more business like but also still robotic." –P16, f0 = 140Hz
In addition to these responses, 25 of the 63 other participants similarly noted that the voice sounded
“robotic" when asked to describe the voice, despite the lack of indication that the voice would be used on
a robot. This provides further evidence to support the fit of the selected Kendra voice for the context for
which the voice would be used in the integrative study, satisfying our third design principle.
5.4.2 Experimental Validation: Appearance
We performed an experimental validation of the effect of appearance on a person’s perception of a robot’s
gender. To make the design of appearance more accessible, we were inspired by the work of Friedman
et al. (Friedman et al. 2021) that discussed how clothing can modify a robot’s appearance to establish a
robot’s identity. Clothing is affordable, can be created with low-cost supplies, and users can learn to sew
80
from tools and tutorials (Leake et al. 2023). To achieve this, we introduce a design methodology based on
the four elements of fashion design (FIT 2015), applied to robots. We selected the humanoid robot Quori
(Specian et al. 2021) (shown in Figure 5.10, because it was specifically designed to have a gender-neutral
embodiment. We then designed clothing for the two task contexts planned for the integrative video study:
a medical professional and a hotel receptionist, because prior work has shown that the appearance of a
robot should align with its intended task (Goetz, Kiesler, and Powers 2003).
Design Principles We set the following design principles (DPs) to craft the robot’s appearance for the
clothing design study:
• DP 1: Appearance Modulates Gender We aimed to design clothes that modulate the perceived
gender of the robot. While clothing can be worn by any gender, the social construction of gender
implies that the design of clothing is typically aligned with particular genders (Crane 2012). For
example, fashion designers tend to use curved lines in women’s clothing and sharp lines in men’s
clothing. (Palumbo, Ruta, and Bertamini 2015).
• DP2: Quality and Cost are Consistent Associations of different monetary values of clothes on the
robot could lead to different social perceptions of that robot (Kervenoael et al. 2020). The perceived
quality and value of the robot’s clothes should not make the robot appear more or less “premium",
nor should the clothes “cheapen" the robot.
• DP3: Clothing Suggests Function The clothes that a robot wears should resemble the clothes
worn by people in similar occupations/contexts.
Design Choices To adhere to our design principles, we made the following design choices in constructing
the robot’s clothes. We aimed to evaluate perceptions of robot genders in the United States, and thus our
design process reflects these Western and American patterns and gendered interpretations of clothing
81
Figure 5.11: Appearance modifications of the Quori robot. The first two images represent the feminine and
masculine versions of the robot clothing designed for the hotel receptionist task. The second two represent
the feminine and masculine clothing designed for the medical professional task.
Table 5.1: Clothing design results. We report the marginal means of femininity (µfeminine) and masculinity
(µmasculine) across the different task and clothing types. We also report the number of participants that
selected each clothing type as the most feminine (Nfeminine) or the most masculine (Nmasculine) when
presented with all three clothing options for each task. Values following ± represent standard error.
Task Hotel Receptionist Medical Professional
Clothing Type Feminine None Masculine Feminine None Masculine
µfeminine .93 ± .30 .74 ± .28 −.78 ± .23 .50 ± .26 .86 ± .34 −.81 ± .23
Nfeminine 40 51 2 50 41 2
µmasculine −.89 ± .30 −1.33 ± .24 .70 ± .17 −1.04 ± .26 −1.79 ± .27 .64 ± .19
Nmasculine 4 3 86 3 4 86
design. We designed a dress shirt, a lab coat, and a vest, each in a current American masculine and a
current American feminine style, as shown in Figure 5.11.
Silhouette. To align with DP1, we modulated the silhouette of the clothing to suggest different genders
by using quilt batting, inspired by drag queens on RuPaul’s Drag Race who use padding to alter their bodies
to appear more feminine (Darnell and Tabatabai 2017). Quilt batting offers a light-weight and cost-efficient
method to modulate the underlying embodiment of the robot since careful placement can have the effect
of augmenting the robot’s embodiment without impacting function. Batting has the additional property
of being highly compliant, contributing to safety in physical interaction scenarios.
For the masculine silhouette, we added several layers of batting to the shoulders and arms to give the
robot a larger look representative of findings of perceived robot masculinity in prior work (Trovato, Lucho,
82
and Paredes 2018). We added darts to the feminine style to reduce the width of the shoulders to give the
garment a less “boxy" shape, reflective of typical Western women’s clothes (McCunn 2016).
Lines. We modulated the perceived gender of the robot by changing the style lines of the clothes, with
feminine clothes containing more curved lines and masculine clothes containing more straight and angled
lines. We incorporated two main manipulations: style of accessory and shaping of the garment’s lapels
and collar.
For the accessory, we used a tie for the masculine robot, which established straight and angular lines
in the garment. For the feminine robot, we used a lavallière as the accessory. Because the lavaillère droops
due to gravity, the lines it establishes are naturally curved. These accessories were chosen because they
are typically used in formal Western contexts and are not associated with specific eras in Western fashion.
Furthermore, we altered the collar of the dress shirt and lapels of both the lab coats and vests. For the
masculine appearance, the collars and lapels were sewn into a sharp point. For the feminine appearance,
the collar and lapels were instead sewn with a curved line. For the vest, we dropped the opening of the
feminine version to create a more curved style line.
Color and Texture. To follow DP2, we used the same fabrics for both garments of each type, since
fabric contributes to a garment’s perceived price and quality (Swinker and Hines 2006). For the dress shirt,
we chose a grey taffeta fabric to reflect the natural sheen of the unclothed robot surfaces and a genderneutral color and texture to be worn underneath other clothes. We chose black silk for the neck accessories,
because is traditionally used for Western neck ties and bows. We used a white cotton blend fabric for the
lab coats, consistent common Western medical lab coats. The vest was made from a black stretch denim
fabric, giving a stiff appearance and structure. To control for quality, a single person (the first author)
constructed all the garments using the same methods and materials.
Face Design. We used PyLips to make modifications to the robot’s face. Drawing on studies in human facial perception, we created a feminine face by using large eyes, thin eyebrows, and red-tinted lips
83
(Mogilski and Welling 2018). In contrast, the masculine face had thicker eyebrows and smaller eyes with
grey-tinted lips. The neutral unclothed robot had a face that was perceptually exactly in the middle between the masculine and feminine face in terms of size of features and lip color.
Study Description
To evaluate the perceived gender of our clothing design, we performed an online user study via Amazon
Mechanical Turk that followed a mixed design. Each participant was presented with static images of the
robot to rate, similar to the images in Figure 5.11. As in the voice study, these images were presented
without context, i.e., the robot did not have a name or voice, and the intended context was not described
to the participants. The within-subjects factor of this study was the task that the robot was designed for:
the hotel receptionist and medical professional tasks, and the between subjects factor were the gendered
qualities of the clothing: masculine, feminine, or androgynous (the unclothed robot, as in Figure 5.10).
These conditions were randomized and counter-balanced. Participants in the clothing study filled out an
on-line questionnaire also administered through Amazon Mechanical Turk.
For each stimulus, participants responded to the following seven-point Likert items ranging from
“Strongly Disagree" to “Strongly Agree":
1. This robot seems masculine (masculine)
2. This robot seems feminine (feminine)
3. This robot seems expensive (cost)
4. This robot seems high-quality (quality)
After responding to these questions, participants were asked to respond to the open-response prompt
“What are one to two tasks you think that the robot would be capable of doing?". After these responses
were collected for the first task, participants were presented with all three clothing options for the first
task and asked to choose the most masculine option and the most feminine option. We also collected
84
open-ended responses to the prompt: “Briefly describe your reasoning for selecting that robot" after each
selection of the most masculine and feminine option for that task. After that, the participant followed the
same process for the other task and then ended the study session. The study was deployed in September
2021. The survey took approximately 5 minutes to complete and participants were compensated with US
$1.25. We used the following inclusion criteria for participants recruited from Amazon Mechanical Turk:
they were located in the United States, had an approval rate of 99% or higher, and had performed at least
1,000 tasks previously.
Participants
We recruited 100 participants for our clothing study, approved under USC IRB #UP-18-00510, through
Amazon Mechanical Turk. We eliminated the data of participants who failed our attention check, which
resulted in 93 valid data points. A chi-square test reveals that this did not affect the underlying distribution
of conditions, χ
2
(2, n = 93) = 2.77, p = .250. We used the same coding process for gender and ethnicity
as in the voice design study. The final set of participants consisted of 47 men, 45 women, and 1 non-binary
person. The ethnicities of the participants were: Asian (4), Black or African American (5), Hispanic (4),
Native American (2), 4 Multiracial (4), and White (71). Eight participants reported that they identified as
part of the LGBTQ+ community.
Study Results We used a mixed-methods approach to evaluate the design principles described in Section 5.4.2.
DP1: Appearance Modulates Gender
To evaluate the perceived gender of the robots, we investigated the effects of visual stimuli on the perceived masculinity and femininity. The correlation between masculinity and femininity was much lower
in this study than in the voice study (Cronbach’s α = .73), thus we analyzed the two items independently. We also considered the two tasks (medical professional and receptionist) independently. We found
a main effect of clothing condition on perceived femininity, F(2, 51.67) = 14.76, p < .001 for the medical
85
professional task as well as the receptionist task F(2, 58.09) = 13.62, p < .001, using Welch’s ANOVA
due to heteroscedasticity of variances. The effect of the clothing condition on masculinity was also significant for both the medical professional task F(2, 56.15) = 26.97, p < .001 and the receptionist task
F(2, 55.06) = 28.12, p < .001. The marginal means for femininity and masculinity ratings are shown
in Table 5.1. Post-hoc analysis using Tukey’s test revealed that the feminine medical professional design
was perceived as both more feminine, p = .003, and less masculine, p = .001, than the masculine medical professional design. Similarly, the feminine receptionist design was perceived as both more feminine,
p = .001, and less masculine, p = .001, than the masculine receptionist design.
Interestingly, there were no significant differences in femininity or masculinity of the unclothed “neutral" robot compared to the femininely-dressed robots in either the hotel receptionist task, p = .886, or
the medical professional task, p = .603. This was additionally reflected in the choice condition, where
participants were equally likely to choose the unclothed robot or the femininely dressed robot as being the
most feminine. When asked for their reasoning, many participants cited both the silhouette and style lines
of the robot. For example, participants indicated the the unclothed robot was the most feminine because
it was “slimmer" than the other robots, while other participants described the feminine clothes as “...the
least bulky". With respect to lines, participants noted that the unclothed robot “has curves compared to the
other two" whereas other participants mentioned that the feminine clothes were more feminine because
the robot had “...a rounded collar and a flouncy bow". This shows that DP1 was correctly implemented
with respect to the clothing designs, but calls into question whether the goal of an androgynous form was
achieved in the original Quori design.
DP2: Quality and Cost are Consistent
We performed Welch ANOVAs independently for the two different tasks with the clothing conditions as a
between-subjects variable. For the medical professional task, we observed no significant effect of clothing
on perceived quality, F(2,59.14)=.68, p = .370, or perceived cost, F(2,58.28)=1.09, p = .344. Similarly,
86
for the receptionist task, we found no significant effect of clothing on perceived quality, F(2, 56.47)=.46,
p = 0.634 or perceived cost, F(2,54.56)=1.01, p = 0.371. Given the size of this study, it is unlikely that
there is a large difference in the perceived quality and cost, supporting that design principle DP2 was
successfully implemented.
DP3: Clothing Suggests Function
We qualitatively evaluated that the clothes matched the function of the task by performing a deductive
thematic analysis based on the generated categories of robot jobs from previous works (Dennler, Ruan,
et al. 2023; Kalegina et al. 2018). Specifically, we were interested in evaluating the “medical" and “customer
service" categories, to reflect the two tasks we planned for the integrative study. For the medical professional task, we counted the responses that mentioned medical domains. Four of the 28 responses for the
feminine clothing condition and 8 of the 36 responses for the masculine clothing condition mentioned that
the robot could work in the medical field. For the unclothed condition, however, only 1 participant out
of 29 believed that the robot was capable of performing medical tasks. The hotel receptionist task was
characterized by working in a customer service domain; 13 of the 27 responses for the masculine clothing
and 16 of the 27 responses for the feminine clothing described the robot as doing customer service tasks.
Only 8 of the 39 responses for the unclothed condition described the same context. The higher proportion
of responses specific to the actual task indicates that DP3 was achieved.
Building on the findings from the two studies described so far, we next performed an online videobased integrative study to explore how task, voice, and appearance interact to form gender perceptions
and other social perceptions of a robot. To evaluate the impact of the task, we selected tasks of two varying
social roles because previous research and several ethical design frameworks highlight the importance of
power dynamics between the user and the robot (Winkle, McMillan, et al. 2023; Zhu, Wen, and Williams
2024). We selected the hotel receptionist task as a task where the robot has a lower social role than the
user, and we selected the medical professional task as a task where the robot has a higher social role than
87
Figure 5.12: Sample frames from the two tasks we selected: medical professional (left) and hotel receptionist
(right).
the user. These tasks have employed an approximately equal numbers of men and women (Campos-Soria,
Marchante-Mera, and Ropero-García 2011; Pelley and Carnes 2020), however, people also hold gendered
stereotypes about these tasks, such as that receptionists are more feminine and medical professionals are
more masculine (Bryant, Borenstein, and Howard 2020). Participants viewed a video of the robot as a
medical professional that provided instructions to a “patient" (an actor), and as a hotel receptionist that
received instructions from a “patron" (an actor).
5.4.3 Hypotheses
From prior research in human and robot gender perception, as well as our two studies described above,
we developed the following hypotheses regarding social perceptions of the robot as related to task and
gender:
• H1: Participants will ascribe different social attributes to the robot depending on the task the robot
performed in the video (medical professional vs. receptionist).
• H2: Participants will ascribe different social attributes to the robot depending on the the perceived gender
of the robot that is performing a specific task.
• H3: Participants will ascribe different social attributes to the robot when the robot’s gendered cues are
aligned compared to when the robot’s gendered cues are different.
88
Additionally, we developed two more hypotheses regarding differences in the perceived gender of the
robot, one regarding the robot’s perceived gender as it relates to its appearance and voice, and the other
regarding the effect of task-aligned clothing on the robot’s perceived gender in the absence of strong gender
cues (Friedman et al. 2021):
• H4: Participants will assign different genders to the robot depending on the voice (H4a), appearance
(H4b), and the interaction of voice and appearance (H4c).
• H5: Participants will assign different genders to the robot depending on the appearance (H5a), task
(H5b), and their interaction (H5c) when the robot has an ambiguously-gendered voice.
5.4.4 Study Description
Participants filled out an online questionnaire deployed on Amazon Mechanical Turk. They began by
reporting their demographic information and then filled out the Negative Attitude toward Robots Scale
(NARS) (Nomura et al. 2006). The participants were then shown a 90-second video (Figure 5.12) of a
robot performing one of the two tasks with a randomly assigned voice and appearance from the validated
options described above.The choice of voice and appearance were randomized and counter-balanced in
a 3x3 matrix. In the medical professional task, the video showed the robot performing routine medical
tests on the actor: asking the actor to show their arm to collect pulse information and make different
faces to test cranial nerve function. In the hotel receptionist task, the robot was directed by the actor to
make changes to a reservation. Each videos was approximately 90 seconds long. After the participants
viewed a video, they rated the perceived social role of the robot in the video on a Likert scale from 1 to
10. The participants then rated the robot on the Robotic Social Attributes Scale (RoSAS) (Carpinella et al.
2017). We then asked participants to use a short open-response dialogue box to describe what the robot
was doing, what they would name the robot, and what other tasks they expected the robot to do. The
participants were then shown a 90-second video of the robot performing the other task and answered the
89
Hotel
Receptionist
Task
Medical
Assistant
Task
2
4
6
8
10
Perceived Social Role (a) Perceived social role
of the two tasks.
Feminine
Clothing
Unclothed Masculine
Clothing
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Masculinity Rating
Medical Professional Task
Feminine
Clothing
Unclothed Masculine
Clothing
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Masculinity Rating
Hotel Receptionist Task
Feminine
Voice
Androgynous
Voice
Masculine
Voice
(b) Masculinity ratings
Feminine
Clothing
Unclothed Masculine
Clothing
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Femininity Rating
Medical Professional Task
Feminine
Clothing
Unclothed Masculine
Clothing
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Femininity Rating
Hotel Receptionist Task
Feminine
Voice
Androgynous
Voice
Masculine
Voice
(c) Femininity ratings
Figure 5.13: Summary of stimuli and results from the integration study. We found that the manipulation of
the social role was significantly different between the two conditions (p < .001), with the expected social
role of the receptionist task being lower than the social role of the medical professional task (a). Participants
saw videos of the robot performing a task with a human. We also found that voice and appearance affected
the perception of masculinity (b) and femininity (c) in both conditions.
same questions. The study was deployed in September 2021. The survey took approximately 10 minutes
to complete and participants were compensated with US $2.50. We used the following inclusion criteria
for participants recruited from Amazon Mechanical Turk: they were located in the United States, had an
approval rate of 99% or higher, and had performed at least 1,000 tasks previously.
90
5.5 Results
5.5.1 Participants
We recruited 360 participants via Amazon Mechanical Turk. After removing responses that failed the
attention check, we were left with 273 valid responses. A chi-square test revealed that this exclusion did
not significantly alter our random assignment of voice and appearance, χ
2
(4, n = 273) = 2.01, p = .733.
The final study population consisted of 149 men, 118 women, 3 non-binary people. The ethnicities of the
participants were: 20 Asian (20), Black or African American (16), Hispanic (8), Multiracial (7), and White
(217). In addition, 38 participants reported that they identified as part of the LGBTQ+ community.
We observed acceptable internal consistencies between all constructs measured by the NARS (αfuture influence =
.87, αrelational attitudes = .80, αactual interaction = .78) and RoSAS (αwarmth = .79, αcompetence = .82, αdiscomfort =
.78) scales.
5.5.2 Manipulation Check
To evaluate that our selection of tasks was effective, we performed a manipulation check of the reported
social roles of the participants for each task. We applied a non-parametric Wilcoxon Signed-Rank test and
found that the medical professional task had a significantly higher social role than the hotel receptionist
task, W=2282.5, p < .001, as shown in Figure 5.13a. This indicates that our manipulation of the tasks’
social role was successful.
We evaluated our hypotheses related to both social perception of the robot and the perceived gender
of the robot.
5.5.3 Social Perception
H1: Tasks Affect Social Perception: Our first hypothesis postulated that different tasks affect the social
perception of the robot. We evaluated this hypothesis using a non-parametric Wilcoxon signed-rank test
91
between task conditions. We found that the medical professional was rated as significantly less warm,
W=7250, p < .001, less competent, W=3701, p < .001, and more discomforting, W=5949.5, p < .001 than
the hotel receptionist robot. Essentially, the robot with the higher social role was rated as less socially
favorable. Therefore, H1 is supported.
H2: Gender Perception Affects Social Perception: The second hypothesis postulated differences in
the RoSAS constructs of Warmth, Competence, and Discomfort. Naturally, these social evaluations are
also dependent on participants’ overall preconceptions of robots, as measured by the NARS constructs
of Future Influence, Relational Attitudes, and Actual Interaction. Thus we analyzed the effect of robot
gender on Warmth, Competence, and Discomfort using a two-way ANCOVA for each task with voice and
appearance as between-subject variables and the subscales of Future Influence, Relational Attitudes, and
Actual Interaction as covariates.
In the medical professional task, there were no main or interaction effects of voice and appearance
on the RoSAS constructs of Warmth, Competence, or Discomfort. The NARS subscale Future Influence
was significant for all three RoSAS constructs: Warmth (F(1, 261) = 26.55, p < .001), Competence
(F(1, 261) = 11.32, p = .001), and Discomfort (F(1,261)=16.11, p < .001). Additionally, the Actual Interaction subscale of NARS was a significant predictor of all three RoSAS constructs: Warmth (F(1, 261) =
88.92, p < .001), Competence (F(1, 261) = 10.61, p = .001), and Discomfort (F(1, 261) = 7.25,
p = .008). The Relational Attitudes were not significant for any RoSAS constructs.
Within the receptionist task, we found no significant main or interaction effects of voice and appearance on Warmth, Competence, or Discomfort. We did find, however, that the covariates were significant
predictors of the RoSAS constructs. Future Influence was a significant predictor of Warmth (F(1, 261) =
16.88, p < .001), Competence (F(1, 261) = 28.97, p < .001), and Discomfort (F(1,261)=48.06, p < .001).
Additionally, the Actual Interaction subscale of NARS was a significant predictor of Warmth (F(1, 261) =
92
72.663, p < .001) and Competence (F(1, 261) = 6.61, p = .011), but not Discomfort. As in the medical
professional task, Relational Attitudes was not a significant predictor for any RoSAS construct. Therefore,
H2 was not supported.
H3: Alignment of Cues Affects Social Perception: Based on previous work that found that unaligned
aesthetic cues lead to ambiguity (Paetzel et al. 2016), we examined how the alignment of gendered cues
may affect user social perceptions. We considered masculine voice with masculine appearance and feminine voice with feminine appearance as aligned cues, and masculine voice with feminine appearance and
feminine voice with masculine appearance as unaligned, as in previous work (Mitchell et al. 2011), noting
that while this reflects normative views of gender, such normative views are instrumental in understanding
how stereotypes may affect the construction of gender in robots. We did not consider androgynous cues
as they are aligned with both masculine and feminine cues. We analyzed the effect of robot gender on the
dependent variables Warmth, Competence, and Discomfort using a one-way ANCOVA for each task. The
between-subjects variable was the alignment of gendered cues and the covariate was the NARS subscales.
In the medical professional task, there were no main or interaction effects of alignment on the RoSAS
constructs of Warmth, Competence, or Discomfort. The NARS subscale Future Influence was significant
for all three RoSAS constructs: Warmth (F(1, 107) = 8.65, p = .004), Competence (F(1, 107) = 7.60,
p = .007), and Discomfort (F(1, 107) = 11.23, p = .001). Additionally, the Actual Interaction subscale of
NARS was significant for only Warmth (F(1, 261) = 88.92, p < .001). The Relational Attitudes were not
significant for any RoSAS constructs.
In the receptionist task, there were again no main or interaction effects of gender cue alignment
on the RoSAS constructs of Warmth, Competence, or Discomfort. The NARS subscale Future Influence
was significant for all three RoSAS constructs: Warmth (F(1, 122) = 14.11, p < .001), Competence
(F(1, 122) = 8.70, p = .004), and Discomfort (F(1, 122) = 11.25, p = .001). Additionally, the Actual
93
Interaction subscale of NARS for only Warmth (F(1, 122) = 32.13, p < .001). The Relational Attitudes
were not significant for any RoSAS constructs. Therefore, H3 was not supported.
(a) Femininity ratings by robot clothing for the
androgynous voice robot.
(b) Masculinity ratings by robot clothing for
the androgynous voice robot.
Figure 5.14: Summary of the effect of clothing the robot in the two tasks, for robots with androgynous
voices. We found that clothing had a significant effect on gendering the robot, depending on the task. In
tasks with higher social roles, clothing makes the robot more masculine and less feminine, and in the lower
social role task, clothing makes the robot less masculine.
5.5.4 Gendering Robots
H4: Voice and Appearance Affect Perceived Gender: Our fourth hypothesis posited that voice and
appearance affect the robot’s perceived gender. We analyzed this independently for each task with a twoway ANOVA. We performed this separately for both ratings of femininity and masculinity. However,
intrarater reliability for these two items was low (Cronbach’s α = .53), indicating that these scales are not
likely to measure the same construct.
For the medical professional task, we observed a significant main effect of voice on femininity F(2, 264) =
43.71, p < .001 with a large effect size (η
2
p = .249). We also observed a main effect of appearance
F(2, 264) = 5.61, p = .004, however the interaction was not significant. For masculinity there was a
main effect of voice F(2, 264) = 21.60, p < .001 with a modest effect size (η
2
p = .141). The main effect of
appearance and the interaction of appearance and voice were not significant for ratings of masculinity in
the medical professional condition.
94
In the hotel receptionist task, we also observed a significant main effect of voice on femininity, F(2, 264) =
71.85, p < .001, with a very large effect size (η
2
p = .352), however, the main effect of appearance and the
interaction effect of voice and appearance were not significant. For the perception of masculinity, voice
had a significant effect, F(2, 264) = 45.34, p < .001, again with a large effect size (η
2
p = .256). The
main effect of appearance was not significant, but the interaction of voice and appearance was significant,
F(4, 264) = 3.02, p = .018. These results support H4a, do not support H4b, and partially support H4c.
H5: Identity Established through Clothing Affects Perceived Gender for Robots with Androgynous Voices: We examined how reinforcing the task context with clothing as opposed to a generic robot
body affects gender perceptions of the robots. Due to the larger effect of voice in the process of gendering
robots that we saw in our previous studies, we investigated how clothing affects the perceived gender
of robots with androgynous voices. To accomplish this goal, we performed a two-way ANOVA with the
between-subjects factor being the role of the task, and whether or not the robot was wearing clothes, with
masculinity and femininity ratings as the dependent variable.
We found that clothing did affect the perceived gender of the robot. Both the social role of the task
(F(1, 173) = 8.49, p = .004) and whether or not the robot was wearing clothes (F(1, 173) = 7.67,
p = .006) affected ratings of femininity with a small to medium effect (η
2
p = .047 for role, and η
2
p = .042
for clothing). The interaction of role and clothing was significant for ratings of masculinity with a small to
medium effect size (F(1, 173) = 8.48, p = .004, η
2
p = .047). Therefore, H5a, H5b, and H5c are supported.
5.6 Summary
This chapter presented voice and clothing design as a methodology to modify user expectations of a robot.
Voice and clothing design allows users to customize the embodiment and expectations of the robot to
better reflect its intended use. We presented a tool for designing screen-based faces that can be used
in social interactions. We found that these design processes and tools can be used to reliably change a
95
user’s social expectation of the same robot embodiment. We believe that these techniques can expand the
applicability of robots to a variety of occupations and environments. This chapter concludes the section
of the dissertation that focuses on robot embodiment. The next chapter begins the next section of the
dissertation that focuses on adapting physical interactions over time.
96
Part II: Adapting Physical Interactions
For users with limited mobility, robots have the potential to serve as an interface to the physical world. How
users interact with and benefit from these robots is highly personal. For example some post-stroke users may
want a robot to challenge them during an exercise by providing resistance, whereas other post-stroke users
may want the robot to provide more assistance to reach their goal. In Chapters 6 and 7, we explore how robots
can adapt their physical interactions with users to achieve physical goals.
97
Chapter 6
Personalizing Movement Models in Post-Stroke Participants
The focus of this chapter is to develop an assistive robotic system that can promote stroke exercise at
home and facilitates the communication between post-stroke users and their neurorehabilitation specialists. Robots can collect objective interaction metrics that users have with the system, and can compile
these data into useful visualizations for neurorehabilitation specialists. This allows post-stroke participants and neurorehabilitation specialists to reduce the time they spend performing physical assessments,
and can faciltate data-driven decision-making around rehabilitation plans. This chapter is summarized in
Section 6.6. Return to the Table of Contents to navigate other chapters.
This chapter is adapted from the paper “A metric for characterizing the arm nonuse workspace in
poststroke individuals using a robot arm" (Dennler, Cain, et al. 2023), written in collaboration with Amelia
Cain, Erica De Guzman, Claudia Chiu, Carolee J. Winstein, Stefanos Nikolaidis, and Maja Matarić.
98
6.1 Motivation
Stroke is a leading cause of serious long-term disability in the United States (Tsao et al. 2022). Without
sufficient rehabilitation efforts, functional decline will ensue, leading to increased difficulty in completing
activities of daily living (ADLs), which contributes to decreased quality of life (Mayo et al. 2002; Winstein et
al. 2019). The goal of post-stroke neurorehabilitation is to restore functionality to the affected limb and enable stroke survivors to improve their quality of life. Several post-stroke rehabilitative interventions, such
as task-oriented training (Rensink et al. 2009), biofeedback(Stanton et al. 2017), and constraint-induced
movement therapy(Wolf, Winstein, et al. 2006), have demonstrated substantial improvements along levels of the International Classification of Functioning, Disability and Health (Organization 2001) including
domains of body structure/function, activity limitations, and participation.
Despite these functional improvements, a subset of stroke survivors may still experience a discrepancy
between what they are able to do in tests where they are constrained to using their stroke-affected arm
and what they spontaneously do in real world ADLs. This is of particular concern for individuals with
hemiparetic stroke and other unilateral motor disorders, because the less-affected side can be used to
compensate for movements of the impaired side and such compensation interferes with the “use it or
lose it” foundational principle of neurorehabilitation. The nonuse phenomenon, the discrepancy between
capacity and actual use (Taub, Crago, and Uswatte 1998), was first characterized in an article titled “Stroke
recovery: he can but does he?” (Andrews and Stewart 1979). Nonuse has been shown to have a learned
component (Buxbaum et al. 2020), and can thus be reduced through practice. This makes nonuse a key
behavioral phenomenon to assess when evaluating patient recovery, one with high clinical and scientific
significance.
In neurological rehabilitation contexts, outcome metrics must meet three criteria to be considered
useful for evaluation: validity, reliability, and ease of use (Wade 1992). However, the two currently widelyaccepted instruments that provide metrics for nonuse–the Motor Activity Log (MAL)(Uswatte et al. 2006)
99
Figure 6.1: Example reaching trial with the BARTR apparatus. The participant places hands on the home
position device. The socially assistive robot (SAR, on the left) describes the mechanics of the BARTR, and
the robot arm (on the right) moves the button to different target locations in front of the participant.
A reaching trial begins when the button lights up, and the SAR cues the participant to move.
and the Actual Amount of Use Test (AAUT)(Sterr, Freivogel, and Schmalohr 2002)–do not satisfy all three
of those criteria. Although both tests have been found to be valid (Chen, Wolf, et al. 2012), they lack
the other two desired qualities of neurorehabilitation assessment metrics: reliability and ease of use. The
MAL relies on a structured interview for user-reported arm use over the course of a specified duration;
for example, one week or three days. Due to the difficulty associated with remembering and accurately
describing one’s arm use over the period of a week, this test is not simple for the participants to complete.
The AAUT is a covert assessment that is valid only if the participant is unaware that the test is being
conducted. Once the test is revealed, it becomes invalid for repeated use, making the scale unreliable.
Inspired by the current state of the field, this work introduces a metric for nonuse that meets all three
criteria.
100
Previous work demonstrated that the Bilateral Arm Reaching Test (BART) can be used to reliably
quantify nonuse (Han et al. 2013). BART randomly lights up one of 100 equally-spaced points between
10cm and 30cm in front of the user, and the user is required to reach to the lit-up point within a time
limit. In the first condition, the user is instructed to choose either hand to reach the point as quickly and
as accurately as possible. Due to the imposed time limit, the user must make a fast and spontaneous hand
choice, even if they know they are being tested. In the second condition, the user is constrained to only
use their stroke-affected arm to reach for the point. The spontaneous performance in the first condition
is compared to the functional performance in the second condition to assess the level of nonuse. This
approach has been shown to be both reliable and valid; however, it only assesses patients on a single plane
of motion. Reaching tasks required to accomplish ADLs involve three dimensional movements. In this
study, we introduce a robot arm that enables a reaching task to quantify arm nonuse in three dimensions,
allowing clinicians to tailor the rehabilitation process to specific patterns of nonuse as they occur in the
user’s real-world environment.
We describe the modified Bimanual Arm Reaching Test with a Robot (BARTR), depicted in Figure
6.1. The testing apparatus consisted of a general-purpose robotic arm that queries points in front of the
user, and a socially assistive robot (SAR) that supported the testing procedure by providing instruction and
motivation. In a session of BARTR, the user completed two phases: a spontaneous phase and a constrained
phase. Each phase can be completed in approximately 20 minutes. We used identical instructions to the
original validated BART (Han et al. 2013). In the spontaneous phase, the user was instructed to use the hand
that can reach the button as quickly and accurately as possible. These instructions ensured that participants
acted spontaneously while being aware that they were being tested. In the constrained phase, the user
reached for the button with their stroke-affected hand. The nonuse metric, nuBARTR, was quantified
from the reaching data collected from each session and repeated sessions that occurred at least four days
apart, as in previous work (Han et al. 2013).
101
To validate nuBARTR as a useful clinical metric, we developed the three following hypotheses based
on the criteria for useful metrics in neurorehabilitation:
• H1: nuBARTR is a valid metric, showing high correlation with the established metric for assessing
nonuse, the AOU subscale of the AAUT.
• H2: nuBARTR is a reliable metric, posessing high test-retest reliability as evidenced by high absolute
agreement across repeated sessions taken at least four days apart.
• H3: nuBARTR is a simple to use metric, achieving a score of 72.6 out of 100 or greater on the System
Usability Scale, indicating above-average user experience (Lewis 2018).
We found that nuBARTR satisfies these three criteria for a useful neurorehabiliation metric: it had high
validity, high test-retest reliability, and study participants found it easy to use. The system can be used to
aid clinicians in the quantification and tracking of stroke survivor arm nonuse.
6.2 Inspiration: Clinical Standards for Measuring Nonuse
6.2.1 Actual Amount of Use Test
The Actual Amount of Use Test (AAUT) (Taub, Crago, and Uswatte 1998) is a clinical standard to quantify
arm nonuse in post-stroke participants. The AAUT is a covert assessment of spontaneous arm use for
14 tasks that regularly occur in daily life, such as pulling out a chair from a table prior to sitting in it and
flipping through the pages of a book. First the tasks were completed covertly (spontaneous AAUTs), so the
participant did not know that they were being video recorded and tested. Then, the experimenter revealed
that arm use was being observed, and participants completed the 14 tasks again while being encouraged
to use their stroke-affected arm as much as possible (constrained AAUTc).
From recorded videos of the participant, a physical therapist rates both the AAUT-Amount (binary
yes/no), which measures if the participant attempted to use their stroke-affected arm (AAUT AOU score)
102
for that task and AAUT-Quality of Movement or QOM (on an ordinal scale of 0 to 5) which measures
how much they used the paretic arm in the task. The benefit of the AAUT is that it is known to be a
valid measure of the participant’s arm nonuse. However, it lacks test-retest reliability because it relies on
participants being unaware that a test is being conducted. After the first time the test is conducted, the
user is primed to know that they are being tested the second time. This requires long wash-out periods
between successive tests, necessitating the need for a repeatable measure of arm nonuse for more frequent
longitudinal assessment. The Bimanual Arm Reaching Test with a Robot that we developed aims to create
an interaction that retains validity when the user is aware they are being tested by calculating a metric
from choice in arm use.
6.2.2 Motor Activity Log
A second clinical technique for assessing arm nonuse is the motor activity log (MAL) (Taub, McCulloch,
et al. 2011). This assessment also measures arm use on two dimensions: the Amount Scale (AS) and the
How Well Scale (HW). These scales are analagous to the AAUT’s AOU and QOM. The test is conducted as
a semi-structured interview, where a trained therapists asks a participant to report their AS and HW scores
for activities performed over a particular timeframe, e.g., the past week. These activities come from a set
of 30 activities of daily living, such as eating finger foods, buttoning a shirt, or combing their hair. For each
task, the participant reflects on how much they used their more affected side on a scale of 0–5, and how
helpful their more affected arm was in performing that task on a scale of 0–5. The therapist explains to the
participant what the criteria for each rating value is, and is responsible for verifying with the participant
that they believe their rating to be accurate.
A benefit of the MAL is that it can be used to assess participants repeatedly. However, the test relies on
a post-stroke participants’ memory of their arm use compared to observed arm use. This may introduce
biases if the participant cannot remember their arm use over the given time frame, or if they want to
103
signal to the therapist that their mobility and arm use are improving. An improved test can both rely
on observed information about arm use and reduce the required training to use the rating scales to more
accurately assess a user’s arm nonuse. The Bimanual Arm Reaching Test with a Robot aims to produce
a metric from observed arm use, and does not rely on subjective ratings of arm behavior to calculate its
metric.
6.2.3 Bimanual Arm Reaching Test
The Bimanual Arm Reaching Test (BART) developed by Han et al. 2013 aimed to address these problems
using a specialized testing apparatus. This apparatus consisted of an array of LED lights that lit up under
a surface and two magnetic sensors that the participant wore. The BART consisted of two phases: a
spontaneous phase and a constrained phase, inspired by the AAUT. For the spontaneous phase, a light on
the board illuminated, and the participant attempted to reach to the button as quickly and as accurately as
possible with whichever hand the would like. Each time the button illuminated and the participant reached
to the light was considered one trial. The participant conducted 100 trials in the spontaneous phase. In
the constrained phase, the participant was instructed to only use their more affected side for all reaches.
The participant again performed 100 trials in the constrained phase. This test then calculated a metric that
was found to be correlated with the AAUT nonuse metric.
The benefit of this test is that it relied on objective use data, and did not require extensive training
to use. However, the apparatus itself may be difficult to construct, and it requires users to be attached
to magnetic sensors which may affect reaching. The BART interaction also takes place on a single plane,
however a user moves in all three spatial dimensions during activities of daily living. The Bimanual Arm
Reaching Test with a Robot aims to extend this test to three dimensions, and reduce the requirements of
worn sensors. The BARTR also has the benefit of using a general-purpose assistive robot, meaning that it
can be readily used for other assistive tasks with the user.
104
Figure 6.2: Visualization of the participant’s workspace. Viewed above, the workspace tested extends
radially from the home position from a distance of 10cm to 30cm (A). Viewed from the side, the workspace
extends upward 40cm (B).
6.3 Technical Approach: Modeling User Behavioral Metrics
Participants were seated at a table with the home position aligned with the center of their chest, as shown
in Figure 6.2. They were instructed to maintain approximately 90 degree angles of their elbows when
their index fingers were resting at the home position. To limit upper-body compensatory movement,
participants wore a shoulder harness attached to the chair (Cai et al. 2019). Participants were instructed to
verbally cue the experimenter when they were ready to begin each section of BARTR. Following previous
work, the two experiment phases were the spontaneous BARTR phase (sBARTR), where the participants
were instructed to use either arm to reach the target, and the constrained BARTR phase (cBARTR), where
the participants were instructed to use their more-affected arm to reach the target (Han et al. 2013).
For both phases of BARTR, the robot arm placed the reaching target at a different location in 3D space
in front of the participant. The participant was instructed to reach the target as quickly and accurately
as possible when prompted by the SAR. Each reaching trial began with the robot arm moving to one of
the randomly sampled locations. When the robot arm arrived at the location and the participant was in
the home position, the light on the target device turned on and the SAR cued the participant to reach to
105
the target after a random interval between 0 and 2 seconds, to prevent the participant from anticipating
movement to the target. After the audiovisual cue, the participant was given 3.1 seconds to reach to the
target. When the participant pressed the button, the light turned off. If the participant did not reach the
target in 3.1 seconds, the light turned off after the 3.1 seconds had elapsed. This time period was selected
to make the maximum time of each experiment phase approximately 20 minutes in duration, given the
variability in travel time between points for the robot arm. This period was sufficient for all neurotypical
participants to reach all of the target placements.
In total, 100 locations were tested for each of sBARTR and cBARTR. The locations were evenly spaced
in the 3D workspace volume in front of the participant defined by the region that was 10cm to 30cm from
the center of the home position, forming a semi-circle that extended in front of the participant in their
transverse plane, and heights that ranged from 0cm to 40cm above the table, as shown in Figure 6.2. These
points were selected randomly without replacement–namely, each point was selected exactly one time;
participants reached for all 100 targets one time per session. Participants attempted up to 100 reaching
trials for each section of BARTR, for a total of 200 reaches.
Calculation of the BARTR Metric
We used the data collected through the BARTR interaction to estimate a user’s workspace. Following
previous work, nonuse was modeled as the subtraction of two components: the constrained component
and the spontaneous component (Han et al. 2013). The constrained component of the workspace W is
defined for every point x ∈ W for a particular participant p as:
cBART Rp(x) = pp(success|X = x, S = sp) (6.1)
where pp(·) denotes the function that returns the probability of the poststroke participants selecting each
side in the spontaneous condition. The side of the participant that was affected by stroke is denoted as sp
106
and is in the set of values {
′
lef t′
,
′
right′}. This quantity represents the total area that the participant is
expected to be able to reach within the time limit–3.1 seconds, based on the times from the neurotypical
group.
The spontaneous component of the workspace is defined over all points x ∈ W as
sBART Rp(x) = pp(S = sp|X = x) ∗ pn(S = sp|X = x) ∗ E[t
sp
n (x) − t
sp
p (x)|X = x] (6.2)
where pp(·) denotes the probability of the post-stroke participants selecting either side in the spontaneous
condition and pn(·) denotes the probability of the neurotypical group selecting either side in the spontaneous condition. t
sp
p (x) and t
sp
n (x) represents the movement time for the post-stroke and neurotypical
participants, respectively, to reach the point x in the workspace with the arm on the participant’s more
affected side, sp. This quantity represents how close the participants’ spontaneous arm use is to spontaneous neurotypical use. Higher usage of the participant’s more-affected arm, and faster movements result
in higher spontaneous scores.
The final calculation for nonuse is calculated as the difference of these functions summed over all of
the points in the workspace:
nuBART R =
X
x∈W
cBART R(x) − sBART R(x) (6.3)
To obtain these values, we modeled the interaction metrics–time to reach points and arm choice–as Gaussian Processes for the normative participants and for each post-stroke participant. We summed over 10,000
samples from a uniform distribution over the workspace to accurately estimate the difference of these two
functions.
107
6.4 Data Collection
6.4.1 Technical System Description
The BARTR apparatus, designed to test arm nonuse, consists of a robot arm and a socially assistive robot
(SAR). The robot arm was the Kinova JACO2 assistive arm (Kinova n.d.) selected because it has already
been used in assistive domains, it is lightweight, and it safely interacts with and around people. The arm
has the same affordances as end-effector robots typically used for other rehabilitative interactions that
have been shown to be effective in the rehabilitation context (Lee, Park, Cho, et al. 2020). The SAR was
the Lux AI QTRobot (QTrobot: Humanoid social robot for research and teaching 2020) that consists of a
screen face on a 2 degree-of-freedom head and two 3 degree-of-freedom arms that can gesture. This SAR
platform has already been validated in our past work with children with arm weakness due to cerebral
palsy (Dennler, Yunis, et al. 2021), as well as in other human-robot interaction contexts (Spitale et al.
2022). The SAR provided the participant with verbal instructions at the start of the BARTR session, and
with positive feedback on a random schedule, similarly to previous SAR use in other rehabilitation contexts
(Swift-Spong et al. 2015; Dennler, Yunis, et al. 2021; Feingold-Polak, Barzel, and Levy-Tzedek 2021).
In addition to the two robots, we developed two low-cost devices for the BARTR apparatus: the target object and the home position. Both devices are 3D-printed, have self-contained power supplies and
processors, and communicate wirelessly with the BARTR apparatus using low-level UDP protocols.
The target device, held by the robot arm, consisted of a 3D-printed housing with a single button. It
received commands to turn on a light and start a timer to begin each reaching trial, and logged the time
taken by the participant to reach for and press the button to turn off the light.
The home position was the location that participants returned to between reaching trials, implemented
as a 3D-printed block with two shallow holes 2cm in diameter, 5 cm apart, with capacitive touch sensors
inside. Participants placed their left pointer finger in the left hole, and their right pointer finger in the right
108
hole. The device communicated at 20Hz, reporting the locations that were being actively touched by the
participant.
6.4.2 Bilateral Arm Reaching Test with a Robot
Participants were seated at a table with the home position aligned with the center of their chest, as shown
in Figure 6.2. They were instructed to maintain approximately 90 degree angles of their elbows when
their index fingers were resting at the home position. To limit upper-body compensatory movement,
participants wore a shoulder harness attached to the chair (Cai et al. 2019). Participants were instructed to
verbally cue the experimenter when they were ready to begin each section of BARTR. Following previous
work, the two experiment phases were the spontaneous BARTR phase (sBARTR), where the participants
were instructed to use either arm to reach the target, and the constrained BARTR phase (cBARTR), where
the participants were instructed to use their more-affected arm to reach the target (Han et al. 2013).
For both phases of BARTR, the robot arm placed the reaching target at a different location in 3D space
in front of the participant. The participant was instructed to reach the target as quickly and accurately
as possible when prompted by the SAR. Each reaching trial began with the robot arm moving to one of
the randomly sampled locations. When the robot arm arrived at the location and the participant was in
the home position, the light on the target device turned on and the SAR cued the participant to reach to
the target after a random interval between 0 and 2 seconds, to prevent the participant from anticipating
movement to the target. After the audiovisual cue, the participant was given 3.1 seconds to reach to the
target. When the participant pressed the button, the light turned off. If the participant did not reach the
target in 3.1 seconds, the light turned off after the 3.1 seconds had elapsed. This time period was selected
to make the maximum time of each experiment phase approximately 20 minutes in duration, given the
variability in travel time between points for the robot arm. This period was sufficient for all neurotypical
participants to reach all of the target placements.
109
Table 6.1: Demographic Information of the Neurotypical Group
Median Minimum Maximum
Age (years) 69.5 45 82
Gender 5 Men, 5 Women
Ethnicity 2 Asian, 2 Black, 6 White
In total, 100 locations were tested for each of sBARTR and cBARTR. The locations were evenly spaced
in the 3D workspace volume in front of the participant defined by the region that was 10cm to 30cm from
the center of the home position, forming a semi-circle that extended in front of the participant in their
transverse plane, and heights that ranged from 0cm to 40cm above the table, as shown in Figure 6.2. These
points were selected randomly without replacement–namely, each point was selected exactly one time;
participants reached for all 100 targets one time per session. Participants attempted up to 100 reaching
trials for each section of BARTR, for a total of 200 reaches.
6.5 Results
6.5.1 Neurotypical Participants
We recruited 10 neurotypical adults to establish a normative value for performance. All neurotypical
participants were right-hand dominant, and their demographic information is summarized in Table 6.1.
The average age of neurotypical participants was 67 ± 10 years.
6.5.2 Post-Stroke Participants
Participants with chronic stroke were recruited from the Los Angeles, California, USA area to take part
in this study. Participants were recruited through the IRB-approved Registry for Aging and Rehabilitation Evaluation database of the Motor Behavior and Neurorehabilitation Laboratory at the University of
Southern California (USC). All participants were right-hand dominant prior to their stroke. In total, 17
post-stroke participants were recruited. Two participants did not meet the study criteria after screening
110
Table 6.2: Demographic Information of the Post-Stroke Group
Median Minimum Maximum
FM-UE Motor Score (66 maximum) 59.5 42 64
AAUT AOU Score (1 maximum) .29 .00 .85
Age (years) 55 32 85
Time between sessions (days) 6.5 4 19
Gender 8 Men, 6 Women
Affected Side 5 Left, 9 Right
Ethnicity 4 Asian, 2 Black, 4 Hispanic,
3 White, 1 Mixed-race
Abbreviations: FM-UE, Fugl-Meyer Upper Extremity; AAUT AOU, Actual
Amount of Use Test – Amount of Use
and one participant was excluded from analysis due to difficulties in completing the task. Of the fourteen
eligible participants, twelve completed all three sessions of the BARTR, and two were only able to complete
two sessions due to scheduling constraints. One eligible participant did not receive AAUT scores due to
technical problems in recording the exam. The average age of post-stroke participants was 57 ± 11 years.
Age and other participant demographic information is summarized in Table 6.2.
6.5.3 Testing Validity, Reliability, and Simplicity
To evaluate BARTR as a metric for neurorehabilitation, we evaluated the three criteria of effective metrics:
validity, test-retest reliability, and ease of use.
Validity We evaluated the validity of BARTR by comparing the quantification of nonuse produced by
the system with the values of nonuse collected from post-stroke participants using the AAUT, the clinical
standard for assessing nonuse. Participants had a wide range of nonuse, with AAUT AOU values ranging
from .00 to .85 and nuBARTR scores ranging from .849 to 1.71. We determined the validity of nuBARTR
with the non-parametric Spearman correlation between AAUT AOU and the averaged value of nuBARTR
across the three sessions. Figure 6.3 shows that the calculated nonuse from BARTR is correlated with the
clinical AAUT AOU metric of nonuse (r(13)=.693, p = .016).
111
Figure 6.3: Evaluations of the proposed metric. We demonstrate the Bimanual Arm Reaching Task with
a Robot (BARTR) metrics validity through its correlation with clinical measurements of nonuse through
a non-parametric Spearman correlation, r(13) = .693, p = .016 (A). We demonstrate reliability with the
absolute agreement of BARTR scores across three sessions through the intraclass correlation coefficient,
ICC(1, k) = .908, p < .001 (B). We demonstrate its ease of use through usability ratings of the system,
showing that the average rating is above 72.6 through a non-parametric Wilocoxon signed-rank test, Z =
16.0, p = .040 (C).
We also examined the correlation with the individual subscales of the AAUT with the non-parametric
Spearman correlation. The cBARTR shows a high correlation with the cAAUT (r(13)=.773, p = .002) and
the sBARTR shows a correlation with sAAUT (r(13)=.769, p = .002).
Test-Retest Reliability We examined the absolute agreement (ICC) of the three BARTR sessions to assess
test-retest reliability. Absolute agreement of the BARTR metric is the recommended test of reliability in
the medical field (Koo and Li 2016). We found that between sessions there was very high reliability of
nuBARTR scores, ICC(1,k)=.908, p < .001. A visualization of nuBARTR scores by participant is shown in
Figure 6.3.
112
We noted correlations between all pairs of sessions via a Pearson correlation. The first and second
session are significantly correlated (r(14)=.662, p = .010), the second and third sessions are significantly
correlated (r(12)=.948, p < .001), and the first and third sessions are correlated (r(12)=.686, p = .012). We
examined scores across all three sessions, and note that the BARTR interaction showed increased reliability
after the first session, supporting repeated evaluations using this method to evaluate participants’ nonuse
over time.
Ease of Use To evaluate ease of use, we applied the standard, commonly used System Usability Scale (SUS)
(Brooke 1996; Bangor, Kortum, and Miller 2008; Lewis 2018). The SUS is scored out of 100 and calculated
from 10 items. SUS meta-analyses provide full distributions of SUS scores across 446 extant systems, and
recommend evaluating systems based on percentiles of systems examined in the meta-analysis (Lewis
2018). For example, a mean SUS score of 72.6 represents a system that is in the top 65% of all systems
evaluated in the meta-analysis, and the meta-analysis provides a rating system for understanding these
percentiles. A score of 78.9 or higher is in the ‘A’ range, a score of 72.6 to 78.8 is in the ‘B’ range, a score
of 62.7 to 72.5 is in the ‘C’ range, and a score of 51.7-62.6 is in the ‘D’ range. The middle values of these
ranges are denoted by the dashed lines in Figure 6.3.
We administered the SUS to all participants that enrolled in the study. For determining usability, we
examined the SUS scores of only the post-stroke group. The average rating of scores was 8.93 ± 11.67,
placing the mean usability of the BARTR apparatus in the 80th percentile of systems included in the SUS
meta-analysis. Due to the high variance in participants’ scores, we determined that the score is significantly
greater than 72.6, which corresponds to an above-average user-experience (Lewis 2018). We found from a
non-parametric Wilcoxon signed rank test that participants rated our system significantly above the 72.6
threshold (Z=16.0, p=.040). Based on this result, the system is easy to use, and readily satisfies the ease of
use criterion. The distribution of SUS scores across all participants is shown in Figure 6.3.
113
6.5.4 Post-Stroke Participant Insights on Rehabilitation Systems
We performed a qualitative analysis on the semi-structured interviews from 12 post-stroke participants
who completed all three study sessions. The interviews were conducted following the third session of
the BARTR, and lasted for an average of 14 minutes (minimum: 4 minutes, maximum: 44 minutes). The
questions were structured around the four themes that prior work identified as important for interaction
with rehabilitation systems (Kellmeyer et al. 2018): safety throughout interaction, ease of interpretation,
predictability of actions, and adaptation of behaviors to task. We show an overview of the participants’
responses to these four themes in Figure 6.4. Positive responses described the system as being unequivocally helpful within the theme, mixed-positive responses described the system as helpful but provided
room for improvement, and mixed responses were unsure if whether the system was helpful with respect
to the theme. No participants found the system unhelpful. We also report the participants’ suggestions for
improvement and future tasks.
Safety throughout Interaction
All participants (n = 12) found the interaction to be safe. In addition to the safety precaution we took of
moving the arm slowly, participants also reported feeling safe because they “figured [the experimenter]
knew what [they] were doing” (P29) and that “it felt pretty safe because I had this shoulder harness on”
(P27).
Some participants (n = 3) identified that they worried about the robot arm when it came close to the
home position, but reported that this did not influence how safe they felt throughout the interaction. One
participant viewed the perceived risk as beneficial to them because “it was good to have my brain react to
having it come close” (P36).
114
Figure 6.4: Qualitative responses from participants. We show overall perceptions of each of the four
factors of trust (Kellmeyer et al. 2018) that each participant mentioned.
Ease of Interpretation
All participants (n = 12) also found the robot easy to use. Most participants (n = 9) specified that they
felt this way because the interaction itself was easy to learn. Participants found that they “got used to the
robot after the first command it gave” (P37) and that the interaction was “a normal everyday task, so it
wasn’t hard to learn” (P27). Participants also found the task easy to learn because “there wasn’t anything
... that you have to put on” (P23). Two participants (P29, P36) mentioned that they had done several other
studies using other devices, and that this interaction was easy because “it was all right in front of me and
the instructions were clear” (P29).
Eight participants also directly described the socially assistive robotics’s voice as easily understandable.
One participant (P29) noted that they “liked the mouth moving, it helped to understand the speech” of the
SAR providing instructions. In addition to understanding the words, another participant (P27) also found
“the voice was comforting and the instructions were very clear”. Participants found the instruction from
the SAR valuable toward understanding the task as well as socially motivating.
115
Predictability of Actions
Seven participants directly commented on the predictability of the interaction. The comments addressed
both the physical predictability of the task, and the social predictability of the SAR. Participants found
the task predictable because it was repetitive and simple. A participant (P37) found this to be particularly
important because “with stroke you’re also going through a psychological situation, and with [this task],
you don’t have to grapple with anything. This way is straight-forward”.
Participants had a variety of interpretations of the social component of the interaction. Because we
used randomness in the SAR’s movements and feedback in order to make it appear more natural and lifelike (as is standard in human-robot interaction work (Dennler, Yunis, et al. 2021; Abubshait and Wykowska
2020; Graaf, Ben Allouch, and Van Dijk 2015)), some participants viewed the unpredictability as a benefit. One participant (P27) referred to the unpredictable social behaviors as “natural” and thought the SAR
“doesn’t feel like technology”; another participant (P25) became engaged in “trying to find a pattern in
the robot’s eyes”. Another participant (P21) had a more neutral reaction to the randomness, and said “the
fluctuations in cuing, I don’t know if that was a hindrance or a help”. One participant (P37) greatly appreciated the fact that the exercise was lead by a robot, because the overall social interaction was predictable
and the robot was not getting tired, and stated “with the SAR it is like no judgement...there is no feeling of
changing in the delivery...if a person had to repeat ‘go, go, go’, sometimes they might get tired, and when
you’re doing the exercise you can see that”.
Adaptation of Behaviors to Task
Eight participants commented on how the system could adapt to them specifically throughout the task.
The participants were also concerned with either the task or the social component of the task. For the task,
six participants identified that the robot could adapt more to different levels of task difficulty. With the goal
of developing a standardized test, the robot sampled points randomly in the interaction, but participants
116
asked if the arm “could go all the way up or all the way back...it would be nice if I could extend my whole
arm” (P25), while at the same time recognizing that “if you had more damage in your arm it would be harder
to do” (P28). Four participants also described the timing of the robot placing points. Three of them found
the speed appropriate; one of these participants (P37) described it as “when the arm was moving it was
moving at the right speed”. One participant (P21) thought that the arm “could move faster or something...it
was very methodical where it went”.
With regard to the SAR’s verbal communication, four participants described the feedback that the
SAR gave as evidence of it adapting to their good performance. P26 also specified that the SAR’s progress
updates were helpful because they “gave you an idea of where you stand at the time”. However, one
participant (P30) wished that the SAR would be “more responsive” to the specifics of their performance,
for example through commenting on how fast their reach was.
Suggestions for Improvements and Future Tasks
Participants also provided feedback on how the system could be improved or adapted to other forms of
exercises for evaluating arm nonuse. The suggestions for improvement largely addressed how the system
could be more personalized to individual tastes. Participants discussed how visual components could be
adapted, for example how the SAR’s exterior could “change to USC colors, which would work better... I
have some stickers I could put on the robot” (P37), or how the button could “turn green when you press
it” (P23). Other participants described how the SAR’s audio could be personalized by “choosing music to
play, just to make it more pleasant” (P25). Participants also suggested gestures for the robot to perform,
such as “when you make a mistake, you could have the robot hold its arms up and point to the button”
(P37).
Despite these suggestions, participants (n=10) described the system as being effective and helpful.
Several (n=4) explicitly stated that they thought about the interaction outside of the experiment. One
117
participant reported that when they were “trying to open a cabinet, I had a flashback to this button pressing
when I was thinking about how to orient my hand to open the cabinet” (P37). Participants found the “fact
that it is 3D is effective” (P37), and suggested several other three-dimesional interactions that would be
useful.
The most popular task that the participants described as being useful was “3D tasks that involved more
finger dexterity” (n=7). Participants described how the interaction could “integrate a little ball... because
once you put it in your hand your fingers start working” (P36), or how the robot arm could “hold a pocket
or something and have people put pennies over here or over there” (P37). One participant also described
how they would like to control the robot to practice finger dexterity by using “a glove or something to
control the robots, so you simulate grabbing and the robot moves with the glove” (P23).
The second type of task that multiple participants suggested was gross motor tasks (n=4). For example,
two participants suggested using the robot arm to passively move their more stroke-affected side by “grabbing what the robot is holding and have it drag my arm around” (P25). Two other participants suggested
actively pushing against the arm as a form of strength training. One participant suggested “you could add
on pressure sensing...I am interested in seeing the pressure and strength of both sides” (P27).
6.6 Summary
This chapter presented the Bimanual Arm Reaching Test with a Robot (BARTR), a robotic system to assess post-stroke arm use by developing a personalized model of participants’ physical interactions. BARTR
allows robots to engage in an exercise activity for post-stroke users while simultaneously collecting meaningful data for physical therapists to perform decisions about the participants rehabilitation regimen. We
used data from this interaction to create personalized movement models, and defined a metric calculated
from these personalized movement models to measure the clinically significant nonuse metric. We performed semi-structured interviews with post-stroke participants to find additional ways that robots can
118
help with rehabilitation. The next chapter expands the use cases of the general purpose robotic arm we
used in the BARTR system to the assistive task of hair combing.
119
Chapter 7
Customizing Robot Haircare through Motion Planning
While assistive robots can help to improve and measure motor function, they can additionally augment
the autonomy of users that have limited function. This chapter proposes an algorithm to generate motion
plans that comb users’ hair from a single click from the user. This algorithm is implemented in a system
that allows robots to perform a novel assistive hair combing task. This chapter is summarized in Section 7.6.
Return to the Table of Contents to navigate other chapters.
This work is adapted from “Design and evaluation of a hair combing system using a general-purpose
robotic arm" (Dennler, Shin, et al. 2021), in collaboration with Eura Shin, Maja Matarić, and Stefanos
Nikolaidis.
120
7.1 Motivation
In 2019, there were 40.7 million people in United States living with some form of reduced mobility (NHIS
2019) resulting from a variety of factors (e.g., traumatic injury, stroke, genetic causes, etc.), according
to the National Health Interview Survey. Most items used in activities of daily living (ADLs) are not
explicitly designed for people living with reduced mobility and may be completely unusable. This requires
people living with limited mobility to either purchase specialized items designed for mobility limitations or
acquire assistance from trained professionals. However, for many people, neither option may be accessible
nor affordable.
The high costs of formal caregivers has necessitated their replacement with family members who give
care informally. In 2015, 43.5 million people in the US reported providing informal care in the previous 12
months. Informal caregiving requires providing companionship, personal care, scheduling health services,
providing transportation, and more, while at the same time managing one’s own job, family finances,
household tasks, and other daily activities (Feinberg et al. 2011). Due to the increased caregiver load,
informal caregiving can be a source of stress and can lead to health problems (Pinquart and Sörensen
2003). However, assisting people with limited mobility with ADLs is necessary to decrease their risk of
depression and, in some forms of limited mobility, can aid in their rehabilitation processes (Lin and Wu
2011).
Assistive robots could provide a solution to this growing issue by enabling people with limited mobility
to independently perform necessary daily tasks such as eating (Bhattacharjee et al. 2020), drinking (Goldau
et al. 2019), and grabbing objects (Jain and Kemp 2010). Areas of self-care such as shaving (Hawkins, Grice,
et al. 2014), cleaning (King et al. 2010), and dressing (Erickson et al. 2018; Kapusta et al. 2019) have similarly
shown potential for assistive robotics for non-critical tasks. However, fully autonomous systems are not
always desirable, as it limits the user’s choice and leads to lower acceptance of the system in the event of
121
errors (Bhattacharjee et al. 2020). A key design goal is thus to include user input to the system in a way
that reduces the effort of the user yet simultaneously maintains the user’s own autonomy.
We propose to expand the abilities of general-purpose assistive robotic arms to the domain of haircombing, an important and under-explored self-care ADL. In this work, we provide a minimal system to
comb different kinds of hairstyles. The system consists of three modules: an image segmentation module,
a path-planning module, and a trajectory generation module. The main contributions of this paper are:
1) the formulation of a path planning method for hair combing, and 2) insights from naïve users on the
efficacy of that approach to hair combing. Together, the system provides users or remote caretakers a
way to automatically generate paths through a user’s hair and have a robot comb along the generated
paths. The system provides functionality using a robotic arm and an RGB-D camera. This work presents
a physical implementation of the system on a lightweight Kinova Gen2 robot arm and an evaluation of its
performance. Our results demonstrate that the system successfully combs a variety of hairstyles.
7.2 Inspiration: Hair Modeling Techniques
Current works in hair modeling focus on recreating and rendering entire hair structures in 3D from 2D
images (Ward et al. 2007). In general, hair can be modeled as a collection of curves in 3D space that follow
an underlying orientation field that can be inferred from the surface of the hair (Saito et al. 2018; Zhang and
Zheng 2019). Some approaches utilize a database of 3D models, which can be combined to form new styles
that are found in single-view images (Hu et al. 2015). Other approaches (Chai et al. 2012) use a pipeline
for editing hairstyles in portraits by finding orientation fields from images to generate additional strands
(Chai et al. 2012). Our system takes inspiration from these works, and models hair as an orientation field.
We emphasize being able to generate paths in real time on hair that becomes neater as the system combs
it.
122
7.3 Technical Approach: Following Hair Orientation Fields
We developed a path planning module that takes the current hair state from the segmentation module and
a user-specified point as input and produces a path through the hair that aligns with the natural curvature
of the hair. For this work, we use a mouse to make the selection, but this pipeline is not dependent on the
input device. Following works from Computer Graphics on hair rendering, we model the flow of hair as a
differential field (Chai et al. 2012). To obtain the differential field, we find the orientation of the hair in the
image. We consider a path that follows the flow of the hair as a solution to the differential field, with initial
conditions defined as the point a user selects as the path’s starting point. For the following sections, we
leverage the common assumption that an image is a smooth continuous function that maps XY coordinates
to intensity values. The function is discretized into pixels and the intensity value at each pixel is noisily
sampled from the underlying intensity function. This assumption allows us to take derivatives over the
image to construct the orientation field.
7.3.1 Coherence-Enhancing Shock Filters
Algorithm 1 Coherence Filter
1: Input: Image I, kernel size Kδ for approximating derivative, kernel size Ke for calculating eigenvectors, kernel size Km for calculating max and min values, Constant Cblend the blending rate for each
iteration, and T the total number of iterations.
2: for t = 1, 2, ..T do
3: Calculate first (normalized) eigenvector [ex, ey] for each sub image of size Ke × Ke in It
4: Approximate δ
2
I
δx2 ,
δ
2
I
δxδy , and δ
2
I
δy2 with a sobel filter of size Kδ.
5: Compute Ivv ← e
2
x ∗
δ
2
I
δx2 + 2 ∗ ex ∗ ey ∗
δ
2
I
δxδy + e
2
y ∗
δ
2
I
δy2
6: Create a new image I
′
t
, where for each sub image Km × Km in I, we take the max value of the sub
image if Ivv > 0 and the min if Ivv < 0
7: It+1 ← It ∗ Cblend + I
′
t ∗ (1 − Cblend)
8: return image
9: Note: our implementation uses Kδ = 7, Ke = 11,
Km = 3, Cblend = 0.9, and T = 3
123
Figure 7.1: Overview of path-planning module.
We first use the method described by Weickert et al. (Weickert 2003) to make the natural flows of the
hair more coherent, reducing the effect that stray strands of hair on the calculation of orientation. To do
this, we iteratively refine an image by increasing contrast in the direction of the greatest change of the
intensity gradient (directions perpendicular to the flow of hair), and decreasing contrast in the direction
of the lowest change of the intensity gradient.
First, we calculate the dominant eigenvector e for each pixel’s local neighborhood, which provide the
direction of greatest change in intensity. The convexity in the direction of e is determined by e
T He, where
H is the Hessian matrix in each pixel’s neighborhood. If the convexity is positive, the algorithm intensifies
the pixel to match its surroundings. If the convexity is negative, the algorithm reduces the pixels intensity.
The net effect of this process is to increase the contrast in the direction of the greatest change in intensity.
The process is shown in Algorithm 1, and the effect on images is seen in Fig. 7.1.
124
Algorithm 2 Find Orientations
1: Input: Image I, kernel size Kδ for approximating derivative, kernel size KE for approximating expectation
2: Approximate δI
δx and δI
δy (with a sobel filter of size Kδ).
3: Construct:
IxIx IxIy
IyIx IyIy
= ∇I
T ∇I = S0
4: Locally average each element in S0 with kernel size KE to calculate Structure Tensor:
J11 J12
J21 J22
=
SKE
5: Calculate local orientation: Θ = 1
2
atan2( J12+J21
J22−J11 ) + π
2
6: return Θ
7: Note: our implementation uses Kδ = 3 and KE = 5
7.3.2 Calculating Orientation through Hair Images
Once the image has been filtered to enhance the coherence of the hair flows, the system calculates the
direction of the hair flows in the image. The gradient structure tensor provides this information, which is
defined continuously for a point p as:
SKE =
R
r KE(r) · (Ix(p − r))2
R
r KE(r) · Ix(p − r)Iy(p − r)
R
r KE(r) · Iy(p − r)Ix(p − r)
R
r KE(r) · (Iy(p − r))2
(7.1)
where KE represents a window of the part of the image to consider. The integral corresponds to
the convolution operation over a continuous image. The pseudo-code for computing SKE
discretely for
all pixels is detailed in Algorithm 2 on lines 2-4. Orientations from this structure tensor following the
equation on line 5, as detailed by Yang (Yang, Burger, et al. 1996).
7.3.3 Generating a Combing Path from Orientation Fields
Once the orientation field is constructed, the user selects a point on the image for the starting point for
the comb. Once the point is selected, the path is iteratively generated by adding new points that follow
the direction of the orientation field, as described in Algorithm 3.
125
Algorithm 3 Calculate Path
1: Input: Orientation image Θ, selected pixel for the start of the path p0 = [px, py], step size k (in pixels).
2: initialize path with v0
3: while pt
is within bounds of hair do
4: θpt = Θ[pt
]
5: pt+1 ← pt + k · [cos(θpt
), sin(θpt
)]
6: append pt+1 to the path
7: return path
8: Note: our implementation uses k = 6
7.4 Data Collection
7.4.1 Experimental Validation: Path Planning Algorithm
To evaluate the path planning module’s design choice of following curves of the hair as an appropriate
combing strategy, we propose a baseline alternative, called the mesh-based method. In this approach, the
strategy for combing is instead based on selecting paths that follow the shortest path to the bottom of the
hair from an initial starting position.
Mesh-Based Baseline Method To create paths that move directly downward, we plan directly on a mesh
created from the point cloud of the hair, taken by the Kinect camera. Next we segment the image based on
color and depth to find candidate point clouds of the head. A human operator selects the subset of point
clouds that correspond to the user’s hair. The union of these point clouds represent the entirety of the hair
structure to be brushed. The resulting point cloud is then converted to a mesh by using a greedy surface
triangulation algorithm (Marton, Rusu, and Beetz 2009) from the PointCloud Library (Rusu and Cousins
2011). Once the mesh is created, paths are formed by sampling mesh vertices of the hair mesh model as
starting points. The mesh is a graph with nodes corresponding to mesh vertices and edges corresponding
to the edges of the triangles that define the mesh. We define the goal as any point in the bottom portion
of the hair mesh. We then formulate path planning as finding a low-cost path from the start vertex to one
of the vertices in the goal set. We find these paths using the A* algorithm (Russell and Norvig 2002) with
126
(a) Mesh-based Algorithm. (b) Image-based Algorithm.
Figure 7.2: Illustrative hairstyle where the image-based algorithm performs differently than the meshbased for corresponding starting points.
Euclidean distance as the distance metric and the straight-line distance to the bottom of the mesh as the
heuristic.
To generate a variety of paths, the algorithm is run several times using a variety of sampled start states.
Examples of paths generated by this algorithm and how it compared to the image-based algorithm is shown
in Figure 7.2. The key difference between these algorithms is that the mesh-based algorithm combs in a
downward manner, whereas the image-based algorithm is able to additionally brush sideways paths.
Online User Validation Study Setup To evaluate the strengths and weaknesses of the these approaches,
we measured user perception of the methods through an on-line study using Amazon Mechanical Turk.
A variety of hairstyles were selected from the USC hair salon database (Hu et al. 2015). Ten short, ten
medium, and ten long hairstyles were selected at random, and rendered in one of five hair colors (black,
brown, auburn, blond(e), and grey) from one of three camera angles.
Each MTurk participant saw 10 of the 30 hairstyles at random, and was shown three types of paths: the
mesh-based method, the image-based method, and a set of human-drawn paths as a high-quality reference
point. Human-drawn paths were generated by an independent researcher not affiliated with the project.
For each algorithm, participants were shown three paths randomly selected from a set of solutions at
different starting points to provide a holistic view of the types of paths the algorithms generated. For each
127
Figure 7.3: Means of the three different algorithms. Error bars represent 95% Confidence Intervals of mean
ratings. All differences are significant.
algorithm, participants rated the generated paths on: 1) whether the paths are able to brush hair (referred
to as completeness), and 2) whether the paths effectively brush the hair (referred to as effectiveness).
7.4.2 Hypotheses
• H1: Participants will prefer paths that vary in direction, and thus rate the image-based method as
generating paths that are more complete than the mesh-based method.
• H2: Participants will prefer paths that follow the flow of the hair, and thus rate the image-based method
as generating paths that are more effective than the mesh-based method.
Validation Study Results The study received 48 respondents (27 women, 21 men; median age range
25-34). Responses were measured on a scale of -3 to 3, with 0 indicating neutral responses. A repeatedmeasures ANOVA test found significant differences for the main effect of algorithm on reported completeness of planned path, with F(2,94) = 24.12, p < .001. Pairwise t-tests were used for post-hoc analysis, revealing that all pairwise comparisons were significant after Bonferroni correction. The mesh-based method
of planning paths (M=.99) received significantly lower ratings than the image-based method (M=1.53),
p < .001, η
2 = .09 which was in turn rated as less complete than the human-drawn paths (M=1.70),
128
Figure 7.4: Sample frames of the video shown to the participants and forces measured by the arm for
different strokes of each hairstyle. Orange lines represent mean force values across 25 strokes with the
orange region illustrating the first and third quartiles. Blue lines represent individual stroke force values.
All force readings were measured at 10Hz and were post-processed with a sliding average of 9 timesteps
for visualization.
p = .01, η
2 = .01. The difference between the mesh-based method and the human-drawn paths was also
significant, p < .001, η
2 = .17. This supports H1.
We performed the same analysis for effectiveness ratings. A repeated-measures ANOVA test shows
significant differences by algorithm type, with F(2,94) = 21.69, p < .001. Pairwise t-tests show significance
between all comparisons with p < .001, where the mesh-based method (M=.32) was lower than the imagebased method (M=.89), η
2 = .08, and the human-drawn paths (M=1.22) were rated the highest, η
2 = .04.
This supports H2.
Qualitative notes from participants in the study revealed that they found the human paths to be rated
the most highly because they tended to “follow the hair flow" but were also “long and straight". The main
concern with the image-based method was the it followed the path of the hair too closely. This is evident
for the hairstyles that have waves, where combing in a wavy path may be inefficient. Because the hair
can deform as the comb passes through it, moving along the waves is less direct than moving straight.
However, moving along the waves is still preferable to moving directly downward without following the
general direction of the hair, as in the mesh-based method.
129
Theme Positive Example
Effectiveness “...the hair [that] the robot worked on looked much better..."
Force “It did seem gentle, so that is good."
Movement “This robot appeared to have finesse and sensitive movements."
Theme Negative Example
Effectiveness “...the hair [that] the robot worked on looked much better..."
Force “It did seem gentle, so that is good."
Movement “...there is no grace or nuance to the motion."
Table 7.1: Examples of positive and negative qualitative responses from participants for each theme.
7.4.3 System Evaluation Study
To evaluate the performance of the physical robot implementation of the system, we recorded three videos
of the robot combing three different wigs. The wigs varied in color and hairstyle and were representative
of the hairstyles in the first study, consisting of a short brown wig, a medium-length pink wig, and a long
blonde wig. The wigs were teased so that the hair was in a state that would warrant using a hair-combing
device. Each video showed the wig and robot arm from the side so that study participants could see both
the end effector and wig at all times, shown in Fig. 7.4. We used a plain white background for all videos. A
human operator selected the start points for the robot system off-screen. The participants watched the wig
being combed by the robot, where the first and final strokes were filmed in real time and the middle strokes
were filmed at 20x speed to show the progress that the robot makes over the course of the hair-combing
interaction to reduce fatigue effects. Participants were then asked to provide qualitative feedback on the
effectiveness of the combing as well as their willingness to use the system.
7.5 Results
7.5.1 Participants
The study received 30 responses (17 men, 13 women; median age range 25-34). We conducted a qualitative
analysis of the open-ended survey responses utilizing a grounded-theory approach (Heath and Cowley
2004) where we categorized responses by theme in a data-driven manner.
130
7.5.2 A Framework for Evaluating Hair Care Systems
The qualitative analysis revealed three main themes in participants’ evaluations of the system. The first
theme was related to the effectiveness of the task of hair combing. The second theme was the perception
of the amount of force that was used throughout the interaction. The third theme was related to how the
robot moved while combing hair. Representative positive and negative examples of each theme are shown
in Table 7.1.
Effectiveness The theme of effectiveness had several subcomponents. The top three were: final alignment
of hair, depth of combing, and time it takes the robot to comb.
The final alignment of the hair was mentioned in 31 of the 90 responses; 25 of these responses indicated
that respondents thought that the hair ended up “...straighter than before it started combing" or some
variant highlighting that the hair looked neater after the combing interaction. This subcomponent received
the highest number of mentions across all categories, indicating that the robot is functionally capable of
combing hair.
Related to depth of combing, the majority of negative responses came from the short brown hairstyle
due to the fact that the hair lies very close to the scalp. 19 responses mentioned that the robot appears to
comb the hair on the outermost layer, but 4 of the responses indicate that the robot also penetrates the
deeper layers of the hair. All 4 positive responses were for the medium and long hairstyles.
The subcomponent of timing was noted in 24 responses. 19 of these responses indicated that the robot
took a long time to move when combing, indicating that they were willing to have a robot that has higher
velocity limits for this task. Interestingly, 4 responses positively described the slow movement of the robot
as cautious and gentle.
Use of Force The theme of force was concerned with the amount of force the robot used during the
combing interaction. The most common comment was that the robot appeared to use too much force
when combing the wig, present in 15 of the responses. This was either stated generally, or quantified as
131
the amount of movement that the mannequin head made during the combing execution. This brought up
concerns for a human user’s safety when using such a system in 6 of the participants’ responses. Conversely, the amount of force used was perceived as gentle and appropriate by 6 of the responses, who
mentioned that the robot appeared to move carefully. Participants responded that the robot did not appear
to have an understanding of the sensation of touch, which worried them as the robot may not realized
how hard it is pulling on knots or tangles.
Characteristics of Movement The responses that regarded the characteristics of movement mentioned
both auditory and visual components. Participants mentioned the lack of sound during the movement as
uncomfortable. Visually, some participants described the static orientation of the hand during the approach
as lacking nuance, whereas others indicated that the robot following the curves and shape of the hair as
having finesse and being smooth. This indicates a need to adapt motion to the specific user of the system.
7.5.3 Force Evaluation
Physically, the maximum force exerted by the system for any hairstyle did not exceed 15N, with the majority of strokes exerting peaks of less than 5N for each hair style as measured by joint efforts of the arm
(see Fig. 7.4). These values are similar in magnitude to sensitive physical interaction tasks such as shaving
(7N peak magnitude) and wiping cheeks (14N peak magnitude) (Hawkins, King, et al. 2012).
7.6 Summary
This chapter presented an algorithm to plan hair-combing trajectories for robots to assist users with personal care tasks, allowing users to customize how a robot combs their hair. These paths followed the flow
of the user’s hair, and can be calculated with little effort from the user—only requiring a single click. We
found that this technique creates realistic hair combing trajectories through an online user study. We then
132
developed a framework to evaluate future hair combing systems based on a second user study that examined this system performing a combing task. This chapter concludes the section on adapting physical
interactions. The next chapter begins the section on adapting social interactions over time by learning
dynamics models of user engagement in a rehabilitation game for youths with cerebral palsy.
133
Part III: Adapting Social Interactions
In addition to physical support, robots provide social support by motivating users throughout assistive interactions. Users have different backgrounds, expectations, and norms that affect how they prefer to communicate
with a robot. By adapting a robot’s social behavior, users can can become more motivated to perform an assistive task, and experience greater benefits from completing these assistive tasks. For example, a user may be
more motivated when a robot provides a certain kind of feedback, which results in a higher volume of exercise
practice, which in turn leads to higher therapeutic outcomes. In Chapters 8 and 9 we explore how robots can
adapt social interaction to users. In Chapter 10, we propose an algorithm that allows users to quickly adapt
both social and physical robot tasks.
134
Chapter 8
Personalizing Engagement Models in Rehabilitation Games for Users
with Cerebral Palsy
This chapter explores how robots can provide personalized social assistance to users with cerebral palsy
practicing physical exercise games. We found that physically embodied robots provide a higher sense
of enjoyment and companionship in practicing repetitive exercises through interactive games. We also
found individual differences in how participants responded to social feedback from the robot. We created
a technique to automatically learn user “types" from interaction data. Through simulation, we identified
potential pitfalls when making decisions based on these types when they are incorrectly inferred, pointing
to the potential disadvantages of personalization without customization. This chapter is summarized in
Section 8.6. Return to the Table of Contents to navigate other chapters.
This chapter is adapted from “Personalizing User Engagement Dynamics in a Non-Verbal Communication Game for Cerebral Palsy" (Dennler, Yunis, et al. 2021), written in collaboration with Catherine Yunis,
Jonathan Realmuto, Terence Sanger, Stefanos Nikolaidis, and Maja Matarić.
8.1 Motivation
In 2013, Cerebral palsy (CP) was one of the most prevalent motor disorders in children (Oskoui et al.
2013), affecting around 0.2%-0.3% of all live births in the United States (Winter et al. 2002). The main
135
symptom of CP is involuntary muscle contractions that lead to repetitive movements (Sanger 2004) which
can greatly affect a child’s ability to communicate with caregivers and peers (Hidecker et al. 2011). This
symptom necessitates the use of active orthoses to facilitate proactive communication and to aid in motor
rehabilitation (Realmuto and Sanger 2019).
Retraining motor skills, however, requires repetitive and lengthy sessions to be effective (Buitrago, Bolaños, and Caicedo Bravo 2020). In children especially, this can lead to disengagement with the therapeutic
activity, negatively affecting functional outcomes. Thus, we aim to facilitate engaging therapeutic activities through the development of an engaging game between a participant and a socially assistive robot
(SAR) (Feil-Seifer and Mataric 2005)(Matarić and Scassellati 2016). The robot encourages the participant to
perform (and therefore practice) non-verbal communicative gestures while providing social reinforcement
as the participant makes progress in the game.
This work explores the effect of both physical and social factors of the interaction design on the ability
to effectively engage participants. Physically, we investigate the embodiment of the agent that delivers the
game. Several recent works have described the positive effect of strongly embodied agents, such as robots,
on the engagement of participants over weakly embodied agents, such as computers (see review by Deng
et al. (Deng, Mutlu, and Mataric 2019)). Socially, we investigate how the feedback provided by the agent
throughout the game affects participant engagement. Understanding how robots can effect engagement
dynamics is an under-explored area of human-robot interaction (HRI) (Oertel et al. 2020).
Both physical and social factors are investigated through a user study of participants with CP. We
found that participants preferred interacting with the SAR compared to a screen-based agent but did not
observe any significant differences in engagement levels between the two conditions, which we attribute
to individual differences in how participants responded to the robot’s actions. To explore user engagement further, we then developed a probabilistic model for personalizing the robot’s actions based on an
136
individual participant’s responses to the robot, and show in simulation that this improves the users’ engagement levels compared to models that are not personalized. Together, the results of this work indicate
the promise of personalized SAR for helping individuals with cerebral palsy to practice non-verbal communication movements.
8.2 Inspiration: User Engagement and Exercise Gamification
Engagement is a key factor in measuring the quality of HRI scenarios(Oertel et al. 2020), and in particular
has been studied extensively as means of maintaining user interest (Celiktutan, Sariyanidi, and Gunes
2018). User-specific perceptual models to identify user engagement have been explored in the context of
the autism spectrum disorder (Jain, Thiagarajan, et al. 2020; Rudovic, Lee, Mascarell-Maricic, et al. 2017;
Rudovic, Park, et al. 2019), where user behavior varies significantly due to personal differences. These
personal differences are also present in CP populations, where there is a great variance between individuals
in how motor function is impacted.
Consequently, there has been an emphasis on developing personalized user models to facilitate interactions (see reviews by Clabaugh et al. (Clabaugh et al. 2019) and Rossi et al.(Rossi, Ferland, and Tapus 2017)).
Personalized interactions have shown promising results in various domains, ranging from rehabilitation
(Tapus, Ţăpuş, and Matarić 2008) to robot tutoring systems (Leyzberg, Spaulding, and Scassellati 2014),
by implementing robot action selection based on personalized user models; however, few have studied
engagement dynamics.
A review of several studies (Malik, Hanapiah, et al. 2016) concludes that SARs have been effective for
clinical populations diagnosed with CP, demonstrating that physical robots can elicit positive responses
from users with CP who are performing repetitive physical exercise tasks. Robots as partners in gamelike therapeutic physical activities have been shown to create engaging experiences for users and lead
to increased motivation (Brisben et al. 2005). The importance of engagement is emphasized in studies
137
involving movement exercises for CP, and quantitative measures of engagement are well-established for
this context (Malik, Yussof, et al. 2014). Given the success of using SARs with this population, we aim to
understand how SARs can shape user engagement in practicing repetitive exercises.
8.3 Technical Approach: Modeling Engagement Dynamics
We modeled the evolution of engagement as a Markov chain, where engagement is a binary state variable
that changes stochastically in discrete time-steps, after each of the robot’s actions.
We define a transition matrix T that specifies how engagement s ∈ S changes over time T : S × A →
Π(S). Since the change depends on the robot’s action (clarify, encourage or reward), we parameterized the
transition matrix by the robot’s action a ∈ A.
8.3.1 Learning Personalized Models
To learn personalized engagement models, the first step is to represent how likely a participant is to become
engaged or disengaged given a robot’s action. We captured this with the transition matrix of the Markov
chain. For each participant in the user study, we computed a transition matrix using maximum likelihood
estimation from the sequence of the annotated engagement values.
We next explored whether participants cluster in terms of similar reactions to the robot’s actions.
Previous work (Nikolaidis, Ramakrishnan, et al. 2015) has shown that users can be grouped based on their
preference on how to perform a collaborative task with a robot. We used a similar approach in the context
of social interaction: we clustered participants from the study based how their engagement changed in
response to the robot’s actions.
We converted the transition matrices to vectors, then computed the distance between vectors using
cosine similarity. We then performed hierarchical clustering (Müllner 2011), by iteratively merging the
two most similar vectors into a cluster. The merged vector was formed by averaging the values of the two
138
Robot Agent, Orthosis Off
(Max 10 min)
Robot Agent, Orthosis On
(Max 10 min)
Screen Agent, Orthosis On
(Max 10 min)
Participant Choice, Orthosis On
(Unlimited time)
Robot
Rating
Screen
Rating
1+ min
Break
Figure 8.1: Stages of the within-subject experiment design.
vectors. We selected the final number of clusters, so that each cluster contained at least two individuals. We
transformed the vector of each cluster back to a transition matrix that specified how engagement changed
for participants of that cluster.
We clustered participants at two different resolutions: 1) based on the user as a whole and 2) based
on the users’ response to each of the robot’s three possible actions. The first clustering, based on each
participant’s holistic response to to robot actions, resulted in matching each participant to one cluster.
We call this participant-level clustering. The second clustering, based on each participant’s responses to
each of the robot’s action separately, required three different clustering iterations, one for each action, and
resulted in having each participant matched to three clusters, one for each action. We call this action-level
clustering.
8.4 Data Collection
8.4.1 Study Setup
The study setup consisted of the participant sitting at a table and facing a computer screen or a robot,
both at eye-level, as shown in Figure 9.3c. The experimenter was present in the room for safety and to
collect verbal questionnaire data. Finally, the participant’s parent was located in the hallway outside,
seated separately and not interacting with the study.
139
We used the tabletop LuxAI QT robot (QTrobot: Humanoid social robot for research and teaching 2020),
shown in Figure 1-a; the robot is 25 inches tall, has arms with three DOF arms and a head with two DOF
with a screen face. The robot was modified to work with the CoRDial dialogue manager (Short, Short,
et al. 2017) that synchronizes facial expressions with text-to-speech.
The screen condition used a standard 19" monitor that displayed the same simple animated face as the
robot’s at the same size, as shown in Figure 1-c, and used the same dialogue manager. The robot and screen
agent used the same voice, the same facial features, and the same facial expressions, as well as the same
machine vision algorithm. The strongly-embodied robot moved and gestured in the shared desk space
with the participant, while the weakly-embodied screen was stationary, as shown in Figure 1-b.
The participants wore an orthosis shown in Figure 1-d
that used fabric-based helical actuators to support the participant’s thumbs-up and thumbs-down gestures.
The orthosis was controlled by a Beaglebone microprocessor (Long and Kridner 2019), actuated with a
compressed air tank, and was attached to the participant with Velcro strips for facile donning and doffing
(Realmuto and Sanger 2019). The orthosis was worn throughout the session and was not a manipulated
variable in the experiment.
The participant’s thumb angle was measured by an RGBD camera and transmitted through a ROS
network (Stanford Artificial Intelligence Laboratory et al. 2018). A webcam placed on the table in front of
the participant captured and recorded the participant’s facial expressions. An emergency stop button was
provided to the participant for terminating the interaction at any point.
8.4.2 Interaction Design
At the start of the session, each participant demonstrated a thumbs-up and thumbs-down gesture to generate a baseline for their individual range of motion. Next, the robot explained the number-guessing game,
telling the participant to think of a number between 1 and 50, and to communicate the number secretly
140
to the experimenter, by whispering or typing the number on an iPad. At each turn of the game, the robot
guessed the number and asked the participant if the guess was correct. The participant answered yes or
no by making a thumbs-up or thumbs-down gesture, respectively, using the arm with the orthosis. If the
robot guessed incorrectly, it asked if the number was higher than the guess. The participant then answered
thumbs-up if the number was higher, and a thumbs-down if the number was lower. The robot ensured
that the number of thumbs-up and thumbs-down gestures were approximately equal by tracking the number of each and guessing higher or lower than the target number to keep the counts balanced. The robot
continued to guess numbers randomly from a range of decreasing size as it narrowed in on the correct
answer. Once the robot correctly guessed the number and the participant signalled with a thumbs-up, the
robot asked to play the game again. The participant responded with another thumbs-up or thumbs-down
gesture.
Every time the participant used a thumbs-up or thumbs-down gesture to respond to the robot, the
robot responded with feedback that combined verbal, physical, and facial action, based on the quality
of the gesture and the history of the participant’s gestures. Specifically, the feedback was a clarifying
utterance, an encouraging utterance, or a rewarding utterance, accompanied with a corresponding physical
gesture and facial expression. Clarifying actions were given when the participant’s response was not
legible. Encouraging actions were given when the angle the participant’s thumb made was near their
personal baseline value. Rewarding actions were given when the participant’s thumb angle exceeded their
personal baseline value. All verbal, physical, and facial components of these feedback actions were selected
randomly from a set of appropriate components for each action, to avoid repetition.
8.4.3 Study Design
The study used a within-subjects design shown in the block diagram in Figure 8.1; the participants interacted with the robot in a single session that lasted approximately one hour from the participants entering
141
the room to their departure. The session was divided into four blocks, with periods of rest in between.
The first three blocks lasted up to 10 minutes each and had the participant play as many games with the
robot/screen as desired, while the final block was open-ended, with no fixed duration. Between blocks, the
participant rested for at least one minute or until they were ready to move to the next block to mitigate
effects of muscle fatigue. The first block served as a practice block to familiarize the participant with the
interaction. In that block, the participant interacted with the robot while the orthosis was not powered
and thus not assisting their movement. In the second block, the participant interacted with the robot with
the orthosis powered on. After the second block, the experimenter verbally administered a questionnaire
about the participant’s experience with the robot. In the third block, the participant interacted with a computer screen with the orthosis powered on. After the third block, the experimenter verbally administered a
questionnaire on the participant’s experience with the screen-based agent. The fourth block was optional,
and the participant was given a choice of playing with the robot, playing with the screen, or ending the
session.
8.4.4 Hypotheses
Since strongly-embodied physical agents have been shown to increase engagement and positive outcomes
in therapeutic tasks (Deng, Mutlu, and Mataric 2019; Malik, Hanapiah, et al. 2016), the following hypotheses were tested:
• H1: Users with CP will prefer the robot over the screen.
• H2: Users with CP will be more engaged when interacting with the robot than the screen.
Measures The participant preference of the embodiment (robot vs. screen) was measured using a threefactor five-point Likert scale, with questions from scales validated in previous works (Heerink et al. 2009;
142
Figure 8.2: Participant responses to Likert-scale questions, grouped by measured construct.
Table 8.1: Survey questions and associated factors of Companionship (C), Perceived Enjoyment (PE), and
Perceived Ease of Use (PEU).
Survey Question Factor
How much do you like playing with the {robot, screen}?
How much do you want to play again with the {robot, screen}?
How friendly is the {robot, screen}?
Is the {robot, screen} exciting?
Is the {robot, screen} fun?
Does the {robot, screen} keep you happy during the game?
Is the {robot, screen} boring? (inverted)
Is playing with the {robot, screen} easy?
Is communicating with the{robot, screen} easy?
Is the {robot, screen} useful when playing the game?
Is the {robot, screen} helpful when playing the game?
Is playing with the {robot, screen} hard? (inverted)
C (Lee, Park, and Song 2005)
C (Lee, Park, and Song 2005)
C (Lee, Park, and Song 2005)
PE (Moon and Kim 2001)
PE (Moon and Kim 2001)
PE (Moon and Kim 2001)
PE (Heerink et al. 2009)
PEU (Venkatesh 2000)
PEU (Venkatesh 2000)
PEU (Venkatesh 2000)
PEU (Venkatesh 2000)
PEU (Venkatesh 2000)
Lee, Park, and Song 2005; Moon and Kim 2001; Venkatesh 2000). The three factors were: perceived enjoyment, companionship, and perceived ease of use.
The participants’ engagement was quantified using identical criteria as in Clabaugh et al. (Clabaugh
et al. 2019), which also measures engagement in a game-based interaction. The participant was labelled as
engaged if they responded to the robot’s question, thought about the correct answer to the question, had
a positive facial expression, and was looking at the robot as seen in the auditory and visual data captured
by the camera. We represented the level of engagement as a binary variable (engaged/not engaged), as
measured by a trained annotator. To ensure consistency, a secondary annotator independently annotated
10% of the videos selected at random. We measured inter-rater reliability with Cohen’s Kappa, and achieved
143
substantial agreement of k = .73, corresponding to an agreement on 86% of videos, similar to other works
in engagement (Rudovic, Lee, Dai, et al. 2018; Clabaugh et al. 2019).
8.5 Results
8.5.1 Participants
We recruited 10 participants (3 female, 7 male) diagnosed with CP and having symptoms of dystonia in
at least one upper limb. The age range of the participants was 9-22 years, with a median age of 15 years.
The gender imbalance is representative of the higher prevalence of males in CP populations (Johnston and
Hagberg 2007), and the large age range reflects the challenges of recruitment of this population. Half of
the participants wore the orthosis on their left hand, and the other half wore the orthosis on their right
hand. All participants successfully completed the study and were provided with compensation for their
time. This study was approved by the University of Southern California Institutional Review Board under
protocol #UP-19-00185.
8.5.2 User Preference for Robot Embodiment
Embodiment preference was determined by the difference in ratings between corresponding questions for
the robot and screen conditions. The specific questions are shown in Table 8.1. The combined responses for
all factors are shown in Figure 8.2. We found high internal consistency for all factors: Perceived Enjoyment
(α = .94), Companionship (α = .91), and Ease of Use (α = .89). We evaluated significance with a
Wilcoxon Signed-Rank Test and found a significant preference for the robot over the screen in factors
measuring Companionship (Z = 9.0, p = .026) and Perceived Enjoyment (Z = 23.5, p = .018). We
found no significant differences in Ease of Use (Z = 52.0, p = .399) and attribute this to the fact that
both embodiments used the same vision system, which suffered from perceptual errors (such as failing to
detect the participant’s off-camera thumb angle) about 20% of the time. We additionally note that many of
144
Figure 8.3: Transition matrices of the two clusters found in the participant-based clustering method. Each
matrix specifies the probability of becoming engaged (E) or disengaged (D) at the next time-step, given the
current state.
the responses showed no preference for either the robot or screen due to the tendency of the participants
to respond with similar values for all questions. The results therefore partially support H1, indicating that
participants somewhat preferred to interact with the robot over the screen.
8.5.3 Clustering Participant Social Feedback Preferences
Using the participant-level method, we found two main clusters, shown in Figure 8.3. In the first cluster,
the encourage action had a greater likelihood of causing the participant’s next state to be Engaged (E) than
the reward action. In the second cluster we observed the opposite effect: the probability of changing from
Disengaged (D) to Engaged (E) is lower for the encourage action than for the reward action. We observe that
the clarify action has a small effect on changing a participant’s engagement state in both clusters. Seven
participants belonged to the first cluster, and three participants belonged to the second cluster. There
were no clear factors that lead to the makeup of the participants in the clusters based on the background
information collected in the study.
145
The action-level clustering method generated separate clusters for each action independently (Figure 8.4). Thus, a single participant is described as being a part of three action-level clusters. We observed
three different types of matrices across the different actions:
• Type I indicates a high probability of the participant becoming Engaged, regardless of their previous
state.
• Type II has an approximately equal probability of becoming Engaged or remaining Disengaged, if
the participant was previously Disengaged.
• Type III features participants who are most likely to remain in the same state.
We found that the clarify action generated only Type II and III clusters, since most participants’ engagement did not change based on that action, with three participants belonging to the Type II cluster and
seven participants belonging to the Type III cluster when conditioning on the clarify action. The reward
action generated Type I and Type II clusters, since most participants became Engaged after a reward action. Three participants belonged to the Type I cluster and seven participants belonged to Type II cluster.
This finding supports previous work (Fogg and Nass 1997) that showed positive reinforcement improving
participants’ engagement in computer-based animal guessing games.
Only the encourage action generated clusters of all three types. The encourage action had three participants that responded in alignment with the Type I cluster, five participants that aligned with the Type II
cluster, and two participants that aligned with the Type III cluster.
We investigated whether the composition of the participants in each cluster was related to the demographic information we collected. Specifically, we analyzed cluster composition with a multinomial
logistic regression of age, gender, and handedness onto cluster type and found no significant differences
in composition between any clusters at either participant or action levels. Qualitatively, age was a minor
component in the Encourage clusters; older ages appeared to be more associated with the Type II cluster
146
Figure 8.4: Transition matrices of different clusters found in the action-based clustering method. Each
matrix specifies the probability of becoming engaged (E) or disengaged (D) at the next time-step, given the
current state.
(median age 16), whereas younger participants either fell into Type I (median age 14) or Type III (median
age 10.5) clusters.
8.5.4 Personalizing Robot Actions
If a robot knows the participant’s cluster and adapts its actions to maximize engagement, to what extent does
this improve the participant’s engagement? Following prior work in simulating users based on personas
(Andriella, Torras, and Alenyà 2019), we show the benefit of using a personalized engagement model by
modeling users based on the data from the user study. We focus on the action-level clustering method,
since it generates different clusters for each robot action, resulting in a higher resolution model than the
participant-level clusters using the same amount of data.
We modeled users based on each participant’s transition matrices described in Section 8.3.1. At each
timestep, we have the user’s current engagement state and the set of the ground-truth transition matrices
for each action from the study. When the robot takes an action, the user’s next engagement state is sampled
147
(a) Correct User Inference vs. Random
Selection
(b) Incorrect User Inference vs. Random Selection (c) Comparison of all strategies
Figure 8.5: Percentage of time that modeled users were engaged for different methods of robot action
selection. Selecting actions based on the correct user clusters (a) keeps users more engaged, however
selecting actions on incorrect user models (b) has an adverse effect. Considering the users as one group
(c) performs similarly to the random baseline.
from the probability distribution of the corresponding action and the current engagement state. This
process is repeated for 100 timesteps, as determined by the number of turns in the in-person study. We
additionally average our results over 100 runs for each participant to mitigate the random effects of the
simulation and converge to the true mean of the engagement level over the course of the simulation.
Our strategy for selecting actions in the simulation is to maximize the likelihood of the user becoming
engaged based on the estimated user clusters. For instance, if a user is said to be Type II for clarify, Type
I for encourage and Type II for reward, and the participant is currently disengaged (D), then the robot
would choose the encourage action, since the participant would become engaged (E) with probability 0.87
(compared to 0.48 for reward and 0.59 for clarify). These estimated clusters, however, are distinct from the
true clusters used to simulate the users: for example, a user may truly be associated with the type II cluster
for encourage actions, but we may erroneously select actions as if that user were associated with a type
I cluster for the encourage action. We additionally imposed a 0.2 probability of the robot taking a clarify
action no matter what state the participant was in to account for incorrect or illegible gestures; similar to
the rate that was observed in the in-person study.
148
We computed two baselines: 1) the robot selects actions randomly, choosing encourage or reward action
with equal probability; and 2) the robot selects actions to maximize engagement based on the transition
matrix computed from the maximum likelihood estimate of all the participants. The second baseline, which
we call the “impersonal strategy", is equivalent to having one cluster for every action; it does not account
for individual differences.
Fig. 8.5 shows the average time the modeled users spent being Engaged in the activity for each condition. A two-tailed paired t-test showed that modeled users spent significantly more time being Engaged
when the robot selected actions given their cluster compared to taking random actions (t(11) = 9.370,
p < .001). However, if the robot had a flawed model of the user, the modeled users were significantly
less engaged over the course of the interaction compared to randomly selecting actions (t(131) = −4.692,
p < .001). This result highlights the trade-offs that personalization may bring, especially in low-data
scenarios.
The second baseline treated users as coming from one cluster. The personalized strategy significantly
outperformed the impersonal strategy (t(11) = 4.984, p < .001). Furthermore, we cannot say that the
impersonal strategy performed any differently than randomly selecting actions (t(11) = −.656, p = .525).
This shows the importance of algorithmic design in interaction, and how incorrect assumptions can lead
to data-driven models that are ineffective.
8.6 Summary
This chapter demonstrated a robotic interaction that assists users with cerebral palsy in practicing orthosis use to perform gestures by developing personalized engagement dynamics models. We found that a
robot provides more positive social evaluations than a computer screen, specifically robots have a higher
level of companionship and enjoyment. We additionally showed that users demonstrated different levels
of engagement depending on the robots social feedback. By creating these personalized models of user
149
engagement dynamics, we found robots can select actions better than generalized models of user engagement dynamics. We also showed through simulation that incorrect user assumptions can lead to lower
engagement than random social feedback, informing the development of adaptable robotic systems. The
next chapter moves from passive adaptation through ratings of engagement to enabling users to actively
adapt robot signals through exploratory search.
150
Chapter 9
Customizing Robot Signaling Behaviors through Novel Interfaces
The purpose of this chapter is to create interactions that facilitate social customization of robots through
robot signal design. We created an interface for robot signal design that prioritizes exploratory searching interactions, resulting in an experience that users were intrinsically motivated to use. We found that
users freely generated data using this interface that we used to define a training objective to learn neural
representations of robot behaviors. We found that using these representations additionally facilitates personalization. This chapter is summarized in Section 9.7. Return to the Table of Contents to navigate other
chapters.
This chapter is adapted from “The RoSiD Tool: Empowering Users to Design Multimodal Signals for
Human-Robot Collaboration" (Dennler, Delgado, et al. 2023) and “Contrastive Learning from Exploratory
Actions: Leveraging Natural Interactions for Preference Elicitation" (Dennler, Nikolaidis, and Mataric
2025), written in collaboration with David Delgado, Daniel Zeng, Stefanos Nikolaidis, and Maja Matarić.
151
Figure 9.1: Example exploratory search process. Users engaging in exploratory search test out different
robot behaviors to learn what the robot is capable of and what they prefer the robot to do.
9.1 Motivation
Users have a variety of preferences for how robots should behave based on many contextual factors, but
those contextual factors are often unknown to the designers of robotic systems before a robot is deployed.
Consider a wheeled robot that helps users find misplaced items in their home. One user may be a longtime dog owner and thus interpret this interaction as similar to playing fetch. That user might expect the
behavioral aspects of the robot to be dog-like. For example, the robot may move erratically as if following
a scent, bark when it has found an item, and emote to portray happiness to have completed its command.
Another user, in contrast, may have experience using smart devices and expect the interaction to be purely
functional. That user might instead expect the robot to move and scan the room methodically, chime when
it finds an item, and immediately return the item to the requester.
Deploying a robot that behaves in only one way cannot satisfy both of these users. Thus, users must
be able to customize robot behaviors to align with their preferences. Several works view the problem of
aligning the robot with the user’s preferences as modeling a user’s internal reward function, which can be
152
Figure 9.2: CLEA: Contrastive Learning from Exploratory Actions. Users engage in exploratory
search to select their preferred robot behaviors. We automatically generate data from their exploratory
actions to learn features that facilitate future interactive learning processes. Our contributions are highlighted in pink, and the enabling work that CLEA supports in highlighted in green.
addressed with inverse reinforcement learning (Ng and Russell 2000; Abbeel and Ng 2004). In this context,
the reward function takes in numerical “features" of the robot’s behavior, e.g., a score of how dog-like or
machine-like the behavior is, and output a single value that corresponds to how good that behavior is for
the user. How these features are defined heavily influences how effectively a robot can adapt to a specific
user. Features can be learned directly from the robot behaviors through self-supervised techniques like
autoencoders (AEs) and variational autoencoders (VAEs). While these techniques result in features that
are representative of the robot’s behaviors, they may not align with the features people actually care about.
The most effective way to learn user-aligned features is by leveraging user-generated data (Bobu, Peng,
et al. 2024). However, to collect such data typically involves a data-labelling process known as a proxy task
before users can engage in the actual task of customizing the robot (Bobu, Liu, et al. 2023; Lee, Smith, and
Abbeel 2021; Yang and Nachum 2021).
Our goal in this work is to learn features for robot behaviors that are aligned with user preferences,
but do not require users to engage in unrelated proxy tasks. To accomplish this, we identified a new form
of user-collected data that is generated during the robot customization process. To do this, we allowed
users to customize behaviors for a Mayfield Kuri robot that helped them locate items around a room.
153
Participants used the RoSiD interface to design state-expressive signals (Dennler, Delgado, et al. 2023),
which allowed them to search through thousands of example robot behaviors. Participants automatically
performed exploratory actions by selecting behaviors that appeared appealing to them while ignoring those
they thought were irrelevant, as illustrated in Figure 9.1.
Our key insight is that we can use these user exploratory actions to learn features for robot behaviors that
are both aligned with users and do not require users to complete proxy tasks before customizing the robot.
We view users performing exploratory actions as engaging in an intuitive reasoning process, and modeled
this using a contrastive loss to train feature-generating networks. We named this framework contrastive
learning from exploratory actions (CLEA) and provide an overview in Figure 9.2.
We showed that CLEA learns features that are more effective for eliciting a user’s preferences than
the state-of-the-art self-supervised learning techniques, offering a scalable and user-friendly approach to
personalizing robot behaviors.
9.2 Inspiration: Exploratory Search
Work in human-computer interaction (HCI) distinguishes two interactions with databases: information
retrieval and exploratory search (Marchionini 2006). Information retrieval (Sanderson and Croft 2012; Singhal 2001) refers to an interaction with a data system wherein the user knows exactly what they need to
find—the user’s reward is known. In exploratory search, the exact goal is unknown ahead of time because
the user is unfamiliar with the search topic and how the goal can be achieved (Marchionini 2006). While
many previous works implicitly assume that users know what the robot is capable of doing and present
preference learning as an information retrieval problem (Ng and Russell 2000; Singhal 2001), recent work
has identified that reformulating robot learning as an exploratory search interaction is useful for novice
robot users (Dennler, Delgado, et al. 2023).
154
(a) Query-based interface for choosing
among three signals per modality.
(b) Search-based interface for browsing all
options for each modality.
(c) Participant using the
RoSiD tool in our study.
Figure 9.3: Interfaces for the RoSiD tool.
Exploratory search interfaces encourage users to generate more search data by allowing them to inspect, save, and filter items in large databases (Allen et al. 2021; Chang, Hahn, et al. 2019; Rahdari et al.
2020). By aggregating and scaling these search data across millions of users, HCI researchers can learn
fine-grained profiles of user behaviors (He et al. 2016; Yang and Zhai 2022).
In this work, we examine the effectiveness of exploratory actions for learning features of behaviors that
users care about. We frame an exploratory action as an intuitive reasoning process where a user quickly
evaluates if a behavior is somewhat aligned with their preferences. If it is, they select that behavior to perform a more in-depth evaluation on the physical robot. These perceptual processes are often modeled with
triplet losses both to capture how people make intuitive decisions and to aggregate individual differences
across user populations (Bobu, Liu, et al. 2023; Demiralp, Bernstein, and Heer 2014; Hadsell, Chopra, and
LeCun 2006; Hoffer and Ailon 2015; Radlinski and Joachims 2005). We use this insight from prior work to
learn features of robot behaviors that people care about from the novel data source of exploratory actions.
9.3 Technical Implementation: The Robot Signal Design Tool
Following previous work, we considered robot signals as multimodal behaviors that consist of visual, auditory, and kinetic components. For each type of stimulus, we collected a large dataset of viable options
from public websites. Specifically, we used 5,912 animated videos that represented the visual components,
155
867 sound clips that represented auditory components, and 2,125 head motions that represented the kinetic components∗
. Based on the literature in preference learning and exploratory search, (Allen et al.
2021; Brown, Coleman, et al. 2020; Chang, Hahn, et al. 2019; Sadigh et al. 2017), we employed two main
interactions to select from these options: query-based and search-based interactions.
Query-based interactions are often used to learn user preferences in the field of human-robot interaction
(Sadigh et al. 2017). In these interactions, users review a small number of robot behaviors and specify the
behavior they think is best-suited for a given task. The behavior that the user selects from the small set
of behaviors provides information about what they would like the robot to do in its particular context.
The formulation of how preferences are modeled is provided in Section 9.3.1. Our query-based interface
is shown in Figure 9.3a. In this work, a single query, Q, consists of three specific videos played on Kuri’s
screen, sound clips played through Kuri’s speakers, or motions played on Kuri’s head. We include an
option for the user to specify that none of the three items in the query are what they are looking for.
Search-based interactions are used in exploratory search contexts in human-computer interaction(Chang,
Hahn, et al. 2019). In these interactions, users are presented with a large number of possible options that
can be filtered with key words. The order the options are presented in is important (Allen et al. 2021). We
use the preference data from the query-based interaction to inform the order of the search-based results.
Our search-based interface is shown in Figure 9.3b.
9.3.1 Understanding User Preferences
We adopt the formulation of preference learning, where preferences are represented as a linear combination
of a set of features that describe a time-series, as described by Sadigh et al. (Sadigh et al. 2017). Our goal is
to learn the parameterization of the user’s preference, ω. We evaluate how well a particular query aligns
∗All files are publicly available on github.
156
with a user’s preferences (to assess H3) using the following alignment metric inspired by (Sadigh et al.
2017):
alignment = E
max
q∈Q
ϕq · ϕselected
|ϕq| · |ϕselected|
(9.1)
where q represents the element in the query Q (consisting of 3 items per modality in our system), and
ϕ denotes the features of the particular stimulus, with ϕselected representing the features of the stimulus
the participant selected at the end of the experiment. The maximum alignment score is 1 and the minimum
alignment score is -1.
9.3.2 Creating Features for Multimodal Data
Our assessment relies on the stimuli used in our experiments being represented by a vector that encapsulates the characteristics of the stimulus (i.e., ϕ in Equation 9.1). We chose to use a learned encoding from
pretrained models, as non-linear features have been shown to be effective for preference learning (Brown,
Coleman, et al. 2020). All embeddings were reduced to 32 dimensions using PCA because dimension largely
affects speed in preference learning, and the system was designed to run in real time.
Visual: To create embeddings for the visual features, we used embeddings from a pretrained CLIP
model available from the transformers library (Wolf, Debut, et al. 2020). Each video had a representative
frame selected as the image component, and a short description used as a language component.
Auditory: Embeddings for the auditory features were generated by encoding our audio files with the
pretrained VGGish model (Hershey et al. 2017).
Kinetic: Embeddings for kinetic features came from a GRU model trained through a Seq2Seq task (Dai
and Le 2015) on our movement data, where the series of states of the robot’s head (pan, tilt, eyes) were
encoded through a recurrent network, and a second recurrent network was initialized with the embedding
to reproduce the original sequence.
157
9.3.3 Generating Queries from User Data
To address H3, we propose a method to generate queries Q for signal design that contain items that are
more aligned with what the users ultimately choose. For each signal, we have a dataset for each modality
D that contains the final items selected by the users.
We base this approach on the insight that user preferences are a smaller set of all possible items in
our datasets of signal components. We attempt to find clusters in preferences from the signals that users
designed by using RoSiD. To do this, we partition D into k groups based on the features of the signal
components, ϕ. We then randomly select an item from each of these queries to create more meaningful
suggestions. This process is outlined in Algorithm 4.
Algorithm 4 Generating queries from user data
1: Input: D, a dataset of designed signals; k, the number of items in the resultant query; cluster(D, k),
a partitioning method that returns f : D → {1, 2, ..., k};
2: Output: Q, a set of k options for the user to select from when designing signals;
3: Q ← ∅; f ← cluster(D, k);
4: for i ∈ {1, 2, ..., k} do
5: qi ∈R {d | ∃d ∈ D, f(d) = i}
6: Q ← Q ∪ qi
7: Return Q
9.4 Technical Approach: Contrastive Learning for Exploratory Actions
In this section, we formulate our approach to leveraging exploratory actions from our data collection study
as a source for learning features of robot behaviors.
9.4.1 Preliminaries
We consider robot behaviors as trajectories in a fully-observed deterministic dynamical system. We denote
a behavior as ξ ∈ Ξ, which represents a series of states and actions: ξ = (s0, a0, s1, a1, ..., sT , aT ). These
states and actions are abstractly defined; they can be videos—behaviors in image-space, audio—behaviors in
158
frequency-space, or movements—behaviors in joint-space. We assume that all behaviors ξ ∈ Ξ accomplish
the task without resulting in errors, allowing users to specify based on user preferences rather than the
behavior’s ability to achieve a goal (Nikolaidis, Ramakrishnan, et al. 2015; Nemlekar et al. 2023; Bobu,
Liu, et al. 2023). While generating Ξ is not the focus of this work, it can be completed through several
techniques, such as collecting demonstrations (Nikolaidis, Ramakrishnan, et al. 2015), performing quality
diversity optimization (Tjanaka et al. 2022), and diversely combining motion primitives (Wang, Garrett,
et al. 2018).
We model a user’s preference as a reward function over robot behaviors that maps the space of behaviors to a real value: RH : Ξ 7→ R. The user’s reward function is not directly observable, but can be inferred
through interaction. Our goal is to learn a reward function from user interactions, RH, that maximizes the
likelihood of the user performing the observed interactions. Higher values of RH for a particular behavior
implies that the behavior is more preferred by the user.
Because the state space of robot behaviors can be very large (Robinson et al. 2023; Arulkumaran et al.
2017), directly learning RH from state-action sequences is intractable. To make reward learning tractable,
several works (Bobu, Peng, et al. 2024; Ng and Russell 2000; Abbeel and Ng 2004) assume that there exists
a function Φ that maps from the state-action space to a lower dimensional feature space–a real vector of
dimension d: Φ : Ξ 7→ R
d
. This simplifying assumption allows us to learn RH(Φ(ξ)) from fewer user
interactions.
9.4.2 Contrastive Learning from Exploratory Actions
To learn a Φ, we leverage interaction data that we collected through the robot customization process. Users
naturally engaged in exploratory search when they were presented with many robot behaviors they could
choose from to customize the robot.
159
We formalize exploratory search as presenting a dataset of behaviors to the user: Di = {ξ0, ξ1, ..., ξN }
where each ξi
is sampled from the full database of behaviors Ξ. In our case, ξi
is a video, a sound, or a head
movement, but this definition extends to other behaviors such as robot gaits, or robot arm movements.
This dataset can be generated using various methods, including keyword search (Chapman et al. 2020),
collaborative filtering (Koren, Rendle, and Bell 2021), and faceted search (Yee et al. 2003). Users can view
brief summaries of each behavior in the dataset to determine if the behavior is relevant.
We mathematically model the user’s internal reasoning process when making an exploratory action
with the function ψ : D 7→ {0, 1}. If the user performs an exploratory action on a behavior ξj from the
dataset Di
, then ψ(ξj |Di) = 1. If the user does not perform an exploratory action on a behavior ξk from
the dataset Di
, then ψ(ξk|Di) = 0. We use this definition of an exploratory action to partition Di
into two
sets:
D
ex.
i
:= {ξ ∈ Di
|ψ(ξ) = 1}; D
ig.
i
:= {ξ ∈ Di
|ψ(ξ) = 0} (9.2)
For example, if a user is initially presented with D0 = {ξA, ξB, ξC, ξD}, and they choose ξB and ξD to
execute on the robot, the explored dataset is Dex.
0 = {ξB, ξD} and the ignored dataset is D
ig.
0 = {ξA, ξC}.
In our data collection study, |Di
| ≈ 100 to allow users to meaningfully search through behaviors.
A common way to model and aggregate diverse internal reasoning processes, such as ψ, across a
population of users is to use a triplet loss (Bobu, Liu, et al. 2023; Demiralp, Bernstein, and Heer 2014;
Hadsell, Chopra, and LeCun 2006; Hoffer and Ailon 2015; Radlinski and Joachims 2005). We adopt this
loss function and generate triplets of behaviors from on the explored and ignored subsets. The triplets are
formed by sampling two behaviors at random from one subset and one behavior from the other subset:
(ξ
Dex.
i
1
, ξDex.
i
2
, ξD
ig.
i
1
) or conversely (ξ
D
ig.
i
1
, ξD
ig.
i
2
, ξDex.
i
1
). The triplet loss encourages features from the same
subset to be more similar to each other than features from opposite subsets, according to any metric function. We use the Euclidean distance between features, d(ξi
, ξj ) = ||Φ(ξi) − Φ(ξj )||2
2
, as in other works
160
that found Euclidean distances an appropriate metric for modeling perceptual processes (Bobu, Liu, et al.
2023; Demiralp, Bernstein, and Heer 2014):
Ltrip.(ξA, ξP , ξN ) = max [d(ξA, ξP ) − d(ξA, ξN ) + α, 0] (9.3)
We denote ξA as the anchor example, ξP as the positive example, ξN as the negative example, and
α ≥ 0 as the margin of separation between positive and negative examples. In our case, the anchor and
positive example are interchangeable as they are both from the same unordered set, so we formulate the
triplet loss to be symmetric:
Lsym.(Φ) = Ltrip.(ξA, ξP , ξN ) + Ltrip.(ξP , ξA, ξN ) (9.4)
We formulate the CLEA loss as the sum of this symmetric triplet loss across all of the datasets presented
to all the users in the signal design study:
LCLEA(Φ) =
|D
Xpop.|
i=0
X
(ξA,ξP ,ξN )∼Di
Lsym.(ξA, ξP , ξN ) (9.5)
where Dpop represents the set of all datasets presented to the population of users that performed exploratory actions. We learn features that minimize this loss to create a feature space for robot behaviors
that is consistent with the variations in the population’s preferences.
9.4.3 Learning Preferences from Rankings
We evaluated CLEA through behavior rankings, as in previous works (Myers et al. 2022). We presented
each user with a set of behaviors to rank, referred to as a query, Q = {ξ0, ξ1, ...ξN }. The user then ordered
these options from their least favorite behavior to their most favorite behavior by creating a mapping
σ : {0, 1, ..., N} 7→ {0, 1, ..., N} such that σ(Q) := ξσ(0) ≺ ξσ(1) ≺ ... ≺ ξσ(N)
. The notation ξi ≺ ξj
denotes that behavior ξj is preferred over ξi
.
161
We interpreted this ranking as a collection of pairwise comparisons, as in previous works (Brown, Goo,
Nagarajan, et al. 2019). We adopted the Bradley-Terry preference model (Bradley and Terry 1952) to model
the probability that the user chooses behavior ξj from the pair of behaviors (ξi
, ξj ) based on the feature
space mapping Φ (i.e., learned with CLEA or other self-supervised objectives):
P(ξi ≺ ξj |RH) = e
RH(Φ(ξj ))
eRH(Φ(ξi)) + e
RH(Φ(ξj )) (9.6)
To learn the user’s reward function, RH, we maximize the probability of all pairwise comparisons
induced by the rankings the user performed. We construct a dataset containing all the users rankings,
Dpref. = {(Q0, σ0),(Q1, σ1), ...(QK, σK)}. We minimize the total cross entropy loss summed over all
pairwise comparisons across all rankings:
L(RH) = X
(Q,σ)∈Dpref
|Q
X
|−1
i=0
X
|Q|
k=i+1
− log P(ξσ(i) ≺ ξσ(k)
|RH) (9.7)
RH can be any computational model that can update its parameters to minimize a loss function. In this
work, we used both neural networks and linear models to approximate RH to compare with prior works,
however other techniques such as gaussian processes (Biyik, Huynh, et al. 2024) are possible.
9.5 Data Collection
9.5.1 Experimental Validation: Usability of the RoSiD Tool
We explored three user study hypotheses related to system use characteristics to evaluate RoSiD:
• H1: Participants will rate the system as usable according to the System Usability Scale (Brooke 1996).
• H2: Participants will spend the most time designing the first signal to learn how to use the system.
162
Figure 9.4: Structure of the design session with approximate times for each section.
• H3: Participants will benefit more from having suggested signals based on the signals other participants
designed than random signals.
Signal Design Session In this section, we describe the details of the interactions users had with the robot
while using the RoSiD tool to design the robot’s signals and evaluate those signals. Our protocols were
approved by the university’s Institutional Review Board under #UP-23-00408.
Participants engaged in a one-hour design session. Upon entering the experiment space, they were
told that they would be designing four signals for a robot that will assist them with finding items around
the experiment space. The signals consisted of three components: visual, auditory, and kinetic.
Participants designed signals for a modified Mayfield Kuri robot (shown in Figure 9.3c). Since the robot
does not have a screen or affordances for carrying items, we added an external screen to provide a salient
visual component to the signalling and a backpack to hold the Raspberry Pi and power supply, with a
pouch for holding objects being transported. The four signals participants designed were:
1. Idle: played every 10 seconds while the robot waits for commands, indicating that the robot is ready
to accept a command.
2. Searching: played every 10 seconds while the robot searches for objects, indicating that the robot is
actively searching for an item.
3. Has Item: played once, when the robot has an item in its pouch and is ready for the participant to
remove the item.
163
4. Has Information: played once, when the robot has found an object, but the object is inaccessible. The
participant can follow the robot to the location of the object to retrieve it.
Each participants was then introduced to the RoSiD interface as described in Section 9.3 and designed
the four signals in a randomized order to mitigate any ordering effects. The participant was free to use the
interface however they liked, for as long as they liked. Participants tended to favor either the query-based
and search-based interactions in their design process, but this was dependent on the individual. After
finishing designing all four signals, the participant filled out the System Usability Scale (Brooke 1996).
The participant next engaged in an interaction with the robot, where the robot was piloted by an
experimenter. To simulate being occupied as the robot roamed around the environment, the participant
was also engaged in a word search task. To complete the word search, the participant had to ask the robot
to help them search for items around the room, which had words for the word search printed on the item.
For example, participants were tasked to ask Kuri to find a stapler, and the stapler had the word "haptic"
printed on it. The participant then located "haptic" in the word search. The time limit for this section
was 10 minutes. Following the interaction, participants engaged in a semi-structured interview and were
compensated with a 20 USD Amazon gift card. The entire study design is illustrated in Figure 9.4.
Participants
Participants were recruited from the USC student population through email, flyers, and word-ofmouth. A total of 25 participant were part of the study, with ages that ranged from 19 to 43 (median
25); participants self-declared as men (13), women (10), and genderqueer, nonbinary, or declined to state
(3, aggregated for privacy; some participants belonged to multiple groups), 13 participants identified as
LGBTQ+. All participants were able to create signals they liked for all four categories, and all successfully
interacted with the robot to collect all the items in the word search task.
H1: System Usability Scores. We examined the participants’ SUS scores based on recommendations from
a meta-analysis of several extant systems (Lewis 2018). The participants rated the system with a median
164
(a) Time to design by order. (b) Time to design by signal.
Figure 9.5: Box plots showing the times users spent deigning signals.
score of 75 out of 100 on the SUS scale, suggesting that the system is between "good" and "excellent" on an
adjective rating scale, and a letter grade of ’B’ demonstrating an above-average user experience. Using a
Mann-Whitney U-Test, we found that the ratings were significantly higher than 65 of 100 on the SUS scale
(U = 10.0, p = .015), indicating that our system is above average in its ease of use, supporting H1.
H2: Time Spent Designing Signals. We examined how long it took users in our study to design the
signals. An ANOVA revealed that the time to design signals depended on the order that they were designed
in (F(3, 96) = 26.549, p < .001), as illustrated in Figure 9.5a. Post hoc analysis showed that the only
significant pairwise differences were between the first signal designed and the rest. This indicates that our
system is easy to learn to use, because the time to design signals stabilized after the first designed signal,
supporting H2. We also found no significant differences between the kind of signal and the time to design
the signal, illustrated in Figure 9.5b, indicating that the signals were similarly easy to design. This implies
that the particular signals we selected were easily understandable for the participants. The type of signal
had little effect on the results of our analysis.
H3: Using Clusters to Initialize Queries. We examined how we could use prior information based on
the signals collected from other users to generate queries that are more aligned with what participants
ultimately chose when designing their own signals. We used a leave-one-out cross-validation setting for
165
(a) Visual components. (b) Auditory components. (c) Kinetic components.
Figure 9.6: Box plots comparing the alignment of initial queries based on random suggestions and the
proposed clustered suggestions.
each participant and formed clusters from all but one participant following the process in Section 9.3.3.
For our clustering method we used agglomerative clustering as implemented in scikit-learn (Pedregosa
et al. 2011). We calculated the alignment score as described in Section 9.3.1 for the clustering method
as compared to randomly selecting queries for each of the three modalities. We performed an ANOVA
analysis for each of the modalities to study the effect of including other user’s information on the maximum
query alignment for new users.
We found that for the visual modality there is a significant main effect across query method (F(1, 3) =
44.106, p < .001), with an average increase in initial alignment of .117 across all signals when using the
clustering method over randomly selecting stimuli. For the auditory modality there was a significant main
effect of query method (F(1, 3) = 19.544, p < .001), with an average increase in initial alignment of
.141 across all signals. For the kinetic modality there was also a significant effect of query type (F(1, 3) =
49.393, p < .001). For the kinetic modality there was an average increase in initial alignment of .132 across
all signal types.
166
9.5.2 Collecting User Preference Data on Robot Signals to Evaluate CLEA
To evaluate the efficacy of learning feature spaces for robot behaviors, we conducted an experiment with
a new set of participants. The participants ranked behaviors to generate individual datasets that we could
use to quantitatively test different feature-learning algorithms.
Manipulated Variables. To evaluate the effectiveness of learned features using automatically collected
data, we evaluated six total algorithms for learning feature spaces. The first baseline was: (1) Random,
a randomly-initialized neural network that projects each behavior to a vector. Random networks can be
effective feature learners, as they cannot overfit to data or learn spurious correlations. We also evaluated
two self-supervised baselines: (2) AE, an autoencoder that uses a self-supervised loss to learn features that
reconstruct the behavior, (3) VAE, a variational autoencoder that uses a self-supervised loss to both reconstruct and standardize the distribution that the features come from. The AE and VAE methods use the latent
space of these models as features. All of these self-supervised losses can also be combined with CLEA, so
we evaluate the following as our proposed algorithms: (4) CLEA, (5) CLEA+AE, and (6) CLEA+VAE. For
all algorithms, we learned separate feature spaces for each of the three signal modalities: visual, auditory,
and kinetic. The size of each feature space was a 128-dimension vector, which was sufficient to capture
diverse preferences for complex behaviors (Chen, Yin, et al. 2021).
Procedure. The five behaviors we presented to the user for each ranking were selected based on the final
customized signals in the customization session described in Section 4.3, because previous work has shown
that using other users’ preferences is a good initialization for new users (Nikolaidis, Ramakrishnan, et al.
2015; Nemlekar et al. 2023; Dennler, Delgado, et al. 2023; Dennler, Yunis, et al. 2021). To generate each
of these five behaviors, we first randomly sampled a customized behavior from the customization session
and then sampled one of the six feature-learning algorithms. Then, we calculated the behavior in the full
database of behaviors that minimized the feature distance to the custom behavior we sampled, according
to the feature space we sampled.
167
In order to fully evaluate the proposed algorithms, we must know the user’s overall favorite behavior
to use as a ground-truth preference. To achieve this, the fifth and tenth ranking used the top-ranked
signals from the previous ranking trials to create a “super ranking". The highest-ranked behavior in the
final ranking trial represents the participant’s overall favorite behavior.
9.6 Results
9.6.1 Participants
To evaluate the generalizability of the features we learned, we collected ranking data from a separate set of
42 new participants (19 women, 4 non-binary, 19 men). Each participant completed ten behavior ranking
trials for a particular modality and signal, with each ranking consisting of five robot behaviors.
9.6.2 Hypotheses
A survey by Bobu et al. (Bobu, Peng, et al. 2024) identified four criteria that constitute good representations
for downstream preference-learning tasks: (1) Completeness, the ability of a representation to capture a
user’s true preferences, (2) Simplicity, the ability to recover user preferences from linear transformations
of the representations, (3) Minimality, the ability of a representation to exist in low-dimensional spaces,
and (4) Interpretability, the ability of users to understand properties of the representation. We adopt this
framework for our analysis.
We evaluate these four criteria with the data collected from the 42 participants in Section 9.5.2. We
split the data from each participant into 70% for training our reward models, and 30% for evaluating our
reward models.
Based on this framework, we tested four hypotheses comparing CLEA features to self-supervised features:
168
visual auditory kinetic
0.5
0.6
0.7
0.8
0.9
1.0
Predicted Choice Accuracy
Random
CLEA
AE
CLEA+AE
VAE
CLEA+VAE
Figure 9.7: Completeness results. Across three modalities, feature spaces using CLEA are able to accurately predict user preferences. Error bars show mean standard error across participants.
• H1: Exploratory actions reflect user preferences, so the most complete features will leverage CLEA features.
• H2: Exploratory actions align with preference teaching tasks, so the most simple features will leverage
CLEA features.
• H3: Exploratory actions efficiently express user preferences, so the most minimal features will leverage
CLEA features.
• H4: Exploratory actions are semantically meaningful, so the most interpretable features will leverage
CLEA features.
9.6.3 Results
Evaluating Completeness. Completeness refers to the feature’s ability to capture all relevant information to understand how a user ranks robot behaviors. To evaluate completeness, we aim to learn a reward
model that can accurately model the choices participants made during the ranking experiment as in previous work (Bobu, Liu, et al. 2023). To predict user choices, we used the 128-dimensional feature spaces
for the six algorithms as input to a neural network that estimated the participant’s internal reward. This
169
Table 9.1: Simplicity results. For each modality, we found the area under the curve (AUC) of the
alignment metric over 100 pairwise comparisons across feature dimensionalities. Asterisks indicate bestperforming algorithm within each dimension (all p < .05).
Visual Auditory Kinetic
Dimension 8 16 32 64 128 8 16 32 64 128 8 16 32 64 128
Random .005 .024 .018 .013 .004 -.001 .134 -.001 .000 .000 .003 .179 .091 .187 .272
AE .024 .014 .011 .014 .007 .038 .065 .042 .080 .015 .014 .227 .330* .321 .154
VAE .269 .335* .247 .180 .033 .234 .174 .117 .083 .077 .207 .251 .192 .203 .346
CLEA .012 .261 .245 .090 .044 .058 .002 .142 .113 .046 .284* .224 .255 .345 .217
CLEA+AE .315* .219 .330 .163 .275* .260 .023 .141 .015 .140 .079 .208 .192 .207 .147
CLEA+VAE .196 .295 .376* .293* .147 .438* .343* .236* .198* .175* .009 .260* .165 .373* .377*
0 20 40 60 80 100
Number of Queries
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Alignment
Visual
0 20 40 60 80 100
Number of Queries
Auditory
0 20 40 60 80 100
Number of Queries
Kinetic
Random AE VAE CLEA CLEA+AE CLEA+VAE
Figure 9.8: Minimality results. Alignment of a linear reward model across numbers of pairwise comparisons for the smallest sized feature space. Shaded region indicates mean standard error.
reward network that consisted of two fully connected layers with hidden dimensions of 256 units to output
a single value. The training objective maximized the probabilities of the selected behaviors in the training
set using Equation 9.6. We quantitatively evaluated the Completeness of our features with the accuracy
of correctly predicting the user’s observed choice in the test set.
The accuracy of all three modalities is shown in Figure 9.7. We show that CLEA+AE was the most
accurate in modeling participant’s choices in the Visual and Auditory modality, determined by t-tests (all
p < .05). In the Kinetic modality, CLEA+AE, CLEA, and Random were tied for the most accurate, but were
more accurate than the other methods (p < .05). CLEA features contain complete information to model
user preferences, supporting H1.
Evaluating Simplicity and Minimality. A feature space is simple if it can model a user’s preference
with a linear reward model, and it is minimal if the dimensionality of the feature space is small (Bobu,
170
Peng, et al. 2024). To evaluate simplicity and minimality, we evaluated feature spaces across five dimensions: 8, 16, 32, 64, and 128. We adopted a simple linear model to estimate a user’s reward, RH(ξ) = ω·Φ(ξ).
We estimate ω using Bayesian inverse reward learning, as in previous works (Biyik, Palan, et al. 2020;
Sadigh et al. 2017; Brown, Goo, Nagarajan, et al. 2019). Simple linear models as a reward model are practically useful because they make the user’s preference easier to store, interpret, and compare than complex
neural network reward models (Reed et al. 2022).
To evaluate simplicity and minimality, we sequentially updated the estimate of the user’s preference ωest. after observing each ranking action they made. After each ranking action, we calculated the
alignment of the user’s true preference ωtrue and estimated preference, ωest., following the equation,
1
M
P
ωest.∼Ω
ωtrue·ωest.
||ωtrue||2·||ωest.||2
, from prior work (Biyik, Palan, et al. 2020; Sadigh et al. 2017). We set the
user’s true preference, ωtrue, as the vector coresponding to the user’s top-ranked signal. We used the area
under the curve (AUC) of alignment over the number of queries as the metric to assess simplicity and
minimality (Myers et al. 2022). A higher AUC value indicates that we learned the user’s preference more
accurately and with fewer queries. We show the alignment curve in Figure 9.8.
To evaluate simplicity, we compared the AUC of our simple model across all five feature space dimensions to show that a simple model effectively models preferences for all dimensions; the results are shown
in Table 9.1. We observed that across all modalities, a CLEA-based feature space has the highest AUC in 13
of the 15 experiments, with CLEA+VAE being the best on 10 of these experiments. Overall, training with
the CLEA objective resulted in simple representations that were useful across different sized dimensions,
supporting H2.
To evaluate minimality, we compared the AUC for only the 8-dimensional feature space to determine if CLEA can model user preferences for low-dimensional feature spaces. The results are shown in
Figure 9.8. We found that CLEA+AE performs the best in the Visual modality, CLAE+VAE performs the
171
Figure 9.9: Interpretability Results. CLEA most often generated the users’ top-rated signal. Dotted line
represents expected number of users if all algorithms were equally preferred.
best in the Auditory modality, and CLEA performs the best in the Kinetic modality. We conclude that using
a variant of the CLEA objective results in features that minimally elicit user preferences, supporting H3.
Evaluating Interpretability. To evaluate interpretability, we examined the top-ranked signal for each
participant in this study. Feature spaces that have meaningful distances are necessary for interpreting and
explaining representations to humans (Sitikhu et al. 2019; McFee, Lanckriet, and Jebara 2011). Based on
this insight, we looked at how many times each participant chose a robot behavior identified from each
feature learning algorithm as their most-preferred behavior. The results are presented in Figure 9.9.
CLEA results in the most interpretable feature spaces, supporting H4.
9.7 Summary
This chapter presented a novel interface, the Robot Signal Design (RoSiD) tool to design robot signals, allowing users to customize robot signaling behaviors. This tool allows users to engage in exploratory search,
a form of interaction where users are intrinsically motivated to generate preference data. We then proposed Contrastive Learning from Exploratory Actions (CLEA), a novel framework to learn robot behavior
172
representations from the exploratory search data collected from the RoSiD tool. We found that these representations facilitate the preference learning process, facilitating robot customization through other forms
of user input such as rankings. This results in a system that can iteratively improve over time as more
user data is collected. The next chapter explores how users can efficiently navigate learned representation
spaces by introducing a novel algorithm to learn preferences from rankings of robot behaviors.
173
Chapter 10
Efficiently Customizing Learned Behaviors through Ranking
This chapter proposes a novel approach to efficiently adapt a robot’s behavior to a user’s preferences
through rankings of behaviors. We consider a user’s desire to see the robot to improve over time, and
propose an algorithm that generates behaviors for the user to rank that are both informative and increase
in overall quality over time. We found that this technique outperforms state-of-the-art approaches based
on information gain, both in simulation and in a user study with real users customizing robot behaviors.
This chapter is summarized in Section 10.6. Return to the Table of Contents to navigate other chapters.
This chapter is adapted from “Improving User Experience in Preference-Based Optimization of Reward
Functions for Assistive Robots" (Dennler, Shi, et al. 2024), written in collaboration with Zhonghao Shi,
Stefanos Nikolaidis, and Maja Matarić.
174
10.1 Motivation
Numerous technical advances in robotics have allowed robots to perform increasingly complex physical
and social tasks. As robots move from laboratories to in-home settings, they will be confronted with users
and contexts that were not previously seen or tested by the robot’s developers. In order to be useful in these
new contexts, robots must adapt their behaviors, e.g., movement trajectories, affective gestures, and voice,
to align with preferences and expectations of specific users (Dennler, Ruan, et al. 2023; Rossi, Ferland, and
Tapus 2017). One user may prefer that a robot hands them an item as quickly as possible, whereas another
user may want the robot to stay as far away from a priceless family heirloom as possible. Users of these
systems will likely not have experience directly programming robots, and thus robots must be teachable
in more intuitive ways.
Previous work has identified that ranking robot trajectories can be a simple and effective method
for non-expert users to teach robots their preferences for how the robot should move (Brown, Goo, and
Niekum 2020; Keselman et al. 2023). Two main approaches use ranking information to learn preferences:
(1) using rankings to learn an explicit model of a user’s reward function (Brown, Goo, and Niekum 2020),
or (2) using rankings to implicitly infer a user’s reward function through black-box optimization techniques (Keselman et al. 2023; Lu et al. 2022). These approaches have typically been evaluated in isolation
and focused on the end-behavior that a robot has learned after the preference learning process has been
completed (Biyik, Palan, et al. 2020; Sadigh et al. 2017). However, the process of teaching robots is a major
factor that drives the perception and adoption of robotic systems (Adamson et al. 2021).
In this work, we propose an algorithm that combines the explicit and implicit approaches for learning
user preferences, called Covariance Matrix Adaptation Evolution Strategy with Information Gain (CMAES-IG). We show through simulation that CMA-ES-IG generates candidate robot trajectories that better reflect user preferences compared to state-of-the-art approaches. We then evaluate these algorithms through
real-world experiments with users performing both physical and social robot tasks, as shown in Fig. 10.1.
175
Figure 10.1: The two domains that users taught robots their preferences for the robot’s behaviors. In the
physical domain, users ranked a JACO arm’s movement trajectories to hand them a marker, a cup, and a
spoon. In the social domain, users ranked a Blossom robot’s gestures to portray happiness, sadness, and
anger.
Physically, users specified their preferences for how a JACO2 robot arm hands them various items. Socially, users specified their preferences for how a Blossom robot (Suguitan and Hoffman 2019; O’Connell
et al. 2024; Shi, O’Connell, et al. 2024) gestures to communicate different emotional states. We show that
our algorithm is able to effectively learn user preferences while also increasing the quality of the robot’s
trajectories over time. Overall, we highlight the importance of user experience in algorithmic design to
create interactions that effectively learn user preferences.
10.2 Inspiration: Active Learning and Black-box Optimization
There are many ways to learn preferences from users in order to support users in accomplishing tasks
(Fitzgerald et al. 2023). Robots can learn a user’s internal reward function through inverse reinforcement
learning a function that maps a low-dimensional representation to a scalar value for each robot behavior.
Various techniques can be used to learn this mapping from trajectory to reward, including demonstrations
(Abbeel and Ng 2004), physical corrections (Bajcsy et al. 2017), language (Sharma et al. 2022), trajectory
rankings (Brown, Goo, and Niekum 2020), and trajectory comparisons (Sadigh et al. 2017). Users may need
176
varying levels of expertise with using robot systems to effectively use those techniques (Fitzgerald et al.
2023). Our work focuses on using trajectory rankings, a technique that is accessible to users of all levels of
expertise with using robot systems.
There are two popular approaches for learning preferences when using rankings to learn user preferences for robot trajectories. One approach is to explicitly model the user’s reward function by estimating
the probability distribution over parameters of a reward function (Sadigh et al. 2017; Biyik and Sadigh
2018). This type of approach decomposes a trajectory ranking into a series of trajectory selections and
uses probabilistic models of human choice to update the distribution over reward weights using Bayes’
rule. The Bayesian approach considers the whole space of trajectories at each iteration of querying the
user. By explicitly modeling users’ reward functions, robots have been able to maneuver cars in simulation
(Sadigh et al. 2017) and assist with assembly tasks (Nemlekar et al. 2023).
The other approach of interest is to implicitly model the user’s reward function using rankings as
inputs to black-box optimization algorithms. This type of approach directly finds the robot trajectories
that the user will rank highly and dynamically updates the space of trajectories that are sampled to be
ranked by the user. CMA-ES (Hansen, Müller, and Koumoutsakos 2003) is one technique that demonstrates
efficiency and tolerance to noise, and has been applied to learning human user preference in robotics
domains. CMA-ES has been applied to optimize and identify personalize control laws for exoskeleton
assistance that minimizes human energy cost during walking (Zhang, Fiers, et al. 2017). Recent work also
applied CMA-ES to enable user preference learning through pairwise selection to enable user preference
and goals learning in social robot navigation (Keselman et al. 2023).
Both types of approaches show promise in eliciting user preferences, but many prior works have focused on using these methods in isolation. Additionally, these approaches have focused on the outcome of
the robot behavior, rather than on how users perceive the robot to be considering their input. A comprehensive work by Habiban et al. (Habibian, Jonnavittula, and Losey 2022) showed that incorporating users’
177
perception is crucial for robots learning preferences over interpretable hand-crafted features. In this work,
we combine explicit and implicit user reward models to learn preferences for features we learned from
data, and we evaluate users’ perceptions of the teaching process.
10.3 Technical Approach: Covariance Matrix Adaptation Evolution Strategy
with Information Gain
10.3.1 Preliminaries
We describe robot behaviors as output trajectories from a dynamical system. A trajectory ξ ∈ Ξ is defined as a sequence of states, s ∈ S and actions a ∈ A that follow the system dynamics, i.e, ξ =
(s0, a0, s1, a1, ..., sT , aT ) for a finite horizon of T time steps. These states are abstractly defined and can
be anything from joint angles to end-effector positions to images, and actions simply convert one state to
another state. Following common practices in inverse reinforcement learning (Abbeel and Ng 2004), we
assume that there exists a feature function ϕ : X 7→ R
d
that represents aspects of the state that the user
may have preferences over. A trajectory can then be represented as a low-dimensional vector in R
d via
Φ(ξ) = PT
i=0 ϕ(si).
A user’s preference for robot behaviors is a function of these trajectory features, R(ξ) = f(Φ(ξ)). In
this work, we make the assumption that a user’s preference is a linear combination over features R(ξ) =
ω
T
· Φ(ξ), where ω ∈ R
d
, as in several previous works (Sadigh et al. 2017; Biyik, Palan, et al. 2020).
When a user is asked to rank trajectories, they are presented with a set of N trajectories referred to as
a trajectory query, Q = {ξ0, ξ1, ..., ξN }. A user then ranks these trajectories according to their internal
reward function R = (ξ
′
0
, ξ′
1
, ..., ξ′
N ) such that ξ
′
0 ≺ ξ
′
1 ≺ ... ≺ ξ
′
N .
178
10.3.2 Bayesian Optimization of Preferences
Approaches that explicitly model the user’s reward function maintain a probability distribution over ω
and update this distribution based on the user’s ranking. A common framing is to view a ranking as an
iterative process of selecting the best trajectories from the query without replacement (Myers et al. 2022),
and use the widely-adopted Bradley-Terry model of rational choice (Bradley and Terry 1952) to estimate
the probability that a user will select a given trajectory from a set of trajectories at each iteration, subject
to a rationality parameter β:
p(ξ | Q) = e
β·Φ(ξ)
P
ξ
′∈Q
e
β·Φ(ξ
′)
(10.1)
Using this model of user preferences and assuming that these selections are conditionally independent,
the distribution over ω can be calculated using Bayes’ rule:
p(ω | R) ∝ p(ω)
Y
N
i=0
p(ξi
| Qi) (10.2)
where ξi represents the trajectory selected from the set at iteration i and Qi represents the set of
trajectories left at iteration i. Equivalently, Qi = Qi+1 ∪ ξi
.
The state of the art technique for generating the set of trajectories Q that a user ranks is to maximize
an information gain objective as described by Bıyık et al. (Biyik, Palan, et al. 2020):
Q = arg max
Q={ξ0,ξ1,...,ξN }
H(ω | Q) − Eξ∈QH(ω | ξ, Q) (10.3)
where H is the Shannon Information Entropy. Maximizing this objective to form Q results in a set of
trajectories that maximally update the distribution over ω when receiving the user’s feedback. In addition,
179
these trajectories are distinct from each other, enabling the user to easily differentiate among them. For
linear reward functions, this implies that trajectories have large distances from each other in feature space.
10.3.3 Covariance Matrix Adaptation Evolution Strategies (CMA-ES)
Evolution strategies (ES) is a large body of algorithms that focus on solving continuous, black-box, mainly
experimental optimization problems. ES algorithms sample a population of solutions for each generation,
and move this sampled population toward solutions with more optimal fitness values generation by generation (Back, Hammel, and Schwefel 1997). Covariance Matrix Adaptation Evolution Strategies (CMA-ES)
is proposed to reduce the number of generations needed to converge to the optimal solutions and improve
the noise tolerance of the optimization process (Hansen, Müller, and Koumoutsakos 2003). Compared to
other ES methods, CMA-ES has been evaluated as one of the most competitive derivative-free optimization algorithms for continuous spaces (Hansen, Auger, et al. 2010). More specifically, CMA-ES samples an
underlying distribution of the population from a multivariate normal distribution, defined as N (m, C),
where m ∈ R
d
is the distribution mean and C ∈ R
d×d
is the symmetric and positive definite covariance
matrix for the distribution (Hansen 2016). At each step of CMA-ES, we sample trajectories from this distribution, and return the ranked values to the CMA-ES optimizer. The optimizer updates m and C using the
CMA-ES update algorithm to move the distribution to higher-valued areas in the trajectory feature space.
In practice, CMA-ES samples narrower and higher performing regions of the trajectory feature space
after each iteration. This results in an increasing average reward for the trajectories that the user is asked
to rank, increasing the user’s perception that the system is actually learning their preferences. However,
because it samples a normal distribution, it often presents users with trajectories that are too similar to
each other for a user to easily differentiate between them.
180
10.3.4 Combining Information Gain and CMA-ES
We propose a new algorithm, Covariance Matrix Adaptation Evolution Strategies with Information Gain
(CMA-ES-IG), for efficiently learning user preferences. CMA-ES-IG leverages the benefits of both of the
previous strategies: it uses the information gain objective to generate sets of trajectories that are easy to
rank, and it uses the adaptive sampling mechanism from CMA-ES to increase the average user reward
of the proposed trajectories over time. We summarize CMA-ES-IG in Algorithm 5, and provide a visual
intuition of these three algorithms in Fig. 10.2.
First, we initialized CMA-ES-IG identically to CMA-ES and our belief over user preferences to a uniform distribution. We then sample D trajectory features from the CMA-ES-IG mean and covariance to
create a set of trajectory features, D. Next, we find |Q| samples from D that maximize the expected information gain, as described in Equation 10.3. While finding the exact solution to this optimization problem
is exponential in |Q|, an efficient approximation is to find |Q| medoids in the set of samples (Biyik and
Sadigh 2018). We adopt this approximation to allow CMA-ES-IG to be computationally tractable.
Algorithm 5 CMA-ES-IG
1: Input: Dataset of robot trajectories D, trajectory feature function Φ, number of items to ask the user
|Q|, prior belief over user preferences b0
2: Initialize CMA-ES optimizer with µ and C
3: while user not done do
4: S ← D samples from N (µ, C)
5: Ω ← D samples from bt
6: Q ← arg max{q1,q2,...,q|Q|
|q∈S}
P
ω∈Ω H(q | ω, Q) − Eω(q | ω, Q)
7: R ← User’s ranked responses
8: bt+1 ∝ bt
QN
i=0 p(ξi
| Qi)
9: Update µ, C according to CMA-ES update (Hansen, Müller, and Koumoutsakos 2003)
181
Figure 10.2: Example queries generated from an early step of each algorithm. The large circle represents the
space of all trajectories with lighter areas representing higher reward, light blue arrows representing the
user’s true preference, dark blue arrows representing the current estimate of the user’s preference, orange
circles representing sampled trajectories to present to the user, and green dotted regions representing the
sampling distribution from the current step of the CMA-ES optimizer. Information gain results in easy
to differentiate queries, CMA-ES results in higher rewards on average, and CMA-ES-IG results in higher
rewards that are easy to differentiate.
10.4 Data Collection
10.4.1 Experimental Validation: Simulated User Rankings
To validate our algorithm before presenting it to users, we performed an algorithmic analysis by simulating
user preferences as in previous work (Sadigh et al. 2017; Biyik, Palan, et al. 2020). We used the parameter
estimation task as described by Fitzgerald et al. (Fitzgerald et al. 2023), which samples a ground truth
weight vector, ω
∗
from a d-dimensional space. To ensure that each ω
∗
is comparable, we projected each to
the unit ball (Sadigh et al. 2017). We then generated queries of four trajectories using our three algorithms
(Infogain, CMA-ES, and CMA-ES-IG) for a simulated user to rank. We used the distribution described by
Bradley-Terry preference model to perform these rankings, given the ground truth vector of the simulated
user. We updated the distribution over the weight vector using Equation 10.2. We simulated 100 users
performing 30 rankings for each of the three algorithms.
We are interested in two evaluation metrics: alignment, which measures how well the estimated preference matches the true preference, and quality, which measures the overall reward of the trajectories
182
Table 10.1: Quantitative Results. We report the area under the curve (AUC) for alignment of learned reward
and quality of query across d-dimensional feature spaces.
Alignment (↑) Regret (↓) Quality (↑)
d = 8 d = 16 d = 32 d = 8 d = 16 d = 32 d = 8 d = 16 d = 32
IG .848 .606 .374 .331 1.243 2.115 -.001 -.003 .003
CMA-ES .834 .691 .488 .479 .995 1.876 .688 .601 .450
CMA-ES-IG .828 .717 .517 .393 .759 1.453 .746 .673 .527
presented to the user. We define alignment as the cosine similarity between the estimated ω
est and the
ground truth ω
∗
, as in previous works (Fitzgerald et al. 2023; Sadigh et al. 2017). We define quality as the
average reward of the trajectories in the query 1
|Q|
P|Q|
i=0 ω
∗
· Φ(ξi). We measure these values after each
simulated query to generate curves that show how each metric increases with repeated querying, as shown
in Fig. 10.3. To compare between curves, we used the area under the curve (AUC) metric, which provides
values between -1 and 1, with 1 being the best. We show results for alignment and quality across features
spaces of d ∈ {8, 16, 32} in Table 10.1; For completeness, we also report regret as a secondary metric for
alignment in Table 10.1. Where regret = ω
∗
·ϕ(ξ
∗
)−ω
∗
·ϕ(ξ
′
); ξ
∗ denotes the trajectory with the highest
reward in Ξ under ω
∗
, and ξ
′ denotes trajectory with the highest reward in Ξ under ω
est
.
We compared the three algorithms: Information Gain (IG), CMA-ES, and CMA-ES-IG. For the alignment metric, we saw that all three methods performed similarly after thirty queries, except for highdimensional feature spaces. However, for the quality metric, we found that CMA-ES-IG outperformed
CMA-ES and both evolutionary strategies vastly outperformed the IG algorithm. This indicates that all
algorithms can learn user preferences, but only CMA-ES-IG and CMA-ES are effective at presenting robot
behaviors that appeal to the user through interaction.
10.4.2 Experimental Setup
To evaluate user perception of different methods for optimizing preferences, we conducted a withinsubjects user study to compare participant perceptions among the different methods for learning preferences. In the physical domain, participants specified preferences for how a JACO robot arm hands them
183
Figure 10.3: Comparison of simulation results for learning user preferences. Shaded regions indicate standard error. We found that all methods were able to learn user preferences across varying dimensions. The
quality of the trajectories in the query increases only for CMA-ES and CMA-ES-IG, with CMA-ES-IG performing significantly better.
a marker, cup, or spoon. In the social domain, participants specified preferences for how a Blossom robot
performed affective gestures to communicate happiness, sadness, and anger.
We developed a framework, shown in Fig. 10.4, to evaluate the different techniques for learning participant preferences. It consists of three learned components: trajectory representations, query sampling,
and preference learning models.
To generate trajectory representations, we used a dataset of 1000 robot handover trajectories and 1500
Blossom gesture trajectories. Trajectories consisted of a sequence 50 joint states that were equally spaced
in time. The Blossom robot has joint states in R
4
, and trajectories were played by sending servo commands
to each of the four servos that control the Blossom at a rate of 10Hz. The JACO arm has joint states in
R
6
so trajectories were played by fitting a b-spline to the those joint states and sending the goal joint
states to an impedance controller at a rate of approximately 50Hz. The JACO arm trajectories were scaled
to 9 seconds in duration. We generated trajectories for Blossom and the JACO arm by sampling from
184
Figure 10.4: The framework for learning user preferences. We learned nonlinear features for sets of robot
trajectories. The query sampler produced sets of trajectories for the user to rank and those rankings were
used to update the estimate of the user’s preferences.
Figure 10.5: User study setup. Users interacted with the robots through the ranking interface to specify
their preferences for how the Blossom robot used gestures to signal different affective states and how the
JACO robot arm handed them different items.
demonstrations, however other techniques can be used to create the datasets, such as quality diversity,
reinforcement learning, or planning approaches.
From the generated trajectories, we used an autoencoder (AE) to generate nonlinear features for the
trajectories for the social and physical domains in our study, which is shown to be effective in prior work
(Brown, Coleman, et al. 2020). The AEs consisted of three convolutional layers and two fully connected
layers with leaky ReLU activations after each layer. Hyperparameters were tuned to minimize the reconstruction loss over all trajectories in the dataset. The feature space of Blossom’s trajectories was sixdimensional and the features space of the JACO arm trajectories was four-dimensional.
185
The query sampling component generated the set of trajectories to show to the participant. At that
point we performed our experimental intervention. Participants were randomly presented with either the
IG, CMA-ES, or CMA-ES-IG methods of generating sets of three robot behaviors to rank. We then identified
the closest trajectories in the dataset to the sampled trajectory features to present to the participant. All
methods took less than one second to calculate the set of presented behaviors. The participant ranked
these three trajectories using the interface shown in Fig. 10.5.
The interface presented trajectories to the participant, and the participant clicked on boxes to view
each trajectory suggested by the algorithms. The participant dragged the boxes to the ranking area, placing
the lowest-ranked trajectory in the leftmost box and the highest ranked trajectory in the rightmost box.
If a participant found a trajectory particularly aligned with their preferences, they could place it in the
“Favorite" trajectory box. The trajectory in this box was saved for the participant to refer to for the entire
interaction and could be replaced if they found a better trajectory. This box was informed by a pilot study
where participants requested a way to refer to trajectories they saw previously. Once the participant
ranked the three options, they selected the submit button to update the model of their preferences. At any
time, they could also press the “View Predicted Best” button to view the trajectory that maximized their
estimated reward.
Before each task, the preference learning algorithm was initialized to a uniform distribution. The
distribution was updated according to Equation 10.2 after each ranking. We used the Bradley-Terry model
to represent the way the user ranked these trajectories, however other models can be used depending on
the inputs available to the participant. For example previous work has proposed choice models that allows
users to rank options as “approximately equal" (Biyik, Palan, et al. 2020). After the participant provides
their feedback and the model is updated, the query sampling process begins again with a new estimate of
the participant’s preferences, and the cycle repeats until the participant is satisfied.
186
Table 10.2: The Likert scale items that participants answered for our two metrics: perceived ease of use
and perceived behavioral adaptation.
1. Teaching the robot is clear and understandable. Perceived Ease of
Use (Venkatesh
and Davis 2000)
2. Teaching the robot does not require a lot of mental effort.
3. I find the robot easy to teach.
4. I find it easy to get the robot to do what I want it to do.
1. The robot has developed its skills over time because of my interaction
with it.
2. The robot’s behavior has changed over time because of my interaction
with it.
Perceived
Behavioral
Adaptation (Lee,
Park, and Song
2005)
3. The robot has become more competent over time because of my interaction with it.
4. The robot’s intelligence has developed over time because of my interaction with it.
10.4.3 User Study Details
We performed a within-subjects user study to identify the differences in user experience between the proposed algorithms across two domains. In particular, we were interested in two key factors that determine
the actual use of systems: ease of use (Venkatesh and Davis 2000) (EOU), and perceived behavioral adaptation (Lee, Park, and Song 2005) (BA). Perceived ease of use measures how easily participants are able
to get the robot to do what they want, and behavioral adaptation measures how much the users perceive
the robot as changing in response to their inputs. Specific Likert scale items in our study are listed in Table 10.2. Users rated these metrics on a 9-point Likert scale with 0 corresponding to strongly disagree and
8 corresponding to strongly agree. We average across the questions for each of these factors to calculate
our evaluation metric.
Our study procedure was approved by the University of Southern California IRB under #UP-24-00600
and proceeded as follows: first, the participant was greeted by the experimenter and randomly assigned to
specify their preferences for either the physical or social robot interaction. Next, the participant specified
their preferences using a randomized and counter-balanced algorithm for the first task for their assigned
187
Figure 10.6: Ease of Use results. CMA-ES-IG was rated significantly easier to use than CMA-ES, and empirically easier than IG.
domain–a marker handover in the physical domain, or a happy gesture in the social domain. After five
minutes, the participant rated the algorithm’s EOU and BA. Next, the participant was presented with the
next randomized algorithm and completed the second task for their assigned domain–a cup handover in
the physical domain, or sad gesture for the social domain. Participants then rated the EOU and BA of the
algorithm. The participant then interacted with the final randomized algorithm in their assigned domain–
a spoon handover in the physical domain, or an angry gestures for the social domain. The user rated the
EOU and BA of the final algorithm. The user then ranked the three algorithms against each other to specify
their overall preferences. The participant then completed the same process for the other domain.
10.5 Results
10.5.1 Participants
All participants were compensated with a 10 USD Amazon giftcard sent to their email. We recruited 14
participants; they were aged 19-32 (Median = 24, SD = 4.5) and comprised 6 women, 5 men, and 3 nonbinary
individuals. There were 7 Asian, 1 Black, 3 Hispanic, and 5 White participants (some participants were
more than one ethnicity).
188
Figure 10.7: Behavioral Adaptation results. CMA-ES-IG was rated as changing the robot’s behavior significantly more over time than both CMA-ES and IG.
10.5.2 Ease of Use
We evaluated the average scores for EOU from a four-item Likert scale. We identified high internal consistency of the scale, with a Cronbach’s alpha of α = .89, indicating good internal consistency. To evaluate
significance we used pairwise non-parametric repeated-measures tests. As shown in Fig. 10.6, we found
that CMA-ES-IG received the highest EOU ratings (M = 5.50), followed by Information Gain (M = 5.13),
and CMA-ES received the lowest rating for EOU (M = 4.87). The difference between ratings for CMAES-IG and CMA-ES was significant (W = 5.5, p = .016) with a medium effect size (hedge’s g = .558).
This indicates that including the information gain objective in CMA-ES-IG indeed makes it easier to use.
10.5.3 Perceived Behavioral Adaptation
We evaluated the average scores for BA from a four-item Likert scale, and identified high internal consistency of the scale, with a Cronbach’s alpha of α = .97, indicating excellent internal consistency. We used
non-parametric repeated-measures pairwise comparisons to assess significance. As shown in Fig. 10.7, we
found that CMA-ES-IG received the highest BA ratings (M = 5.18), followed by CMA-ES (M = 4.69), and
Information Gain received the lowest rating for BA (M = 4.48). CMA-ES-IG was rated as significantly
189
Figure 10.8: Algorithmic ranking results. CMA-ES-IG was consistently ranked as the most preferred algorithm for teaching robots preferences in our user study.
higher than both CMA-ES (W = 15, p = .033) with a small to medium effect (hedge’s g = .377), and
IG (W = 5.5, p = .009) with a medium effect size (hedge’s g = .414). This indicates that participants
observed the largest behavioral change during the teaching process when using CMA-ES-IG to teach the
robots.
10.5.4 Overall Ranking
Finally, we evaluated the ranking of each algorithm based on how the participants ranked the algorithms
against one another. A ranking of 0 indicates that the participant believed that algorithm to be the worst
overall, and a ranking of 2 indicated that the participant believed that algorithm to be the best overall. As
shown in Fig. 10.8, we found that users ranked CMA-ES-IG as the best algorithm on average (M = 1.48),
followed by CMA-ES (M = .89), and IG as the least preferred (M = .63). Using pairwise non-parametric
repeated-measures tests, we found that CMA-ES-IG was consistently ranked significantly higher than
CMA-ES (W = 3.0, p = .022) with a large effect size (hedge’s g = 1.26). CMA-ES-IG was also ranked
higher than IG (W = 0.0, p = .009) with a large effect size (hedge’s g = 1.75). This indicates that, overall,
participants preferred to use CMA-ES-IG to teach robots through trajectory rankings.
190
10.6 Summary
This chapter introduced Covariance Matrix Adaptation Evolution Strategy with Information Gain (CMAES-IG) as a way to search through behaviors in learned feature spaces, allowing users to more efficiently
customize robots’ physical and social behaviors. This algorithm provides a technique to generate a set of
robot behavior that users can rank. The algorithm allows designers to flexibly set the number of behaviors
for the user to rank, and works well for robot behavior representations that are learned from data. The key
benefits of this approach are (1) that this technique generates sets of behaviors that both help the robot
computationally model a user’s preferences and (2) that users interacting with this technique perceive the
robot as better understanding their preferences over time. This chapter concludes the section on social
adaptation, and the next chapter concludes the dissertation.
191
Chapter 11
Conclusion and Future Directions
This chapter summarizes the contributions of the dissertation. The work presented here describes how
assistive robots can adapt mechanical design, social behaviors, and physical behaviors to align with users’
preferences. We presented user studies and system designs that address these areas in a wider variety of
robots and target populations. We presented possible directions for the future of adaptable and assistive
robotic systems. The dissertation is concluded in Section 11.3. Return to the Table of Contents to navigate
other chapters.
11.1 Contributions
The main contribution of the dissertation is to identify how assistive robots can adapt to non-expert
users. The dissertation presented a framework to understand how robots can adapt their behaviors to
different users through a variety of physical and social interactions, and how the robot’s appearance affects
how this adaptation process occurs. We observed that adaptation can occur without the user’s effort when
the robot learns more about them, as in personalization, or through conscious effort directed by the user,
as in customization. This research has made the following contributions to the understanding of adapting
robot embodiment, physical behaviors, and social behaviors.
192
11.1.1 Adapting Robot Embodiment
In Chapter 4, we described how robot embodiment affects user expectations. We describe the idea of design
metaphors, and describe how these are a tool to understand user expectations. We created the Metaphors
for Understanding Functional and Social Anticipated Affordances to allow researchers to make sense of
the space of robot designs, and developed robots that are personalized to their intended applications. In
Chapter 5, we described how users can engage in clothing design techniques to customize a robot’s perceived embodiment. We showed through a series of user studies that clothing design choices modify user
expectations of a robot.
11.1.2 Adapting Physical Behavior
In Chapter 6, we proposed the Bimanual Arm Reaching Test with a Robot (BARTR) to assess post-stroke
users’ arm movement. We developed a novel metric from the personalized movement models this system
learns that helps physical therapists assess a post-stroke users’ recovery process. In Chapter 7, we extended
the functionality of a robot arm to hair combing, an important activity of daily living. We created a motion
planning algorithm to generate hair combing paths that allow users to customize robot hair care to their
own preferences.
11.1.3 Adapting Social Behavior
In Chapter 8, we created a system to encourage exercise for youths with cerebral palsy. We described an
algorithm to identify personal preferences in social feedback, and showed that personalized user models
enable robots to take social actions that encourage higher engagement compared to one-size-fits-all approaches. In Chapter 9, we created a tool to allow users to author custom social signals. We observed
193
common behaviors of users exploring interesting behaviors, and formulated a learning objective to leverage the data collected from this interaction. In Chapter 10, we created an algorithm that allows users to
efficiently explore robot behaviors to quickly customize robots.
11.2 Future Directions for Adapting Assistive Robots
The dissertation presented several findings that enable robots to adapt to their users, however, this line
of inquiry is nascent. Robots are still not often used in and around humans. Future work is necessary to
understand how robots can help diverse users in diverse tasks.
Directions for Embodiment Research. The work presented in the dissertation discussed embodiment as it pertains to robots that are created by industry and research groups. These robot development
processes are often long, and require the collaboration of many people of a variety of backgrounds. As such,
the space of robots that we can sample is much smaller than the space of all possible robots. Future work
can address the problem of simplifying robot design processes. For example, 3D printing has become much
more accessible to users, and can facilitate the creation of personal robot morphologies (Shi, O’Connell,
et al. 2024). By democratizing robot design, the space of robots can greatly expand. This additionally supports the goals of the field of robot learning, where significant effort is spent on learn cross-embodiment
policies (Zakka et al. 2022). These cross embodiment policies aim to allow robots with different morphologies to have a shared understanding of the physical world. By expanding the types of robots that can be
used within the physical world, these machine learning models can have a more complete understanding
of the physical world.
Directions for Robot Interface Research. The work presented in the dissertation found that meaningful user data can be collected across a variety of interactions. Each interaction that a roboticist develops
allows a new type of data to be collected. Future research can explore how novel robot interactions can
194
generate data that can be used as a communication channel to adapt robots. In particular, leveraging interactions that users naturally engage in based on a robot’s design can provide easy-to-collect and informative
signals to adapt robots.
Directions for Computational Research. The work presented in the dissertation was an initial
exploration into combining social and physical assistance. While these works involved social and physical
interactions, they were largely adapted separately. Future work can explore how physical and social robot
behaviors can be jointly adapted to users. For example, users performing rehabilitation exercises that are
becoming fatigued may need require different social support than users that are just starting. Additionally,
there are many opportunities for research in long-term adaptation, for example the physical and social
assistance that the robot provides should evolve over time as users make progress in their rehabilitation
program. Future work can conduct these long term interactions to understand how robots may adapt to
changing user models.
11.3 Final Words
To meaningfully assist users, robots must be aware of a user’s individual differences. Historically, science
considered individual differences as noise. Now, we encourage roboticists to see these differences not as
noise that obfuscates understanding, but rather as an opportunity to holistically understand the world and
the many people living in it.
195
Bibliography
Abbeel, Pieter and Andrew Y Ng (2004). “Apprenticeship learning via inverse reinforcement learning”. In:
21st International Conference on Machine Learning. ICML ’04. Banff, Canada: Association for
Computing Machinery. doi: https://doi.org/10.1145/1015330.1015430.
Abubshait, Abdulaziz and Agnieszka Wykowska (2020). “Repetitive robot behavior impacts perception of
intentionality and gaze-related attentional orienting”. In: Frontiers in Robotics and AI 7. doi:
https://doi.org/10.3389/frobt.2020.565825.
Adamson, Timothy, Debasmita Ghose, Shannon C Yasuda, Lucas Jehu Silva Shepard,
Michal A Lewkowicz, Joyce Duan, and Brian Scassellati (2021). “Why we should build robots that
both teach and learn”. In: 2021 ACM/IEEE International Conference on Human-Robot Interaction
(Mar. 8–11, 2021). HRI ’21. Boulder, CO, USA: Association for Computing Machinery, pp. 187–196.
doi: https://doi.org/10.1145/3434073.3444647.
Admoni, Henny and Brian Scassellati (2017). “Social eye gaze in human-robot interaction: a review”. In:
Journal of Human-Robot Interaction 6.1, pp. 25–63. doi: https://doi.org/10.5898/JHRI.6.1.Admoni.
Allen, Garrett, Benjamin L Peterson, Dhanush kumar Ratakonda, Mostofa Najmus Sakib, Jerry Alan Fails,
Casey Kennington, Katherine Landau Wright, and Maria Soledad Pera (2021). “Engage!: Co-designing
Search Engine Result Pages to Foster Interactions”. In: Proceedings of the 20th Annual ACM Interaction
Design and Children Conference. IDC ’21. Athens, Greece: Association for Computing Machinery,
pp. 583–587. doi: https://doi.org/10.1145/3459990.3465183.
Anderson, Steph M. (2020). “Gender Matters: The Perceived Role of Gender Expression in Discrimination
Against Cisgender and Transgender LGBQ Individuals”. In: Psychology of Women Quarterly 44.3,
pp. 323–341. doi: https://doi.org/10.1177/0361684320929354.
Andrews, Keith and Jean Stewart (1979). “Stroke recovery: He can but does he?” In: Rheumatology 18.1,
pp. 43–48. doi: https://doi.org/10.1093/rheumatology/18.1.43.
Andriella, A., C. Torras, and G. Alenyà (2019). “Learning Robot Policies Using a High-Level Abstraction
Persona-Behaviour Simulator”. In: 28th International Conference on Robot and Human Interactive
Communication. RO-MAN ’21. New Delhi, India: IEEE, pp. 1–8. doi:
10.1109/RO-MAN46459.2019.8956357.
196
Andrist, Sean, Bilge Mutlu, and Adriana Tapus (2015). “Look like me: matching robot personality via gaze
to increase motivation”. In: Proceedings of the 33rd Conference on Human Factors in Computing
Systems. CHI ’15. ACM, pp. 3603–3612. doi: https://doi.org/10.1145/2702123.270259.
Arulkumaran, Kai, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath (2017). “Deep
reinforcement learning: A brief survey”. In: IEEE Signal Processing Magazine 34.6, pp. 26–38. doi:
https://doi.org/10.1109/MSP.2017.2743240.
Axinexis (July 2023). url: https://www.axinesis.com/en/our-solutions/reaplan/.
Back, Thomas, Ulrich Hammel, and H-P Schwefel (1997). “Evolutionary computation: Comments on the
history and current state”. In: IEEE transactions on Evolutionary Computation 1.1, pp. 3–17. doi:
https://doi.org/10.1109/4235.585888.
Bajcsy, Andrea, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan (2017). “Learning Robot
Objectives from Physical Human Interaction”. In: Proceedings of the 1st Annual Conference on Robot
Learning. Vol. 78. CoRL ’17. Mountain View, CA, USA: PMLR, pp. 217–226. url:
https://proceedings.mlr.press/v78/bajcsy17a.html.
Bangor, Aaron, Philip T Kortum, and James T Miller (2008). “An empirical evaluation of the system
usability scale”. In: Intl. Journal of Human–Computer Interaction 24.6, pp. 574–594. doi:
https://doi.org/10.1080/10447310802205776.
Bardzell, Shaowen (2010). “Feminist HCI: taking stock and outlining an agenda for design”. In:
Proceedings of the Conference on Human Factors in Computing Systems. CHI ’10. ACM SIGCHI,
pp. 1301–1310. doi: https://doi.org/10.1145/1753326.1753521.
Barthes, Roland (1990). The fashion system. Univ of California Press. isbn: 0520071778.
Bartneck, Christoph, Dana Kulić, Elizabeth Croft, and Susana Zoghbi (2009). “Measurement instruments
for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of
robots”. In: International journal of social robotics 1.1, pp. 71–81. doi:
https://doi.org/10.1007/s12369-008-0001-3.
Baxter, Stacey M, Jasmina Ilicic, and Alicia Kulczynski (2018). “Roses are red, violets are blue,
sophisticated brands have a Tiffany Hue: The effect of iconic brand color priming on brand
personality judgments”. In: Journal of Brand Management 25, pp. 384–394. doi:
https://doi.org/10.1057/s41262-017-0086-9.
Bem, Sandra L (1981). “Bem sex role inventory”. In: Journal of Personality and Social Psychology. doi:
https://doi.org/10.1037/t00748-000.
Bernotat, Jasmin, Friederike Eyssel, and Janik Sachse (2021). “The (fe) male robot: how robot body shape
impacts first impressions and trust towards robots”. In: International Journal of Social Robotics 13,
pp. 477–489. doi: https://doi.org/10.1007/s12369-019-00562-7.
197
Bhattacharjee, Tapomayukh, Ethan K Gordon, Rosario Scalise, Maria E Cabrera, Anat Caspi,
Maya Cakmak, and Siddhartha S Srinivasa (2020). “Is More Autonomy Always Better? Exploring
Preferences of Users with Mobility Impairments in Robot-assisted Feeding”. In: Proceedings of the
2020 ACM/IEEE International Conference on Human-Robot Interaction. HRI ’20. Cambridge, United
Kingdom, pp. 181–190. doi: https://doi.org/10.1145/3319502.3374818.
Bionik (Sept. 2022). InMotion® arm/hand. url: https://bioniklabs.com/inmotion-arm-hand/.
Biyik, Erdem, Nicolas Huynh, Mykel J Kochenderfer, and Dorsa Sadigh (2024). “Active preference-based
Gaussian process regression for reward learning and optimization”. In: The International Journal of
Robotics Research 43.5, pp. 665–684. doi: https://doi.org/10.1177/027836492312087.
Biyik, Erdem, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, and Dorsa Sadigh (2020). “Asking
Easy Questions: A User-Friendly Approach to Active Reward Learning”. In: Proceedings of the
Conference on Robot Learning. Vol. 100. CoRL ’20. Virtual Conference: PMLR, pp. 1177–1190. url:
https://proceedings.mlr.press/v100/b-iy-ik20a.html.
Biyik, Erdem and Dorsa Sadigh (2018). “Batch active preference-based learning of reward functions”. In:
Proceedings of The 2nd Conference on Robot Learning. CoRL ’18. PMLR, pp. 519–528. url:
https://proceedings.mlr.press/v87/biyik18a.html.
Blackwell, Angela Glover (2017). “The curb-cut effect”. In: Stanford Social Innovation Review 15.1,
pp. 28–33. url: https://ssir.org/articles/entry/the_curb_cut_effect.
Bobu, Andreea, Yi Liu, Rohin Shah, Daniel S. Brown, and Anca D. Dragan (2023). “SIRL: Similarity-based
Implicit Representation Learning”. In: Proceedings of the International Conference on Human-Robot
Interaction. HRI ’23. Stockholm, Sweden: ACM, pp. 565–574. doi:
https://doi.org/10.1145/3568162.3576989.
Bobu, Andreea, Andi Peng, Pulkit Agrawal, Julie A Shah, and Anca D. Dragan (2024). “Aligning Human
and Robot Representations”. In: HRI ’24. Boulder, CO, USA: ACM, pp. 42–54. doi:
https://doi.org/10.1145/3610977.3634987.
Bouzida, Anya, Alyssa Kubota, Dagoberto Cruz-Sandoval, Elizabeth W. Twamley, and Laurel D. Riek
(2024). “CARMEN: A Cognitively Assistive Robot for Personalized Neurorehabilitation at Home”. In:
Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. HRI ’24.
Boulder, CO, USA: ACM, pp. 55–64. doi: https://doi.org/10.1145/3610977.3634971.
Bradley, Ralph Allan and Milton E Terry (1952). “Rank analysis of incomplete block designs: I. The
method of paired comparisons”. In: Biometrika 39.3/4, pp. 324–345. doi:
https://doi.org/10.2307/2334029.
Brisben, A, C Safos, A Lockerd, J Vice, and C Lathan (2005). “The cosmobot system: Evaluating its
usability in therapy sessions with children diagnosed with cerebral palsy”. In: Proceedings of the
International Symposium on Robot and Human Interactive Communication. RO-MAN ’05. IEEE. url:
https://web.mit.edu/zoz/Public/AnthroTronix-ROMAN2005.pdf.
Brooke, John (1996). “Sus: a “quick and dirty" usability”. In: Usability evaluation in industry 189.3,
pp. 189–194. doi: https://doi.org/10.1201/9781498710411-35.
198
Brown, Daniel, Russell Coleman, Ravi Srinivasan, and Scott Niekum (2020). “Safe Imitation Learning via
Fast Bayesian Reward Inference from Preferences”. In: Proceedings of the 37th International Conference
on Machine Learning. Vol. 119. CoRL ’20. PMLR, pp. 1165–1177. url:
https://proceedings.mlr.press/v119/brown20a.html.
Brown, Daniel, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum (2019). “Extrapolating Beyond
Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations”. In: Proceedings
of the 36th International Conference on Machine Learning. Vol. 97. ICML ’19. PMLR, pp. 783–792. url:
https://proceedings.mlr.press/v97/brown19a.html.
Brown, Daniel S., Wonjoon Goo, and Scott Niekum (2020). “Better-than-Demonstrator Imitation Learning
via Automatically-Ranked Demonstrations”. In: Proceedings of the Conference on Robot Learning.
Vol. 100. CoRL ’20. PMLR, pp. 330–359. url: https://proceedings.mlr.press/v100/brown20a.html.
Browne, Cameron B, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling,
Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton (2012).
“A survey of monte carlo tree search methods”. In: IEEE Transactions on Computational Intelligence
and AI in games 4.1, pp. 1–43. doi: https://doi.org/10.1109/TCIAIG.2012.2186810.
Bryant, De’Aira, Jason Borenstein, and Ayanna Howard (2020). “Why Should We Gender? The Effect of
Robot Gendering and Occupational Stereotypes on Human Trust and Perceived Competency”. In:
Proceedings of the International Conference on Human-Robot Interaction. HRI ’20. Cambridge, United
Kingdom: ACM/IEEE, pp. 13–21. isbn: 9781450367462. doi: https://doi.org/10.1145/3319502.3374778.
Buitrago, Jaime Alberto, Ana Marcela Bolaños, and Eduardo Caicedo Bravo (2020). “A motor learning
therapeutic intervention for a child with cerebral palsy through a social assistive robot”. In: Disability
and Rehabilitation: Assistive Technology 15.3, pp. 357–362. doi:
https://doi.org/10.1080/17483107.2019.1578999.
Buolamwini, Joy and Timnit Gebru (2018). “Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification”. In: Proceedings of the 1st Conference on Fairness, Accountability
and Transparency. Vol. 81. FAccT ’18. PMLR, pp. 77–91. url:
https://proceedings.mlr.press/v81/buolamwini18a.html.
Butler, Judith (2002). Gender trouble. Routledge. doi: https://doi.org/10.4324/9780203902752.
Buxbaum, Laurel J, Rini Varghese, Harrison Stoll, and Carolee J Winstein (2020). “Predictors of arm
nonuse in chronic stroke: a preliminary investigation”. In: Neurorehabilitation and Neural Repair 34.6,
pp. 512–522. doi: https://doi.org/10.1177/1545968320913554.
Cai, Siqi, Guofeng Li, Xiaoya Zhang, Shuangyuan Huang, Haiqing Zheng, Ke Ma, and Longhan Xie
(2019). “Detecting compensatory movements of stroke survivors using pressure distribution data and
machine learning algorithms”. In: Journal of Neuroengineering and Rehabilitation 16.1, pp. 1–11. doi:
https://doi.org/10.1186/s12984-019-0609-6.
Cambre, Julia, Jessica Colnago, Jim Maddock, Janice Tsai, and Jofish Kaye (2020). “Choice of Voices: A
Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content”. In: Proceedings of
the Conference on Human Factors in Computing Systems. CHI ’20. Honolulu, HI, USA: ACM, pp. 1–13.
doi: https://doi.org/10.1145/3313831.3376789.
199
Cambre, Julia and Chinmay Kulkarni (Nov. 2019). “One Voice Fits All? Social Implications and Research
Challenges of Designing Voices for Smart Devices”. In: Proc. ACM Hum.-Comput. Interact. 3.CSCW.
doi: https://doi.org/10.1145/3359325.
Campos-Soria, Juan Antonio, Andrés Marchante-Mera, and Miguel Angel Ropero-García (2011).
“Patterns of occupational segregation by gender in the hospitality industry”. In: International Journal
of Hospitality Management 30.1, pp. 91–102. doi: https://doi.org/10.1016/j.ijhm.2010.07.001.
Candeias, Alexandre, Travers Rhodes, Manuel Marques, Manuela Veloso, et al. (2018). “Vision augmented
robot feeding”. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops.
ECCV ’18. CVF. url: https://www.cs.cmu.edu/afs/cs/user/mmv/www/papers/18feedbot1.pdf.
Carpinella, Colleen M., Alisa B. Wyman, Michael A. Perez, and Steven J. Stroessner (2017). “The Robotic
Social Attributes Scale (RoSAS): Development and Validation”. In: Proceedings of the 2017 ACM/IEEE
International Conference on Human-Robot Interaction. HRI ’17. Vienna, Austria: Association for
Computing Machinery, pp. 254–262. doi: https://doi.org/10.1145/2909824.3020208.
Carroll, John M, Robert L Mack, and Wendy A Kellogg (1988). “Interface metaphors and user interface
design”. In: Handbook of human-computer interaction. Elsevier, pp. 67–85. doi:
https://doi.org/10.1016/B978-0-444-70536-5.50008-7.
Cartei, Valentina and David Reby (2013). “Effect of formant frequency spacing on perceived gender in
pre-pubertal children’s voices”. In: PLoS One 8.12, e81022. doi:
https://doi.org/10.1371/journal.pone.0081022.
Catalan, J.M, J.V. Garcia, D. Lopez, J. Diez, A. Blanco, L.D. Lledo, F. J. Badesa, A. Ugartemendia, I. Diaz,
R. Neco, and N. Garcia-Aracil (2018). “Patient Evaluation of an Upper-Limb Rehabilitation Robotic
Device for Home Use”. In: 7th International Conference on Biomedical Robotics and Biomechatronics.
Biorob ’18. IEEE, pp. 450–455. doi: https://doi.org/10.1109/BIOROB.2018.8487201.
Celiktutan, Oya, Evangelos Sariyanidi, and Hatice Gunes (2018). “Computational Analysis of Affect,
Personality, and Engagement in Human–Robot Interactions”. In: Computer Vision for Assistive
Healthcare. Elsevier, pp. 283–318. doi: https://doi.org/10.1016/B978-0-12-813445-0.00010-1.
Cha, Elizabeth, Anca D Dragan, and Siddhartha S Srinivasa (2015). “Perceived robot capability”. In: 24th
IEEE International Symposium on Robot and Human Interactive Communication. RO-MAN ’15. Kobe,
Japan: IEEE, pp. 541–548. doi: https://doi.org/10.1109/ROMAN.2015.7333656.
Cha, Elizabeth, Yunkyung Kim, Terrence Fong, Maja J Mataric, et al. (2018). “A survey of nonverbal
signaling methods for non-humanoid robots”. In: Foundations and Trends® in Robotics 6.4,
pp. 211–323. doi: http://dx.doi.org/10.1561/2300000057.
Chai, Menglei, Lvdi Wang, Yanlin Weng, Yizhou Yu, Baining Guo, and Kun Zhou (2012). “Single-view hair
modeling for portrait manipulation”. In: Transactions on Graphics (TOG) 31.4, pp. 1–8. doi:
https://doi.org/10.1145/2185520.2185612.
200
Chang, Joseph Chee, Nathan Hahn, Adam Perer, and Aniket Kittur (2019). “SearchLens: Composing and
capturing complex user interests for exploratory search”. In: Proceedings of the 24th International
Conference on Intelligent User Interfaces. IUI ’19, pp. 498–509. doi:
https://doi.org/10.1145/3301275.330232.
Chang, Wan-Ling, Jeremy P White, Joohyun Park, Anna Holm, and Selma Šabanović (2012). “The effect
of group size on people’s attitudes and cooperative behaviors toward robots in interactive gameplay”.
In: The 21st IEEE International Symposium on Robot and Human Interactive Communication. RO-MAN
’12. IEEE, pp. 845–850. doi: https://doi.org/10.1109/ROMAN.2012.6343857.
Chapman, Adriane, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez,
Emilia Kacprzak, and Paul Groth (2020). “Dataset search: a survey”. In: The VLDB Journal 29.1,
pp. 251–272. doi: https://doi.org/10.1007/s00778-019-00564-x.
Charness, Gary and Yan Chen (2020). “Social identity, group behavior, and teams”. In: Annual Review of
Economics 12, pp. 691–713. doi: https://dx.doi.org/10.1146/annurev-economics-091619-032800.
Chen, Shuya, Steven L Wolf, Qin Zhang, Paul A Thompson, and Carolee J Winstein (2012). “Minimal
detectable change of the actual amount of use test and the motor activity log: The EXCITE trial”. In:
Neurorehabilitation and Neural Repair 26.5, pp. 507–514. doi: https://doi.org/10.1177/15459683114250.
Chen, Tong, Hongzhi Yin, Yujia Zheng, Zi Huang, Yang Wang, and Meng Wang (2021). “Learning Elastic
Embeddings for Customizing On-Device Recommenders”. In: Proceedings of the 27th Conference on
Knowledge Discovery & Data Mining. KDD ’21. Virtual Event, Singapore: ACM SIGKDD, pp. 138–147.
doi: https://doi.org/10.1145/3447548.3467220.
Chita-Tegmark, Meia, Monika Lohani, and Matthias Scheutz (2019). “Gender effects in perceptions of
robots and humans with varying emotional intelligence”. In: 2019 14th ACM/IEEE International
Conference on Human-Robot Interaction (HRI). IEEE, pp. 230–238.
Cila, Nazli (2013). “Metaphors we design by: The use of metaphors in product design”. In: url:
https://resolver.tudelft.nl/uuid:b7484b0f-9596-4856-ae9d-97c696f9de79.
Clabaugh, Caitlyn Elise, Kartik Mahajan, Shomik Jain, Roxanna Pakkar, David Becerra, Zhonghao Shi,
Eric Deng, Rhianna Lee, Gisele Ragusa, and Maja Mataric (2019). “Long-term personalization of an
in-home socially assistive robot for children with autism spectrum disorders”. In: Frontiers in Robotics
and AI 6, p. 110. doi: https://doi.org/10.3389/frobt.2019.00110.
Crane, Diana (2012). Fashion and its social agendas: Class, gender, and identity in clothing. University of
Chicago Press. isbn: 0226117995.
Crowelly, Charles R, Michael Villanoy, Matthias Scheutzz, and Paul Schermerhornz (2009). “Gendered
voice and robot entities: perceptions and reactions of male and female subjects”. In: 2009 International
Conference on Intelligent Robots and Systems. IROS ’09. IEEE. IEEE/RSJ, pp. 3735–3741. doi:
https://doi.org/10.1109/IROS.2009.5354204.
201
Cui, Yuchen, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum (2021). “The
EMPATHIC Framework for Task Learning from Implicit Human Feedback”. In: Proceedings of the 2021
Conference on Robot Learning. Vol. 155. CoRL ’21. PMLR, pp. 604–626. url:
https://proceedings.mlr.press/v155/cui21a.html.
Cupchik, Gerald et al. (2001). “Constructivist realism: An ontology that encompasses positivist and
constructivist approaches to the social sciences”. In: Forum Qualitative Sozialforschung/Forum:
Qualitative Social Research. Vol. 2. 1. doi: https://doi.org/10.17169/fqs-2.1.968.
Dai, Andrew M and Quoc V Le (2015). “Semi-supervised Sequence Learning”. In: Advances in Neural
Information Processing Systems. Ed. by C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett.
Vol. 28. NeurIPS ’15. Curran Associates, Inc. url: https:
//proceedings.neurips.cc/paper_files/paper/2015/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf.
Darnell, Amy L and Ahoo Tabatabai (2017). “The werk that remains: Drag and the mining of the idealized
female form”. In: RuPaul’s Drag Race and the Shifting Visibility of Drag Culture. Springer, pp. 91–101.
doi: https://doi.org/10.1007/978-3-319-50618-0_7.
Dasgupta, Saurav (n.d.). Texture in fashion design: A guide to enhancing visual and tactile aesthetics. url:
https://www.dishafashioninstitute.com/texture-in-fashion-design.
Davis, Fred D (1989). “Perceived usefulness, perceived ease of use, and user acceptance of information
technology”. In: MIS quarterly, pp. 319–340. doi: https://doi.org/10.2307/249008.
Davis, Jenny L, Tony P Love, and Phoenicia Fares (2019). “Collective social identity: Synthesizing identity
theory and social identity theory using digital data”. In: Social Psychology Quarterly 82.3, pp. 254–273.
doi: https://doi.org/10.1177/0190272519851.
Deci, Edward L and Richard M Ryan (2012). “Self-determination theory”. In: Handbook of theories of social
psychology 1.20, pp. 416–436. doi: https://doi.org/10.4135/9781446249215.n21.
Demiralp, Çağatay, Michael S Bernstein, and Jeffrey Heer (2014). “Learning perceptual kernels for
visualization design”. In: IEEE transactions on visualization and computer graphics 20.12,
pp. 1933–1942. doi: https://doi.org/10.1109/TVCG.2014.2346978.
Deng, Eric, Bilge Mutlu, and Maja J Mataric (2019). “Embodiment in Socially Interactive Robots”. In:
Foundations and Trends® in Robotics 7.4, pp. 251–356. doi: http://dx.doi.org/10.1561/2300000056.
Deng, Eric C, Bilge Mutlu, and Maja J Matarić (2018). “Formalizing the design space and product
development cycle for socially interactive robots”. In: Workshop on Social Robots in the Wild at the
2018 ACM Conference on Human-Robot Interaction (HRI). Chicago, IL, USA: IEEE/ACM. url:
https://robotics.usc.edu/publications/media/uploads/pubs/pubdb_1004_
da1b28f2a0604cb89451ebbff399e0fd.pdf.
202
Dennler, Nathan, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu,
William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, and
Jess De Jesus De Pinho Pinhal (2023). “Bound by the Bounty: Collaboratively Shaping Evaluation
Processes for Queer AI Harms”. In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and
Society. AIES ’23. Montréal, QC, Canada: Association for Computing Machinery, pp. 375–386. doi:
https://doi.org/10.1145/3600211.3604682.
Dennler, Nathaniel, Amelia Cain, Erica De Guzman, Claudia Chiu, Carolee J Winstein,
Stefanos Nikolaidis, and Maja J Matarić (2023). “A metric for characterizing the arm nonuse
workspace in poststroke individuals using a robot arm”. In: Science Robotics 8.84, eadf7723. doi:
https://doi.org/10.1126/scirobotics.adf7723.
Dennler, Nathaniel, David Delgado, Daniel Zeng, Stefanos Nikolaidis, and Maja Matarić (2023). “The
rosid tool: Empowering users to design multimodal signals for human-robot collaboration”. In:
International Symposium on Experimental Robotics. Springer, pp. 3–10. doi:
https://doi.org/10.1007/978-3-031-63596-0_1.
Dennler, Nathaniel, Changxiao Ruan, Jessica Hadiwijoyo, Brenna Chen, Stefanos Nikolaidis, and
Maja Matarić (Apr. 2023). “Design Metaphors for Understanding User Expectations of Socially
Interactive Robot Embodiments”. In: Transactions on Human-Robot Interaction 12.2. doi:
https://doi.org/10.1145/3550489.
Dennler, Nathaniel, Zhonghao Shi, Stefanos Nikolaidis, and Maja Matarić (2024). “Improving User
Experience in Preference-Based Optimization of Reward Functions for Assistive Robots”. In:
International Symposium on Robotics Research. Springer. url: https://arxiv.org/abs/2411.11182.
Dennler, Nathaniel, Eura Shin, Maja Matarić, and Stefanos Nikolaidis (2021). “Design and Evaluation of a
Hair Combing System Using a General-Purpose Robotic Arm”. In: 2021 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 3739–3746. doi:
https://doi.org/10.1109/IROS51168.2021.9636768.
Dennler, Nathaniel, Catherine Yunis, Jonathan Realmuto, Terence Sanger, Stefanos Nikolaidis, and
Maja Matarić (2021). “Personalizing user engagement dynamics in a non-verbal communication game
for cerebral palsy”. In: 30th IEEE International Conference on Robot & Human Interactive
Communication. RO-MAN ’21. IEEE, pp. 873–879. doi:
https://doi.org/10.1109/RO-MAN50785.2021.9515466.
Dennler, Nathaniel S, Mina Kian, Stefanos Nikolaidis, and Maja Matarić (2025). “Designing Robot
Identity: The Role of Voice, Clothing, and Task on Robot Gender Perception”. In: International Journal
of Social Robotics. In Press. url: https://arxiv.org/abs/2404.00494.
Dennler, Nathaniel Steele, Stefanos Nikolaidis, and Maja Mataric (2025). “Using Exploratory Search to
Learn Representations for Human Preferences”. In: Proceedings of the 2025 International Conference on
Human-Robot Interaction. HRI ’25. ACM/IEEE, pp. 392–396. url: https://arxiv.org/abs/2501.01367.
203
Dennler, Nathaniel Steele, Evan Torrence, Uksang Yoo, Stefanos Nikolaidis, and Maja Mataric (2024).
“PyLips: an Open-Source Python Package to Expand Participation in Embodied Interaction”. In:
Adjunct Proceedings of the 37th Annual Symposium on User Interface Software and Technology. UIST
’24. Pittsburg, USA: ACM, pp. 1–4. doi: https://doi.org/10.1145/3672539.368674.
DeVito, Michael A, Ashley Marie Walker, and Jeremy Birnholtz (2018). “’Too Gay for Facebook’
Presenting LGBTQ+ Identity Throughout the Personal Social Media Ecosystem”. In: Proceedings of the
ACM on Human-Computer Interaction 2.CSCW, pp. 1–23. doi: https://doi.org/10.1145/3274313.
Edemekong, Peter F, Deb Bomgaars, Sukesh Sukumaran, and Shoshana B Levy (2019). Activities of daily
living. StatPearls Publishing LLC. url: https://pubmed.ncbi.nlm.nih.gov/29261878/.
Ekman, Paul and Wallace V Friesen (1978). “Facial action coding system”. In: Environmental Psychology &
Nonverbal Behavior. doi: https://doi.org/10.1037/t27734-000.
Ekström, Hanna (2013). How can a character’s personality be conveyed visually, through shape. url:
https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A637902.
Erickson, Zackory, Henry M Clever, Greg Turk, C Karen Liu, and Charles C Kemp (2018). “Deep haptic
model predictive control for robot-assisted dressing”. In: International Conference on Robotics and
Automation. ICRA ’18. IEEE, pp. 1–8. doi: https://doi.org/10.1109/ICRA.2018.8460656.
Esmaeilzadeh, Pouyan (2021). “How does IT identity affect individuals’ use behaviors associated with
personal health devices (PHDs)? An empirical study”. In: Information & Management 58.1, p. 103313.
doi: https://doi.org/10.1016/j.im.2020.103313.
Eyssel, Friederike and Frank Hegel (2012). “(s) he’s got the look: Gender stereotyping of robots”. In:
Journal of Applied Social Psychology 42.9, pp. 2213–2230. doi:
https://doi.org/10.1111/j.1559-1816.2012.00937.x.
Fan, Jianqing, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang (2020). “A Theoretical Analysis of Deep
Q-Learning”. In: Proceedings of the 2nd Conference on Learning for Dynamics and Control. Vol. 120.
L4DC ’20. PMLR, pp. 486–489. url: https://proceedings.mlr.press/v120/yang20a.html.
Fasola, Juan and Maja J Matarić (2013). “A socially assistive robot exercise coach for the elderly”. In:
Journal of Human-Robot Interaction 2.2, pp. 3–32. doi: https://doi.org/10.5898/JHRI.2.2.Fasola.
Feil-Seifer, David and Maja J Mataric (2005). “Defining socially assistive robotics”. In: 9th International
Conference on Rehabilitation Robotics. ICORR ’05. IEEE, pp. 465–468. doi:
https://doi.org/10.1109/ICORR.2005.1501143.
Feinberg, Lynn, Susan C Reinhard, Ari Houser, Rita Choula, et al. (2011). “Valuing the invaluable: 2011
update, the growing contributions and costs of family caregiving”. In: Washington, DC: AARP Public
Policy Institute 32, p. 2011. url:
https://www.beliveaulaw.net/wp-content/uploads/2011/08/AARPs-Valuing-the-Invaluable-2011-
Update-The-Growing-Contributions-and-Costs-of-Family-Caregiving.pdf.
204
Feingold-Polak, Ronit, Oren Barzel, and Shelly Levy-Tzedek (2021). “A robot goes to rehab: a novel
gamified system for long-term stroke rehabilitation using a socially assistive robot—methodology and
usability testing”. In: Journal of NeuroEngineering and Rehabilitation 18.1, pp. 1–18. doi:
https://doi.org/10.1186/s12984-021-00915-2.
FIT, The Museum at (2015). Elements and Principles of Fashion Design.
https://www.fitnyc.edu/museum/documents/elements-and-principles-of-fashion-design.pdf.
[Accessed 11-09-2024].
Fitzgerald, Tesca, Pallavi Koppol, Patrick Callaghan, Russell Quinlan Jun Hei Wong, Reid Simmons,
Oliver Kroemer, and Henny Admoni (2023). “INQUIRE: INteractive Querying for User-aware
Informative REasoning”. In: Proceedings of The 6th Conference on Robot Learning. Vol. 205. CoRL ’23.
PMLR, pp. 2241–2250. url: https://proceedings.mlr.press/v205/fitzgerald23a.html.
Fogg, Brian J and Clifford Nass (1997). “Silicon sycophants: the effects of computers that flatter”. In:
International journal of human-computer studies 46.5, pp. 551–561. doi:
https://doi.org/10.1006/ijhc.1996.0104.
Fong, Terrence, Illah Nourbakhsh, and Kerstin Dautenhahn (2003). “A survey of socially interactive
robots”. In: Robotics and autonomous systems 42.3-4, pp. 143–166. doi:
https://doi.org/10.1016/S0921-8890(02)00372-X.
Fraune, Marlena R (2020). “Our robots, our team: Robot anthropomorphism moderates group effects in
human–robot teams”. In: Frontiers in psychology 11, p. 1275. doi:
https://psycnet.apa.org/doi/10.3389/fpsyg.2020.01275.
Fraune, Marlena R, Selma Šabanović, and Eliot R Smith (2017). “Teammates first: Favoring ingroup robots
over outgroup humans”. In: 26th international symposium on robot and human interactive
communication. RO-MAN ’17’. IEEE, pp. 1432–1437. doi:
https://doi.org/10.1109/ROMAN.2017.8172492.
Fraune, Marlena R, Steven Sherrin, Selma Šabanović, and Eliot R Smith (2019). “Is human-robot
interaction more competitive between groups than between individuals?” In: 14th international
conference on human-robot interaction. HRI ’19. IEEE/ACM, pp. 104–113. doi:
https://doi.org/10.1109/HRI.2019.8673241.
Friedman, Natalie, Kari Love, RAY LC, Jenny E Sabin, Guy Hoffman, and Wendy Ju (2021). “What Robots
Need From Clothing”. In: Proceedings of the 2021 ACM Designing Interactive Systems Conference. DIS
’21. Virtual Event, USA: Association for Computing Machinery, pp. 1345–1355. url:
https://doi.org/10.1145/3461778.3462045.
Gay, John, John Christopher Pepusch, and William Nicholson (2024). Polly. url:
https://aws.amazon.com/polly/.
Goetz, Jennifer, Sara Kiesler, and Aaron Powers (2003). “Matching robot appearance and behavior to
tasks to improve human-robot cooperation”. In: The 12th IEEE International Conference on Robot and
Human Interactive Communication. RO-MAN ’03. IEEE, pp. 55–60. doi:
https://doi.org/10.1109/ROMAN.2003.1251796.
205
Goldau, Felix Ferdinand, Tejas Kumar Shastha, Maria Kyrarini, and Axel Gräser (2019). “Autonomous
multi-sensory robotic assistant for a drinking task”. In: 16th International Conference on Rehabilitation
Robotics. ICORR ’19. IEEE, pp. 210–216. doi: https://doi.org/10.1109/ICORR.2019.8779521.
Gong, Lin and Jooyoung Shin (2013). “The innovative application of surface texture in fashion and textile
design”. In: Fashion & Textile Research Journal 15.3, pp. 336–346. doi:
http://dx.doi.org/10.5805/SFTI.2013.15.3.336.
Graaf, Maartje MA de, Soumaya Ben Allouch, and Jan AGM Van Dijk (2015). “What makes robots social?:
A user’s perspective on characteristics for social human-robot interaction”. In: 7th International
Conference on Social Robotics. ICSR ’15. Springer. Paris, France, pp. 184–193. doi:
https://doi.org/10.1007/978-3-319-25554-5_19.
Grinberg, Miguel (2018). Flask web development. " O’Reilly Media, Inc." isbn: 1491991690.
Gurung, Lina (2020). “Feminist Standpoint Theory: Conceptualization and Utility.” In: Dhaulagiri: Journal
of Sociology & Anthropology 14. doi: https://doi.org/10.3126/dsaj.v14i0.27357.
Ha, David and Jürgen Schmidhuber (2018). “Recurrent World Models Facilitate Policy Evolution”. In:
NeurIPS ’18. url: https:
//proceedings.neurips.cc/paper_files/paper/2018/file/2de5d16682c3c35007e4e92982f1a2ba-Paper.pdf.
Habibian, Soheil, Ananth Jonnavittula, and Dylan P. Losey (Sept. 2022). “Here’s What I’ve Learned:
Asking Questions that Reveal Reward Learning”. In: J. Hum.-Robot Interact. 11.4. doi: 10.1145/3526107.
Hadfield-Menell, Dylan, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan (2017). “Inverse
reward design”. In: Advances in neural information processing systems. NeurIPS ’17. url: https:
//proceedings.neurips.cc/paper_files/paper/2017/file/32fdab6559cdfa4f167f8c31b9199643-Paper.pdf.
Hadsell, Raia, Sumit Chopra, and Yann LeCun (2006). “Dimensionality reduction by learning an invariant
mapping”. In: Conference on Computer Vision and Pattern Recognition. Vol. 2. CVPR ’06. IEEE,
pp. 1735–1742. doi: https://doi.org/10.1109/CVPR.2006.100.
Han, Cheol E, Sujin Kim, Shuya Chen, Yi-Hsuan Lai, Jeong-Yoon Lee, Rieko Osu, Carolee J Winstein, and
Nicolas Schweighofer (2013). “Quantifying arm nonuse in individuals poststroke”. In:
Neurorehabilitation and Neural Repair 27.5, pp. 439–447. doi:
https://doi.org/10.1177/1545968312471904.
Hansen, Nikolaus (2016). “The CMA evolution strategy: A tutorial”. In: arXiv preprint arXiv:1604.00772.
doi: https://doi.org/10.48550/arXiv.1604.00772.
Hansen, Nikolaus, Anne Auger, Raymond Ros, Steffen Finck, and Petr Pošík (2010). “Comparing results of
31 algorithms from the black-box optimization benchmarking BBOB-2009”. In: Proceedings of the 12th
annual conference companion on Genetic and evolutionary computation. GECCO ’10, pp. 1689–1696.
doi: https://doi.org/10.1145/1830761.1830790.
Hansen, Nikolaus, Sibylle D Müller, and Petros Koumoutsakos (2003). “Reducing the time complexity of
the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”. In: Evolutionary
computation 11.1, pp. 1–18. doi: 1https://doi.org/10.1162/106365603321828970.
206
Häring, Markus, Dieta Kuchenbrandt, and Elisabeth André (2014). “Would you like to play with me? how
robots’ group membership and task features influence human-robot interaction”. In: Proceedings of
the 2014 ACM/IEEE International Conference on Human-Robot Interaction. HRI ’14. Bielefeld, Germany:
Association for Computing Machinery, pp. 9–16. doi: https://doi.org/10.1145/2559636.2559673.
Hart, P Sol and Erik C Nisbet (2012). “Boomerang effects in science communication: How motivated
reasoning and identity cues amplify opinion polarization about climate mitigation policies”. In:
Communication research 39.6, pp. 701–723. doi: https://doi.org/10.1177/0093650211416646.
Hawkins, Kelsey P, Phillip M Grice, Tiffany L Chen, Chih-Hung King, and Charles C Kemp (2014).
“Assistive mobile manipulation for self-care tasks around the head”. In: Symposium on Computational
Intelligence in Robotic Rehabilitation and Assistive Technologies. CIR2AT ’14. IEEE, pp. 16–25. doi:
https://doi.org/10.1109/CIRAT.2014.7009736.
Hawkins, Kelsey P, Chih-Hung King, Tiffany L Chen, and Charles C Kemp (2012). “Informing assistive
robots with models of contact forces from able-bodied face wiping and shaving”. In: The 21st IEEE
International Symposium on Robot and Human Interactive Communication. RO-MAN ’12. IEEE,
pp. 251–258. doi: https://doi.org/10.1109/ROMAN.2012.6343762.
He, Jiyin, Pernilla Qvarfordt, Martin Halvey, and Gene Golovchinsky (2016). “Beyond actions: Exploring
the discovery of tactics from user logs”. In: Information Processing & Management 52.6, pp. 1200–1226.
doi: https://doi.org/10.1016/j.ipm.2016.05.007.
Heath, Helen and Sarah Cowley (2004). “Developing a grounded theory approach: a comparison of Glaser
and Strauss”. In: International journal of nursing studies 41.2, pp. 141–150. doi:
https://doi.org/10.1016/S0020-7489(03)00113-5.
Heerink, M., B. Krose, V. Evers, and B. Wielinga (2009). “Measuring acceptance of an assistive social
robot: a suggested toolkit”. In: The 18th International Symposium on Robot and Human Interactive
Communication. RO-MAN ’09. doi: https://doi.org/10.1109/ROMAN.2009.5326320.
Heider, Fritz and Marianne Simmel (1944). “An experimental study of apparent behavior”. In: The
American journal of psychology 57.2, pp. 243–259. doi: https://doi.org/10.2307/1416950.
Hennessy, Josephine and Michael A West (1999). “Intergroup behavior in organizations: A field test of
social identity theory”. In: Small group research 30.3, pp. 361–382. doi:
https://doi.org/10.1177/104649649903000305.
Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen,
R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney,
Ron J. Weiss, and Kevin Wilson (2017). “CNN architectures for large-scale audio classification”. In:
International Conference on Acoustics, Speech and Signal Processing. ICASSP ’17. IEEE, pp. 131–135.
doi: 10.1109/ICASSP.2017.7952132.
Hesse, Stefan, Henning Schmidt, Cordula Werner, and Anita Bardeleben (2003). “Upper and lower
extremity robotic devices for rehabilitation and for studying motor control”. In: Current opinion in
neurology 16.6, pp. 705–710. doi: https://doi.org/10.1097/01.wco.0000102630.16692.38.
207
Hidecker, Mary Jo Cooley, Nigel Paneth, Peter L Rosenbaum, Raymond D Kent, Janet Lillie,
John B Eulenberg, KEN CHESTER JR, Brenda Johnson, Lauren Michalsen, Morgan Evatt, et al. (2011).
“Developing and validating the Communication Function Classification System for individuals with
cerebral palsy”. In: Developmental Medicine & Child Neurology 53.8, pp. 704–710. doi:
https://doi.org/10.1111/j.1469-8749.2011.03996.x.
Hoffer, Elad and Nir Ailon (2015). “Deep metric learning using triplet network”. In: Similarity-based
pattern recognition: third international workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14,
2015. Proceedings 3. Springer, pp. 84–92. doi: https://doi.org/10.1007/978-3-319-24261-3_7.
Hoffmann, Laura, Nikolai Bock, and Astrid M. Rosenthal v.d. Pütten (2018). “The Peculiarities of Robot
Embodiment (EmCorp-Scale): Development, Validation and Initial Test of the Embodiment and
Corporeality of Artificial Agents Scale”. In: HRI ’18. Chicago, IL, USA: Association for Computing
Machinery, pp. 370–378. doi: https://doi.org/10.1145/3171221.3171242.
Hogan, Neville, Hermano Igo Krebs, Jain Charnnarong, Padmanabhan Srikrishna, and Andre Sharon
(1992). “MIT-MANUS: a workstation for manual therapy and training”. In: Proceedings IEEE
International Workshop on Robot and Human Communication. RO-MAN ’92. IEEE, pp. 161–165. doi:
https://doi.org/10.1109/ROMAN.1992.253895.
Honig, Shanee and Tal Oron-Gilad (2018). “Understanding and resolving failures in human-robot
interaction: Literature review and model development”. In: Frontiers in psychology 9, p. 861. doi:
https://doi.org/10.3389/fpsyg.2018.00861.
Hu, Liwen, Chongyang Ma, Linjie Luo, and Hao Li (2015). “Single-view hair modeling using a hairstyle
database”. In: ACM Transactions on Graphics (ToG) 34.4, pp. 1–9. doi:
https://doi.org/10.1145/2766931.
Huang, Chien-Ming, Sean Andrist, Allison Sauppé, and Bilge Mutlu (2015). “Using gaze patterns to
predict task intent in collaboration”. In: Frontiers in psychology 6, p. 1049. doi:
https://doi.org/10.3389/fpsyg.2015.01049.
Huggins-Daines, David, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and
Alexander I Rudnicky (2006). “Pocketsphinx: A free, real-time continuous speech recognition system
for hand-held devices”. In: International Conference on Acoustics Speech and Signal Processing. Vol. 1.
ICASSP ’06. IEEE, pp. I–I. doi: https://doi.org/10.1109/ICASSP.2006.1659988.
Hughes, Josie, Thomas Plumb-Reyes, Nicholas Charles, L Mahadevan, and Daniela Rus (2021).
“Detangling hair using feedback-driven robotic brushing”. In: 4th International Conference on Soft
Robotics. RoboSoft ’21. IEEE, pp. 487–494. doi: https://doi.org/10.1109/RoboSoft51838.2021.9479221.
Jackson, Ryan Blake and Tom Williams (2021). “A theory of social agency for human-robot interaction”.
In: Frontiers in Robotics and AI 8, p. 687726. doi: https://doi.org/10.3389/frobt.2021.687726.
Jain, Advait and Charles C Kemp (2010). “EL-E: an assistive mobile manipulator that autonomously
fetches objects from flat surfaces”. In: Autonomous Robots 28.1, pp. 45–64. doi:
https://doi.org/10.1007/s10514-009-9148-5.
208
Jain, Shomik, Balasubramanian Thiagarajan, Zhonghao Shi, Caitlyn Clabaugh, and Maja J Matarić (2020).
“Modeling engagement in long-term, in-home socially assistive robot interventions for children with
autism spectrum disorders”. In: Science Robotics 5.39. doi:
https://doi.org/10.1126/scirobotics.aaz3791.
Johnston, Michael V and Henrik Hagberg (2007). “Sex and the pathogenesis of cerebral palsy”. In:
Developmental Medicine & Child Neurology 49.1, pp. 74–78. doi:
https://doi.org/10.1017/S0012162207000199.x.
Jung, Heekyoung, Heather Wiltse, Mikael Wiberg, and Erik Stolterman (2017). “Metaphors, materialities,
and affordances: Hybrid morphologies in the design of interactive artifacts”. In: Design Studies 53,
pp. 24–46. doi: https://doi.org/10.1016/j.destud.2017.06.004.
Jung, Yoonhyuk (2011). “Understanding the role of sense of presence and perceived autonomy in users’
continued use of social virtual worlds”. In: Journal of Computer-Mediated Communication 16.4,
pp. 492–510. doi: https://doi.org/10.1111/j.1083-6101.2011.01540.x.
Kalegina, Alisa, Grace Schroeder, Aidan Allchin, Keara Berlin, and Maya Cakmak (2018). “Characterizing
the Design Space of Rendered Robot Faces”. In: HRI ’18. Chicago, IL, USA: Association for Computing
Machinery, pp. 96–104. doi: https://doi.org/10.1145/3171221.3171286.
Kapusta, Ariel, Zackory Erickson, Henry M Clever, Wenhao Yu, C Karen Liu, Greg Turk, and
Charles C Kemp (2019). “Personalized collaborative plans for robot-assisted dressing via optimization
and simulation”. In: Autonomous Robots 43, pp. 2183–2207. doi:
https://doi.org/10.1007/s10514-019-09865-0.
Kellmeyer, Philipp, Oliver Mueller, Ronit Feingold-Polak, and Shelly Levy-Tzedek (2018). “Social robots in
rehabilitation: A question of trust”. In: Science Robotics 3.21, eaat1587. doi:
https://doi.org/10.1126/scirobotics.aat1587.
Kervenoael, Ronan de, Rajibul Hasan, Alexandre Schwob, and Edwin Goh (2020). “Leveraging
human-robot interaction in hospitality services: Incorporating the role of perceived value, empathy,
and information sharing into visitors’ intentions to use social robots”. In: Tourism Management 78.
doi: https://doi.org/10.1016/j.tourman.2019.104042.
Keselman, Leonid, Katherine Shih, Martial Hebert, and Aaron Steinfeld (2023). “Optimizing Algorithms
From Pairwise User Preferences”. In: International Conference on Intelligent Robots and Systems (IROS).
IROS ’23. IEEE/RSJ, pp. 4161–4167. doi: https://doi.org/10.1109/IROS55552.2023.10342081.
Kessler, Suzanne J and Wendy McKenna (1985). Gender: An ethnomethodological approach. University of
Chicago Press. isbn: 0226432068.
Keyes, Os (2018). “The Misgendering Machines: Trans/HCI Implications of Automatic Gender
Recognition”. In: Proc. ACM Hum.-Comput. Interact. 2.CSCW. doi: https://doi.org/10.1145/3274357.
Khadpe, Pranav, Ranjay Krishna, Li Fei-Fei, Jeffrey T. Hancock, and Michael S. Bernstein (2020).
“Conceptual Metaphors Impact Perceptions of Human-AI Collaboration”. In: Proc. ACM
Hum.-Comput. Interact. 4.CSCW2. doi: https://doi.org/10.1145/3415234.
209
Kian, Mina J, Mingyu Zong, Katrin Fischer, Abhyuday Singh, Anna-Maria Velentza, Pau Sang,
Shriya Upadhyay, Anika Gupta, Misha A Faruki, Wallace Browning, et al. (2024). “Can an
LLM-powered socially assistive robot effectively and safely deliver cognitive behavioral therapy? A
study with university students”. In: arXiv preprint arXiv:2402.17937. doi:
https://doi.org/10.48550/arXiv.2402.17937.
Kim, Jingoog and Mary Lou Maher (2020). “Conceptual Metaphors for Designing Smart Environments:
Device, Robot, and Friend”. In: Frontiers in Psychology 11, p. 198. doi:
https://doi.org/10.3389/fpsyg.2020.00198.
King, Chih-Hung, Tiffany L Chen, Advait Jain, and Charles C Kemp (2010). “Towards an assistive robot
that autonomously performs bed baths for patient hygiene”. In: International Conference on Intelligent
Robots and Systems. IROS ’10. IEEE, pp. 319–324. doi: https://doi.org/10.1109/IROS.2010.5649101.
Kinova (n.d.). Assistive Solutions. url:
https://www.kinovarobotics.com/en/solutions/medical-and-assistive/assistive-solutions.
Koo, Terry K and Mae Y Li (2016). “A guideline of selecting and reporting intraclass correlation
coefficients for reliability research”. In: Journal of Chiropractic Medicine 15.2, pp. 155–163. doi:
https://doi.org/10.1016/j.jcm.2016.02.012.
Koren, Yehuda, Steffen Rendle, and Robert Bell (2021). “Advances in collaborative filtering”. In:
Recommender systems handbook, pp. 91–142. doi: https://doi.org/10.1007/978-1-0716-2197-4_3.
Korpan, Raj, Ruchira Ray, Andrea Sipos, Nathan Dennler, Max Parks, Maria E Cabrera, and
Roberto Martín-Martín (2024). “Launching Queer in Robotics [Women in Engineering]”. In: IEEE
Robotics & Automation Magazine 31.2, pp. 144–146. doi: https://doi.org/10.1109/MRA.2024.3388277.
Kouvaritakis, Basil and Mark Cannon (2016). “Model predictive control”. In: Switzerland: Springer
International Publishing 38, pp. 13–56. doi: https://doi.org/10.1007/978-3-319-24853-0.
Kuchenbrandt, Dieta, Friederike Eyssel, Simon Bobinger, and Maria Neufeld (2013). “When a robot’s
group membership matters: Anthropomorphization of Robots as a Function of Social Categorization”.
In: International Journal of Social Robotics 5.3, pp. 409–417. doi:
https://doi.org/10.1007/s12369-013-0197-8.
Kuchenbrandt, Dieta, Markus Häring, Jessica Eichberg, Friederike Eyssel, and Elisabeth André (2014).
“Keep an eye on the task! How gender typicality of tasks influence human–robot interactions”. In:
International Journal of Social Robotics 6.3, pp. 417–427. doi:
https://doi.org/10.1007/s12369-014-0244-0.
Kwon, Minae, Malte F Jung, and Ross A Knepper (2016). “Human expectations of social robots”. In: 11th
International Conference on Human-Robot Interaction. HRI ’16. ACM/IEEE, pp. 463–464. doi:
https://doi.org/10.1109/HRI.2016.7451807.
Law, Theresa, Bertram F Malle, and Matthias Scheutz (2021). “A touching connection: how observing
robotic touch can affect human trust in a robot”. In: International Journal of Social Robotics, pp. 1–17.
doi: https://doi.org/10.1007/s12369-020-00729-7.
210
Leake, Mackenzie, Kathryn Jin, Abe Davis, and Stefanie Mueller (2023). “InStitches: Augmenting Sewing
Patterns with Personalized Material-Efficient Practice”. In: Proceedings of the Conference on Human
Factors in Computing Systems. CHI ’23. Hamburg, Germany: Association for Computing Machinery.
doi: https://doi.org/10.1145/3544548.3581499.
Lee, Kimin, Laura Smith, and Pieter Abbeel (2021). “Pebble: Feedback-efficient interactive reinforcement
learning via relabeling experience and unsupervised pre-training”. In: arXiv preprint arXiv:2106.05091.
doi: https://doi.org/10.48550/arXiv.2106.05091.
Lee, Kwan Min, Namkee Park, and Hayeon Song (2005). “Can a Robot Be Perceived as a Developing
Creature? Effects of a Robot’s Long-Term Cognitive Developments on Its Social Presence and
People’s Social Responses Toward It”. In: Human communication research 31.4, pp. 538–563. doi:
https://doi.org/10.1111/j.1468-2958.2005.tb00882.x.
Lee, Stephanie Hyeyoung, Gyulee Park, Duk Youn Cho, Ha Yeon Kim, Ji-Yeong Lee, Suyoung Kim,
Si-Bog Park, and Joon-Ho Shin (2020). “Comparisons between end-effector and exoskeleton
rehabilitation robots regarding upper extremity function among chronic stroke patients with
moderate-to-severe upper limb impairment”. In: Scientific Reports 10.1, pp. 1–8. doi:
https://doi.org/10.1038/s41598-020-58630-2.
Lewis, James R (2018). “The system usability scale: past, present, and future”. In: International Journal of
Human–Computer Interaction 34.7, pp. 577–590. doi: https://doi.org/10.1080/10447318.2018.1455307.
Leyzberg, Daniel, Samuel Spaulding, and Brian Scassellati (2014). “Personalizing robot tutors to
individuals’ learning differences”. In: Proceedings of the International Conference on Human-Robot
Interaction. HRI ’14. Bielefeld, Germany: Association for Computing Machinery, pp. 423–430. doi:
https://doi.org/10.1145/2559636.2559671.
Li, Xinjian, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao,
Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W Black, et al. (2020).
“Universal phone recognition with a multilingual allophone system”. In: IEEE International
Conference on Acoustics, Speech and Signal Processing. ICASSP ’20. IEEE, pp. 8249–8253. doi:
https://doi.org/10.1109/ICASSP40776.2020.9054362.
Lin, I-Fen and Hsueh-Sheng Wu (2011). “Does informal care attenuate the cycle of ADL/IADL disability
and depressive symptoms in late life?” In: Journals of Gerontology Series B: Psychological Sciences and
Social Sciences 66.5, pp. 585–594. doi: https://doi.org/10.1093/geronb/gbr060.
Löffler, Diana, Judith Dörrenbächer, and Marc Hassenzahl (2020). “The Uncanny Valley Effect in
Zoomorphic Robots: The U-Shaped Relation Between Animal Likeness and Likeability”. In: HRI ’20.
Cambridge, United Kingdom: Association for Computing Machinery, pp. 261–270. doi:
https://doi.org/10.1145/3319502.3374788.
Long, Christine and Jason Kridner (2019). Meet Beagle: Open Source Computing. url:
https://beagleboard.org/.
Lu, Shihan, Mianlun Zheng, Matthew C Fontaine, Stefanos Nikolaidis, and Heather Culbertson (2022).
“Preference-driven texture modeling through interactive generation and search”. In: IEEE transactions
on haptics 15.3, pp. 508–520. doi: https://doi.org/10.1109/TOH.2022.3173935.
211
Maaten, Laurens van der and Geoffrey Hinton (2008). “Visualizing data using t-SNE”. In: Journal of
machine learning research 9.Nov, pp. 2579–2605. url:
http://jmlr.org/papers/v9/vandermaaten08a.html.
Macenski, Steven, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall (2022). “Robot
Operating System 2: Design, architecture, and uses in the wild”. In: Science robotics 7.66, eabm6074.
doi: https://doi.org/10.1126/scirobotics.abm6074.
Maciejasz, Paweł, Jörg Eschweiler, Kurt Gerlach-Hahn, Arne Jansen-Troy, and Steffen Leonhardt (2014).
“A survey on robotic devices for upper limb rehabilitation”. In: Journal of neuroengineering and
rehabilitation 11, pp. 1–29. doi: https://doi.org/10.1186/1743-0003-11-3.
Malik, Norjasween Abdul, Fazah Akhtar Hanapiah, Rabiatul Adawiah Abdul Rahman, and
Hanafiah Yussof (2016). “Emergence of socially assistive robotics in rehabilitation for children with
cerebral palsy: A review”. In: International Journal of Advanced Robotic Systems 13.3, p. 135. doi:
https://doi.org/10.5772/64163.
Malik, Norjasween Abdul, Hanafiah Yussof, Fazah Akhtar Hanapiah, and Saw Jo Anne (2014). “Human
Robot Interaction (HRI) between a humanoid robot and children with cerebral palsy: experimental
framework and measure of engagement”. In: Conference on Biomedical Engineering and Sciences.
IECBES ’14. IEEE, pp. 430–435. doi: https://doi.org/10.1109/IECBES.2014.7047536.
Marchionini, Gary (2006). “Exploratory search: from finding to understanding”. In: Communications of
the ACM 49.4, pp. 41–46. doi: https://doi.org/10.1145/1121949.1121979.
Marton, Zoltan Csaba, Radu Bogdan Rusu, and Michael Beetz (2009). “On Fast Surface Reconstruction
Methods for Large and Noisy Datasets”. In: Proceedings of the International Conference on Robotics and
Automation. ICRA ’09. Kobe, Japan. doi: https://doi.org/10.1109/ROBOT.2009.5152628.
Matarić, Maja J and Brian Scassellati (2016). “Socially assistive robotics”. In: Springer handbook of robotics.
Springer, pp. 1973–1994. doi: https://doi.org/10.1007/978-3-319-32552-1_73.
Mathur, Maya B and David B Reichling (2016). “Navigating a social world with robot partners: A
quantitative cartography of the Uncanny Valley”. In: Cognition 146, pp. 22–32. doi:
https://doi.org/10.1016/j.cognition.2015.09.008.
Mayo, Nancy E, Sharon Wood-Dauphinee, Robert Co^te, Liam Durcan, and Joseph Carlton (2002).
“Activity, participation, and quality of life 6 months poststroke”. In: Archives of Physical Medicine and
Rehabilitation 83.8, pp. 1035–1042. doi: https://doi.org/10.1053/apmr.2002.33984.
McColl, Derek, Wing-Yue Geoffrey Louie, and Goldie Nejat (2013). “Brian 2.1: A socially assistive robot
for the elderly and cognitively impaired”. In: IEEE Robotics & Automation Magazine 20.1, pp. 74–83.
doi: https://doi.org/10.1109/MRA.2012.2229939.
McCunn, Don (Feb. 2016). How to make sewing patterns, second edition. Design Enterprises of San
Francisco. isbn: 0932538215.
McFee, Brian (2018). pyrubberband: a python wrapper for rubberband.
https://github.com/bmcfee/pyrubberband.
212
McFee, Brian, Gert Lanckriet, and Tony Jebara (2011). “Learning Multi-modal Similarity.” In: Journal of
machine learning research 12.2. url: https://www.jmlr.org/papers/volume12/mcfee11a/mcfee11a.pdf.
McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and
Oriol Nieto (2015). “librosa: Audio and music signal analysis in python.” In: SciPy, pp. 18–24. doi:
https://doi.org/10.25080/majora-7b98e3ed-003.
McNabb, Charlie (2017). Nonbinary gender identities: History, culture, resources. Rowman & Littlefield.
isbn: 1442275529.
Mehta, Neeta (2011). “Mind-body dualism: A critique from a health perspective”. In: Mens sana
monographs 9.1, p. 202. doi: https://doi.org/10.4103/0973-1229.77436.
Messerschmidt, James W (2009). ““Doing gender” The impact and future of a salient sociological
concept”. In: Gender & Society 23.1, pp. 85–88. doi: https://doi.org/10.1177/0891243208326253.
Metzger, Jean-Claude, Olivier Lambercy, Antonella Califfi, Daria Dinacci, Claudio Petrillo, Paolo Rossi,
Fabio M Conti, and Roger Gassert (2014). “Assessment-driven selection and adaptation of exercise
difficulty in robot-assisted therapy: a pilot study with a hand rehabilitation robot”. In: Journal of
neuroengineering and rehabilitation 11, pp. 1–14. doi: https://doi.org/10.1186/1743-0003-11-154.
Mitchell, Wade J, Kevin A Szerszen Sr, Amy Shirong Lu, Paul W Schermerhorn, Matthias Scheutz, and
Karl F MacDorman (2011). “A mismatch in the human realism of face and voice produces an uncanny
valley”. In: i-Perception 2.1, pp. 10–12. doi: https://doi.org/10.1068/i041.
Mnih, Volodymyr, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley,
Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu (2016). “Asynchronous methods for deep
reinforcement learning”. In: Proceedings of the 33rd International Conference on International
Conference on Machine Learning. ICML ’16. New York, NY, USA: JMLR.org, pp. 1928–1937. url:
https://dl.acm.org/doi/abs/10.5555/3045390.3045594.
Mogilski, Justin K and Lisa LM Welling (2018). “The relative contribution of jawbone and cheekbone
prominence, eyebrow thickness, eye size, and face length to evaluations of facial masculinity and
attractiveness: A conjoint data-driven approach”. In: Frontiers in psychology 9, p. 2428. doi:
https://doi.org/10.3389/fpsyg.2018.02428.
Moon, Ji-Won and Young-Gul Kim (2001). “Extending the TAM for a World-Wide-Web context”. In:
Information & management 38.4, pp. 217–230. doi: https://doi.org/10.1016/S0378-7206(00)00061-6.
Moro, Christina, Goldie Nejat, and Alex Mihailidis (2018). “Learning and Personalizing Socially Assistive
Robot Behaviors to Aid with Activities of Daily Living”. In: J. Hum.-Robot Interact. 7.2. doi:
https://doi.org/10.1145/3277903.
Müllner, Daniel (2011). “Modern hierarchical, agglomerative clustering algorithms”. In: arXiv preprint
arXiv:1109.2378. doi: https://doi.org/10.48550/arXiv.1109.2378.
213
Mumm, Jonathan and Bilge Mutlu (2011). “Human-robot proxemics: physical and psychological
distancing in human-robot interaction”. In: Proceedings of the 6th International Conference on
Human-Robot Interaction. HRI ’11. Lausanne, Switzerland: Association for Computing Machinery,
pp. 331–338. doi: https://doi.org/10.1145/1957656.1957786.
Myers, Brad (1994). “Challenges of HCI design and implementation”. In: Interactions 1.1, pp. 73–83. doi:
https://doi.org/10.1145/174800.174808.
Myers, Vivek, Erdem Biyik, Nima Anari, and Dorsa Sadigh (2022). “Learning Multimodal Rewards from
Rankings”. In: Proceedings of the 5th Conference on Robot Learning. Vol. 164. Proceedings of Machine
Learning Research. PMLR, pp. 342–352. url: https://proceedings.mlr.press/v164/myers22a.html.
Namaste, Viviane (2000). Invisible lives: The erasure of transsexual and transgendered people. University of
Chicago Press. isbn: 0226568105.
Nanavati, Amal, Patricia Alves-Oliveira, Tyler Schrenk, Ethan K. Gordon, Maya Cakmak, and
Siddhartha S. Srinivasa (2023). “Design Principles for Robot-Assisted Feeding in Social Contexts”. In:
Proceedings of the International Conference on Human-Robot Interaction. HRI ’23. Stockholm, Sweden:
Association for Computing Machinery, pp. 24–33. doi: https://doi.org/10.1145/3568162.3576988.
Nanavati, Amal, Vinitha Ranganeni, and Maya Cakmak (2023). “Physically assistive robots: A systematic
review of mobile and manipulator robots that physically assist people with disabilities”. In: Annual
Review of Control, Robotics, and Autonomous Systems 7. doi:
https://doi.org/10.1146/annurev-control-062823-024352.
Nass, Clifford and Youngme Moon (2000). “Machines and mindlessness: Social responses to computers”.
In: Journal of social issues 56.1, pp. 81–103. doi: https://doi.org/10.1111/0022-4537.00153.
Nass, Clifford, Jonathan Steuer, and Ellen R Tauber (1994). “Computers are social actors”. In: Proceedings
of the SIGCHI conference on Human factors in computing systems, pp. 72–78. url:
https://dl.acm.org/doi/pdf/10.1145/191666.191703.
Nemlekar, Heramb, Neel Dhanaraj, Angelos Guan, Satyandra K. Gupta, and Stefanos Nikolaidis (2023).
“Transfer Learning of Human Preferences for Proactive Robot Assistance in Assembly Tasks”. In:
Proceedings of the International Conference on Human-Robot Interaction. HRI ’23. Stockholm, Sweden:
Association for Computing Machinery, pp. 575–583. doi: https://doi.org/10.1145/3568162.3576965.
Nenna, Federica and Luciano Gamberini (2022). “The influence of gaming experience, gender and other
individual factors on robot teleoperations in vr”. In: 17th International Conference on Human-Robot
Interaction. HRI ’22. IEEE, pp. 945–949. doi: https://doi.org/10.1109/HRI53351.2022.9889669.
Ng, Andrew Y and Stuart Russell (2000). “Algorithms for inverse reinforcement learning.” In: Proceedings
of 17th International Conference on Machine Learning. Vol. 1. ICML ’00, p. 2.
Ng, Johan Y.Y., Nikos Ntoumanis, Cecilie Thøgersen-Ntoumani, Edward L. Deci, Richard M. Ryan,
Joan L. Duda, and Geoffrey C. Williams (2012). “Self-determination theory applied to health contexts:
A meta-analysis”. In: Perspectives on psychological science 7.4, pp. 325–340. doi:
https://doi.org/10.1177/1745691612447309.
214
NHIS (Nov. 2019). Table A-10. Difficulties in physical functioning among adults aged 18 and over, by selected
characteristics: United States, 2018. url: https://www.cdc.gov/nchs/nhis/shs/tables.htm.
Nikolaidis, Stefanos, David Hsu, and Siddhartha Srinivasa (2017). “Human-robot mutual adaptation in
collaborative tasks: Models and experiments”. In: The International Journal of Robotics Research 36.5-7,
pp. 618–634. doi: https://doi.org/10.1177/0278364917690593.
Nikolaidis, Stefanos, Ramya Ramakrishnan, Keren Gu, and Julie Shah (2015). “Efficient Model Learning
from Joint-Action Demonstrations for Human-Robot Collaborative Tasks”. In: Proceedings of the
Tenth Annual International Conference on Human-Robot Interaction. HRI ’15. Portland, Oregon, USA:
Association for Computing Machinery, pp. 189–196. doi: https://doi.org/10.1145/2696454.2696455.
Nomura, Tatsuya (2017). “Robots and gender”. In: Gender and the Genome 1.1, pp. 18–26. doi:
https://doi.org/10.1089/gg.2016.29002.nom.
Nomura, Tatsuya, Tomohiro Suzuki, Takayuki Kanda, and Kensuke Kato (2006). “Measurement of
negative attitudes toward robots”. In: Interaction Studies 7.3, pp. 437–454. doi:
https://doi.org/10.1075/is.7.3.14nom.
O’Connell, Amy, Ashveen Banga, Jennifer Ayissi, Nikki Yaminrafie, Ellen Ko, Andrew Le,
Bailey Cislowski, and Maja Mataric (2024). “Design and Evaluation of a Socially Assistive Robot
Schoolwork Companion for College Students with ADHD”. In: Proceedings of the 2024 ACM/IEEE
International Conference on Human-Robot Interaction. HRI ’24. Boulder, CO, USA: Association for
Computing Machinery, pp. 533–541. doi: https://doi.org/10.1145/3610977.3634929.
Oertel, Catharine, Ginevra Castellano, Mohamed Chetouani, Jauwairia Nasir, Mohammad Obaid,
Catherine Pelachaud, and Christopher Peters (2020). “Engagement in Human-Agent Interaction: An
Overview”. In: Frontiers in Robotics and AI 7, p. 92. issn: 2296-9144. doi: 10.3389/frobt.2020.00092.
Oliveira, Ana C de, Chad G Rose, Kevin Warburton, Evan M Ogden, Bob Whitford, Robert K Lee, and
Ashish D Deshpande (2019). “Exploring the capabilities of harmony for upper-limb stroke therapy”.
In: 16th International Conference on Rehabilitation Robotics. ICORR ’19. IEEE, pp. 637–643. doi:
https://doi.org/10.1109/ICORR.2019.8779558.
Organization, World Health (2001). IFC: International Classification of Functioning, Disability and Health.
isbn: 9241545429.
Oskoui, Maryam, Franzina Coutinho, Jonathan Dykeman, Nathalie Jette, and Tamara Pringsheim (2013).
“An update on the prevalence of cerebral palsy: a systematic review and meta-analysis”. In:
Developmental Medicine & Child Neurology 55.6, pp. 509–519. doi:
https://doi.org/10.1111/dmcn.12080.
Paepcke, Steffi and Leila Takayama (2010). “Judging a bot by its cover: An experiment on expectation
setting for personal robots”. In: 2010 5th ACM/IEEE International Conference on Human-Robot
Interaction (HRI). IEEE, pp. 45–52. doi: https://doi.org/10.1109/HRI.2010.5453268.
215
Paetzel, Maike, Christopher Peters, Ingela Nyström, and Ginevra Castellano (2016). “Effects of
multimodal cues on children’s perception of uncanniness in a social robot”. In: Proceedings of the 18th
ACM International Conference on Multimodal Interaction. ICMI ’16. Tokyo, Japan: Association for
Computing Machinery, pp. 297–301. doi: https://doi.org/10.1145/2993148.2993157.
Palumbo, Letizia, Nicole Ruta, and Marco Bertamini (2015). “Comparing angular and curved shapes in
terms of implicit associations and approach/avoidance responses”. In: PloS one 10.10, e0140043. doi:
https://doi.org/10.1371/journal.pone.0140043.
Pandey, Amit Kumar and Rodolphe Gelin (2018). “A mass-produced sociable humanoid robot: Pepper:
The first machine of its kind”. In: IEEE Robotics & Automation Magazine 25.3, pp. 40–48. doi:
https://doi.org/10.1109/MRA.2018.2833157.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
E. Duchesnay (2011). “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning
Research 12, pp. 2825–2830. url:
https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf.
Pelley, Elaine and Molly Carnes (2020). “When a specialty becomes “women’s work”: trends in and
implications of specialty gender segregation in medicine”. In: Academic Medicine 95.10,
pp. 1499–1506. doi: https://doi.org/10.1097/ACM.0000000000003555.
Peng, Andi, Andreea Bobu, Belinda Z. Li, Theodore R. Sumers, Ilia Sucholutsky, Nishanth Kumar,
Thomas L. Griffiths, and Julie A. Shah (2024). “Preference-Conditioned Language-Guided
Abstraction”. In: Proceedings of the International Conference on Human-Robot Interaction. HRI ’24.
Boulder, CO, USA: Association for Computing Machinery, pp. 572–581. doi:
https://doi.org/10.1145/3610977.3634930.
Pernet, Cyril R and Pascal Belin (2012). “The role of pitch and timbre in voice gender categorization”. In:
Frontiers in psychology 3, p. 23. doi: https://doi.org/10.3389/fpsyg.2012.00023.
Phillips, Elizabeth, Xuan Zhao, Daniel Ullman, and Bertram F. Malle (2018). “What is Human-like?
Decomposing Robots’ Human-like Appearance Using the Anthropomorphic roBOT (ABOT)
Database”. In: Proceedings of the International Conference on Human-Robot Interaction. HRI ’18.
Chicago, IL, USA: Association for Computing Machinery, pp. 105–113. doi:
https://doi.org/10.1145/3171221.3171268.
Pinquart, Martin and Silvia Sörensen (2003). “Differences between caregivers and noncaregivers in
psychological health and physical health: a meta-analysis.” In: Psychology and aging 18.2, p. 250. doi:
https://doi.org/10.1037/0882-7974.18.2.250.
Porter, Charlie (2022). Valentino Rosso. en. isbn: 9781649801807.
Powers, Aaron, Adam DI Kramer, Shirlene Lim, Jean Kuo, Sau-lai Lee, and Sara Kiesler (2005). “Eliciting
information from people with a gendered humanoid robot”. In: International Conference on Robot and
Human Interactive Communication. RO-MAN ’05. IEEE, pp. 158–163. doi:
https://doi.org/10.1109/ROMAN.2005.1513773.
216
Preves, Sharon E (2003). Intersex and identity: The contested self. Rutgers University Press. isbn:
0813532299.
Puts, David Andrew, Steven JC Gaulin, and Katherine Verdolini (2006). “Dominance and the evolution of
sexual dimorphism in human voice pitch”. In: Evolution and human behavior 27.4, pp. 283–296. doi:
https://doi.org/10.1016/j.evolhumbehav.2005.11.003.
QTrobot: Humanoid social robot for research and teaching (2020). url:
http://luxai.com/qtrobot-for-research/.
Queerinai, Organizers Of, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker,
Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubicka, Hang Yuan, Hetvi J, Huan Zhang,
Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth,
Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx Mclean, Pan Xu,
A Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, St John, Tanvi Anand,
Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh,
Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko,
Andrew Mcnamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Dong, Jackie Kay, Manu Saraswat,
Nikhil Vytla, and Luke Stark (2023). “Queer In AI: A Case Study in Community-Led Participatory AI”.
In: Proceedings of the Conference on Fairness, Accountability, and Transparency. FAccT ’23. Chicago, IL,
USA: Association for Computing Machinery, pp. 1882–1895. doi:
https://doi.org/10.1145/3593013.3594134.
Quigley, Morgan, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler,
Andrew Y Ng, et al. (2009). “ROS: an open-source Robot Operating System”. In: ICRA workshop on
open source software. Vol. 3. 3.2. Kobe, Japan, p. 5. url:
https://ai.stanford.edu/~mquigley/papers/icra2009-ros.pdf.
Radlinski, Filip and Thorsten Joachims (2005). “Query chains: learning to rank from implicit feedback”.
In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data
Mining. KDD ’05. Chicago, Illinois, USA: Association for Computing Machinery, pp. 239–248. doi:
https://doi.org/10.1145/1081870.1081899.
Rae, Irene, Leila Takayama, and Bilge Mutlu (2013). “The influence of height in robot-mediated
communication”. In: 8th ACM/IEEE International Conference on Human-Robot Interaction. HRI ’13.
IEEE, pp. 1–8. doi: 10.1109/HRI.2013.6483495.
Raghunath, Nisha, Paris Myers, Christopher A Sanchez, and Naomi T Fitter (2021). “Women Are Funny:
Influence of Apparent Gender and Embodiment in Robot Comedy”. In: International Conference on
Social Robotics. Springer, pp. 3–13. doi: https://doi.org/10.1007/978-3-030-90525-5_1.
Raghunath, Nisha, Christopher A Sanchez, and Naomi T Fitter (2022). “Robot Comedy (is) Special: A
Surprising Lack of Bias for Gendered Robotic Comedians”. In: International Conference on Social
Robotics. Springer, pp. 663–673. doi: https://doi.org/10.1007/978-3-030-90525-5_1.
217
Rahdari, Behnam, Peter Brusilovsky, Dmitriy Babichenko, Eliza Beth Littleton, Ravi Patel, Jaime Fawcett,
and Zara Blum (2020). “Grapevine: A profile-based exploratory search and recommendation system
for finding research advisors”. In: Proceedings of the Association for Information Science and
Technology 57.1, e271. doi: https://doi.org/10.1002/pra2.271.
Rao, Vignesh (2023). Offline Text To Speech (TTS) converter for Python.
https://github.com/thevickypedia/py3-tts.
Rayaprol, Aparna (2016). “Feminist research: Redefining methodology in the social sciences”. In:
Contributions to Indian Sociology 50.3, pp. 368–388. doi: https://doi.org/10.1177/0069966716657460.
Realmuto, Jonathan and Terence Sanger (2019). “A robotic forearm orthosis using soft fabric-based
helical actuators”. In: 2nd International Conference on Soft Robotics. RoboSoft ’19. IEEE, pp. 591–596.
doi: https://doi.org/10.1109/ROBOSOFT.2019.8722759.
Reed, Colorado J, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li,
Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, and Trevor Darrell (2022).
“Self-Supervised Pretraining Improves Self-Supervised Pretraining”. In: Proceedings of the Winter
Conference on Applications of Computer Vision. WACV ’22. doi:
https://doi.org/10.48550/arXiv.2103.12718.
Reeve, Johnmarshall (2002). Self-determination theory applied to educational settings. University of
Rochester Press. isbn: 1580461565.
Ren, Yupeng, Hyung-Soon Park, and Li-Qun Zhang (2009). “Developing a whole-arm exoskeleton robot
with hand opening and closing mechanism for upper limb stroke rehabilitation”. In: International
Conference on Rehabilitation Robotics. ICORR ’09. IEEE, pp. 761–765. doi:
https://doi.org/10.1109/ICORR.2009.5209482.
Rensink, Marijke, Marieke Schuurmans, Eline Lindeman, and Thora Hafsteinsdottir (2009).
“Task-oriented training in rehabilitation after stroke: systematic review”. In: Journal of Advanced
Nursing 65.4, pp. 737–754. doi: https://doi.org/10.1111/j.1365-2648.2008.04925.x.
Reysen, Stephen, Iva Katzarska-Miller, Sundé M Nesbit, and Lindsey Pierce (2013). “Further validation of
a single-item measure of social identification”. In: European Journal of Social Psychology 43.6,
pp. 463–470. doi: https://doi.org/10.1002/ejsp.1973.
Rifinski, Danielle, Hadas Erel, Adi Feiner, Guy Hoffman, and Oren Zuckerman (2020).
“Human-human-robot interaction: robotic object’s responsive gestures improve interpersonal
evaluation in human interaction”. In: Human–Computer Interaction, pp. 1–27. doi:
https://doi.org/10.1080/07370024.2020.1719839.
Robben, Daan, Eriko Fukuda, and Mirjam De Haas (2023). “The Effect of Gender on Perceived
Anthropomorphism and Intentional Acceptance of a Storytelling Robot”. In: Companion of the 2023
ACM/IEEE International Conference on Human-Robot Interaction. HRI ’23. Stockholm, Sweden:
Association for Computing Machinery, pp. 495–499. doi: https://doi.org/10.1145/3568294.3580134.
Robinson, Howard (2023). “Dualism”. In: The Stanford Encyclopedia of Philosophy. Ed. by Edward N. Zalta
and Uri Nodelman. Spring 2023. Metaphysics Research Lab, Stanford University.
218
Robinson, Nicole, Brendan Tidd, Dylan Campbell, Dana Kulić, and Peter Corke (2023). “Robotic Vision
for Human-Robot Interaction and Collaboration: A Survey and Systematic Review”. In: J. Hum.-Robot
Interact. 12.1. doi: https://doi.org/10.1145/3570731.
Roca, Juan Carlos and Marylène Gagné (2008). “Understanding e-learning continuance intention in the
workplace: A self-determination theory perspective”. In: Computers in human behavior 24.4,
pp. 1585–1604. doi: https://doi.org/10.1016/j.chb.2007.06.001.
Rossi, Silvia, Francois Ferland, and Adriana Tapus (2017). “User profiling and behavioral adaptation for
HRI: A survey”. In: Pattern Recognition Letters 99, pp. 3–12. doi:
https://doi.org/10.1016/j.patrec.2017.06.002.
Rudofsky, Bernard (1947). Are clothes modern? An essay on contemporary apparel. P. Theobold. url:
https://www.moma.org/calendar/exhibitions/3159.
Rudovic, Ognjen, Jaeryoung Lee, Miles Dai, Björn Schuller, and Rosalind W Picard (2018). “Personalized
machine learning for robot perception of affect and engagement in autism therapy”. In: Science
Robotics 3.19. doi: https://doi.org/10.1126/scirobotics.aao6760.
Rudovic, Ognjen, Jaeryoung Lee, Lea Mascarell-Maricic, Björn W Schuller, and Rosalind W Picard (2017).
“Measuring engagement in robot-assisted autism therapy: A cross-cultural study”. In: Frontiers in
Robotics and AI 4, p. 36. doi: https://doi.org/10.3389/frobt.2017.00036.
Rudovic, Ognjen, Hae Won Park, John Busche, Björn Schuller, Cynthia Breazeal, and Rosalind W. Picard
(2019). “Personalized Estimation of Engagement From Videos Using Active Learning With Deep
Reinforcement Learning”. In: Conference on Computer Vision and Pattern Recognition Workshops.
CVPRW ’19. IEEE, pp. 217–226. doi: 10.1109/CVPRW.2019.00031.
Russell, Stuart and Peter Norvig (2002). Artificial intelligence: a modern approach. Prentice Hall. isbn:
9781292153964.
Rusu, Radu Bogdan and Steve Cousins (May 2011). “3D is here: Point Cloud Library (PCL)”. In: IEEE
International Conference on Robotics and Automation (ICRA). Shanghai, China. doi:
https://doi.org/10.1109/ICRA.2011.5980567.
Ryan, Richard M, C Scott Rigby, and Andrew Przybylski (2006). “The motivational pull of video games: A
self-determination theory approach”. In: Motivation and emotion 30, pp. 344–360. url:
https://www.scirp.org/journal/paperinformation?paperid=70850.
Sadigh, Dorsa, Anca Dragan, Shankar Sastry, and Sanjit Seshia (2017). Active preference-based learning of
reward functions. doi: https://doi.org/10.15607/rss.2017.xiii.053.
Saito, Shunsuke, Liwen Hu, Chongyang Ma, Hikaru Ibayashi, Linjie Luo, and Hao Li (2018). “3D hair
synthesis using volumetric variational autoencoders”. In: ACM Trans. Graph. 37.6. issn: 0730-0301.
doi: https://doi.org/10.1145/3272127.3275019.
Salih, Sara (2007). “On Judith butler and performativity”. In: Sexualities and communication in everyday
life: A reader, pp. 55–68. url: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=
a3e78728e9292baa6289c258a8667e0452c82f68.
219
Sanderson, Mark and W Bruce Croft (2012). “The history of information retrieval research”. In:
Proceedings of the IEEE 100.Special Centennial Issue, pp. 1444–1451. doi:
https://doi.org/10.1109/JPROC.2012.2189916.
Sandygulova, Anara and Gregory MP O’Hare (2018). “Age-and gender-based differences in children’s
interactions with a gender-matching robot”. In: International Journal of Social Robotics 10.5,
pp. 687–700. doi: https://doi.org/10.1007/s12369-018-0472-9.
Sanger, Terence D (2004). “Toward a definition of childhood dystonia”. In: Current opinion in pediatrics
16.6, pp. 623–627. doi: https://doi.org/10.1097/01.mop.0000142487.90041.a2.
Schaffer, Jonathan (2018). “Monism”. In: The Stanford Encyclopedia of Philosophy. Ed. by Edward N. Zalta.
Winter 2018. Metaphysics Research Lab, Stanford University. url:
https://plato.stanford.edu/archives/win2018/entries/monism/.
Scheurman, Morgan Klaus, Katta Spiel, Oliver L. Haimson, Foad Hamidi, and Stacy M. Branham (May
2020). HCI Gender Guidelines. url: https://www.morgan-klaus.com/gender-guidelines.html.
Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz (2015). “Trust Region
Policy Optimization”. In: Proceedings of Machine Learning Research 37, pp. 1889–1897. url:
https://proceedings.mlr.press/v37/schulman15.html.
Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov (2017). “Proximal policy
optimization algorithms”. In: arXiv preprint arXiv:1707.06347. doi:
https://doi.org/10.48550/arXiv.1707.06347.
Sebo, Sarah, Brett Stoll, Brian Scassellati, and Malte F. Jung (2020). “Robots in Groups and Teams: A
Literature Review”. In: Proc. ACM Hum.-Comput. Interact. 4.CSCW2. doi:
https://doi.org/10.1145/3415247.
Selvaggio, Mario, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano (2021).
“Autonomy in physical human-robot interaction: A brief survey”. In: IEEE Robotics and Automation
Letters 6.4, pp. 7989–7996. doi: https://doi.org/10.1109/LRA.2021.3100603.
Shamoi, Pakizar, Atsushi Inoue, and Hiroharu Kawanaka (2020). “Modeling aesthetic preferences: Color
coordination and fuzzy sets”. In: Fuzzy Sets and Systems 395, pp. 217–234. doi:
https://doi.org/10.1016/j.fss.2019.02.014.
Shannon, Claude Elwood (1948). “A mathematical theory of communication”. In: The Bell system technical
journal 27.3, pp. 379–423. doi: https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.
Sharma, Pratyusha, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans,
Antonio Torralba, Jacob Andreas, and Dieter Fox (2022). “Correcting robot plans with natural
language feedback”. In: arXiv preprint arXiv:2204.05186. doi:
https://doi.org/10.48550/arXiv.2204.05186.
Sheldon, Kennon M, Geoffrey Williams, and Thomas Joiner (2008). Self-determination theory in the clinic:
Motivating physical and mental health. Yale University Press. isbn: 0300128665.
220
Shi, Di, Wuxiang Zhang, Wei Zhang, and Xilun Ding (2019). “A review on lower limb rehabilitation
exoskeleton robots”. In: Chinese Journal of Mechanical Engineering 32.1, pp. 1–11. doi:
https://doi.org/10.1186/s10033-019-0389-8.
Shi, Ke, Aiguo Song, Ye Li, Huijun Li, Dapeng Chen, and Lifeng Zhu (2021). “A cable-driven three-DOF
wrist rehabilitation exoskeleton with improved performance”. In: Frontiers in Neurorobotics 15,
p. 664062. doi: https://doi.org/10.3389/fnbot.2021.664062.
Shi, Zhonghao, Thomas R. Groechel, Shomik Jain, Kourtney Chima, Ognjen (Oggi) Rudovic, and
Maja J. Matarić (2022). “Toward Personalized Affect-Aware Socially Assistive Robot Tutors for
Long-Term Interventions with Children with Autism”. In: J. Hum.-Robot Interact. 11.4. doi:
https://doi.org/10.1145/3526111.
Shi, Zhonghao, Amy O’Connell, Zongjian Li, Siqi Liu, Jennifer Ayissi, Guy Hoffman,
Mohammad Soleymani, and Maja J Matarić (2024). “Build Your Own Robot Friend: An Open-Source
Learning Module for Accessible and Engaging AI Education”. In: Proceedings of the Conference on
Artificial Intelligence. Vol. 38. AAAI ’24 21. AAAI, pp. 23137–23145. doi:
https://doi.org/10.1609/aaai.v38i21.30359.
Short, Elaine, Dale Short, Yifeng Fu, and Maja J Mataric (2017). “Sprite: Stewart platform robot for
interactive tabletop engagement. Department of Computer Science, University of Southern
California”. In: Tech Report. doi: https://doi.org/10.48550/arXiv.2011.05786.
Short, Elaine Schaertl, Katelyn Swift-Spong, Hyunju Shim, Kristi M Wisniewski, Deanah Kim Zak,
Shinyi Wu, Elizabeth Zelinski, and Maja J Matarić (2017). “Understanding social interactions with
socially assistive robotics in intergenerational family groups”. In: 26th International Symposium on
Robot and Human Interactive Communication. RO-MAN ’17. IEEE, pp. 236–241. doi:
https://doi.org/10.1109/ROMAN.2017.8172308.
Shrum, Larry J (2017). “Cultivation theory: Effects and underlying processes”. In: The international
encyclopedia of media effects, pp. 1–12. doi: https://doi.org/10.1002/9781118783764.wbieme0040.
Singhal, Amit (2001). “Modern information retrieval: A brief overview”. In: IEEE Data Eng. Bull. 24.4,
pp. 35–43. url: https://masters.donntu.ru/2009/fvti/bezuglyi/library/ieee2001.pdf.
Sitikhu, Pinky, Kritish Pahi, Pujan Thapa, and Subarna Shakya (2019). “A comparison of semantic
similarity methods for maximum human interpretability”. In: Artificial intelligence for transforming
business and society. Vol. 1. AITB ’19. IEEE, pp. 1–4. doi:
https://doi.org/10.1109/AITB48515.2019.8947433.
Solá-Santiago, Frances (2024). Little Book of Bottega Veneta: The story of the iconic fashion house. en.
London, England: Welbeck Publishing Group. isbn: 9781802796421.
Song, Linsen, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, and Ran He (2021). “Pareidolia
Face Reenactment”. In: Proceedings of the Conference on Computer Vision and Pattern Recognition.
CVPR ’21. IEEE/CVF, pp. 2236–2245. doi: https://doi.org/10.48550/arXiv.2104.03061.
221
Specian, Andrew, Ross Mead, Simon Kim, Maja Matarić, and Mark Yim (2021). “Quori: A
community-informed design of a socially interactive humanoid robot”. In: IEEE Transactions on
Robotics 38.3, pp. 1755–1772. doi: https://doi.org/10.1109/TRO.2021.3111718.
Spectrum, IEEE (Sept. 2018). All Robots. url: https://robots.ieee.org/robots/.
Spitale, Micol, Sarah Okamoto, Mahima Gupta, Hao Xi, and Maja J. Matarić (Sept. 2022). “Socially
Assistive Robots as Storytellers that Elicit Empathy”. In: J. Hum.-Robot Interact. 11.4. doi:
https://doi.org/10.1145/3538409.
Standage, Martyn and Richard M Ryan (2020). “Self-determination theory in sport and exercise”. In:
Handbook of sport psychology, pp. 37–56. doi: https://doi.org/10.1002/9781119568124.ch3.
Stanford Artificial Intelligence Laboratory et al. (May 23, 2018). Robotic Operating System. Version ROS
Melodic Morenia. url: https://www.ros.org.
Stanton, Rosalyn, Louise Ada, Catherine M Dean, and Elisabeth Preston (2017). “Biofeedback improves
performance in lower limb activities more than usual therapy in people following stroke: a
systematic review”. In: Journal of Physiotherapy 63.1, pp. 11–16. doi:
https://doi.org/10.1016/j.jphys.2016.11.006.
Steinhaeusser, Sophia C., Philipp Schaper, Ohenewa Bediako Akuffo, Paula Friedrich, Jülide Ön, and
Birgit Lugrin (2021). “Anthropomorphize me! Effects of Robot Gender on Listeners’ Perception of the
Social Robot NAO in a Storytelling Use Case”. In: Companion of the 2021 ACM/IEEE International
Conference on Human-Robot Interaction. HRI ’21 Companion. Boulder, CO, USA: Association for
Computing Machinery, pp. 529–534. doi: https://doi.org/10.1145/3434074.3447228.
Sterr, Annette, Susanna Freivogel, and Dieter Schmalohr (2002). “Neurobehavioral aspects of recovery:
assessment of the learned nonuse phenomenon in hemiparetic adolescents”. In: Archives of Physical
Medicine and Rehabilitation 83.12, pp. 1726–1731. doi: https://doi.org/10.1053/apmr.2002.35660.
Stets, Jan E and Peter J Burke (2000). “Identity theory and social identity theory”. In: Social psychology
quarterly, pp. 224–237. doi: https://doi.org/10.2307/2695870.
Stiber, Maia, Russell H. Taylor, and Chien-Ming Huang (2023). “On Using Social Signals to Enable
Flexible Error-Aware HRI”. In: Proceedings of the 2023 ACM/IEEE International Conference on
Human-Robot Interaction. HRI ’23. Stockholm, Sweden: Association for Computing Machinery,
pp. 222–230. doi: https://doi.org/10.1145/3568162.3576990.
Strait, Megan K, Victoria A Floerke, Wendy Ju, Keith Maddox, Jessica D Remedios, Malte F Jung, and
Heather L Urry (2017). “Understanding the uncanny: both atypical features and category ambiguity
provoke aversion toward humanlike robots”. In: Frontiers in psychology 8, p. 1366. doi:
https://doi.org/10.3389/fpsyg.2017.01366.
Strauss, Anselm and Juliet Corbin (1998). Basics of qualitative research techniques. Thousand oaks, CA:
Sage publications. isbn: 0803959397.
222
Stumpf, Simone, Anicia Peters, Shaowen Bardzell, Margaret Burnett, Daniela Busse, Jessica Cauchard,
Elizabeth Churchill, et al. (2020). “Gender-inclusive HCI research and design: A conceptual review”.
In: Foundations and Trends® in Human–Computer Interaction 13.1, pp. 1–69. doi:
http://dx.doi.org/10.1561/1100000056.
Su, Zhidong, Fei Liang, Ha Manh Do, Alex Bishop, Barbara Carlson, and Weihua Sheng (2021).
“Conversation-based medication management system for older adults using a companion robot and
cloud”. In: IEEE Robotics and Automation Letters 6.2, pp. 2698–2705. doi:
https://doi.org/10.1109/LRA.2021.3061996.
Suguitan, Michael and Guy Hoffman (2019). “Blossom: A Handcrafted Open-Source Robot”. In: J.
Hum.-Robot Interact. 8.1. doi: https://doi.org/10.1145/3310356.
Sutton, Richard S (1988). “Learning to predict by the methods of temporal differences”. In: Machine
learning 3, pp. 9–44. doi: https://doi.org/10.1007/BF00115009.
Swift-Spong, Katelyn, Elaine Short, Eric Wade, and Maja J Matarić (2015). “Effects of comparative
feedback from a socially assistive robot on self-efficacy in post-stroke rehabilitation”. In: International
Conference on Rehabilitation Robotics. ICORR ’15. IEEE, pp. 764–769. doi:
https://doi.org/10.1109/ICORR.2015.7281294.
Swinker, Mary E and Jean D Hines (2006). “Understanding consumers’ perception of clothing quality: A
multidimensional approach”. In: International journal of consumer studies 30.2, pp. 218–223. doi:
https://doi.org/10.1111/j.1470-6431.2005.00478.x.
Szafir, Daniel and Danielle Albers Szafir (2021). “Connecting Human-Robot Interaction and Data
Visualization”. In: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot
Interaction. HRI ’21. Boulder, CO, USA: Association for Computing Machinery, pp. 281–292. doi:
https://doi.org/10.1145/3434073.3444683.
Tajfel, Henri (1974). “Social identity and intergroup behaviour”. In: Social science information 13.2,
pp. 65–93. doi: https://doi.org/10.1177/053901847401300204.
Tajfel, Henri and John C Turner (2004). “The social identity theory of intergroup behavior”. In: Political
psychology. Psychology Press, pp. 276–293.
Takebayashi, Takashi, Kayoko Takahashi, Yuho Okita, Hironobu Kubo, Kenji Hachisuka, and
Kazuhisa Domen (2022). “Impact of the robotic-assistance level on upper extremity function in stroke
patients receiving adjunct robotic rehabilitation: sub-analysis of a randomized clinical trial”. In:
Journal of NeuroEngineering and Rehabilitation 19.1, p. 25. doi:
https://doi.org/10.1186/s12984-022-00986-9.
Tapus, Adriana, Cristian Ţăpuş, and Maja J Matarić (2008). “User—robot personality matching and
assistive robot behavior adaptation for post-stroke rehabilitation therapy”. In: Intelligent Service
Robotics 1.2, pp. 169–183. doi: https://doi.org/10.1007/s11370-008-0017-4.
Taub, Edward, Jean E Crago, and Gitendra Uswatte (1998). “Constraint-induced movement therapy: A
new approach to treatment in physical rehabilitation.” In: Rehabilitation Psychology 43.2, p. 152. doi:
https://psycnet.apa.org/doi/10.1037/0090-5550.43.2.152.
223
Taub, Edward, Karen McCulloch, Gitendra Uswatte, David M Morris, Mary Bowman, and Jean Crago
(2011). “Motor activity log (mal) manual”. In: UAB training for CI therapy 1, p. 18. url:
https://www.ccts.cme.uab.edu/citherapy/images/pdf_files/CIT_Training_MAL_manual.pdf.
Tay, Benedict, Younbo Jung, and Taezoon Park (2014). “When stereotypes meet robots: the double-edge
sword of robot gender and personality in human–robot interaction”. In: Computers in Human
Behavior 38, pp. 75–84. doi: https://doi.org/10.1016/j.chb.2014.05.014.
Taylor, Rachael (2022). Tiffany & co.: The story behind the style. en. Dorking, England: Studio Press. isbn:
9781800783416.
Tillman, Bryan (2012). Creative character design. Crc Press. doi: https://doi.org/10.1201/9781351261685.
Tjanaka, Bryon, Matthew C. Fontaine, Julian Togelius, and Stefanos Nikolaidis (2022). “Approximating
gradients for differentiable quality diversity in reinforcement learning”. In: Proceedings of the Genetic
and Evolutionary Computation Conference. GECCO ’22. Boston, Massachusetts: Association for
Computing Machinery, pp. 1102–1111. doi: https://doi.org/10.1145/3512290.3528705.
Torre, Ilaria, Erik Lagerstedt, Nathaniel Dennler, Katie Seaborn, Iolanda Leite, and Éva Székely (2023).
“Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?” In: 32nd
International Conference on Robot and Human Interactive Communication. RO-MAN ’23. IEEE,
pp. 106–112. doi: https://doi.org/10.1109/RO-MAN57019.2023.10309500.
Trovato, Gabriele, Cesar Lucho, and Renato Paredes (2018). “She’s electric—the influence of body
proportions on perceived gender of robots across cultures”. In: Robotics 7.3, p. 50. doi:
https://doi.org/10.3390/robotics7030050.
Tsao, Connie W, Aaron W Aday, Zaid I Almarzooq, Alvaro Alonso, Andrea Z Beaton,
Marcio S Bittencourt, Amelia K Boehme, Alfred E Buxton, April P Carson,
Yvonne Commodore-Mensah, et al. (2022). “Heart disease and stroke statistics—2022 update: a report
from the American Heart Association”. In: Circulation 145.8, e153–e639. doi:
https://doi.org/10.1161/CIR.0000000000001052.
Uswatte, Gitendra, Edward Taub, DPPT Morris, KPPT Light, and PA Thompson (2006). “The Motor
Activity Log-28: assessing daily use of the hemiparetic arm after stroke”. In: Neurology 67.7,
pp. 1189–1194. doi: https://doi.org/10.1212/01.wnl.0000238164.90657.c2.
Venkatesh, Viswanath (2000). “Determinants of perceived ease of use: Integrating control, intrinsic
motivation, and emotion into the technology acceptance model”. In: Information systems research
11.4, pp. 342–365. doi: https://doi.org/10.1287/isre.11.4.342.11872.
Venkatesh, Viswanath and Fred D Davis (2000). “A theoretical extension of the technology acceptance
model: Four longitudinal field studies”. In: Management science 46.2, pp. 186–204. doi:
https://doi.org/10.1287/mnsc.46.2.186.11926.
Voida, Stephen, Elizabeth D. Mynatt, and W. Keith Edwards (2008). “Re-framing the desktop interface
around the activities of knowledge work”. In: Proceedings of the 21st Annual ACM Symposium on User
Interface Software and Technology. UIST ’08. Monterey, CA, USA: Association for Computing
Machinery, pp. 211–220. doi: https://doi.org/10.1145/1449715.1449751.
224
Wade, Derick T (1992). “Measurement in neurological rehabilitation.” In: Current Opinion in Neurology
and Neurosurgery 5.5, pp. 682–686. url: https://journals.lww.com/co-neurology/toc/1992/10000.
Wainer, Joshua, David J Feil-Seifer, Dylan A Shell, and Maja J Mataric (2006). “The role of physical
embodiment in human-robot interaction”. In: The 15th International Symposium on Robot and Human
Interactive Communication. RO-MAN ’06. IEEE, pp. 117–122. doi:
https://doi.org/10.1109/ROMAN.2006.314404.
Wang, Xijing and Eva G Krumhuber (2018). “Mind perception of robots varies with their economic versus
social function”. In: Frontiers in psychology 9, p. 1230. doi: https://doi.org/10.3389/fpsyg.2018.01230.
Wang, Zi, Caelan Reed Garrett, Leslie Pack Kaelbling, and Tomás Lozano-Pérez (2018). “Active model
learning and diverse action sampling for task and motion planning”. In: International Conference on
Intelligent Robots and Systems. IROS ’18. IEEE/RSJ, pp. 4107–4114. doi:
https://doi.org/10.1109/IROS.2018.8594027.
Ward, Kelly, Florence Bertails, Tae-Yong Kim, Stephen R Marschner, Marie-Paule Cani, and Ming C Lin
(2007). “A survey on hair modeling: Styling, simulation, and rendering”. In: IEEE transactions on
visualization and computer graphics 13.2, pp. 213–234. doi: https://doi.org/10.1109/TVCG.2007.30.
Watkins, Christopher JCH and Peter Dayan (1992). “Q-learning”. In: Machine learning 8, pp. 279–292. doi:
https://doi.org/10.1007/BF00992698.
Weaver, Warren (2017). “The mathematics of communication”. In: Communication theory. Routledge,
pp. 27–38. isbn: 9781315080918.
Weickert, Joachim (2003). “Coherence-enhancing shock filters”. In: Joint Pattern Recognition Symposium.
Springer, pp. 1–8. doi: https://doi.org/10.1007/978-3-540-45243-0_1.
West, Candace and Don H Zimmerman (1987). “Doing gender”. In: Gender & society 1.2, pp. 125–151. doi:
https://doi.org/10.1177/0891243287001002002.
Westbrook, Katherine E., Trevor A. Nessel, Hohman Marc H., and Matthew Varacallo (2024). Anatomy,
Head and Neck: Facial Muscles. Treasure Island (FL): StatPearls Publishing. url:
Treasure%20Island%20(FL):%20StatPearls%20Publishing.
White, Katherine, Rishad Habib, and David J Hardisty (2019). “How to SHIFT consumer behaviors to be
more sustainable: A literature review and guiding framework”. In: Journal of Marketing 83.3,
pp. 22–49. doi: https://doi.org/10.1177/0022242919825649.
Winkle, Katie, Praminda Caleb-Solly, Ailie Turton, and Paul Bremner (2018). “Social Robots for
Engagement in Rehabilitative Therapies: Design Implications from a Study with Therapists”. In:
Proceedings of the International Conference on Human-Robot Interaction. HRI ’18. Chicago, IL, USA:
Association for Computing Machinery, pp. 289–297. doi: https://doi.org/10.1145/3171221.3171273.
225
Winkle, Katie, Donald McMillan, Maria Arnelid, Katherine Harrison, Madeline Balaam, Ericka Johnson,
and Iolanda Leite (2023). “Feminist Human-Robot Interaction: Disentangling Power, Principles and
Practice for Better, More Ethical HRI”. In: Proceedings of the 2023 ACM/IEEE International Conference
on Human-Robot Interaction. HRI ’23. Stockholm, Sweden: Association for Computing Machinery,
pp. 72–82. doi: https://doi.org/10.1145/3568162.3576973.
Winstein, Carolee, Bokkyu Kim, Sujin Kim, Clarisa Martinez, and Nicolas Schweighofer (2019). “Dosage
matters: a phase IIb randomized controlled trial of motor therapy in the chronic phase after stroke”.
In: Stroke 50.7, pp. 1831–1837. doi: https://doi.org/10.1161/STROKEAHA.118.023603.
Winter, Sarah, Andrew Autry, Coleen Boyle, and Marshalyn Yeargin-Allsopp (2002). “Trends in the
prevalence of cerebral palsy in a population-based study”. In: Pediatrics 110.6, pp. 1220–1225. doi:
https://doi.org/10.1542/peds.110.6.1220.
Wolf, Daniel (2022). Rhubarb Lip Sync. https://github.com/DanielSWolf/rhubarb-lip-sync.
Wolf, Steven L, Carolee J Winstein, J Philip Miller, Edward Taub, Gitendra Uswatte, David Morris,
Carol Giuliani, Kathye E Light, Deborah Nichols-Larsen, for the EXCITE Investigators, et al. (2006).
“Effect of constraint-induced movement therapy on upper extremity function 3 to 9 months after
stroke: the EXCITE randomized clinical trial”. In: JAMA 296.17, pp. 2095–2104. doi:
https://doi.org/10.1001/jama.296.17.2095.
Wolf, Thomas, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi,
Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer,
Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger,
Mariama Drame, Quentin Lhoest, and Alexander Rush (2020). “Transformers: State-of-the-Art
Natural Language Processing”. In: ed. by Qun Liu and David Schlangen, pp. 38–45. doi:
https://doi.org/10.18653/v1/2020.emnlp-demos.6.
Xu, Qiantong, Alexei Baevski, and Michael Auli (2021). “Simple and effective zero-shot cross-lingual
phoneme recognition”. In: arXiv preprint arXiv:2109.11680. doi:
https://doi.org/10.48550/arXiv.2109.11680.
Yang, Guang-Zhong, Peter Burger, David N Firmin, and SR Underwood (1996). “Structure adaptive
anisotropic image filtering”. In: Image and Vision Computing 14.2, pp. 135–145. doi:
https://doi.org/10.1016/0262-8856(95)01047-5.
Yang, Mengjiao and Ofir Nachum (18–24 Jul 2021). “Representation Matters: Offline Pretraining for
Sequential Decision Making”. In: Proceedings of the 38th International Conference on Machine Learning.
Vol. 139. ICML ’21. PMLR, pp. 11784–11794. url: https://proceedings.mlr.press/v139/yang21h.html.
Yang, Yanwu and Panyu Zhai (2022). “Click-through rate prediction in online advertising: A literature
review”. In: Information Processing & Management 59.2, p. 102853. doi:
https://doi.org/10.1016/j.ipm.2021.102853.
Yee, Ka-Ping, Kirsten Swearingen, Kevin Li, and Marti Hearst (2003). “Faceted metadata for image search
and browsing”. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI
’03. Ft. Lauderdale, Florida, USA: Association for Computing Machinery, pp. 401–408. doi:
https://doi.org/10.1145/642611.642681.
226
Zakka, Kevin, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, and Debidatta Dwibedi
(Aug. 2022). “XIRL: Cross-embodiment Inverse Reinforcement Learning”. In: Proceedings of the 5th
Conference on Robot Learning. Vol. 164. CoRL ’22. PMLR, pp. 537–546. url:
https://proceedings.mlr.press/v164/zakka22a.html.
Zeiaee, Amin, Rana Soltani Zarrin, Andrew Eib, Reza Langari, and Reza Tafreshi (2021). “CLEVERarm: A
lightweight and compact exoskeleton for upper-limb rehabilitation”. In: IEEE Robotics and
Automation Letters 7.2, pp. 1880–1887. doi: https://doi.org/10.1109/LRA.2021.3138326.
Zhang, Juanjuan, Pieter Fiers, Kirby A Witte, Rachel W Jackson, Katherine L Poggensee,
Christopher G Atkeson, and Steven H Collins (2017). “Human-in-the-loop optimization of
exoskeleton assistance during walking”. In: Science 356.6344, pp. 1280–1284. doi:
https://doi.org/10.1126/science.aal5054.
Zhang, Leigang, Shuai Guo, and Qing Sun (2020). “Development and assist-as-needed control of an
end-effector upper limb rehabilitation robot”. In: Applied Sciences 10.19, p. 6684. doi:
https://doi.org/10.3390/app10196684.
Zhang, Meng and Youyi Zheng (2019). “Hair-GAN: Recovering 3D hair structure from a single image
using generative adversarial networks”. In: Visual Informatics 3.2, pp. 102–112. doi:
https://doi.org/10.1016/j.visinf.2019.06.001.
Zhu, Yifei, Ruchen Wen, and Tom Williams (2024). “Robots for Social Justice (R4SJ): Toward a More
Equitable Practice of Human-Robot Interaction”. In: Proceedings of the 2024 ACM/IEEE International
Conference on Human-Robot Interaction. HRI ’24. Boulder, CO, USA: Association for Computing
Machinery, pp. 850–859. doi: https://doi.org/10.1145/3610977.3634944.
Zimmerman, John, Jodi Forlizzi, and Shelley Evenson (2007). “Research through Design as a Method for
Interaction Design Research in HCI”. In: Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems. CHI ’07. San Jose, California, USA: Association for Computing Machinery,
pp. 493–502. url: https://doi.org/10.1145/1240624.1240704.
227
Abstract (if available)
Abstract
Robots are expected to be deployed in diverse environments and use cases to provide physical and social assistance to end-users. A major barrier to the widespread deployment of robots is the large variance in user preferences for how robots should perform tasks. The impact of user preferences is exacerbated in robots compared to already ubiquitous computer systems because a robot’s embodiment allows it to physically interact with the world and form social connections with users through its actions.
This dissertation explores how robots can adapt their mechanical design, physical behaviors, and social behaviors to align with users' preferences. Across these domains, we emphasize the importance of both automatic adaptation through personalization, and user-driven adaptation through customization.
First, this work identifies how robot embodiment affects expectations for interaction. We introduce design metaphors as a tool for reasoning about these expectations, and clothing design as a method to modify the robot's perceived embodiment. Second, we show how robots can learn users' preferences through physical interaction. We create an objective metric by modeling interaction with a robot system that assesses movement in post-stroke users, and we develop a novel hair-combing robotic interaction. Finally we show how robots can learn users' preferences through social interactions. We introduced a process to learn user engagement models based on robot social actions to facilitate exercise games for users with cerebral palsy. We additionally create an interface to allow users to create non-verbal signals and provide a machine learning framework to develop representations of these signals that facilitate customization. We conclude the thesis with an algorithm that allows users to quickly customize robot behaviors, both social and physical behaviors. Together, this work enables the design and implementation of assistive robotic systems that can aid a variety of users with diverse preferences.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Towards socially assistive robot support methods for physical activity behavior change
PDF
On virtual, augmented, and mixed reality for socially assistive robotics
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Managing multi-party social dynamics for socially assistive robotics
PDF
Situated proxemics and multimodal communication: space, speech, and gesture in human-robot interaction
PDF
Multiparty human-robot interaction: methods for facilitating social support
PDF
Socially assistive and service robotics for older adults: methodologies for motivating exercise and following spatial language instructions in discourse
PDF
Coordinating social communication in human-robot task collaborations
PDF
Nonverbal communication for non-humanoid robots
PDF
Algorithms and systems for continual robot learning
PDF
Quality diversity scenario generation for human robot interaction
PDF
Leveraging prior experience for scalable transfer in robot learning
PDF
Planning and learning for long-horizon collaborative manipulation tasks
PDF
Quickly solving new tasks, with meta-learning and without
PDF
High-throughput methods for simulation and deep reinforcement learning
PDF
Modeling and regulating human interaction with control affine dynamical systems
PDF
Decision support systems for adaptive experimental design of autonomous, off-road ground vehicles
PDF
Characterizing and improving robot learning: a control-theoretic perspective
PDF
Program-guided framework for your interpreting and acquiring complex skills with learning robots
Asset Metadata
Creator
Dennler, Nathaniel Steele
(author)
Core Title
Physical and social adaptation for assistive robot interactions
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2025-05
Publication Date
02/06/2025
Defense Date
12/18/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
assistive robotics,customization,personalization,physically assistive robotics,preference learning,robot embodiment,socially assistive robotics
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Matarić, Maja (
committee chair
), Narayanan, Shri (
committee member
), Nikolaidis, Stefanos (
committee member
)
Creator Email
dennler@usc.edu,nathan@dennlers.net
Unique identifier
UC11399GTHV
Identifier
etd-DennlerNat-13820.pdf (filename)
Legacy Identifier
etd-DennlerNat-13820
Document Type
Dissertation
Format
theses (aat)
Rights
Dennler, Nathaniel Steele
Internet Media Type
application/pdf
Type
texts
Source
20250211-usctheses-batch-1241
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
assistive robotics
customization
personalization
physically assistive robotics
preference learning
robot embodiment
socially assistive robotics