Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Nonverbal communication for non-humanoid robots
(USC Thesis Other)
Nonverbal communication for non-humanoid robots
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Nonverbal Communication for
Non-Humanoid Robots
Elizabeth Cha
Computer Science
University of Southern California
Los Angeles, CA 90039
December 2018
Thesis Committee:
Maja Matari´ c, USC Computer Science (Chair)
Gaurav Sukhatme, USC Computer Science
Panayiotis Georgiou, USC Electrical Engineering
Acknowledgements
First and foremost, I would like to thank my advisors, Maja Matari´ c and Terry Fong,
for everything they have done for me. Their invaluable guidance has shaped me into
the person I am today. They have supported me in all my endeavors over the years and
always provided me with the help and resources I needed to succeed. They constantly
challenged me to reach higher and think outside of the box, making me a better researcher
and engineer. I would also like to thank the other members of my committee–Gaurav
Sukhatme and Panos Georgiou— for bringing their unique, outside perspectives to my
work.
I have also been fortunate to work with many other mentors and collaborators over
the years: Aaron Steinfeild, Leila Takayama, Marynel Vazquez, Katherine Kuchenbecker,
Russ Taylor, Ed Colgate, Dan Szafir, Stefanos Nikolaidis, and Naomi Fitter. I am especially
grateful to Sidd Srinivasa and Jodi Forlizzi for advising me during my Masters, and to
Anca Dragan for teaching me how to do research. I have also been lucky to mentor
some amazing students who applied their distinctive perspectives and ideas to our work:
Christian Wagner, Lancelot Watthieu, and Emily Meschke.
Thank you to all of my friends who have kept me sane over the years and supported
me. Anum and Heba, thanks for giving me a second home in the bay anytime I needed it
and taking care of me. Mike, Anca, Chris, and Jen, thanks for all of those fun adventures
in and out of Pittsburgh. Connie and Miko, thanks for always traveling to see me and
helping me to take my mind off research.
ii
I would like to give a special thanks to Alex Heimbach. Your friendship, especially
these last four years, has meant so much to me. You have been with me for many of my
most important milestones and done anything you can to support me, including moving
to LA to live with me for no reason other than I asked.
My final thanks go to my family. I am grateful to my parents, Sung and Jae Cha, who
have sacrificed and always put me first in every way. Your support made it possible for
me to pursue any opportunity, including my PhD. I also want to thank Chris Cha, my
brother, for encouraging me and always supporting my decisionss Connor, you are the
best husband anyone could ask for. I am so lucky to have a partner who always tries to
make me happy and will do anything to help me succeed. I dedicate this thesis to you.
iii
Funding
This thesis contains work supported by a NASA Space Technology Research Fellowship
Program (NSTRF # NNX15AQ36H and the National Science Foundation (Graduate Re-
search Fellowship and grant IIS-1528121).
iv
Table of Contents
Acknowledgements ii
Funding iv
List of Figures viii
Abstract xii
1 Introduction 1
1.1 Non-Humanoid Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Communication Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Planning Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background and Related Work 9
2.1 Communication Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Perception of Nonverbal Signals . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Nonverbal Signal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Algorithms for Intelligent Interaction . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Framework for Robot Communication 26
3.1 Robot Communication Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Optimization Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Generating a Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Design of Robot Light Signals 40
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Experimental Design [S1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Light Signal Design Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Application of Robot Light Signals 64
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Experimental Design [E2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Design of Robot Auditory Signals 74
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Sound Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4 Experimental Design [S2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Application of Robot Auditory Signals 90
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 Model and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Experimental Design [E2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8 Design of Robot Multimodal Signals 101
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Experimental Design [S3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9 Application of the Communication Framework 118
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.3 Model and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.4 Experimental Design [E3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
vi
10 Supporting Nonverbal Signaling through Hardware Solutions 132
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.3 Design Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
10.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
11 Summary and Conclusions 147
11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
11.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Bibliography 153
vii
List of Figures
1.1 Non-humanoid robots, such as the Amazon warehouse robot, Waymo au-
tonomous car, Savioke Relay, and NASA Astrobee (clockwise), must use
nonverbal signals to interact with humans. . . . . . . . . . . . . . . . . . . . 2
1.2 The contributions of this thesis seek to address the three challenges of non-
verbal signaling for non-humanoid robots: robot design, signal design, and
signal usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 The communication model utilized in this work (adapted from Shannon-
Weaver, 1998). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Examples of commercial robots that utilize integrated visual displays for
interaction with humans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Examples of fictional and commercial robots that use lights to provide state
information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 The framework formulates signaling as a decision making problem. Mod-
els of the human, robot, and environment are included in the robot’s com-
munication model. At each step, the robot executes a communication action
and receives a reward depending on its interaction with a human. . . . . . 27
3.2 The Markov Decision Process (MDP) used to model the problem of non-
verbal signaling to human interactors. . . . . . . . . . . . . . . . . . . . . . . 29
3.3 The signal category is defined by the desired response from a human ob-
server to a robot’s signal and the urgency of the human’s action. . . . . . . 31
3.4 Interactions between the human and robot are treated as episodes that start
when the robot has information necessary to communicate and ends after
the communication action and subsequent response (if any) occurs. . . . . 38
4.1 The Parrot AR drone with an LED array used in this study. . . . . . . . . . 44
viii
4.2 The resulting conditions from the 5 manipulated variables in this study. . . 45
4.3 The four light pattern variations used in this study. . . . . . . . . . . . . . . 46
4.4 Analysis (1) results: checks indicate a significant mean effect for the ma-
nipulated variable (column) and dependent measure (row). . . . . . . . . . 52
4.5 The average Likert ratings (including 95% confidence intervals) for the ur-
gency, difficult to ignore, and error from Analysis (1). . . . . . . . . . . . . . 53
4.6 A decision tree for light signal generation using the signal parameters in-
vestigated in the exploratory user study presented in Section 4.3. . . . . . . 59
5.1 The simulated Astrobee free-flying robot in the ISS. . . . . . . . . . . . . . . 65
5.2 The Fetch mobile robot base is designed to transport loads across ware-
house and industrial environments. . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 The Ohmni telepresence robot is designed to enable mobile remote pres-
ence in a variety of human-oriented environments, such as the home. . . . 66
5.4 An autonomous car model employed in the study. . . . . . . . . . . . . . . . 67
5.5 Results of the signal type forced-choice measures. . . . . . . . . . . . . . . . 70
5.6 Results of the signal urgency forced-choice measures. . . . . . . . . . . . . . 71
5.7 Results of the 5-point Likert asking participants to rate the helpfulness of
the robot’s light signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1 The tonal and broadband signals generated from the Turtlebot’s motor
sounds and used in the study. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 An overhead view of the experimental setup. . . . . . . . . . . . . . . . . . . 80
6.3 A participant prepares orders, locates the robot from behind the visual
barrier, and drops off orders. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 Objective collaboration metrics: inference accuracy, inference time, and cu-
mulative percent of correct predictions by time. . . . . . . . . . . . . . . . . 85
6.5 Subjective metrics: 5-point Likert ratings for noticeability, localizability, and
annoyance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.1 The auditory signal action space used in the first application of the signaling
framework. The minimum and maximum sound intensity levels depended on
the ambient environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
ix
7.2 The reward function is composed of a reward term (task outcome) and a
cost term (interaction outcome). . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 The diagram on the left shows the data collection setup, and the photo-
graph on the right shows the actual space. . . . . . . . . . . . . . . . . . . . 95
7.4 The iRobot AVA mobile base used in the experimental data collection. . . 96
7.5 The auditory signal policies learned from the data collection for each world
state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1 An diagram of the experimental setup (left) and a participanting interacting
with the AVA mobile base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.2 Participant ratings of urgency and annoyance for auditory only signals
(top) and visual only signals (bottom) in the pilot study. . . . . . . . . . . . 109
8.3 Objective Collaboration Metrics: means (top) and ANOVA results (bottom)
of reaction time, response time, and gaze duration by availability and urgency112
8.4 Subjective Collaboration Metrics: means (top) and ANOVA results (bottom)
of participants’ ratings of attention and urgency . . . . . . . . . . . . . . . . 114
8.5 Subjective Collaboration Metrics: means (top) and ANOVA results (bottom)
of participants’ ratings of annoyance, and interaction . . . . . . . . . . . . . 115
9.1 Interactions between the human and robot are treated as episodes as pro-
posed in Chapter 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.2 We employed a Monte Carlo control method switching off between exploit-
ing the maximum value action and exploring other sub-optimal actions. . . 126
9.3 The grid world environment used for the simulated human-robot collabo-
rative task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.4 The results of the pilot experiment. As the exploration rate e decreases
(left), the average return (right) increases. . . . . . . . . . . . . . . . . . . . . 128
9.5 The results of running the Monte Carlo algorithm in the simulated human-
robot collaborative task. As the exploration rate e decreases (left), the av-
erage return (right) increases until convergence after approximately 500
interactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.6 Participants were also asked to rate the robot’s thoughtfulness in commu-
nicating on a 5-point Likert scale. . . . . . . . . . . . . . . . . . . . . . . . . . 130
10.1 Non-humanoid robot use cases, applications, and signaling requirements. . 136
x
10.2 The ModLight System: an external power pack, an Arduino micrcontroller
for driving the system, and 3D printed modules containing an LED and
light diffusing acrylic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
10.3 Several designs for the ModLight module. . . . . . . . . . . . . . . . . . . . 141
10.4 ModLight Software: C++ library that generates light behaviors and the
Visual Programming Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . 142
xi
Abstract
As robots increasingly perform tasks in a diverse set of real-world environments, they are
expected to not only operate in close proximity to humans but interact with them as well.
This has led to great interest in the communication challenges associated with the varying
degrees of coordination and collaboration required between humans and robots for these
tasks. Non-humanoid robots can benefit from the use of nonverbal signals as they often
lack the communication modalities that humans intrinsically rely on to obtain important
state information.
The goal of this thesis is to enable non-humanoid robots to intelligently utilize nonver-
bal signals to communicate information about their internal state. As interaction is a complex
process, we propose a computational framework that formalizes the robot’s communica-
tion behavior as a decision making problem under uncertainty. Building on prior work in
notification systems, this framework takes into account information about the human and
robot and attempts to balance their individual objectives to create more acceptable robot
behavior.
To inform the framework’s Markov Decision Process model, we explored the design
space of light, sound, and motion for nonverbal signaling during a human-robot collab-
oration task. We present three user studies that identify underlying signal design princi-
ples based on human perceptions. We applied the findings of these studies to interaction
scenarios in three different experiments. To increase the generalizability of this research,
we employed several types of non-humanoid robot platform that vary in appearance and
capabilities.
Finally, we applied the communication framework to a simulated human-robot collab-
oration task. A policy for the robot’s nonverbal signaling behavior was generated using
model-free reinforcement learning. This experiment evaluated the impact of the robot’s
actions on participants’ perceptions of the robot as a teammate. Results showed that
xii
the use of this framework enables the robot to not only improve its own task-oriented
outcomes but to act as a more thoughtful and considerate agent during interaction with
humans.
This research contributes to both the design and planning of nonverbal communica-
tion for non-humanoid robot platforms with both theoretically and empirically driven
methodologies. Although the number of non-humanoid robots deployed in the world
is growing, this field of research is still maturing. This work provides a foundation for
future human-robot interaction research in these areas while promoting generalizability
and standardization of robot behaviors across the diverse set of existing non-humanoid
robots.
xiii
1. Introduction
This chapter provides an overview of our approach to planning nonverbal communica-
tion and motivates the use of this framework for non-humanoid robots. In addition to
the formalization of the robot communication problem, we also outline three studies on
nonverbal signal design, three applications of their findings to human-robot interaction,
and an application of the proposed framework to a simulated human-robot collaboration
task. The chapter concludes with a description of the major contributions of this research
and an outline of the remainder of the thesis.
1.1 Non-Humanoid Robots
One of the primary goals of robotics is enabling robots to work alongside and with hu-
mans. This requires robots to be capable of not only navigating and manipulating in
human environments but coordinating and collaborating with humans as well (Khatib
et al., 1999). As robots are increasingly required to operate in concert with humans, they
must convey important information about their knowledge, condition, and actions. This
information enables humans to predict a robot’s future behavior and to plan their own
actions accordingly.
In recent years, much of human-robot interaction (HRI) research has employed hu-
manoid robots that possess highly anthropomorphic forms and features (Goodrich and
Schultz, 2007). However, many robots operating in the real world are non-humanoid.
Non-humanoid robots typically have simpler embodiments that are targeted towards spe-
cific tasks or domains (Coeckelbergh, 2011, Terada et al., 2007). As a result, these robots
(e.g., Figure 1.1) are utilized in a wide variety of settings and applications, including
healthcare, agriculture, space, industry, automotive, and service (Billingsley et al., 2008,
1
Figure 1.1: Non-humanoid robots, such as the Amazon warehouse robot, Waymo autonomous car, Savioke
Relay, and NASA Astrobee (clockwise), must use nonverbal signals to interact with humans.
Bualat et al., 2015, Fong et al., 2013, Forlizzi and DiSalvo, 2006, Kittmann et al., 2015,
Urmson et al., 2008, Wurman et al., 2008).
Since many commercial non-humanoid robots have historically been employed in in-
dustrial applications, past HRI research regarding these platforms largely focused on
issues relating to safety, operation, and control in the context of highly regulated envi-
ronments (Goodrich and Schultz, 2007, Thomaz et al., 2016). However, as non-humanoid
robots are deployed to more human-oriented environments, their behavior must evolve
to account for people’s needs and expectations (Lee et al., 2010, Paepcke and Takayama,
2010).
Since non-humanoid robots are often more machine-like in appearance, people are
likely to have different expectations than of humanoid robots (Goetz et al., 2003). Unlike
other machines or products, however, robots can also operate as autonomous, intelligent
agents that act in surprising and unpredictable ways. Therefore, it is vital that robots are
capable of accurately conveying their knowledge and capabilities in ways that humans
can easily interpret and respond to.
A key challenge for generating intuitive and expressive nonverbal signals is that most
non-humanoid robots are constrained in appearance, possessing communication modal-
ities that are significantly more limited than those employed by humans. This makes it
2
challenging to generate a wide range of unique and recognizable signals (Cha et al., 2017).
As a result, non-humanoid robots must also carefully consider how to optimally utilize
the limited number of signals they have available.
To address these communication challenges, this thesis focuses on non-humanoid
robots that have few anthropomorphic features and perform primarily functional (i.e.,
non-social) tasks. As this encompasses a wide range of platforms utilized in research and
commercial applications, we employ several types of non-humanoid robots to increase
the generalizability of this research.
1.2 Communication Modalities
This work presented in this thesis primarily employs light and nonverbal sound as chan-
nels of robot communication. These modalities are often utilized by machines and house-
hold products enabling researchers to draw from these common user experiences when
designing signals for robots. Their low cost and high salience have also made these
modalities popular in HRI and human-computer interaction (HCI) research.
Structured approaches to generating communication signals using these modalities,
however, are understudied. This results in the creation of a unique set of signals for
each robot platform, an approach that requires significant time and effort and inhibits
the standardization of robot communication behaviors. General methods for planning
a robot’s communication behavior requires moving beyond this individualized approach
and finding more systematic methods for generating nonverbal signals while maintaining
usability.
In this research, nonverbal signals are typically employed in the context of base mo-
tion, one of the only modalities that humans also possess. While communication through
motion can be highly expressive and natural, it can also interfere with a robot’s functional
task. Since the robots targeted in this research must communicate while simultaneously
completing tasks, it is important to employ signals in combination with functional, task-
related motion to mimic realistic robot scenarios outside of the laboratory.
To address the research challenges relating to nonverbal signal design, we present
three user studies (S1-S3). The goal of these studies is to inform our signaling framework
of how the robot should employ signals in the context of different world states. The
3
studies employ a mixture of auditory and visual cues, with motion acting as a source
of nonverbal leakage rather than a true signaling modality (Mutlu et al., 2009b). The
findings of these studies are a major contribution of this thesis and address one of the key
challenges of nonverbal signaling, signal design (Figure 1.2). To validate these findings
in more realistic settings, we also present three applications (E1-E3) of nonverbal robot
communication during interaction scenarios inspired by non-humanoid robots deployed
in the world.
1.3 Planning Communication
As the environments and operations of non-humanoid robots become increasingly com-
plex, there is a need for more intelligent, adaptive robot behavior. Robots will need to
model the world around them and reason about how to communicate in order to be ef-
fective and efficient interactors. This requires the development of models and algorithms
which can use information about the world to compute optimal communication policies.
Going beyond sensing and avoiding objects and/or humans in the robot’s workspace
is challenging. A primary motivation for this increase in capability is human safety.
Humans are dynamic and can quickly alter the state of the environment. Anticipating
their future actions prevents collisions, improves coordination, and is a vital component
of HRI (Jarrassé et al., 2008).
Modeling a human’s state also helps the robot to plan its own actions while taking into
account how the human will perceive the robot’s behavior. During this process, informa-
tion about perceptual conditions, such as obstructions in the field of view, poor lighting
conditions, ambient noise, and other sources of interference, can increase the effective-
ness of a robot’s communication (Martinson and Brock, 2007, Takayama and Harris, 2013,
Trafton et al., 2005).
A key aspect of our research is consideration of both the human’s and robot’s state
during communication planning. Humans often have an underyling reward function
(or goal) which differs from a robot’s (Dragan, 2017). Robots that only considers their
own goals are less likely to be accepted by humans. When planning the robot’s signal-
ing behaviors, our framework considers the humans’ preferences for interaction in order
to balance the humans and robots objectives .
4
The design of nonverbal signals can also be used to compensate for environmental
factors that impact signal perception (e.g., increasing the robot’s volume in a noisy envi-
ronment). Effective modulation of nonverbal signals requires an understanding of how
specific manipulations of a signal parameter (e.g., light color) affects humans’ percep-
tions. This knowledge enables generation of a larger vocabulary of nonverbal signals and
is critical for choosing communication actions that match the robot’s goals.
Despite the availability of information about co-located humans, current approaches
in HRI tend to have static, predetermined policies for communicating the robot’s state.
These approaches lack the robustness needed to enable robots to operate more au-
tonomously. They also assume that communication is always beneficial which can nega-
tively impact humans who are already engaged in a task.
To address these issues, we present a framework that models robot communication
as a decision making problem. The robot employs information about the world to learn
policies for its communication behavior. The proposed framework and its application
are major contributions of this thesis and address one of the key challenges of nonverbal
signaling, signal usage (Figure 1.2)
1.4 Contributions
This thesis addresses three key research problems (Figure 1.2) relating to nonverbal com-
munication for non-humanoid robots and makes the following contributions:
Framework for Robot Communication: We introduce a formalism for planning robot
signaling behaviors that aims to balance a robot’s task-based priorities with a human’s
interaction preferences. The proposed model takes the form of an Markov Decision Pro-
cess in which the robot reasons about the world state when choosing a communication
signal. This formalism is designed to account for the natural uncertainty that occurs in
the world, especially when interacting with humans, and is well suited for signaling as
perception of nonverbal signals depends on a number of factors outside of the robot’s
control. Finally, we propose methods for defining both the robot’s and human’s success
and discuss key assumptions that enable us to design a tractable evaluation.
5
Signal Design
Light
Sound
Signal Usage
Model
Algorithms
Robot Design
Signal Platforms
Figure 1.2: The contributions of this thesis seek to address the three challenges of nonverbal signaling for
non-humanoid robots: robot design, signal design, and signal usage.
Design of Nonverbal Signals: Planning robot communication requires a vocabulary of
nonverbal signals which are effective and efficient at conveying information, altering hu-
man mental models, and initiating human action. To enable the validation of our frame-
work for robot signaling, we investigated the design space of auditory and visual signals
for robot communication through three user studies (S1-S3). Unlike past approaches,
which are often tailored to the robot’s platform, these studies seek to uncover underlying
signal design principles which can aid roboticists in creating intuitive signals and sig-
naling standards across non-humanoid robot platforms. We also validated the results of
these studies in three applications (E1-E3) of HRI using a range of robot platform and
interaction scenarios. An overview of the studies and experiments are found below.
Chapter
Signaling
Modality
Communication
Framework
Light Sound
S1 4. Design of Robot Light Signals X
E1 5. Applications of Robot Light Signals X
S2 6. Design of Robot Auditory Signals X
E2 7. Application of Robot Auditory Signals X X
S3 8. Design of Robot Multimodal Signals X X
E3 10. Application of Communication Framework X X X
6
Design of Robot Hardware for Signaling: While the majority of this research is con-
cerned with problems relating to the design and planning of robot communication, we
also explore the design of hardware solutions to support signaling for non-humanoid
robots. We propose ModLight, a modular research tool consisting of a set of low cost
light blocks that can be easily reconfigured to fit a myriad of robots and applications.
The design of the system enables it to support signals across across different types of
non-humanoids platforms with a range of embodiments.
Applications of Communication Framework: As a first application, we employed a re-
duced form of the communication framework and the results of our study in auditory
signal design (S2) to a experimental data collection in a physical human-robot collabora-
tive task (E2). Then, we applied the communication framework to a simulated interaction
scenario. The use of a simulated environment enables a larger and more diverse set of
interactions between the human and robot than is feasible in real world studies. We learn
across participants using a model-free reinforcement learning algorithm that averages
sampled returns to learn optimal signaling policies. The results of this work show that
incorporating human preferences enables the robot to be a better interactor and collabo-
rator.
1.5 Outline
Chapter 2 provides background from related fields, particularly in the cognitive processes
that occur in humans during communication. The purpose of this chapter is to highlight
past research that serves as the foundation of nonverbal communication in HRI. Although
there are many competing methodologies, we primarily discuss concepts that this thesis
builds on.
Chapter 3 presents our first contribution, a framework for robot communication. This
chapter provides a theoretical overview of the communication framework, including its
Markov Decision Process model. A higher level discussion of how the model can be
applied to different HRI scenarios is also included. Finally, the optimization criteria and
methods for solving the model are proposed.
Chapter 4 presents our first user study on nonverbal signal design using light (S1).
This exploratory, video-based study investigates the design space of light signals on a
7
free-flying robot with the aim of identifying underlying human perceptions of common
light signal parameters.
Chapter 5 presents an applications of the findings of our study of robot light signal
design. We presented a video-based experiment which employs signals for notification,
warning, and error across different robot platforms and interaction scenarios (E1).
Chapter 6 presents our second user study on nonverbal signal design using sound
(S2). This study investigates the use of auditory icons for enabling localization of a small
mobile robot in the context of a human-robot collaboration task. We also compared two
types of auditory signal, tonal and broadband, with the aim of increasing localizability
while minimizing human annoyance.
Chapter 7 presents an application of the findings of our study on robot auditory signal
design in another human-robot collaborative task (E2). We also employ an earlier version
of the communication framework which reduces the MDP model to a one-state MDP .
We use offline bandit algorithms are to learning the robot’s communication policies for
localization and describe the results of this experiment.
Chapter 8 presents our final user study on nonverbal signal design using sound and
light (S3). This study investigates how light and sound can be employed to create mul-
timodal nonverbal signals for requesting help during a human-robot collaboration task.
We employ these signals in communication scenarios with varying urgency to inform the
state space of our MDP model.
Chapter 10 presents our contribution towards the design of robot hardware for sup-
porting nonverbal signaling on non-humanoid robots. We detail the design and creation
of ModLight, an open-source, modular light signaling platform for non-humanoid robots.
Software tools that enable researchers of varying levels of technical capability to program
the platform are also presented.
Chapter 9 discusses the application, implementation, and evaluation of our computa-
tional framework. We apply the communication framework to a simulated human-robot
collaboration task and a employ Monte Carlo control algorithm to learn online a more
optimal communication policy via sampled averaged returns.
Chapter 11 summarizes the research performed and its contributions. We also discuss
the impact of this thesis and its findings on HRI and other related fields of work.
8
2. Background and Related Work
This chapter reviews relevant prior work in nonverbal communication for human-robot
interaction (HRI) and presents background on key concepts from human communication,
psychology, and cognitive science utilized in our research. The purpose of this chapter is
to provide a foundation for this thesis and its contributions to human-robot communica-
tion.
First, we provide general background on communication theory and how nonverbal
communication is perceived by humans. Then, we review prior work relating to the
design of nonverbal communication. This overview draws from several diverse fields
of research but focuses on the signaling modalities employed in this work. Finally, we
review methods for planning intelligent robot behaviors for interaction.
2.1 Communication Theory
Communication is the process of transmitting information from one agent to another.
The sender encodes information into another form, such as a signal, that is transmitted to
another agent, the receiver, across a channel or medium (Shannon and Weaver, 1998). The
channel is subject to noise or environmental disturbances (e.g., ambient sounds) which can
interfere with the receiver’s ability to decode the message and obtain the contained infor-
mation. Often the receiver reacts or responds to the message, providing feedback feedback
to the sender Figure 2.1.
In this work, the sender is a robot who converts information into a nonverbal signal
and transmits the signal using a communication modality to the human receiver. The
human must then attempt to decode the signal to obtain the robot’s encoded information.
Since these signals are often abstract, the human may misinterpret or decode the signal
9
Signal
Modality
Signal
Noise
Feedback
Robot
(Encoder)
Human
(Decoder)
Figure 2.1: The communication model utilized in this work (adapted from Shannon-Weaver, 1998).
incorrectly, especially if they are unfamiliar with the robot’s signaling conventions or
there are external disturbances (e.g., environmental noise, conflicting cues).
Nonverbal communication is a powerful tool that has been studied both in the context of
human-human and human-robot interactions (Breazeal et al., 2005). In this work, we de-
fine nonverbal cues as anything the receiver perceives to provide information. This includes
the implicit, unconscious behaviors that humans typically derive meaning from (i.e., non-
verbal leakage) as well as intentional nonverbal signals generated by the sender (Breazeal
et al., 2005). Due to their goal-oriented nature and complexity of control, most robots do
not naturally emit the state-expressive cues that humans do (Mutlu et al., 2009b). Instead,
the majority of their communication must take the form of carefully planned nonverbal
signals.
Although speech is an important tool for interaction, much of human communica-
tion occurs through nonverbal channels, such as facial expression, body language, and
gaze (Knapp et al., 2013). Due to the prominence of nonverbal cues in human communi-
cation, humans are quite adept at consciously and subconsciously noticing, interpreting,
and reacting to them (Knapp et al., 2013). Therefore, a major goal of this research is
to facilitate the development and usage of intuitive nonverbal signals for robots.
10
2.2 Perception of Nonverbal Signals
Human-robot communication researchers must also consider factors that affect humans’
perception of nonverbal signals. Understanding humans’ mental perceptions of robot
behavior is a key challenge and area of topic for HRI research. In this thesis, we are
primarily concerned with how humans attribute mental states to other agents and use
this information to shape their own actions. We also discuss the effects of culture, media,
and past experience on the perception of communication, as this is a key challenge for
creating standard signaling conventions.
2.2.1 Attribution of Mental States
Several works in psychology, cognitive science, and communication support the notion
that humans naturally attempt to ascribe mental states, such as beliefs and intents, to
others (Frith and Frith, 2005, Scassellati, 2002). This cognitive ability is called theory of
mind and allows us to understand that other humans have knowledge, desires, and per-
spectives different from our own (Frith and Frith, 2005, Scassellati, 2002). Without this
metarepresentational ability, it can be difficult to understand other agents’ behavior, an
important component of effective communication and collaboration (Bauer et al., 2008,
Scassellati, 2002).
By attributing cognitive states to other agents, we are able to develop mental models
or abstract representations of the surrounding world, including other humans (Gentner
and Stevens, 2014, Johnson-Laird, 1983). These models enable humans to more accurately
estimate others’ knowledge and capabilities and to predict the effects of their actions on
the world (Buckner and Carroll, 2007, Kiesler, 2005). During communication, humans use
these models to establish common ground, a set of mutual knowledge, beliefs, and assump-
tions that enables efficient communication and minimizes misunderstandings (Kiesler,
2005, Stalnaker, 2002).
Since robots are complex and unfamiliar to most humans, one of the primary
challenges of HRI research centers on understanding humans’ mental model of the
robot (Kiesler, 2005, Kiesler and Goetz, 2002, Phillips et al., 2011). This has led to sig-
nificant work in exploring how factors, such as robot appearance, capability, and roles,
11
affect the underlying processes for mental model development (Beer et al., 2011, Kiesler
and Goetz, 2002, Phillips et al., 2011, Powers and Kiesler, 2006, Walters et al., 2008)
The results of these works often vary due to the wide range of appearances, capa-
bilities, and applications of robots, especially non-humanoid platforms (Phillips et al.,
2011, Powers and Kiesler, 2006, Walters et al., 2008). Some have suggested that in the
absence of real-life experience, we default to models of human behavior (Beer et al., 2011,
Kiesler, 2005). Others suggest a strong influence from culture and media portrayals of
robots (Bartneck, 2004, Bartneck et al., 2007, Haring et al., 2014, MacDorman et al., 2009).
These mixed results show that it is essential for researchers to better understand how
robot characteristics and actions affect human perceptions in order to appropriately de-
sign behaviors for interaction (Kiesler, 2005).
2.2.2 Reasoning About the World
Mental models also enable humans to reason about the world, including the robot’s in-
ternal state and future actions. This capability is essential for collaboration. There are
several processes which humans and robots can utilize to understand the other’s point
of view. One such process, perspective taking, involves an agent attempting to imagine an-
other’s point of view by using knowledge of the robot and environment to reason about
cognitive states. This process can also be utilized by the robot to incorporate human
preferences into signaling algorithms.
A more active process, sensemaking, occurs when humans are unfamiliar with a sit-
uation or receive discrepant cues from the world (Siino and Hinds, 2005). Unlike per-
spective taking, sensemaking does not focus on a particular agent but instead looks at
situations and events. In unfamiliar situations with robots, humans utilize cues from the
robot and environment to try to comprehend the robot’s behavior. This can be difficult
when the robot’s internal state is complex or there are inconsistencies between the robot’s
behaviors, embodiment, or both. Hence, the nonverbal signals robots utilize should be
consistent with the robot’s attributes (e.g., appearance) and other behaviors.
Robots must also engage in processes to reason about the world. Situation awareness
(SA) is a common frameworks for grounding the robot’s knowledge of the world and
consists of three levels: perception of environmental elements and events, comprehension of
their meaning, and projection of their status (Endsley, 1995). Levels of SA are also a useful
12
framework for describing the capabilities needed to perform functional robot tasks (Drury
et al., 2003, Steinfeld et al., 2006).
Higher levels of SA, including the ability to comprehend environmental elements and
events, also lends themselves to approaches in which the robot utilizes a model of the
world when determining its actions. As a major goal of this research is to enable robots to
interact intelligently, we propose a similar formalism for our communication framework.
2.2.3 Effects of Culture, Media, and Experience
Humans’ past experiences often play a large part in how they perceive a robot’s behavior.
Since most humans have limited experience interacting with robots deployed in the real
world, they tend to rely on knowledge gained through other sources, such as exposure to
robots in the media (Bartneck, 2004, Bartneck et al., 2007, Haring et al., 2014, MacDorman
et al., 2009). Fictional robots, such as Star Wars’ R2D2 and BB-8 and Disney’s WALL-E,
have long influenced humans’ ideas and expectations of robots in the real world.
Studies have also found that culture strongly affects expectations, as well as percep-
tions of robots (Bartneck, 2004). For instance, in the US, a common concern with robot
adoption is replacement of human jobs, while in Japan, concerns often center on the emo-
tion impacts of robots on society (Bartneck, 2004, Bartneck et al., 2007). Culture can also
affect the perception of communication behaviors as specific cues (e.g., thumbs up) have
different meanings across societies.
Another key factor that affects human behavior towards robots is the degree to which
we anthropomorphize them (Beer et al., 2011, Duffy, 2003, Fussell et al., 2008, Phillips
et al., 2011, Salem et al., 2014). Anthropomorphization is the process of ascribing human
characteristics to a nonhuman agent and has been of particular interest for HRI due to
the wide range of human-like features robots can possess (Duffy, 2003, Epley et al., 2007).
Past research has explored how appearance, vocal signals, and other behavioral cues affect
anthropomorphization as well as its effects on empathy, expectations, and other attitudes
towards robots (Eyssel et al., 2012, Fussell et al., 2008, Riek et al., 2009, Sundar et al., 2016).
Imbuing robots with human-like characteristics or behaviors, such as speech, also
creates certain expectations of the robot’s capability (Cha et al., 2015). However, when
the robot fails to meet these expectations, acceptance of the robot can be negatively im-
pacted (Cha et al., 2015, Goetz et al., 2003). Underestimation of the robot, however, can
13
lead to inefficiency and underutilization. Matching user expectations to the robot’s ac-
tual capabilities is challenging as there are many factors that affect human perceptions
of robots, including characteristics of the human (e.g., age, gender) (Oestreicher and Ek-
lundh, 2006).
Hence, a major goal of this work is to design and employ communication behav-
iors that create consistent and accurate expectations of robots. Since these concepts all
contribute to the way humans perceive and behave towards robots, we consider each of
these factors when designing signals and algorithms for nonverbal communication. More
broadly, these processes also impact the acceptance and adoption of robots, especially as
the presence of robots in everyday life increases, making them a vital consideration for
the field of HRI (Beer et al., 2011).
2.3 Nonverbal Signal Design
In this section, we review past work on nonverbal signal design in HRI, human-computer
interaction (HCI), and industrial design. Since many non-humanoid robots resemble ma-
chines, appliances, or computers, we draw inspiration from these fields along with human
communication for designing and employing nonverbal signals. We divide literature by
signaling modality, discuss existing signal usage, and consider how to employ similar
mechanisms for designing signals using our target communication modalities.
2.3.1 Implicit Motion
A more recent area of HRI research explores algorithms for generating implicit motion
cues for robots (Bodden et al., 2016, Dragan, 2015). Motion can be a highly expressive
form of communication, and studies have shown that even babies attribute complex prop-
erties such as intentionality to motion (Buren et al., 2016). In this section, we focus on
implicit motion cues contained in the robot’s goal-directed motion, as explicit motion
cues often take the form of gaze or gesture.
Since traditional path planning algorithms primarily focus on quickly generating a
collision-free trajectory from the start to the goal, the resulting motion can be unpre-
dictable, complicated, and unnatural (Dragan, 2015, Dragan et al., 2015). Several re-
cent works in HRI have explored methods for making robot motion more human-like,
14
predictable, intent-expressive, or legible for use in human-oriented applications (Dragan
et al., 2013, Kruse et al., 2012, Szafir et al., 2014). Past studies have found that this more
expressive motion promotes safety, enhances trust, and enables coordination, making it a
valuable tool for HRI (Dragan et al., 2015, Lichtenthäler and Kirsch, 2016).
A common approach is to optimize the robot’s trajectory with regard to desired mo-
tion properties while preserving the start and end positions and/or configurations. The
quality of the motion is highly dependent on the optimization criteria (i.e., cost or reward
function), which can be generated in several ways. One method is to use human move-
ment data to extract a cost function directly using a technique such as inverse optimal
control (Mombaur et al., 2010). This also enables the robot to incorporate different levels
of social convention in its behavior (Kruse et al., 2012).
However, human-inspired motion does not always contain the desired properties. For
instance, human arm trajectories for goal-directed motion, such as grasping, often prior-
itize efficiency over intent expression. In these cases, it may be better to directly create
motion that emphasizes the target characteristic rather than attempting to replicate hu-
man motion behavior. In the previous example, legible or intent-expressive motion can
be generated by optimizing the probability of a motion-goal based on its trajectory from
an observer’s view (Dragan and Srinivasa, 2013, Dragan et al., 2013).
Other methods rely on a greater amount of human input, such as choreographing the
robot’s motions or directly controlling the robot’s trajectories (Knight and Simmons, 2014).
Although there are numerous benefits to having a human-in-the-loop, many applications
do not have an available human. Hence, recent work has emphasized the need for more
generalized methods that achieve the desired interaction outcomes while moving away
from the need for human input.
There are also additional challenges associated with using motion for communication
on non-humanoid robots. Many of the non-humanoid robots employed in this thesis pos-
sess only a few degrees-of-freedom and therefore lack the capability to generate highly
expressive motion. Moreover, altering the robot’s goal-directed motion may negatively
affect its task performance (e.g., increased completion time). Hence, utilizing these tech-
niques to mimic motion through lights or a visual display may offer more flexibility and
alleviate some of these concerns. As there has been little work in this area, more research
15
Figure 2.2: Examples of commercial robots that utilize integrated visual displays for interaction with humans.
is necessary to understand how other communication modalities can achieve similar ef-
fects.
2.3.2 Visual Displays
Visual displays have become increasingly popular for many robotic applications with the
rise of touchscreen platforms, such as smartphones and tablets. Visual displays are able
to utilize a wide range of signals with varying representational fidelities (Pousman and
Stasko, 2006). This can be extremely beneficial when expressing large amounts of infor-
mation, as these displays can utilize text and high resolution images. Signals generated on
simpler modalities, such as point light sources, can also be reproduced on visual displays,
making them extremely flexible.
However, these replicated signals are often less salient and require the human to be
in close proximity. More complex signals also require significant time and effort to inter-
pret, lacking the efficiency of human nonverbal behaviors. Displays are also costly, have
significant power and weight requirements, and are more fragile, making them poorly
suited to certain non-humanoid robot use cases.
Visual displays range in the amount of attention and interaction they are intended to
have with humans. Displays such as personal computer screens are designed such that
humans will directly interact with and focus the majority of their attention on the display.
Some devices utilize their display at different attention levels; many smartphones have
ambient notification systems which provide alerts when the user is not directly interacting
with the system.
16
Robots utilizing visual displays, such as the Savioke Relay or LoweBot, can enable
humans to interact and provide commands to the robot via the display, reducing the
chance of error from using speech or gesture. Since many non-humanoid robots are
designed to perform functional tasks, the inclusion of visual display for providing and
receiving task-relevant information can be advantageous. However, such displays are
not as well suited for providing information to bystanders or users while the robot is in
operation or mobile.
Ambient displays sit on the periphery of a human’s attention and do not involve di-
rect interaction between the human and system. Instead, they aim to present information
without distracting the person from their primary task. Since a significant advantage of
nonverbal communication is the efficiency with which it conveys information, prior work
in ambient displays can offer many relevant design insights for robot signal generation.
By utilizing images, color, shape, and other characteristics, ambient displays have been
able to convey a wide range of information, including weather forecasts, physical activ-
ity, and proximity (Fortmann et al., 2013, Hegarty et al., 2010, Ishii and Ullmer, 1997,
Vogel and Balakrishnan, 2004). Research on visual displays has also explored the effects
of particular signal characteristics on the interaction outcomes of human users and the
system.
Visual displays can be combined with other communication modalities to provide
greater detail or flexibility in certain scenarios. For instance, if a complicated error occurs,
a robot can utilize sound or motion to draw humans’ attention to its display. The display
can then provide information that is too detailed or difficult to express using more limited
nonverbal modalities. This approach enables the robot to utilize a more diverse set of
communication behaviors, while still providing higher levels of detail when necessary.
2.3.3 Lights
Lights are a promising visual mechanism that have been utilized on a wide range of me-
chanical devices, such as appliances, computers, cars, airplanes, and mobile phones (Har-
rison et al., 2012). On these devices, lights are primarily used as simple indicators. Mobile
phones indicate new notifications through the use of a single LED, while aircraft control
boards signify the state of different controls and equipment through color and whether
the LED is turned on.
17
Figure 2.3: Examples of fictional and commercial robots that use lights to provide state information.
Although light is simpler and carries less information content than other visual in-
dicators, such as text, it is easily discernible and can vary in salience (Hansson and
Ljungstrand, 2000, Matviienko et al., 2016). These properties enable light to be used
as a signaling mechanism across different notification and criticality levels. Several lights
can also be combined to increase information capacity and create different configurations
to fit a range of robot forms and applications.
Prior work in visual perception and psychophysics has shown that dynamic visual
cues can convey complex properties even in simple animations or light motions (Dittrich
and Lea, 1994). Moreover, humans attribute not only animacy, but intent to these simple
signals. Other work in lighting displays also shows that light can evoke complex social
responses, including emotion (Dittrich et al., 1996). Light colors already have strong
connotations and are utilized in everyday applications, such as traffic control.
Since non-humanoid robots are often more machine-like in appearance, LEDs are a
natural choice of signaling mechanism. Current non-humanoid robots, such as the Sphero
or iRobot Ava (Figure 2.3), often utilize LEDs of varying colors to indicate when the robot
is turned on or in different states (e.g., error). Autonomous vehicles are already equipped
with lights (i.e., brake lights, head lights, tail lights), which act as indicators of the robot’s
state.
Light signal design in the context of non-humanoid robots also requires consideration
of their diversity in form, animacy, and application. Roboticists have explored utilizing
light to convey motion intent, task state, and affect (Baraka et al., 2015, Rea et al., 2012,
Szafir et al., 2015). Many of these works draw design inspiration from common user ex-
periences with light signals (e.g., lighthouse, jet engine flames) (Baraka and Veloso, 2017,
Szafir et al., 2015). Other works have utilized light to reinforce another communication
18
mechanism, such as speech and facial expression (Funakoshi et al., 2008, Kobayashi et al.,
2011).
Another source to draw inspiration from are fictional robots. Some common examples
of robots in popular media that utilize light signals include Eve from Disney’s WALL-
E, Kitt from the television show Knight Rider, the Zeriods from Terrahawks, and the
mechanical Cylons from Battlestar Galactica. Some of these robots utilize continuous
variations in lights to give the appearance of animacy or life; the Cylons have a red light
that pulses in a cyclic pattern to mimic movement from one end of the robot’s head to the
other while the Zeriods have “eyes” with 3x3 grids of LEDs that turn on and off.
Despite the large number of commercial and research robots that employ LED lights,
few works have actually researched structured methods for generating light signals. The
approach of designing signals for individual products limits generalizability and lowers
usability of the platform. This problem can be exacerbated for non-humanoid robots as
their usage and embodiment vary significantly compared to other products (e.g., smart-
phones). The size of the design space of light also limits testing to a small set of signals
in constrained scenarios.
These issues indicate the need for greater research into general design principles that
can be applied across robot platforms and scenarios to enable generation of light signals
with little to no human input. Towards this goal, this thesis presents an exploratory user
study that investigates underlying human biases towards certain light signal parameter
manipulations. In addition, this study directly informs the communication framework,
as it enables us to construct a set of light signal policies with varying properties (e.g.,
urgency).
2.3.4 Sound
There are many ways to leverage sound as a mechanism for nonverbal communication.
Many robots generate noise that expresses information about their internal state while
operating (e.g. motor sounds when moving). Robots can also use explicit auditory cues,
ranging from simplistic nonverbal utterances (e.g., beep or chirp) to speech (Cha and
Matari´ c, 2016, Fischer et al., 2014, Martelaro et al., 2016, Read and Belpaeme, 2014b).
Although speech is a natural mechanism for human communication, it can be inefficient
for information easily conveyed by concise nonverbal signals. The usage of speech may
19
also raise expectations of the robot’s capabilities (Cha et al., 2015), and hearing a robot
continuously vocalize its state information may become tiresome or disruptive to nearby
humans (Kiesler, 2005).
Humans typically generate two types of auditory cues, differentiated by their source
and method of production. Internal cues, such as snoring, sighs, or breathing, are gener-
ated entirely by the human body. While certain sounds are involuntary and correspond
to a biological function in the body, others are a more conscious form of communica-
tion (e.g., groan). External auditory cues are produced by the physical interaction of a
human with elements in the environment, such as footsteps on the floor. These noises of-
ten provide functional, task-oriented information as they often result of from goal-driven
physical actions. Internally produced sounds, on the other hand, tend to convey informa-
tion about the human’s internal state, such as their mood.
A subset of these internal sounds are referred to as nonverbal utterances, or sounds
produced by the vocal track that do not form words or language. Some examples include
moans, sighs, grunts, and laughs. These sounds occur naturally in human communica-
tion and often rely on variations in human vocal dynamics. For functional non-humanoid
robots, especially those with more mechanical appearances, these cues may appear incon-
sistent with their embodiment and behavior.
Instead, non-humanoid robots can utilize non-linguistic utterance (NLUs), or synthetic
versions of these sounds, such as beeps, whistles, or chirps (Read and Belpaeme, 2014b).
These types of sounds are often utilized by fictional non-humanoid robots, such as Star
Wars’ R2D2 or WALL-E’s Mo, instead of speech. Typically, NLUs are individually gener-
ated by a human designer, making it difficult to create an extensive vocabulary of signals
to fit a larger range of scenarios.
Recent work has attempted to overcome this challenge by exploring how specific
acoustic properties of NLUs affect humans’ attribution of meaning to these cue (Read
and Belpaeme, 2012, 2014a). Preliminary results seem to indicate that interpretations
tend to be inconsistent and coarse in the absence of situational context (Read and Bel-
paeme, 2012, 2014a). This is unsurprising as these auditory cues are abstract and mostly
utilized in visual media, which provides context and reinforcement through other char-
acters’ reactions, music, and visual effects.
20
Many deployed non-humanoid robots instead utilize auditory signals similar to those
employed by machinery, appliances, and computing devices. These sounds often take
the form of alerts or alarms and usually signal predefined events. This approach works
well for robots that operate in a limited number of scenarios and only require a few
auditory cues. However, for more complex use cases, the number of abstract sounds
needed becomes greater than humans are capable of easily learning and distinguishing.
Additionally, many robots are envisioned to interact with naive humans who require
intuitive signals to appropriately react and respond to.
An alternative approach is to create signals that mimic natural sounds produced by
the robot’s operations (i.e., auditory icons). Many sounds produced by humans, such as
breathing or walking, are easily mapped to information about their internal state and are
hence, intuitive to listeners. Depending on the specific platform and environment, the
robot may already produce some of these cues. Since these unintentional cues can also
be inconsistent, even from the same robot, the regular use of auditory icons for signaling
can produce a more reliable experience.
Consistency in the use of auditory signals enables other cognitive processes such as
auditory localization. These signals can also be modulated to create a larger set of signals
to match the world state (i.e., state of the robot, human, and environment). For example,
the volume or intensity of a cue can be altered depending on the robot’s proximity to
humans. This lowers the amount of unnecessary noise, minimizes disruption, and creates
more intelligent, adaptive behaviors for the robot.
Other areas of work that provide insight into auditory signal design include warning
design, human factors, and psychophysics. Combining these signals with other cues such
as light has shown positive results in overcoming the abstract nature of sound to create
more understandable and readable signals (Cha and Matari´ c, 2016).
As prior work in auditory cues for robots has primarily targeted speech, more research
is needed in generating nonverbal auditory signals that support interaction for a larger
range of platforms and scenarios. Towards this goal, this thesis presents a user study
exploring the use of auditory icons and different sound variations for signaling during a
human-robot collaborative task. Results of this study are directly utilized to inform the
communication framework and its first application.
21
2.4 Algorithms for Intelligent Interaction
In this section, we review literature on computational methods for planning intelligent
behavior during interaction with humans. Since the field of HRI is still young, we also
draw from HCI and artificial intellgience (AI) research to inform our communication
framework. Past work in nonverbal communication for robots has primarily focused on
challenges relating to the design of communication signals.
However, planning a robot’s high level actions to intelligently employ these signals is
another important area of HRI research. Although the body of literature exploring this
research problem is still limited, the number of relevant works will continue to grow in
the coming years, especially as technology advances into human environments.
2.4.1 Modeling Human State
An important aspect of enabling robots to act intelligently is inferring a human interac-
tor’s state. Without this knowledge, the robot may cause annoyance, inefficiency, or other
negative outcomes for humans. Much of the prior work in this area has focused on mod-
eling a human’s attentional focus in order to establish joint attention between the human
and robot, a necessary component of human-robot collaboration.
One approach for doing so relies on the robot imagining the perspective of a human
collaborator. For instance, Breazeal et al., proposes an embodied cognition architecture
inspired by human ToM models to simulate the goals of a human collaborator (Breazeal
et al., 2009). Alternatively, a simpler approach is to learn models of the world, such as
activity models or object affordances, and to only simulate the relevant portion of the
human’s mental state (Kelley et al., 2008, 2012).
Other important features of human state which may affect signaling algorithms in-
clude the human’s availability and current activity. If the human is busy, then the robot
should adjust its signal and timing accordingly. Activity recognition models typically
rely heavily on computer vision and attempt to infer simple human actions or higher
level activities. An overview of these methods can be found in (Aggarwal and Ryoo,
2011).
22
Information obtained from activity recognition models can be integrated with other
methods to reason about information other than the human’s physical actions. For in-
stance, humans can have different styles while performing certain tasks, such as driving
or cooking, which can help robots to predict future actions. Projection of a human’s future
actions can aid robots in deciding when a signal should occur and predict the human’s
response.
An important component of a human’s state is their mental model of the robot. Under-
standing how humans perceive the robot enables it to communicate more effectively, an
essential process called grounding (Clark and Brennan, 1991). Since a significant amount
of observation and/or interaction is often necessary to accurately infer a human’s mental
state, recent research has looked into more active methods for speeding up this pro-
cess (Sadigh et al., 2016).
Once the human’s mental model of the robot is known, the robot can take actions to
influence this model. For instance, if the human has no knowledge of the robot’s capabil-
ities and limitations in relation to a certain type of task, the robot can provide knowledge
directly by communication or indirectly by demonstrating the task. Identifying the best
method for altering the human’s mental model can help form the robot’s communication
needs.
2.4.2 Mediating Communication
Past HRI research on nonverbal communication often focused on learning how commu-
nication behaviors affect specific interaction outcomes. Many of these works employed
user studies to learn the effects of specific nonverbal cues, such as gesture and gaze, on
objective measures, such as information recall, engagement, and persuasiveness, and on
subjective measures, such as perceptions of the robot (Admoni, 2016, Andrist et al., 2012,
Huang and Mutlu, 2013, Mutlu et al., 2009a, Sauppé and Mutlu, 2014). While these stud-
ies help inform robot signal design, researchers may find it challenging to directly employ
their findings in computational methods for planning robot communication.
These studies are conducted in highly controlled conditions that enable researchers to
isolate the effects of their manipulations but are unrealistic of the dynamic environments
robots will be deployed in. For robots to act intelligently, they must take into account
information about the world and be able to account for uncertainty, especially in human
23
interactors’ actions.. Methods which enable robots to handle non-deterministic scenarios
have been utilized for planning robot motion and actions, inferring about human inter-
nal state, and collaborating with human users. These approaches often utilize statistical
models which assume stochasticity and support the optimization of certain criteria, such
as task efficiency.
Another field that has heavily explored methods for regulating the flow of a sys-
tem’s communication is notification systems in HCI. One approach has been to create
systems that are able to present information without significant attention from the user
(i.e., ambient systems) (McCrickard et al., 2003b). Other works have focused on analyzing
and evaluating existing systems and devices in order to model critical parameters and
techniques for user notification. A popular framework for specifying the criticality of a
communication is “notification level” (Pousman and Stasko, 2006). These levels corre-
spond to the degree of attention required by the user and hence, can be utilized to help
determine the mode of communication.
A more computational approach, attention-sensitive alerting, aims to take into account
information about the user’s state when deciding how and when to present information.
The system infers their attentional focus and attempts to present information to the hu-
man in a way that provides the greatest utility (Horvitz et al., 1999). This requires the
system to consider both the cost of transmitting an alert (e.g., drawing attention away
from current task) as well as the utility of the information. This approach also enables
the system to defer communication to a point in time that is better for the user.
Prior works in this area typically assume simplistic cost functions or utilize methods
for estimating the cost that do not extend to more complex scenarios. Hence, a large focus
of our research is understanding how to formulate these functions in order to capture
the complex dynamics underlying communication and interaction with humans. Thus,
this work also draws inspiration from recent research that explores the design of cost
or reward functions for more intelligent robot behavior Dragan (2017), Hadfield-Menell
et al. (2017), Keizer et al. (2013), Nikolaidis and Shah (2013), Rosenthal and Veloso (2011).
Our work in this thesis seeks to combine these approaches to form a computational
framework for communication that models components of the world state (e.g., humans)
and optimizes for both the robot’s and human’s objectives. We propose treating com-
munication as a decision making problem that assume uncertainty in both the human
24
interactors’ state and their potential responses to a robot’s signal. This approach aims
to optimize both traditional task-oriented criteria and those identified through past user
studies in HRI, such as the human’s receptiveness to interaction.
2.5 Summary
In this chapter, we reviewed background and relevant prior work relating to nonverbal
signal design and algorithms for planning robot behavior. We discussed fundamental
concepts from communication, psychology, and cognitive science that form the foun-
dation of this thesis. Next, we reviewed literature relating to nonverbal signal design
and discussed how their findings can be applied to generating light and sound signals
for non-humanoid robots. Finally, we discussed methods for planning intelligent robot
behavior and how such techniques can be incorporated into our framework for robot
communication. In the next chapter, we propose our computational framework for robot
communication.
25
3. Framework for Robot Communication
This chapter proposes a computational framework for planning a robot’s communication
behavior during human-robot interaction. A key challenge is making the robot’s behavior
intelligent and considerate of the human interactor. This goal requires the robot to go
beyond static or reflexive responses and and utilize information it gathers about the world
when planning its actions. This process is especially critical for nonverbal signaling as
signal perception is affected by environmental conditions.
The goal of this framework is to formalize the problem of human-robot communi-
cation and enable systematic approaches to planning a robot’s communication policies.
The framework is flexible to different robot embodiments, applications, and communica-
tion scenarios and can be adapted to handle different levels of available information and
uncertainty.
We formulate human-robot communication as a decision-making problem. This for-
malism results in the foundation of the framework, a communication model that takes on
the form of a Markov Decision Process (MDP). The underlying assumption of this model
is the Markov property or that the current state in the model summarizes all essential
information (i.e., memoryless).
We also discuss optimization criteria for the model which define a robot’s desired
behavior. As designing reward functions for complex agents is challenging, our proposed
reward function is another component of the communication framework. Finally, we
discuss methods for solving the communication model to learn optimal signaling policies
for a robot.
26
Reward
State
Robot
(Agent)
World
Action
Human
Model
Environment
Model
Robot
Model
Figure 3.1: The framework formulates signaling as a decision making problem. Models of the human, robot,
and environment are included in the robot’s communication model. At each step, the robot executes a
communication action and receives a reward depending on its interaction with a human.
3.1 Robot Communication Model
The primary component of the proposed communication framework is the formulation
of the human-robot communcation problem a Markov Decision Process (MDP). The MDP
can be described by a tuplefS, A, T, Rg, where:
S is a finite set of world states modeling different configurations of the human,
robot, and environment. Since future states only depend on the current state
(and not past states), information about previous interactions are included into the
model’s state space.
A is a finite set of actions that the robot can take. In this model, the action space
consists of communication actions that are available to the robot. In particular, we
focus on nonverbal signaling behaviors, such as light and auditory signals, that
appearance-constrained, non-humanoid robots can employ.
27
T : S A!P(S) is the state-transition function that gives a probability distribution
over world states for each state and action. It is typically represented as P
a
(s,s
0
) or
the probability that a communication action a in state s will lead to a new state s
0
.
The transition function models the variability in human response to a the nonverbal
signal.
R : S A!R is the reward function. It is represented by R(s, a) or the expected
immediate reward for taking a communication action a in state s. In this work,
the reward includes both task and interaction outcomes to balance the robot’s and
human’s preferences.
The robot’s policy p is the assignment of a communication or signaling action p(s) to
every state s. The optimal policy p
(s) is the mapping of communication actions that
maximizes the expected reward. A discount factor g can also be used to balance im-
mediate and long term rewards. To obtain an optimal policy from this model, we must
solve the MDP . If the dynamics of the system are known, this problem can be solved with
typical planning methods such as dynamic programming. Therefore, we also discuss
approaches to solving the MDP in a following section.
3.2 Model Parameters
The communication model requires information about the world state to use to compute
a signaling policy. This information can be directly encoded into the state space of the
MDP . This approach can present several issues as the information may be too extensive
or complex to directly employ. Moreover, a larger state space can make solving for the
optimal policy infeasible due to the time and overhead involved with having a robot and
human interact.
We propose a more concise representation of the state space to keep the model
tractable Figure 3.1. We also discuss the other parameters of the MDP and how they
can be altered to fit other HRI scenarios.
3.2.1 States
A primary goal of this communication model is to employ the world state, including
information regarding the robot, human, and environment, when planning the robot’s
28
A
0
A
1
R
0
R
1
S
0
S
1
S
2
A
2
R
2
S
3
Reward
(task & interaction outcomes)
Action
(nonverbal signals)
Robot
Human
Environment
State
(information about the world)
S={s
R
, s
E
, s
H
}
A={a
L
, a
S
}
Sound
Light
r
H
r
R
R={r
R
, r
H
}
Figure 3.2: The Markov Decision Process (MDP) used to model the problem of nonverbal signaling to human
interactors.
communication actions. In this work, we assume a dyadic interaction with one robot and
one human interactor. However, the framework also supports multiparty interactions
with multiple human interactors which can enable the robot to specifically target certain
humans for communication, as shown in Rosenthal and Veloso (2011).
3.2.1.1 Robot State Variables
The goal of a robot’s communication is to convey information about its operations. Conse-
quently, determining the best signaling action requires consideration of the robot’s current
state and interaction needs. There are several approaches for modeling this information,
with varying levels of abstraction.
29
Potential variables relating to communication ((Cha et al., 2018)) can be imprecise,
with their values subject to the robot’s designer (e.g., urgency, risk). One approach is
to enumerate a finite set of informational states to be signaled by the robot when their
corresponding scenarios occur (e.g., low battery, turning on) (Baraka and Veloso, 2017,
Szafir et al., 2017). However, this method lacks flexibility as the signaling states are
usually quite specific and must be determined in advance for each platform.
Researchers have also found that nonverbal, non-anthropomorphic signals are not
particularly iconic. Humans may require training or prior exposure to correctly decode
them (Read and Belpaeme, 2014a). The abstractness of machine-oriented signaling modal-
ities suggests that higher level encodings are better suited for non-humanoid signaling.
Throughout our work, we explored various methods for representing this information.
Certain approaches with lower levels of abstraction were better suited to the controlled
settings of some of the applications or studies found in this thesis. For this model, how-
ever, we propose several broader approaches for describing a robot’s state.
The first approach employs broader categories of robot state (e.g., error). We
suggested one potential grouping of commonly communicated state features in Sec-
tion 10.3.2. Loose categorizations are unlikely to be exhaustive of all potential robot
states and may not be well suited for applications that require more precise information
to be transmitted. However, it is still flexible enough to fit many of the communication
needs that arise for several types of robots (e.g., motion intent).
.
Alternatively, we suggest employing features which are commonly used to define
communication signals, such as notifications and alerts for machines, computers, and
other products (McCrickard and Chewar, 2003, Pousman and Stasko, 2006). Such signal
characteristics are proposed in Cha et al. (2017) and include the desired effect of a signal
on a human interactor (e.g., Figure 3.3). Using these characteristics requires a method
that takes a larger space of features about the robot and reduces it to just a few variables
describing the robot’s communication needs.
In this thesis, we typically employ three signaling categories to describe the robot’s
signaling state: (1) notification, (2) warning, and (3) error. These categories are commonly
used on household devices, software applications, and other machinery. We define the
progression of these signaling states in regards to the robot’s need for interaction from
30
Action Response
Urgency
Necessity
Informative
(No Action Needed)
Benecial
(Action Benets Robot)
?
Urgent
(Requires Action Immediately)
Necessary
(Requires Action In Future)
Figure 3.3: The signal category is defined by the desired response from a human observer to a robot’s signal
and the urgency of the human’s action.
the human, similar to the categories in Figure 3.3. The main difference between these two
categorizations is that the warning state allows for more uncertainty since some situations
may resolve without the need for human intervention.
These categories also indicate there are escalating levels of a robot’s need to signal to
co-located humans. This is a natural property of communication as it used to convey a
wide range of information. Our framework takes this information into account as it can
be a useful variable when deciding if and when a communication action is necessary.
3.2.1.2 Human
The goal of the robot’s communication is to affect the human’s state in some way. At
the simplest level, the robot provides information to the human. However, in scenarios
where the robot can benefit from human intervention, its communication is designed to
elicit specific responses. Consideration of the human’s current state is essential to the
success of these interactions.
31
For instance, the human may be occupied (e.g., sleeping, holding delicate items) in
a way that does not allow them to provide the desired response. If the model only
considers the robot’s desires and continues to signal expecting the human to take action,
this not only causes inefficiency for the robot but annoyance to the human. A major
goal of this thesis is for the robot to balance its needs with a humans when planning its
communication.
Research has found that alerts or information transmission is greatly affected by a hu-
man’s cognitive load as well as their present activity (Boehm-Davis and Remington, 2009).
Accurately estimating cognitive load is often beyond a robot’s capability as it requires in-
formation which is not easily observable. Instead, we infer the human’s availability, or
whether they are engaged in another task.
Availability (also called interruptability) has been widely explored in the context of
HCI (Adamczyk and Bailey, 2004, Borst et al., 2015, Horvitz and Apacible, 2003, Rosenthal
et al., 2011, Turner et al., 2015). There is little research in which robots take this state
information into account, but recent work has shown that doing so can make robots more
effective in their operations (Rosenthal and Veloso, 2011). The human’s availability also
affects their receptiveness to the robot’s communication.
Receptiveness is affected by both the human’s current availability as well as past inter-
actions. Constant interruptions by the robot are likely to lower the human’s willingness
to interact with the robot in the future. As time passes, the effects of these interruptions
are lessened. In our application of the communication framework (Chapter 9), we define
this relationship mathematically.
Personal capabilities of the human interactor can also be integrated into the model to
increase the robot’s effectiveness. For instance, a human who has an impaired sense, such
as vision, should not be signaled with a low salience light. In this research, we employ
signals with different perceptual channels which can be used to overcome these percep-
tual deficits. Other factors, such as each human’s preferences can also be incorporated
into the human state to enable personalization of the robot’s behavior.
32
3.2.1.3 Environment
Since the perception of signals is greatly affected by environmental factors, such as am-
bient noise levels and distance, information about the environment is an important com-
ponent of the communication model’s state space. Factors that typically impact psy-
chophysics of nonverbal signals include:
Ambient Sound Levels- High noise levels in the ambient environment can mask au-
ditory signals, while low noise levels can increase the salience of a signal. Although
the content and regularity of the ambient sounds can also cause interference, real-
time auditory scene analysis is a complex process that is outside the scope of this
thesis. Humans naturally use this information when communicating to avoid vi-
olating social protocols (e.g., yelling in a library) or to increase their effectiveness
(e.g., waving to get attention in a loud environment).
Visibility- Representing the visual noise in the environment is a challenging task as
there are many factors and uncertainty is determining where a human is looking.
Humans also have trouble processing images that have significant visual distractors
or detail. Instead, we reduce the problem to visibility of the robot which in the
most basic measure of whether the human can see visual signals displayed by the
robot. More complex applications can also take into account other variables that
affect salience of a visual signal.
Distance- Psychophysics research shows that the perception of both auditory and
visual stimuli is affected by the distance between the human and signal source.
Therefore, if the robot is out of range of the human interactor, it should expect no
response or that it will need to modulate its signal appropriately.
3.2.2 Actions
The action space of this model represents the communication actions available to the
robot. In each world state, the robot aims to choose a signaling action which optimizes
its reward. While there is an almost infinite set of nonverbal signals that can be generated
using the modalities employed in this thesis (Harrison et al., 2012), the communication
model relies on a finite set of actions.
33
In addition to regular signaling actions, the robot can also choose to not communicate
with the human at all. If the human is not receptive to communication by the robot
and/or the robot’s signaling state is low in priority, it may be more worthwhile to delay
the communication or disregard it completely (Horvitz et al., 1999).
Constructing the robot’s action space requires insight into nonverbal signal design
as the space of potential signals is infinite. Moreover, the inclusion of a large number
of signals without insight into the effects of these signals makes evaluation challenging
and perhaps, impossible. Therefore, a major contribution of this thesis is the design and
validation of nonverbal signals which also informs the communication framework. This
is an essential component for applying the communication model in the real world.
3.2.3 Transition Function
Due to the numerous factors affecting signal perception and human response, this model
assumes an unknown state-transition function. There are two approaches to solving this
problem. The first is to learn a probability distribution over world states for each state
and action pair. The MDP can then be solved with model-based reinforcement learning
techniques. The amount of data required to learn a transition function may be infeasible
for many applications of HRI.
Another approach is to use model-free reinforcement learning techniques to directly
learn a policy without first learning a model of the environment. Model free methods,
such as Q-learning, only look at which actions maximize the reward function, making
them potentially more tractable for HRI.
3.2.4 Reward Function
A primary goal of this work is to enable the robot to balance both its own and a human’s
preferences and needs during communication. In a perfectly collaborative environment,
the human and robot have a shared goal and optimize the same reward function (Dragan,
2017). With current robot applications, this is often not the case as humans and robots
have differing priorities. Even amongst humans with the same goal, there is variance in
the reward functions they optimize (e.g., driving).
34
When planning its actions, the robot should take into account the human interactor’s
goals and preferences. This is an important goal of human-human interaction that en-
ables cooperation, collaboration, and teamwork. We propose a reward function with two
components: a term relating to the human’s preferences, r
H
, and a term relating to the
robot’s goals, r
R
.
In this thesis work, we assume that these components are typically contrasting and
can result in conflicting policies. The robot’s term strives to employ communication that
most benefits the robot, such as having the human always respond to request for help.
However, the human’s term may penalize these actions if they have to turn their attention
and time away from a task they are already engaged in or if their receptiveness to the
robot’s interruption is low. We will discuss potential optimization criteria and propose a
reward function for use in this work.
3.3 Optimization Criteria
The goals of the framework are encoded in the reward function that the robot attempts
to optimize when planning its communication actions. Previously, we proposed two
components of the reward function in the MDP model. For the studies performed in this
thesis, we focus on loosely collaborative scenarios in which the human and robot share a
higher level goal but have individual tasks and metrics for success.
This results in our assumption that the human interactor is more willing to help the
robot when there is no negative impact on their own task metrics. For the communication
model to be employed, there must be an observable event that can be used to calculate the
reward given to the robot at the end of each episode. Using these assumptions, we can
formulate the optimization criteria of this model as the sum of the robot’s and human’s
reward terms.
r= r
R
+ r
H
(3.1)
3.3.1 Robot Term Formalism
Since this work focuses on functional task-oriented robots, this optimization term is de-
fined relative to the robot’s success in communicating. From our definition of the robot’s
35
state, we assume that the robot has differing goals in terms of the human’s response
which can be used to define whether a communication action is successful. Such task-
oriented metrics are regularly used to define the performance of planning algorithms for
a robot’s functional behavior, and hence, are natural criteria for use in communication
planning.
A robot’s success when planning is often defined by two metrics, accuracy and time.
Applying a similar reward function during communication planning presents a challenge,
however, as it assumes that the robot’s communication directly affects completion of its
task (in a manner that can be observed). Instead, we propose employing metrics from
notification systems literature and defining the robot’s reward in terms of the expected
value of transmitting the communication alert (EVTA) (Horvitz et al., 1999).
r
R
= EVTA
R
= E[r
task
R
+ IG
R
]= r
task
R
+ E[IG
R
] (3.2)
EVTA
R
can be thought of as the expected utility of the communication and includes
two components. The first is a task-based term, r
task
R
, that encompasses the previously
mentioned planning metrics. The definition of this term varies based on the robot’s
application and how the robot’s success is defined. We assume that r
task
R
can also be
negative since there is a loss associated with the communication action being delayed,
not happening, or failing (i.e., the expected criticality). We can use the actual value of
r
task
R
if all effects on the robot’s task are easily observed.
The second term, IG
R
, relates to the information gain that occurs from the robot’s
communication action which may not have an observable output on the robot’s success.
For communication actions with longer term effects or no observable output, the use of
information gain as a reward term enables us to assign a value to these interaction. We
use the expected value of IG
R
since the true effect is unknown.
3.3.2 Human Term Formalism
The human’s optimization term should take into account the task the human is currently
working on as well as their preferences for interaction. Since the human and robot of-
ten have competing interests, we propose the human’s term to be the expected value of
transmitting the alert on the human. This enable us to incorporate the loss the human
36
experiences relative to their own task (e.g., time) into the overall reward function. Simi-
lar assumptions have been used in both notification systems and HRI literature (Admoni
et al., 2016, Horvitz et al., 1999).
r
H
= EVTA
H
= E[r
task
H
+ IG
H
]= E[r
task
H
]+ E[IG
H
] (3.3)
Previously, we assumed that when the robot takes a communication action r
task
R
is
typically positive because the robot benefits from the human’s response. For r
task
H
, we
assume the opposite in that the human’s task suffers from their switch in focus to the
robot. Prior work has defined this cost to be a sum of the time it takes for the human to
attend to the communication action and the time for the human’s response.
However, interacting with a robot often requires greater physical and cognitive effort
than responding to a computer alert. Higher effort tasks can require increased time and
energy to resume. Interruption in the middle of a task (versus a breakpoint) also makes
it more challenging to resume a task (Borst et al., 2015). To address these issues, we
propose employing time, effort, and the moment of interruption when calculating the
human’s loss, r
task
R
.
We also need to take into account humans’ preferences for a robot’s communication,
i.e., their receptiveness. Receptiveness is a complex factor to communicate and includes
information about their current availability (including the moment of interruption) as
well as past interactions. We propose formulating receptiveness h
r
as a summation of the
human’s state at each of the robot’s previous communications. We weight each term by
how far in the past the communication occurred since the human’s recent state has more
of an effect on their current willingness to interact.
h
r
=
n
å
i=1
e
t nt
i
5
a
i
(3.4)
In scenarios where the human and robot do not repeatedly interact, receptiveness
reduces to their current availability.
37
3.4 Generating a Policy
Two primary components make up the proposed communication framework. The first
component is a formalism of the communication problem as a Markov Decision Process
as described above. The second component is the generation of a communication policy
using the MDP model. In the previous section, we discussed methods for solving for a
policy which are characterized by whether they employ a model of the system dynamics
(i.e., model-based) or attempt to directly learn the action that yields the best value (i.e.,
model-free).
While there are some applications of HRI in which it might be possible to learn a
dynamics model, we assume that often this would require too much real interaction data
to be feasible. Consequently, we employ only model-free methods in this work. Popu-
lar model-free reinforcement learning (RL) methods include Bandit algorithms, temporal
difference (TD) learning (Q-learning, SARSA), and Monte Carlo methods.
Figure 3.4: Interactions between the human and robot are treated as episodes that start when the robot has
information necessary to communicate and ends after the communication action and subsequent response
(if any) occurs.
In this research, we treat the interaction between the human and robot as an episode.
This assumption enables us to employ Bandit algorithms and Monte Carlo RL methods
which use actual rewards when updating the robot’s policy. However, these methods
often require a large number of samples to converge on the optimal policy. We address
this issue when describing the final application of our framework (Chapter 9).
38
3.5 Summary
In this chapter, we presented a computational framework for planning a robot’s commu-
nication behaviors. First, a broad overview of the framework and its primary component,
an MDP model was presented. We described the model parameters including the state
space and optimization criteria in greater detail, dividing components of the model by
their relation to the human, robot, and environment. Lastly, we discussed methods for
solving the MDP model to generate policies for the robot’s communication. The frame-
work was described with a high level of abstraction as the purpose of this chapter is to
propose a generalizable method for planning robot communication. Detailed applications
of the framework are described in Chapter 7 and Chapter 9.
39
4. Design of Robot Light Signals
The focus of this chapter is the design of communicative light signals for robots to facili-
tate human-robot interaction (HRI). Towards this goal, we conducted an exploratory user
study investigating the effects of light signal design parameters on a robot’s expressive-
ness. From the findings of this study, we propose a light signal design tool in the form of
a decision tree as a first step towards systematic signal design.
4.1 Introduction
Recent work has suggested the use of light as an alternative communication modality
for non-humanoid robots, as LEDs are low in cost, highly visible, and versatile (Baraka
et al., 2015, Szafir, 2015). Since robots are complex machines, they require a high degree
of expressivity. Modulating basic parameters, such as color, LED position, duration,
and frequency, provides a large vocabulary of light signal behaviors that can be used
to communicate a wide range of information (Harrison et al., 2012).
Although light signals have been employed by many devices, including appliances,
automobiles, aircrafts, traffic signals, and smartphones, there are few guidelines or prin-
ciples to drive their design. As a result, many signals are simplistic and not particu-
larly iconic, requiring people to learn unique meanings for each light behavior and de-
vice (Harrison et al., 2012). As robots evolve, taking on different forms and purposes,
their signaling needs will continue to grow. Consequently, learning new signals will be-
come increasingly difficult, especially as humans reach the limitations of their perceptual
capability.
In this exploratory user study, we investigated the effects of manipulating light signal
parameters on a robot’s expressiveness. We assumed a robot possesses multiple LED
lights and considered color, light pattern, and pattern frequency. Since robots typically have
40
primary goals involving motion, the robot’s light signals were coupled with base motion
to mimic the realistic task-oriented settings robots will operate under. As implicit motion
parameters have been explored in past works, this study can also determine whether the
addition of a light signal affects previous HRI findings.
We focused primarily on perceptions of functional properties, rather than social or
affective ones, as non-humanoid robots are often employed for task-oriented applications.
It is important to note that these properties are general attributes (e.g., confidence), rather
than concrete meanings (e.g., low power, planning to move). Prior work has shown light
signals may be too abstract to iconically represent precise information without additional
training or context (Harrison et al., 2012), so the scope of this study was to determine the
effects of light signal design characteristics on these attributes.
Greater understanding of preexisting perceptions not only facilitates intuitive signal
design but can prevent design inconsistencies in which the robot inadvertently expresses
information that conflicts with its communication objectives. Understanding of these
biases also enables more structured signal generation methods that enhance consistency
in the robot’s signaling vocabulary. Towards these goals, we performed a video-based
user study, analyzed the effects of light signal and motion parameters, and present a first
step towards a systematic approach for generating robot light signals.
In this study, we employed a drone, or free-flying robot. Drones are a popular type
of non-humanoid robot and are increasing in usage for both personal and commercial
applications. Many of the scenarios that drones are being developed for will require
communication between the human and robot (Cauchard et al., 2015). As the field of
human-drone interaction is still young, understanding how humans perceive, react, and
respond to these robots is a major research challenge. This study also aimed to add to the
growing body of work in these areas.
The next section provides background on relevant past works in human-drone inter-
action and light signal design. We then present the details of our video-based user study
and its results. From these results, we propose a decision tree for light signal generation.
Finally, we discuss our findings and their implications for future HRI research.
41
4.2 Related Work
This work is informed by research in human-computer interaction (HCI) in which light
signals are used to convey a wide range of information with varying levels of detail and
salience (Hansson and Ljungstrand, 2000, Harrison et al., 2012, Matviienko et al., 2016).
This versatility enables light to be used as a signaling mechanism across a diverse set of
environments and scenarios, an important feature for non-humanoid robots. LEDs are
also easily accessible, can be utilized in higher numbers for increased information capac-
ity, and can be configured to fit different devices, including a range of non-humanoid
robot embodiments.
As such, light is a promising signaling modality that has been utilized on many types
of devices (e.g., smartphones, computers, aircrafts, robots) to signal a wide variety of in-
formation (e.g., power levels, notifications, faults) (Harrison et al., 2012). Many of these
products have employed light as simplistic indicators, directly mapping states to a small
set of predefined light signals (e.g., a color and a binary on/off cue). However, these
signals are often not intuitive or iconic and require humans to learn each by rote. Since
robots are complex agents with a large set of features in their internal space, they may re-
quire more state-expressive signals than humans can easily learn using this method (Cha
et al., 2018). Moreover, while some applications can train human interactors to recognize
signals (e.g., industrial), it is not feasible in many other environments (e.g., homes, hotels,
hospitals).
One approach for overcoming these issues is to create a set of basic but recognizable
light signals that can be modulated to create a larger signaling vocabulary. Designing
a set of fundamental light signals requires both knowledge of psychophysics and psy-
chology to understand the resulting mental and physical perceptions. Although simple,
humans have been shown to attribute complex properties, such as emotion and gender, to
light (Clarke et al., 2005, Dittrich and Lea, 1994, Dittrich et al., 1996). The common usage
of light in everyday signaling has also resulted in certain connotations which are likely to
affect human responses to robot light signals.
Despite the prevalence of lights for communication on non-humanoid robots, there is
limited research exploring the design space for robot light signals. In HRI, these works
42
have primarily looked at how to map certain internal state features, such as motion in-
tent, task state, and affect, to specific light signals (Baraka and Veloso, 2017, Szafir et al.,
2015). Several of these works have employed light signals that draw on common signaling
experiences, such as a car blinker (Baraka et al., 2015, 2016, Kim et al., 2008, Szafir et al.,
2015). While this approach offers promise for designing a fundamental set of light signals
that humans can easily recognize, more research is needed to understand existing human
biases towards certain light signaling parameters.
Towards these goals, we present a user study that investigates light signal design
parameters to support intuitive signal design and systematic modulation of signal pa-
rameters. This knowledge can also facilitate the creation of common light signaling con-
ventions across different non-humanoid robots and applications.
4.3 Experimental Design [S1]
To better identify how light signal design affects human attribution of desired properties
to a robot, we conducted an exploratory video-based user study varying parameters of
a robot’s light signal and motion (S1). We analyzed the effects of these variations on
participants’ perceptions of certain properties of the signal and robot. Both the light signal
parameters and the measured properties are derived from past work to better frame our
contribution.
The goal of this user study is to understand existing perceptions or biases to variations
in certain light signal parameters. Due to the limited research in this area, we chose to
explore several signal parameters and measured properties. Moreover since the field
is young, it is challenging to formulate well-informed hypotheses. This work hopes to
contribute towards a foundation that will allow for hypothesis testing in the future and
serves as a first step towards generating light signals is a more systematic way.
4.3.1 Robot
In this study, we utilized a Parrot AR drone. The robot is equipped with an array of
80 individually programmable Adafruit Neopixel RGBW LEDs embedded in a channel
carved into the outer foam hull. White filter sheets typically utilized in photography
applications were attached to the outer edge of the channel using black binding to match
43
the hull. This layer diffused the light from the LEDs, providing a wider viewing angle
and reducing the brightness when gazing directly at the light source.
Figure 4.1: The Parrot AR drone with an LED array used in this study.
4.3.2 Manipulated Variables
We manipulated five variables: three variables relating to the robot’s light signal (color,
pattern, and pattern frequency) and two variables relating to the robot’s motion (path and
speed). Both sets of manipulated variables were identified from prior work in HCI (Har-
rison et al., 2012, Pousman and Stasko, 2006) and HRI (Baraka et al., 2015, Dragan et al.,
2013, Szafir et al., 2015, Zhou et al., 2017).
Figure 4.2 includes a table of the resulting conditions. Motion parameters are shown
on the left side of the table with speed partially nested under motion path. Light signal
parameters are shown on the top and include color, pattern, and pattern frequency par-
tially nested under pattern. The cell coloring designates four separate analyses performed
on the resulting data.
4.3.2.1 Light Signal Parameters
Although there are many parameters that can be varied to create an infinite number of
light signals, we chose three common and fundamental characteristics, color, pattern, and
pattern frequency, to better understand how signals already in use affect human percep-
tions (Baraka et al., 2015, Harrison et al., 2012, Szafir et al., 2015). To better isolate their
effects, we keep these parameters consistent throughout the duration of the video (e.g.,
no color change or pattern acceleration).
44
Std Std Std Std Std
S F S F S F S F S F S F S F S F S F S F S F S F S F S F S F
None
S
F
S
F
Path
Speed
Pattern
Color
Blk Bcn Wp
Robot Motion
Approach Recede
Blk Bcn Wp Blk Bcn Wp Blk Bcn Wp Blk Bcn Wp
Light Signal
Red Green Blue White Orange
Analysis (1) Analysis (2) Analysis (3) Analysis (4)
Figure 4.2: The resulting conditions from the 5 manipulated variables in this study.
Color
Since color is a commonly employed design element when creating light signals, stan-
dards for its usage have emerged in certain fields, such as transportation. Traffic lights
across the world use a standard color code with green meaning go, yellow warning the
signal will change to red, and red indicating stop. Similarly, railroads and air traffic
control also utilize green and red lights to indicate go or stop.
For everyday products, these colors have been used to express slightly different mean-
ings (Harrison et al., 2012). In a smartphone, for instance, green indicates operating cor-
rectly or fully charged, red represents experiencing a problem or error, and yellow means
warning or low battery. In addition to these colors, we identified blue and white as com-
mon LED signal colors used for providing notifications and status related information.
Thus, we manipulated the light signal color to be: green, red, orange, blue, or white.
Orange was chosen over yellow as it was more easily distinguished from the other colors
on video. Green is shown in Figure 4.1 (right), while the remaining colors are shown
in Figure 4.3 (ordered top to bottom).
Pattern
Several past works have explored the design of a light signal pattern on both point light
sources and light arrays (Baraka et al., 2015, Harrison et al., 2012). The positioning, num-
ber of lights turned on and off, brightness, and timing can be varied to create different
patterns. Some patterns correspond to iconic symbols (e.g., heartbeat, progress bars),
45
Beacon Wipe Steady Blink
Figure 4.3: The four light pattern variations used in this study.
while others are more abstract (e.g., pulsing). Findings from these past works primarily
consisted of mapping specific meanings to certain light patterns.
As the number of light patterns that can be generated are infinite, we chose to inves-
tigate four more basic and frequently utilized patterns: steady, blink, beacon, and wipe, as
shown in Figure 4.3.
The first pattern, steady, is a constant fixed light signal in which all lights in the array
are turned on for the duration of the signal. Since many indicator lights utilize a simplistic
binary on cue, we chose to include this pattern to act as a baseline.
Blink consists of the entire LED array turning on and off at a constant frequency.
Blinking lights have already been employed for signaling device state, regulating speech,
indicating errors, and increasing salience (Funakoshi et al., 2008, Harrison et al., 2012).
Beacon consists of a set of 3 LEDs “moving” across the LED array (Harrison et al.,
2012, Szafir et al., 2015). At each time step, the last light at the end of the 3 LED set is
turned off and the next LED is turned on. This creates the illusion of the light moving
around the perimeter of the robot, similar to a lighthouse beacon, the lights used on the
head of fictional Cylon robots, and the throbbers used by computer programs to indicate
an action is in progress.
The last pattern, wipe, is visually similar to progress or loading bars used by computer
programs to indicate the degree of completion of a process. At each time step, the next
46
LED in the array is turned on until the entire strip is filled. The strip then turns off, and
this process repeats.
Pattern Frequency
Since the steady pattern is not dynamic, it does not have a pattern frequency. For the
other three patterns, we varied the frequency or the speed of the pattern repeating itself
to be either slow or fast. The fast condition was four times faster than the slow condition
for the blink, beacon, and wipe patterns. The time step between a new LED to be turned
on in the beacon and wipe patterns was 20 ms for the fast condition and 80 ms for the
slow condition.
We set the blink frequency closer to the overall time to complete an entire cycle of the
beacon and wipe patterns so participants would see approximately the same number of
cycles for each pattern. The fast blink condition waited 200 ms before switching between
on and off, while the slow blink condition waited 1200 ms.
4.3.2.2 Motion Parameters
We also manipulated two motion parameters: motion path and motion speed. These char-
acteristics were inspired by past work in HRI, as well as animation and virtual agents.
While these parameters have been examined both on a robot, as well as more abstractly,
few works have explored their effects in combination with light signals.
Motion Path
Motion path is defined as the direction of the robot’s base motion and has three levels, no
path, approaching, and receding. In the no path condition, the robot hovers at a fixed
altitude. Similar to the steady light pattern, this acts as a baseline to understand how the
addition of motion affects human perceptions.
In the approaching condition, the robot starts at the back of the room and moves in a
straight line path towards the video camera, stopping approximately 2 feet away. In the
receding condition, the robot mirrors this trajectory, starting approximately 2 feet away
from the camera and moving in a straight line path to the back of the room.
47
Approaching motions have already been shown to elicit strong directional biases and
trigger reflexive responses (Lewis and McBeath, 2004). Although the robot’s path is un-
likely to be significantly altered (e.g., different start or end points) for signaling purposes,
it is still important to understand how the path affects the perception of a signal. For
that reason, we made the robot’s motion egocentric, or defined relative to the viewer, by
placing the video camera at the start or end point of the robot’s trajectory (Szafir et al.,
2015).
Motion Speed
We varied the robot’s motion speed to be either slow or fast. The robot reaches a maximum
velocity of 0.5
m
sec
in the fast condition and a maximum velocity of 0.15
m
sec
in the slow. The
fast velocity was chosen such that a cycle of each light pattern could be completed during
the robot’s trajectory. The slow velocity was chosen to be approximately four times slower
than the fast velocity in order to match variations in pattern frequency.
4.3.3 Participant Allocation
From the five manipulated variables, we generated a total of 175 video conditions, as
shown in Figure 4.2. Certain factors had levels that could not be crossed with all other
factors: the baseline conditions of no motion had neither a path nor a speed, and the
steady light pattern did not have a pattern frequency.
We wanted to utilize a within-subjects design to enable participants to see multiple
light signals. However, with the high number of conditions, it was infeasible for every
participant to rate each video. Instead, we randomly assigned each participant 7 video
conditions, with a total of 18 participants per video condition (Zhou et al., 2017).
We recruited participants using the Amazon Mechanical Turk platform. Participants
were located in the United States, at least 18 years of age, and had an approval rating
of at least 95%. A total of 450 participants (48% male and 52% female, median age=36)
participated in the study.
4.3.4 Dependent Measures
To assess how the light signal and motion parameters affect human observer attribu-
tion of desired task-related properties, we identified 8 properties as dependent measures:
48
urgency, error, and difficulty to ignore in terms of notification and intentionality, safety, nat-
uralness, confidence, and competence in relation to the robot.
4.3.4.1 Notification Attributes
The notification attributes as well as their measurement scales were identified from no-
tification systems literature (Pousman and Stasko, 2006). As we wanted to focus on the
absence or presence of these specific properties, we used the unipolar, 5-point Likert scale
ranging from “Not at all” to “Extremely”.
Urgency
Urgency is defined as the need for immediate attention. Notification and alarm systems
often explore how to convey varying levels of urgency in order to elicit different levels of
attention and/or action from the user.
Error
Since robots are embodied agents that can physically interact with the world, we chose to
also utilize the attribute, error, or the degree at which the robot is operating or functioning
correctly. Errors can impact human safety and typically correspond to the most serious
alert levels in notification or alarm systems.
Difficulty to Ignore
The last attribute we identified from notification systems literature is difficulty to ignore.
This corresponds to the salience or how “attention-grabbing” a signal is. We chose to
utilize the term as it also describes a human’s ability to divide or shift their attention to
other items in the environment, a key design criteria for notification and ambient systems.
A short pre-study pilot also showed potential respondents may have greater familiarity
with the term “difficult to ignore” over “salience.”
4.3.4.2 Robot Attributes
Unlike the previous attributes which primarily concern perceptions of the signal, these
attributes relate to perceptions of the robot. HRI research has explored the effects of
49
robot appearance, speech, motion, and behavior (e.g., apologies, mistakes) on these at-
tributes (Cha et al., 2015, Goetz et al., 2003, Lee et al., 2010, Zhou et al., 2017).
These properties were measured using a bipolar, 7-point Likert scale, pairing each
attribute with a contrasting quality. For example, one scale ranged from “Extremely
unintentional” to “Extremely intentional” with a middle point of “Neither intentional
nor unintentional’.’ This enabled us to measure perceptions of both qualities and contrast
our results with previous findings.
Intentionality
“Intentional” or “Unintentional” indicates the degree users perceive the robot to be in
control of its actions. This relates to safety, trust, and the robot’s capability. Intentionality
can also be manipulated to draw a person’s attention to and away from certain objects in
the world (Hoffman et al., 2006, Terada and Ito, 2010).
Safety
Safety is a primary concern for robots, as their ability to physically interact with the
environment can create dangerous situations (Lasota et al., 2017). The attribute “Safe”
was contrasted with “Dangerous” to indicate whether participants perceived that the
robot could cause harm to others.
In robotics, safety has been explored mostly in relation to robot motion (Lasota et al.,
2017). In the past, robots often utilized functional motion that achieved a task goal but
was ambiguous to users; more recent research has focused on expressive robot behaviors
that improve perceived safety. Hence, reinforcing safety through expressive signaling is
an important goal for HRI (Dragan et al., 2013, Zhou et al., 2017).
Naturalness
Designing robots that can interact with humans in natural and fluid ways is especially
difficult for robots that are non-humanoid, as they typically have more machine-like ap-
pearances. While “Natural” or “Unnatural” can apply to many aspects of the robot (e.g.,
appearance, behavior, movement), all are considered to impact human’s ability to fluidly
interact with the robot (Dragan et al., 2013, Zhou et al., 2017).
50
Confidence
Confidence, or the degree of assurance in one’s own abilities, is known to impact trust.
“Confident” was contrasted with “Unsure” to measure participants’ perceptions of the
robot’s belief in its own capability. Understanding the robot’s confidence also enables
co-located humans to make informed decisions about their own actions concerning the
robot (Zhou et al., 2017).
Competence
Competence indicates the oberver’s belief in the robot’s abilities. Similar to confidence,
perceiving the robot to be “Competent” or “Incompetent” will affect the oberver’s trust.
A human who doubts the robot’s capabilities may have more apprehension towards in-
teracting or even sharing space with the robot (Cha et al., 2015).
4.4 Analysis
We performed a multivariate analysis on the data and found that the items used as depen-
dent measures were not highly correlated, except for confidence and competence (Cron-
bach’s a= 0.5952). We will discuss the results of these two measures together and the
remaining items separately.
Due to the unbalanced nature of study, we performed multiple sets of analyses look-
ing at subsets of our nested data. First, we analyzed the results of our fully factorial
manipulations, as shown by green boxes in Figure 4.2 (120 conditions): (1) light signal
color (green, red, orange, blue, white), light signal pattern (blink, beacon, wipe), light sig-
nal pattern frequency (slow, fast), motion path (forward, backward), and motion speed
(slow, fast).
We then analyzed the remaining data looking at the conditions with the steady light
signal pattern or no motion path, which resulted in an additional three analyses: (2)
steady light signal (20 conditions, pink boxes in Figure 4.2), (3) no motion path (30 con-
ditions, orange boxes in Figure 4.2), and (4) steady light signal and no motion path (5
conditions, blue boxes in Figure 4.2).
51
Figure 4.4: Analysis (1) results: checks indicate a significant mean effect for the manipulated variable (col-
umn) and dependent measure (row).
For each subset of data, we utilized a factorial repeated measures analysis of variance
(ANOVA). We present a subset of these analyses, highlighting results that showed statis-
tical significance, were surprising, or provided other interesting insights. For readability,
we will refer to each of these subsets by the bolded numbers used above.
4.4.1 Urgency
In analysis (1), we found a significant main effect for light signal color (F(3,2159) =
5.24, p< 0.001), pattern (F(2,2159)= 18.1, p< 0.001), and pattern frequency (F(,12159)=
71.87, p< 0.001), as well as motion direction (F(1,2159) = 9.84, p = 0.002) and speed
(F(1,2159)= 68.87, p< 0.001). We also found an interaction effect between the light signal
pattern and its frequency (F(2,2159)= 6.64, p= 0.001).
Perceptions of urgency have been well explored for alarm and alert signal de-
sign (Guillaume et al., 2002). Much of this past work has focused on acoustical prop-
erties, and several works have shown that the frequency of auditory signals (e.g., beep-
ing) strongly affects perceptions of urgency (Arrabito et al., 2004). Hence, we expected
that increasing the frequency of the light signal pattern or the speed of the robot’s mo-
tion would have similar effects on urgency, which was confirmed. We did find, how-
ever, that while light pattern frequency was also found to have a significant effect in (3)
(F(1,539)= 19.25, p< 0.001), motion speed did not in (2).
There are few works exploring the effects of color on the perception of urgency (Lewis
and Baldwin, 2012), with most utilizing color in combination with another communica-
tion modality (e.g., text, sound). We expected that colors typically utilized in warning
signals (e.g., red and orange) would be perceived as more urgent. A post hoc analysis
using Tukey HSD revealed, however, that only the red and green colors were significantly
52
1
2
3
4
5
Green Blue White Orange Red
Average Likert Rating
Urgency Difficulty to Ignore Error
1
2
3
4
5
Blink Beacon Wipe
1
2
3
4
5
Slow Fast
(a) Color
(b) Pattern (c) Pattern Frequency
Figure 4.5: The average Likert ratings (including 95% confidence intervals) for the urgency, difficult to ignore,
and error from Analysis (1).
different, with red being perceived as the most urgent and green being perceived as the
least. We also found similar results from analyses (2) and (3), suggesting that color alone
does not convey urgency.
We also expected that patterns with greater variations at each time step would convey
greater urgency. Auditory signals show similar effects, as sounds with large changes
in intensity or pitch were found to be more suitable for high intensity alerts (Arrabito
et al., 2004). This was confirmed in the post hoc analysis for (1); we found that all three
patterns, blink, beacon, and wipe, were rated as significantly different. Blink was rated
most urgent, while wipe was rated the least.
As the robot’s overall goal is ambiguous in the video, we did not expect that the its
motion path would affect perceptions of urgency. However, approaching motions towards
the camera were rated as significantly more urgent than receding motion. This could be
due to participants framing the term urgency in regards to themselves. The effect size of
the motion path was the smallest (h
2
p
= 0.005) of our manipulated variables.
For the interaction effect found between the light signal pattern and frequency in (1),
the post hoc analysis showed that the conditions with blink and beacon patterns at the
faster frequency had significantly higher ratings than the other conditions. Conditions
with the fast pattern frequency were rated on average higher than those with the slow;
within these groups, the blink pattern was rated the most urgent and wipe the least. This
suggests that the pattern frequency (h
2
p
= 0.018) has a stronger effect than the pattern
(h
2
p
= 0.035), which was confirmed by calculating the effect sizes.
53
4.4.2 Error
As high urgency scenarios are often associated with off-nominal conditions, such as er-
rors, we expected similar findings for error and urgency. However, we found significant
effects for only a subset of the dependent measures. In analysis (1), a significant main
effect was found for light signal color (F(3,2159) = 5.85, p< 0.001) and motion speed
(F(1,2159)= 29.85, p< 0.001).
We expected that participants would draw on their experiences with colors used to
signify error (e.g., red stop signs, emergency stop buttons, alarm lights) and colors used
to signify normal or success. A post hoc analysis showed significant differences in the
conditions with red and orange lights compared to conditions with green and blue lights.
Surprisingly, white light was not found to be significantly different than any other color.
As white is often a default light color, these results suggest that participants may have a
more neutral or diverse perception towards its usage.
We also found that in the absence of motion (analyses (3) and (4)) no significant effect
was found for color. This was surprising as static red or orange lights, symbols, and signs
are often found in everyday life. As many of these instances combine color with a sym-
bol, sound, or text, people may expect the signal to be less ambiguous when conveying
important information such as errors (e.g., stop sign).
Surprisingly, we did not find any effect for signal pattern or frequency. We expected
that light patterns with some variation, such as those resembling a police beacon or blink-
ing fire alarm, would have a significant effect. The mean results showed that beacon was
the highest rated pattern for error and wipe was the lowest. The fast pattern frequency
was also rated higher than the slow.
While the motion path did not have any significant effect, we found that receding mo-
tion was rated higher than the approaching motion, in contrast to the ratings of urgency.
The faster motion speed was rated significantly higher than slow, which is consistent with
the results for urgency and the findings from past work.
4.4.3 Difficulty to Ignore
For the third measure, we expected that signals and motions with larger variations in
their parameters over time will be more salient and thus, more difficult to ignore. This
was confirmed in analysis (1), as we found a significant main effect for light signal color
54
(F(3,2159)= 3.11, p= 0.015), pattern (F(2,2159)= 9.8, p< 0.001), and pattern frequency
(F(1,2159) = 11.26, p = 0.001), as well as motion direction (F(1,2159) = 6.99, p = 0.008)
and speed (F(1,2159)= 10.52, p= 0.001). We also found interaction effects between the
light signal color and pattern (F(8,2159)= 2.42, p= 0.013), and motion path and speed
(F(2,2159)= 5.15, p= 0.023).
Color was not shown to be significant in analyses (2), (3), and (4), indicating the
weakness of color alone without motion or a light signal pattern for salience. The post
hoc analysis revealed that red and green were rated significantly differently than the
other colors. Since past work has not found that color alone has a significant effect on
salience (Wool et al., 2015), this could suggest that participants rated light colors com-
monly utilized by other signals (e.g., traffic lights) higher. This could be because they are
more trained to notice these signals or it could be an effect of utilizing videos rather than
in-person interactions.
We hypothesized that a higher number of LEDs changing over a set period of time
would cause more apparent motion and increase saliency (Cohn, 1998). Hence, we ex-
pected that the beacon pattern would be only slightly more salient than the wipe pattern
but that both patterns would be less difficult to ignore than the blink pattern. This was
confirmed by the the post hoc analysis for (1) which showed that the blink pattern was
rated significantly higher than the beacon and wipe patterns. Surprisingly, a similar re-
sult was not seen in (3), as pattern was not found to have a significant effect. A signficiant
mean effect was also found for pattern frequency, with the faster frequency rated more
difficult to ignore.
Both motion parameters, path and speed, were found to have significant effects. We
expected that participants would have more difficulty ignoring the approaching motion
as the robot moves in the direction of the camera and ends within close proximity. This
was confirmed suggesting that motion biases found in prior works also apply to robot
motion as well (Dautenhahn et al., 2006, Lewis and McBeath, 2004), but not in the absence
of a dynamic light signal pattern as shown by (2). We also found that the faster motion
was rated significantly more difficult to ignore.
We also found an interaction effect between the motion path and speed, with condi-
tions including fast motion rated higher. This was contrary to our expectations that the
approaching motion would be rated as more difficult to ignore regardless of the robot’s
55
speed. However, this is promising for robot designers as robot path is often fixed due to
its functional goals, but speed can be more easily altered.
4.4.4 Intentionality
Intentionality is a property often desired for robot motion and has been a common goal
of past work. From these works, we expected that both motion parameters would have
a significant mean effect on intentionality but that the light signal parameters would not.
We found in analysis (1) that motion path (F(1,2159)= 13.40, p< 0.001), motion speed
(F(1,2159)= 8.08, p= 0.005), and the light signal pattern (F(2,2159)= 4.72, p= 0.009) had
significant effects.
Although we did not expect light pattern to have a significant effect, the results
showed that the blink pattern was rated significantly more intentional than the beacon
pattern. One explanation is that humans associate the increased salience and effort of
the blinking with greater intentionality. However, this is contradicted by participants rat-
ing the wipe pattern more highly than the beacon. Unfortunately, there are few works
that have explored this concept so more research is needed to better understand these
findings.
Most prior works that utilize robot motion to convey intentionality alter the robot’s
trajectory but keep the start and goal points constant. However, we anticipated that
participants would perceive the approaching motion as more intentional since the robot
ends its motion in front of the camera at a distance a person interacting with the viewer
would stand; this was confirmed in (1).
We also found that participants rated the slower moving robot as significantly more
intentional than the faster robot. This effect is theorized to be due to the perception that
the slower moving robot has more time to think and take action and is therefore, more
deliberate (Caruso et al., 2016). Surprisingly, this effect was not seen without a dynamic
light signal (2).
4.4.5 Safety
Due to the strong usage of the color red to indicate danger, we anticipated that color
would show a significant effect (F(1,2159)= 11.26, p< 0.001) on safety ratings. The post
hoc analysis showed that green and blue were rated significantly safer than white, orange,
56
and red. While the existence of two significant groups is unsurprising, we expected
ratings for the white color would be closer to those of green and blue. Instead, we found
that participants only rated the white color slightly more safe than orange.
Safety was the only dependent measure which showed a significant effect for color
in (4). This suggests that the use of colors for any signal should be highly controlled
when considering perceived safety. While the light signal pattern and frequency were
not found to have a significant effect, an interaction effect was found between the color,
pattern, and frequency of the light signal (F(8,2159)= 2.43, p= 0.013).
Past works found that higher motion speeds result in greater feelings of anxiety and
discomfort (Bartneck et al., 2009, Kuli´ c and Croft, 2007). Our findings confirmed these
results as participants rated the faster moving robot significantly less safe (F(1,2159)=
23.74, p< 0.001) in (1). Due to the potential for collision in the approaching motion, we
also expected a significant effect for motion path. Although the receding motion was
rated on average safer, a significant effect was not found.
4.4.6 Naturalness
We did not find a significant effect for any of the light signal characteristics in (1) on
naturalness. To better understand human perceptions, we analyzed the mean ratings. We
found that white was rated as the most natural color, followed by green, blue, orange, and
red. While the dichotomy between red/orange and green/blue was seen in the previous
measures, we were surprised to find white was rated the most natural since for others
measures, it was often rated in the middle.
We were also surprised that the blink pattern was rated as more natural than the wipe
and beacon. We expected that greater apparent motion would cause participants to rate
the wipe as the most natural and the blink as the least. The slower pattern frequency was
rated as more natural which is opposite of the results for motion speed. This could be due
to the increased salience of the machine-like light signal. Since few works have explored
naturalness with light signaling, more work is needed to understand these results.
Naturalness is a common concern with generated robot motion so many works have
attempted to create motion which is more natural and human-like (Dragan et al., 2013,
Matsui et al., 2005, Zhou et al., 2017). In line with the results shown in past works, (1)
found that the robot’s motion speed had a marginal significant effect on perceptions
57
of naturalness (F(1,2159)= 3.85, p= 0.05). We anticipated that due to robot’s slowness
compared to humans, the faster speed would be perceived as more natural. Although
motion path has been shown to affect human perception of the robot’s naturalness and
predictability, we did not expect nor see a significant effect for motion path, as the robot
utilized straight-line trajectories (Dragan et al., 2013).
4.4.7 Confidence and Competence
Since confidence and competence were found to be correlated (r = 0.731), a combined
analysis is presented. Results showed significant effects for all three light signal param-
eters for both measures. Since past works exploring the effects of color on psychological
properties show mixed results, we were surprised to find significant effects for confidence
(F(4,2159)= 2.94, p= 0.019) and competence (F(4,2159)= 4.9, p= 0.001). The post hoc
analyses presented even more confusion as orange was rated the least confident and com-
petent (significantly less so than blue). One possible explanation is that orange is often
associated with warnings, which may have greater uncertainty. This is seen in traffic
signals as many people have trouble deciding whether to go during an orange/yellow
signal.
In past works, higher motion speed was shown to increase perceptions of confidence
and competence of the robot (Zhou et al., 2017). These findings were consistent with our
results, as the faster robot motion was rated as significantly more confident (F(1,2159)=
7.31, p= 0.007) and competent (F(1,2159)= 13.57, p< 0.001) than the slower motion in (1).
A significant effect was also found for the motion direction, with the approaching
motion rated as significantly more confident (F(1,2159)= 40.99, p< 0.001) and compe-
tent (F(1,2159)= 12.6, p<= 0.001). The approaching motion may suggest that the robot
is coming to interact with the viewer, which may be perceived as more confident than
receding or “running away.”
4.5 Light Signal Design Decision Tree
As a first step towards a more systematic approach for utilizing these results, we propose
a decision tree for signal generation based on common parameters found in notification
58
Information
Notication Warning Error
Not Urgent Urgent Not Urgent Urgent Not Urgent Urgent
Steady Wipe, Slow Beacon, Slow Blink, Slow Beacon, Fast Blink, Fast
Figure 4.6: A decision tree for light signal generation using the signal parameters investigated in the ex-
ploratory user study presented in Section 4.3.
systems literature (McCrickard et al., 2003b, Pousman and Stasko, 2006). We present this
decision tree below (Figure 4.6).
Since the focus of this chapter is light signal design and motion acts as an implicit
cue, only light signal parameters are included in the decision tree. We employed the
ID3 decision tree algorithm for each of the three notification attributes: urgency, error,
and difficulty to ignore (i.e., salience). The robot attributes were not employed in the
decision tree as their results were mixed and not as easily related with the signal design
parameters used to split the tree.
4.5.1 Signal Design Parameters
The decision tree includes two signal design parameters: information type and urgency.
Our results from the previous study in light signal design showed that these signals are
better suited for conveying broader types of information rather than precise meanings.
Therefore, our aim in the creation of this decision tree was to create a general set of signals
well suited for many HRI scenarios.
The signal type is defined as a notification, warning, or error. Notification is defined
as information from the robot that may be beneficial for the human. Warnings and errors
indicate off-nominal states with errors preventing successful operation of the robot. Ur-
gency only had two levels, with the assumption that signals that are not urgent requires
less attention from a human observer.
59
4.5.2 Notification Signal Design
In the statistical results of the exploratory study, we tended to observe two distinct color
groupings: 1) red, orange and 2) green, blue, and white. Green is often used to indicate
that a system is functioning correctly, while red and orange are used to indicate potential
issues (i.e., faults). Since notifications are a neutral signal type for relaying information,
orange or red were ruled out for these signals.
Our results showed that the green color was actually rated as more salient than blue or
white. Comments from participants also indicated that green might inadvertently express
a positive state (e.g., operating correctly). Therefore, we chose to employ blue which was
rated lower in salience and higher in error.
Non-urgent notification signals are meant for information that is not critical to the
human or constrained by time. Therefore, it requires a low level of awareness or attention
from the human interactor and can in many cases be ignored, similar to the lowest notifi-
cation levels employed by notification systems research for HCI (McCrickard et al., 2003b,
Pousman and Stasko, 2006). The steady light signal was rated the lowest in salience,
urgency, and error, making it the best suited for this signal type and urgency.
Since the blink and beacon patterns were rated higher for salience and error than the
wipe pattern, the wipe pattern was chosen to represent urgent notifications. Although
the pattern frequency was shown to provide the highest information gain for urgency
when employing the ID3 algorithm, we chose the slower frequency wipe pattern because
notifications are low priority information, and the faster frequency would draw too much
attention to the signal.
4.5.3 Warning Signal Design
Orange and yellow colors are often utilized for warnings (e.g., low battery for smart-
phones, road signage) to signal a possible or impending error or unwanted condition
(e.g., smartphone turns off). Due to its common usage and strong connotations, the or-
ange color is utilized for all warning signals. We assume based on prior work in literature
that yellow can be used in place of orange to create greater distinction between the three
colors in the decision tree.
We assume in this work that all warnings require some level of human attention,
ranging from watching the robot to see if the warning resolves on its own to taking
60
actions to prevent a further error. This suggests that higher saliency signals should be
used for warnings. The beacon and blink signals were rated the highest in saliency and
error, making them well suited for warnings.
Since the pattern frequency had the largest effect on the perceptions of urgency, we
chose to employ only the slower frequency for warning signals which are by definition
less critical than error signals. We also distinguish warnings from error by color as the
ID3 algorithms found color to provide the highest information gain for ratings of error
and salience.
4.5.4 Error Signal Design
As red signals were rated significantly higher for urgency, error, and salience, red was
employed for both types of error signals. Similar to the warning signal design, beacon
and blink were chosen for their higher ratings of error and saliency. We assumed that
all errors require higher levels of human awareness and consequently employed only the
faster pattern frequency for error signals.
4.6 Discussion
Past work has shown that humans have existing perceptions of both light signal design
parameters as well as robot motion characteristics. As these perceptions are not well
understood, especially in the context of HRI, the goal of this chapter was to explore the
design space of robot light signals through an exploratory user study.
In this study, situated within the context of human-drone interaction, we explored
several light signal parameters (color, pattern, pattern frequency). We combined light
signals displayed on the LED array of a drone with basic robot motions (approaching,
retreating). We also varied motion speed to compare the results of this research against
existing work in HRI.
Using these parameterizations, 175 videos of a drone employing each light signal and
motion condition were created. Participants rated a subset of these videos on Amazon
Mechanical Turk. Participants were shown a subset of the total videos and asked to rate
61
both notification attributes (urgency, difficulty to ignore, error) and robot attributes (in-
tentionality, safety, naturalness, confidence, competence). These attributes were identified
from prior work in notification systems and HRI.
We expected to see significant effects (i.e., biases) for both types of attributes. For
notification attributes, we found that that all light signal parameter variations significantly
affected urgency and difficulty to ignore. Surprisingly, error was only affected by the color
of the signal and speed of the robot’s motion. One reason this may be the case was the
“smoothness” of the patterns and frequencies utilized; we suspect that irregularity may
be needed to indicate error, or something out of the ordinary. Since red is commonly used
to denote error, color is a clear indicator for errors and warnings.
For the robot attributes from HRI, the results were more mixed. In the multivariate
analysis, a high correlation was found between two of the dependent measures, compe-
tence and confidence. These two attributes were also found to be significantly affected
by all of the varied parameters. However, the remaining attributes were only affected by
a subset of the parameters. Only robot speed was found to be significant across all at-
tributes, confirming past results in HRI. Naturalness was not affected by any of the other
parameters; we believe this is due to the inherent unnaturalness of a drone and LEDs.
In our post hoc analyses, we also found that only certain levels of each manipulated
variable were found to be significantly different. This was most often seen with color,
which was frequently treated as a binary variable (green, blue, white or red, orange). We
were also surprised to find that increasing the robot’s motion speed and the frequency of
the light signal pattern resulted in opposite effects on perceptions of robot properties.
These findings indicate that there are existing biases for humans towards certain light
signals and that these biases must be taken into account when designing signaling behav-
iors. As a first step towards creating a structured approach for signal design with these
perceptions in mind, we proposed a decision tree for generating light signal behaviors.
These results also provide greater understanding of what existing signals convey and
add to the young field of human-drone interaction. Despite these promising results, this
work is only a first step towards generating expressive robot light signal behaviors. As
there are many more parameters for affecting light signal design, more investigation is
needed to thoroughly explore the design space. Perceptual effects on light also requires
62
that these results be confirmed through in-person user studies, but these findings help to
narrow down the space of signals for evaluation.
4.7 Summary
In this chapter, we explored the design of communicative light signals for human-robot
interaction. An overarching goal of this thesis is to enable robots to effectively commu-
nicate internal state information using expressive nonverbal signals. We chose to employ
light as a modality for its versatility, robustness, and common usage across products. The
work presented in this chapter is a first approach towards understanding how to leverage
existing perceptions of light signals to design more intuitive and iconic signals. Under-
standing the effects of these variations is a necessary step towards modulating signals to
create a larger vocabulary without the need for a human designer.
We also presented a first step towards creating a systematic approach for generating
new light signals that can convey different state-related properties. The findings from this
chapter directly informs our communication framework as its MDP model requires a set
of carefully designed nonverbal signals to make up its action space. In the next chapter,
we will apply the results of this workk to several scenarios of HRI in a video-based
experiment.
63
5. Application of Robot Light Signals
This chapter presents an application of the results of the previous study in designing light
signaling behaviors for non-humanoid robots (Chapter 4). The goal of this applications is
to validate our findings with different types of appearance-constrained robots and their
corresponding applications in a video-based experiment. This work also demonstrates the
generalizability of the proposed decision tree from Chapter 4 to different HRI scenarios.
5.1 Introduction
In this chapter, we aim to validate the findings of our previous study exploring the de-
sign and perception of robot light signals (Chapter 4). We use the design tree proposed
in Section 4.5 to generate a set of robot light signals for use in a variety of HRI contexts.
An underlying goal of this thesis is to facilitate generalizable nonverbal signal design and
promote standardization of robot signaling behaviors. These applications serve to show
that our previous findings (Section 4.4) can be employed by a variety of appearance-
constrained nonhumanoid robots, an important step for standardization.
The previous study design (Section 4.3) employed egocentric videos of a drone moving
about an empty room. This approach strictly constrained the environment and robot’s
actions in order to discover underlying human perceptions without other confounds.
Robots in the real world, however, are designed for specific tasks in human-oriented
environments. Therefore, in this chapter, we remove these constraints and evaluate the
signals using different robot and interaction contexts.
64
5.2 Experimental Design [E2]
We present a video-based user study validating the signaling behaviors proposed in Sec-
tion 4.5 across four sets of HRI contexts. These contexts involve different appearance-
constrained, non-humanoid robots performing their respective tasks (e.g., autonomous
car driving) and interacting with humans. For each robot, we generated videos of three
different scenarios in which they employ light signals for communication and had partic-
ipants rate the robot and its signals across several measures.
5.2.1 HRI Contexts
We employ four different types of non-humanoid robots: a free-flying robot, a industrial
mobile base, a telepresence robot, and a autonomous vehicle. We chose these robot plat-
forms as they are deployed in the real world in some capacity, and their popularity and
usage have grown in recent years.
Free-Flying Robot- The free-flying robot chosen for this study is the NASA Astrobee.
The Astrobee is designed to perform tasks on the International Space Station (ISS),
such as taking sensor measurements and acting as a remote camera for ground crew.
We chose to employ the Astrobee platform for two reasons: 1) it is already equipped
with signal lights designed to communication information about the robot’s state
and 2) unlike other free-flying robots, it is designed to operate in close proximity
and even collaborate with humans. A model of the Astrobee robot was provided by
NASA.
Figure 5.1: The simulated Astrobee free-flying robot in the ISS.
65
Industrial Mobile Base- The industrial mobile base used in this study is the Fetch
robot base. This robot is similar to other industrial bases, such as the Amazon
robotics (formerly Kiva) base and the Omron Adept. These robots are significantly
shorter than adult humans ( 16 inches tall) and are typically employed to transport
loads around industrial environments, such as factories and warehouses, as shown
below.
Figure 5.2: The Fetch mobile robot base is designed to transport loads across warehouse and industrial
environments.
Telepresence Robot- The telepresence robot used in this study is the Ohmni robot
by OhmniLabs. This robot is 48 inches and designed to provided face-to-face com-
munication capabilities in a remote environment for a human operator. While telep-
resence robots have been employed in a variety of environments, such as offices and
hospitals, the Ohmni is primarily designed for use in more personal environments,
including homes and classrooms.
Figure 5.3: The Ohmni telepresence robot is designed to enable mobile remote presence in a variety of
human-oriented environments, such as the home.
66
Autonomous Car- Autonomous cars are capable of sensing the environment and
navigating without human input. They typically employ a number of sensors, such
as a LIDAR, to map the environment. The autonomous car used in this study is
a Prius model taken from open source ROS repositories, modified with simulated
sensors.
Figure 5.4: An autonomous car model employed in the study.
We added lights to the later three robot platforms. Lights are placed on both the left
and right sides (wrapping to the front) of the Astrobee, Fetch robot base, and autonomous
car to increase visibility. Since the Ohmni has a narrow body, only one set of lights that
are visible from both the front and sides of the robot were used. We employed the same
number of lights ( 22 lights) on all robots’ sides.
5.2.2 Signaling Scenarios
An overview of the signaling scenarios employed for the four robot contexts is shown in
the table below.
5.2.3 Participants
We generated 12 videos for the scenarios described above, using the Unity 3D system. We
used a partially-within subjects design, in which each participant saw all three videos for
only one robot platform. The order of the videos were randomized and counterbalanced.
We recruited participants using the Amazon Mechanical Turk platform. Participants
were located in the United States, at least 18 years of age, and had an approval rating of
at least 95%. A total of 80 participants were recruited for the study.
67
Type Scenario Signal Type Urgency
Free-Flying Robot
The robot is stationary in the
ISS, operating correctly, and
waiting for a new task.
Notification Not Urgent
The robot attempts to dock
to a handrail in the ISS with
its perching arm and experi-
ences an error.
Warning Not Urgent
The robot has lost control
of its motion as it navigates
around the ISS.
Error Urgent
Industrial Mobile Base
The robot encounters a per-
son and waits for them to
cross the robot’s path before
continuing on.
Notification Urgent
The robot tries to maneuver
through a narrow space of
boxes and becomes stuck.
Warning Urgent
The robot is carrying a large
load and begans overheating.
Error Urgent
Telepresence Robot
A remote operator attempts
to log into the robot.
Notification Urgent
The robot has low battery
and cannot make it back to
its dock before turning off.
Warning Urgent
The robot loses connection to
the remote operator.
Error Not Urgent
Autonomous Car
A car with a safety driver is
operating autonomously.
Notification Not Urgent
The car slows down for a
pedestrian crossing the road.
Warning Not Urgent
The car’s sensor is hit by a
piece of debris. The car pulls
over to enable the passenger
to check the sensor.
Error Not Urgent
5.2.4 Measures
We had four primary measures for each of the videos:
1. Description- Participants were asked to describe the light signal and what they
believe it indicates. This was used as a manipulation check for the videos and
to check whether participants could infer the precise meanings behind the signals.
68
2. Signal Type- Participants chose which of the signal types, “notification,” “warning,”
or “error,” they believe best represented the robot’s light signal.
3. Urgency- Participants chose whether they believed the robot’s light signal to be
urgent or not.
4. Signal Usage- Participants rated on a 5-point Likert scale whether they found the
light signal to be helpful in understanding the robot’s actions.
In order to make these questions effective measures of the light signal design deci-
sion tree, we provided only minimal information about the robot platforms and their
environments.
5.3 Analysis
We present the results of this video study, primarily focusing on counts of the fixed choice
measures describing the light signals. We number the scenarios (1-3) for each of the robot
platforms in the order shown in the table above.
5.3.1 Signal Type
Notification: Scenario 1 for each of the robot platforms employed the signals for notifi-
cation. We found that these scenarios had the least ambiguity (i.e., the highest counts)
regarding the signal’s intended meaning, as shown in Figure 5.5. The telepresence sce-
nario, a operator logging into the robot, was chosen correctly every time, whereas the
autonomous car scenario, the operating mode, had the lowest count.
To better understand these results, we also looked at participants’ descriptions of the
light signals’ intended meanings. Although most participants gathered that the signals
denoted a notification of some type, participants had trouble identifying the notification
meaning for several of the videos.
Surprisingly, participants were most confused about the telepresence robot’s notifica-
tion. Since most people have not used or been around a telepresence robot, they were un-
sure of its operation and functionality. All participants were able to infer the meaning of
the industrial robot base’s notification signal, confirming its accuracy in the fixed-choice
measure.
69
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Type (Free-Flyer)
Notification Warning Error
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Type (Industrial Base)
Notification Warning Error
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Type (Telepresence Robot)
Notification Warning Error
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Type (Autonomous Car)
Notification Warning Error
Figure 5.5: Results of the signal type forced-choice measures.
Warning: Scenario 2 for each of the robot platforms employed the signals for warn-
ing. For all four robot platform scenarios, a majority of participants correctly chose the
warning descriptor. The results of this measure were more mixed, however. Two of the
scenarios, for free-flyer and autonomous car, were split between notification and warning.
These platforms employed the non-urgent signal variation. The remaining two scenarios,
industrial base and telepresence robot, were split amongst all three descriptors.
Participants were most often able to infer the correct meaning behind the warning sig-
nal for the industrial base, despite its mixed results. This suggests that participants differ
in their opinion of whether the robot being stuck requires a warning or error. This effect
was seen for both scenarios that employed the urgent warning. This is unsurprising, since
warnings and errors are often described similarly and in some systems, interchangeable.
Error: Scenario 3 for each of the robot platforms employed the signals for error. For all
four robot platform scenarios, a majority of participants correctly chose the error descrip-
tor. Similar to warning, several participants had trouble differentiating between warning
70
and error signals. In their descriptions, participants had the most trouble identifying the
reason for the telepresence robot’s warning signal; however, this confusion seems to arise
from lack of familiarity with these platforms.
5.3.2 Urgency
The urgency fixed-choice measure was chosen correctly for all scenarios, except for ur-
gent, notification signals. Almost all participants chose non-urgent for all four notification
measures. This indicates that our urgency manipulation for the notification signal type
failed. It may also suggest that participants may never classify notifications as urgent due
to their unimportant nature.
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Urgency (Free-Flyer)
Not Urgent Urgent
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Urgency (Industrial Base)
Not Urgent Urgent
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Urgency (Telepresence
Robot)
Not Urgent Urgent
0
4
8
12
16
20
Scenario 1 Scenario 2 Scenario 3
Ratings of Signal Urgency (Autonomous Car)
Not Urgent Urgent
Figure 5.6: Results of the signal urgency forced-choice measures.
5.3.3 Signal Usage
Our final measure asked participants to rate on a 5-point Likert scale how helpful they
found the robot’s light signal in understanding the scenario they observed. For each of
71
1
2
3
4
5
Notification Warning Error
Likert Rating of Signal Helpfulness
Free-Flying Robot Autonomous Car
Industrial Base Telepresence Robot
Figure 5.7: Results of the 5-point Likert asking participants to rate the helpfulness of the robot’s light signal.
the four robot types, participants rated the signal most helpful in the error scenario and
least helpful during the notification. Since notifications typically do not have any off-
nominal conditions, it may be more challenging to understand what is happening when
the notification signal occurred.
We also found that participants had more difficulty understanding the scenarios with
the Astrobee free-flying robot and the telepresence robot. This may be due to lack of
familiarity with these robot types as evidenced in their comments. The Astrobee robot
also behaves differently than other free-flying robots due to the microgravity and unique
environment.
5.4 Discussion
This chapter presents an application of the findings from our exploratory study in light
signal design (Chapter 4). The initial user study (S1) employed videos of a free-flying
robot using a light signal while in motion. These controlled scenarios enabled us to dis-
cover humans’ underlying perceptions of common light signal parameters. We generated
a decision tree for light signal generation from the findings of the study (Section 4.5).
In this chapter, we used this decision tree to generate light signals for use in four
sets of human-robot interaction scenarios. Each set of scenarios had three videos cor-
responding to the signal type (notification, warning, error) and employed a common
non-humanoid robot type currently deployed in the world (free-flying robot, telepresence
72
robot, autonomous car, industrial base). The goal of this experiment was to validate the
proposed decision tree in realistic HRI scenarios.
The results of the experiment showed that while participants were most often able to
correctly identify a notification signal (82.5%), the signals did not help convey the robot’s
actual state information. This finding suggests that the wide breadth of information that
falls under the notification category makes it challenging to decode the robot’s light signal
without other cues. We also found that participants tended to have a hard time separating
scenarios that should employ warnings versus errors.
We also found that the manipulation for urgency was not successful for notification.
Comments from participants suggest that notifications by nature are viewed as not urgent.
The manipulation for the other two types of signals, however, succeeded. Overall, these
findings suggest that while the proposed decision tree is useful for generating warning
and error signals, more thought is needed to design effective notification signals.
5.5 Summary
In this chapter, we presented an applications of robot light signals for communication
using the results of our previous study in light signal design (Chapter 4). We validated
the proposed signals from Section 4.5) in a video-based study with a free-flying robot,
telepresence robot, industrial mobile base, and autonomous car. Then, we created a sim-
ulation that enabled users to directly interact with the free-flying robot. Our results show
promise in developing a set of standardized light signals which can be employed by a
variety of non-humanoid robots. In the next chapter, we present research on designing
auditory signals for HRI.
73
6. Design of Robot Auditory Signals
The focus of this chapter is the design of communicative auditory signals for robots to
facilitate human-robot interaction (HRI). Towards this goal, we conducted a user study
investigating the design of auditory signals to enable humans to localize a robot in the
nearby environment.
6.1 Introduction
Auditory cues provide a wide range of contextual information that promote awareness of
a human’s surroundings (Dingler et al., 2008, Papadopoulos et al., 2012). Humans natu-
rally and often subconsciously produce auditory signals, such as breathing sounds, that
reveal information about their current physical state (Papadopoulos et al., 2012, Turchet
et al., 2016).
Auditory signals are highly salient making them effective for grabbing human atten-
tion. This property has led to their widespread usage in alarms and alerts. Modulation of
auditory parameters, such as pitch and intensity, can alter a signal’s salience and enables
communication of a wide range of information. Compared to light signal design, there
is significantly more prior work in auditory signal generation due to the prominence of
speech, music, and sound effects in everyday life.
In this chapter, we explore the design of auditory signals for HRI through a user
study. This study also investigated the use of auditory icons, or sounds that naturally
reveal information about the robot’s state. Humans generate a variety of subconscious
and intentional auditory cues that indicate information about their activities and hence
their internal state. Since many robots do not naturally produce these state-expressive
and perceivable sounds, it can be difficult for a human to use auditory cues to localize a
robot in their surroundings.
74
As human knowledge of a robot’s location is key to both safety and collaboration,
the goal of this study was to explore the design of robot sounds that facilitate auditory
robot localization by co-located humans. A straightforward solution is for the robot to
use a loud, distinctive sound to indicate its presence. However, more noticeable auditory
signals are also more likely to annoy or distract humans (Cha and Matari´ c, 2016, Sneddon
et al., 2003). Instead, the design and usage of the robot’s auditory cues should take into
account the preferences of the human interactor, a major tenet of our communication
framework.
Towards this goal of enabling auditory localization in a user-acceptable manner, we
conducted a user study with 24 participants investigating the use of auditory icons that
mimic the robot’s actual operations. In addition, we also explored techniques for modu-
lating sound in order to minimize annoyance without significantly affecting localizability
or salience. This resulted in a comparison of two variations of an auditory signal, broad-
band and tonal, on a human’s perceptions of a robot during a human-robot collaboration
task. Tonal sounds contain a distinctive frequency and are hence, more noticeable and
annoying than broadband sounds, which contain a larger range of frequencies.
The previous user study on light signal design (S1), presented in Chapter 4, employed
a free-flying robot or drone. To increase the generalizability of our research, this study
employed the Turtlebot 2, a mobile robot. This robot is a non-holonomic mobile base
that is short in height; the robot’s functionality and embodiment are similar to industrial
mobile bases (e.g., Amazon warehouse robot) and robot vacuums for the home (e.g.,
iRobot Roomba).
The next section provides background on relevant past works in auditory localization
and signal design. Then, the details of the user study and its results are presented. Finally,
we discuss our findings and their implications for HRI.
6.2 Related Work
A key challenge for robot nonverbal communication is creating signals that are intuitive
and easy to interpret (Cha and Matari´ c, 2016). Although training or prolonged exposure
can help humans to understand the nonverbal signals commonly employed by electronic
devices and products (Edworthy and Stanton, 1995, Garzonis et al., 2008), robots often
75
require a wider variety of signals than can easily be learned. Signals that are analogous
to cues humans already understand may help to overcome this challenge.
This approach is especially important when using communication modalities that are
less precise, such as sound. While vision is traditionally preferred in applications that
require a high level of perceptual discrimination (Brewster, 2002, Deatherage, 1972), we
envision robots to function in many scenarios and environments where the visual field
is limited, blocked, or noisy (Dingler et al., 2008, Keller and Stevens, 2004). In these
situations, auditory cues can augment a human’s knowledge of a co-located robot’s state,
including its presence and location in the environment (Dingler et al., 2008).
Prior work has demonstrated that artificial sounds (e.g., beeps and chirps) like those
used by the fictional robots R2D2 and WALL-E can convey many social properties, such
as affect and politeness (Read and Belpaeme, 2012). As these sounds more closely mimic
linguistic functions, however, they are less suited for conveying concrete information
about the robot’s state (Read and Belpaeme, 2014a). Instead, we propose the use of robot
sounds that act as auditory icons designed to mimic the noise normally produced by a
robot’s operations. These sounds can be amplified or manipulated to increase salience,
thereby augmenting the robot’s natural cues. Since these sounds more closely match
humans’ expectations, they are also easier to interpret than melodies or synthetic alarms.
Knowledge of the robot’s location enables coordination and prevents collisions (Alami
et al., 2006, Drury et al., 2003). This is crucial as many robots are limited in their ability to
quickly detect and avoid dynamic obstacles, putting the burden of safety on the human.
Humans and animals often utilize sounds to estimate the position of others in a process
called auditory localization, a concept that is largely unexplored for HRI in the context of
the human localizing the robot. Past approaches have instead relied on visual cues, such
as motion or light, to increase awareness of a robot (Baraka et al., 2015, Cha and Matari´ c,
2016).
Auditory localization typically uses two types of binaural cues: interaural time differ-
ence (ITD), or the time difference between the arrival of the same sound at each ear,
and interaural intensity difference (IID), or the difference in intensity (i.e., loudness) of the
same sound between ears (Farnell, 2010). In addition, the head related transfer function or
difference between the sound at its source and in the human’s ears, is also used. One
76
example of an auditory cue utilized for localization is the sound of footsteps, which be-
come louder as a person comes closer. Although auditory localization is quite complex,
the problem can be simplified by only looking at relative localization of a sound source
in two dimensions.
This study also explored the effects of two variations of sound. The first, tonal, is
often used in alarms and alerts for machinery because the sounds are difficult to ignore.
This also makes tonal sounds well suited for less frequent emergency scenarios where
noticeability is key (Patterson and Mayfield, 1990, Popoff-Asotoff et al., 2011). Broadband
sounds contain a larger range of frequencies, making the signal less distinctive and more
similar to ambient noise (Farnell, 2010, Perrott and Buell, 1982). However, research has
shown that broadband tones are easier to localize than pure tones (Middlebrooks and
Green, 1991). Both sound types were explored, because tonal sounds are more distinctive
,while broadband sounds are easier to listen to for long periods of time.
6.3 Sound Design
As a first step towards understanding how robots can effectively employ auditory sig-
nals to facilitate localization by humans, we generated and compared two variations of
sound: tonal sounds, characterized by a distinct tone, and broadband sounds, that contain
many different frequencies. Both variations of auditory signal were designed to convey
that the robot is in motion.
6.3.1 Auditory Icon
Auditory icons are sounds that are designed to act as analogies to typical sounds or noises
found in everyday life (Gaver, 1986). Instead of sounds with learned meanings, these
sounds are more iconic, matching people’s expectations and experiences. For instance, to
indicate that a robot is powered on, it can mimic sounds that other machinery or people
emit (e.g., computer humming and breathing).
Since robots take many different forms, they produce a wide range of noises with
varying levels of noticeability. The use of these iconic signals can help create signaling
standards across platforms and reduce the number of cues humans must learn. As the
proposed auditory icons are analogous to the robot’s actual state, these signals are more
77
Tonal Robot Sound
Broadband Robot Sound
Figure 6.1: The tonal and broadband signals generated from the Turtlebot’s motor sounds and used in the
study.
intuitive and easier to learn, compared to melodies or other artificial sounds (e.g., beeps
and whistles). Thus, the use of auditory icons reduces the need for prior exposure or
training.
6.3.2 Tonal Robot Sound
To create a tonal robot sound, we first recorded the sound produced by the motion of the
Turtlebot, the mobile robot platform utilized in this study. The goal for the tonal signal
was to create a machine-like sound that clearly resembled the periodic noise of the robot’s
motors while in motion. Background noise and other unwanted noise components were
then removed from the raw sound signal. This made the sound more distinctive and
clearer than the original recording. To increase the saliency of the signal, the pitch was
altered to be approximately 10% higher; humans tend to be more sensitive to higher-
frequency noises within a certain range (Fletcher-Munson curves; (Fletcher and Munson,
1933)). The end result was a regular servo-like sound at a frequency of 700 Hz, as shown
in Figure 6.1 (left).
6.3.3 Broadband Robot Sound
Although broadband sounds do not have a distinctive pitch, we wanted the sound to
remain reminiscent of regular motor sounds. The broadband signal consisted of the tonal
robot sound with added Brownian noise, generated by integrating white noise with equal
intensity throughout the frequencies found within the human hearing range (20 Hz-20
78
kHz). Compared to white noise, Brownian noise is considered less harsh as it has more
energy at lower frequencies, giving it a softer quality. We chose Brownian noise for its
resemblance to the deeper, rumbling sounds of many of machinery. The resulting sound,
as shown in Figure 6.1 (right), still contained a distinctive "wheel turning" component but
had no clear tone or frequency.
6.4 Experimental Design [S2]
To explore the effects of auditory icons on human-robot collaboration, we conducted a
within-subjects user study in which participants collaborated with the Turtlebot on a
physical human-robot collaborative task (S2). In addition, this study also investigated the
effects of two sound variations, broadband and tonal, on human satisfaction, an important
component of the communication framework presented in Chapter 3.
6.4.1 Hypotheses
Since the sounds produced by many robots are difficult to perceive against ambient noise,
we anticipated the addition of either type of auditory signal will improve participants’
ability to localize the robot. To evaluate this, we introduced a baseline sound condition
in which no additional auditory signal is present. We also expected that one sound type
will be preferred overall.
H1: Objective Collaboration Metrics. Auditory condition (baseline, tonal, broadband) will
significantly affect participants’ accuracy and time when inferring the robot’s location, with broad-
band sound being the best.
H2: Perception of the Sounds. The tonal and broadband signals will negatively affect par-
ticipants’ perceptions of annoyance but will positively affect noticeability and localizability of the
robot. Participants will perceive the broadband sound to be the easiest to localize and the tonal
sound to be the most noticeable and annoying.
H3: Sound Type Preference. Participants will prefer the broadband robot sound to the tonal
robot sound for use in a human-robot collaborative task.
79
Participant
Work Area
Other Warehouse
Area
Drop-O Locations
and Paths
Visual Barrier
Drop-O
Trajectories
Start
Figure 6.2: An overhead view of the experimental setup.
6.4.2 Collaborative Task
Designing a collaborative task to evaluate these hypotheses presented several challenges.
To isolate the effects of auditory signals on localization, visual cues needed to be blocked
or minimized. Since participant performance needed to be compared across auditory
conditions, the task needed to be repeatable and consistent; positioning was especially
important as variations in distance or position can significantly affect sound perception.
A subset of the participants’ actions also needed to depend on their ability to localize the
robot in order to provide observable, objective metrics. Lastly, the task had to be realistic
to create an authentic experience for participants and motivate their performance. This
authentic framing also provides insight into how the results of this study can be utilized
by our communication framework in real applications and environments.
To satisfy the task criteria, we chose a collaborative packing activity in a mock ware-
house environment. The participant and robot worked together to package orders and
distribute them across the environment. The participant’s role is to find the individual
items (building blocks) for each order, place them in a bag with the corresponding or-
der number, and hand off the package to the robot. The robot periodically retrieved the
participant’s completed orders and transported them across the “warehouse.”
In this user study, we employed the Turtlebot 2 mobile robot base with two additions:
1) a USB speaker was mounted on top of the robot for playing the auditory signals and
2) a plastic bin was mounted behind the speaker for holding completed orders.
80
Figure 6.2 shows a schematic of the task. The participant is on one side of the room
separated from the robot by a 5 foot tall white curtain. The participant continuously
packages orders until a message on the monitor asks whether they are ready for the robot
to pick up the orders. After the participant confirmed, the robot moved to one of the three
drop-off areas behind the white curtain. The participant collected their completed orders,
chose the drop-off area (left, right, or middle) that they believe the robot was located
at, and moved to the area using the corresponding path marked by red tape. After the
participant walks a certain distance (marked by gray dashed line) past the workstation,
they are not allowed to change paths. This enabled us to distinguish which drop-off area
the participant chose. The participant pulls up the curtain, places the orders in the robot’s
bin, and returns to their workstation.
To motivate their performance, participants were told they would receive a bonus
based on their accuracy and speed, such that an incorrect drop-off location would be
penalized and a faster drop-off time would be rewarded. Hence, participants aimed to
choose the correct drop-off location as quickly as possible. We also told participants that
completing the packaging of orders slightly affected the bonus to prevent them from
solely focusing on the robot.
6.4.2.1 Visual Barrier
The study employed a visual barrier to isolate the effects of the auditory cues on partic-
ipants’ ability to localize the robot. The visual barrier also prevented participants from
fixating on the robot due to its novelty, the design of the experiment, and the bonus.
Once participants reached their chosen drop-off location, they were allowed to look over
the barrier to check whether their prediction was correct.
6.4.2.2 Robot Trajectories
In order to know when to start listening for the auditory cues, participants triggered the
execution of the robot’s motion to the drop-off areas. The robot utilized three straight
line trajectories starting at the middle of the room, opposite the participant to get to the
drop-off areas (shown by dashed lines in Figure 6.2). The robot moved at 0.25
m
sec
to give
participants enough time to listen to the its auditory signal before making a prediction.
81
Preparing orders Localizing robot Dropping o orders
Figure 6.3: A participant prepares orders, locates the robot from behind the visual barrier, and drops off
orders.
6.4.2.3 Environmental Noise
The ambient noise levels of the environment were also altered to create an effect consistent
with a real factory or warehouse. Industrial settings typically have sound measurements
ranging from 75 to 90 dBA. We utilized a continuous sound track of industrial noise for
the duration of the experiment. Using a sound level meter, ambient sound levels were
adjusted to be approximately 70 dBA. This level was chosen because the experimental
area was smaller and more enclosed than a real warehouse.
6.4.3 Procedure
As participants entered the experiment room, they were given a written overview of the
study with a picture of the robot. After obtaining informed consent, the experimenter
explained the collaborative task in detail. Participants were told that the goal of the
study was to explore methods to better coordinate human and robot actions using sound.
Participants were also told they could earn a 67% bonus (added to their compensation)
based on the performance metrics described in the previous section. After finishing the
task, participants completed a post-study survey.
82
6.4.4 Manipulated Variables
We manipulated a single variable, auditory signal type, the sound that the robot plays
while it is moving, to be baseline (i.e., absent), tonal, or broadband. The baseline sound con-
dition has no added auditory signal and no amplification; it represents a more typical use
case in which a robot moves about the environment emitting only the noises it naturally
produces.
A sound level meter was used to make the tonal and broadband auditory signals equal
in sound level intensity (i.e., volume). Due to the robot’s short height and the presence of
a curtain, we had to amplify the signals to overcome the physical interference. No sound
was played while the robot was stationary.
6.4.5 Participant Allocation
A total of 24 participants (15 males, 9 females; ages 18-35, M=23.83, SD=4.25) were re-
cruited from the local community. Four participants reported having prior experience
working with or using robots.
The study used a within-subjects design as it enabled participants to compare the three
different auditory signal conditions. Each auditory condition was used for six drop-offs,
equally divided among the three locations. The order of drop-offs and the conditions
were fully counterbalanced to control for ordering effects.
Participants were told that there were three different types of sounds that the robot
produced. They were also informed of and given a short break when a transition oc-
curred between the conditions to avoid confusion from a sudden sound change. To pre-
vent participants from continuously listening for the sounds, the robot drove around the
warehouse between drop-offs. This also enabled participants to familiarize themselves
with each auditory condition.
6.4.6 Dependent Measures
We utilized both objective and subjective measures to evaluate the effects of each auditory
condition on participants’ ability to localize the robot during a human-robot collaboration
task. The objective measures include the decision time and the accuracy for choosing drop-
off locations. The decision time is the amount of time it takes for the participant to choose
83
a drop-off location after prompting the robot to start its trajectory. Each drop-off was
scored as either 0 or 1 in accuracy.
The subjective measures consisted of 5-point Likert scale ratings for each auditory con-
dition’s noticeability, localizability, and annoyance. Additionally, participants were forced to
choose an auditory condition for each of these descriptors: most noticeable, easiest to
localize, most annoying, and most preferred to work with. Participants also described
each of the auditory conditions as a manipulation check and to provide insight into their
perceptions of each condition.
6.5 Analysis
The experimental task was divided into three sessions, one for each auditory condition.
Each session consisted of six drop-offs. This led to a total of 432 interactions or trials.
6.5.1 H1- Objective Collaboration Metrics
Our first hypothesis stated that the auditory condition (baseline, tonal, broadband) will
significantly affect the objective collaboration metrics. We also predicted that the broad-
band signal will be easiest to localize.
To evaluate this hypothesis, a repeated-measures ANOVAs was performed on partic-
ipants’ prediction accuracy and decision time (i.e., inference speed). We found that the
auditory condition significantly affected both accuracy (F(2,432)= 108.67, p< 0.001; Fig-
ure 6.4) and inference speed (F(2,432)= 25.69, p< 0.001; Figure 6.4). However, a post-hoc
analysis using Tukey HSD showed that the tonal and broadband conditions were not
significantly different for either metric.
Overall, participants had a lower average inference time for the broadband condition,
while the accuracy between the two conditions was about equal. Several participants com-
mented that the tonal signal was easier to localize from the start of the session, whereas
the broadband signal took some time to acclimate to.
Analysis of the video recordings also showed that several participants hesitated in the
initial drop-off trials for the broadband condition. They moved their bodies back and
forth before slowly walking to the drop-off location. This suggests that the broadband
sound is less intuitive and may require time for familiarization.
84
Accuracy
percentage
20
40
60
80
100
0
Inference Speed
2
4
6
8
10
0
12
14
seconds
percentage
20
40
60
80
100
0
45 40 35 30 25 20 15 10 5 0
time (sec)
Participants that predicted the correct drop-o
Baseline
Tonal
Broadband
Accuracy
36.8% (53/144)
91.7% (132/144)
91.0% (131/144)
F(2,432)=108.67, p<0.001
Avg. Time (sec)
11.90 (SD=8.47)
7.56 (SD=5.36)
6.93 (SD=4.82)
F(2,432)=25.69, p<0.001
Objective Collaboration Measures
Baseline Tonal Broadband
Figure 6.4: Objective collaboration metrics: inference accuracy, inference time, and cumulative percent of
correct predictions by time.
6.5.2 H2- Perceptions of the Sounds
Our second hypothesis stated that addition of auditory signals (i.e., tonal and broadband
conditions) will negatively affect participants’ ratings of annoyance, but positively affect
ratings of robot noticeability and localizability. We also predicted that participants will
perceive the broadband signal as the easiest to localize and the tonal signal to be the most
noticeable and annoying.
We performed repeated-measures ANOVAs on participants’ ratings of all three sub-
jective collaboration metrics. As predicted, the auditory signal condition significantly
affected all three measures (Figure 6.5): noticeability (F(2,24)= 321.21, p< 0.001), local-
izability (F(2,24)= 222.32, p< 0.001), and annoyance (F(2,24)= 74.71, p< 0.001).
Participants rated the tonal signal as the most annoying and noticeable, while the
broadband signal was rated the most localizable. A post hoc analysis with Tukey HSD
confirmed that each auditory condition was rated significantly differently for noticeability
and annoyance. As with the objective collaboration metrics, the added auditory signal
conditions (tonal and broadband) were rated as significantly more localizable than the
baseline condition, but the tonal and broadband conditions were not rated significantly
differently from each other.
85
Noticeability
1
2
3
4
5
0
Localizability
1
2
3
4
5
0
Annoyance
1
2
3
4
5
0
Baseline
Tonal
Broadband
1.083 (SD=0.28)
Noticeability
1.125 (SD=0.34)
4.750 (SD=0.53)
4.292 (SD=0.69)
F(2,24)=321.21, p<0.001
Localizability
1.125 (SD=0.34)
4.083 (SD=0.72)
4.375 (SD=0.65)
F(2,24)=222.32, p<0.001
Annoyance
3.750 (SD=0.85)
2.875 (SD=0.99)
F(2,24)=74.71, p<0.001
Subjective Collaboration Measures
Baseline Tonal Broadband
Figure 6.5: Subjective metrics: 5-point Likert ratings for noticeability, localizability, and annoyance.
In a set of fixed-choice questions, participants were asked to choose the auditory con-
ditions they thought was the most noticeable, easiest to localize, and most annoying. 88%
of participants chose the tonal condition as the most noticeable while 83% of participants
also chose this signal as the most annoying. In contrast, 67% of participants chose the
broadband robot as the easiest to localize, confirming the mixed results between the two
added auditory signal conditions.
Participants commented that the tonal signal was the most annoying and noticeable
due to its high pitch. Several participants also mentioned that the tonal signal stood out
from the ambient sounds, compared to the broadband signal, which sounded "lower" and
more like the factory noise.
6.5.3 H3- Sound Type Preference
The final hypothesis predicted that participants will prefer the broadband auditory con-
dition to the tonal auditory condition for use in a human-robot collaboration task. In the
post-study survey, participants were asked to choose the robot that they would prefer to
work with and explain their selection.
86
79% of participants preferred to work with the robot with the broadband auditory
condition while 21% of participants preferred the robot with the tonal condition. Partic-
ipants’ comments revealed that they felt that they were able to find both robots in the
environment, but that the tonal sound condition was more annoying.
Participants also commented that the higher pitch of the tonal signal was more dis-
tracting and difficult to ignore. A handful of participants also stated that they would be
unable to work for a long period of time in the presence of the tonal signal. More than
one participant commented that they would, "lose it," if they had to listen to the tonal
sound for a full workday. Another participant summarized the experiment by saying,
"the first sound (baseline) would cause me to lose my job in the first week and the second
(tonal) would make me quit in the first week."
6.6 Discussion
This chapter proposed a novel HRI concept: the purposeful generation of auditory cues
designed to enable localization of the robot in the human’s workspace. To validate this
concept, we conducted a user study, in the context of a human-robot collaboration task,
exploring the design of auditory signals for enabling humans to detect the presence of a
nearby mobile robot in motion.
We also proposed the use of auditory icons, or sounds designed to mimic everyday
phenomena, for use as robot communication signals. State-expressive, auditory icons can
convey information about the robot more intuitively but have yet to be widely explored
in HRI. The auditory icon utilized in this study mimicked the sound of the robot’s motors
in motion.
We also investigated two types of sound by varying the base signal to create broad-
band and tonal versions of the auditory icon. Tonal sounds have a distinctive pitch or
frequency, whereas, broadband sounds contain a larger range of frequencies and are typ-
ically less annoying. We also included a baseline condition with no additional auditory
signal forcing participants to rely only on the sounds emitted from the robot’s motors.
The study setting was designed to resemble a warehouse setting, including its ambient
noise levels.
87
We hypothesized that the auditory signal condition (i.e., baseline, tonal, broadband)
would affect the accuracy and time to infer a robot’s localization. Results showed that
both auditory signals significantly improved localization over the baseline. This was ex-
pected as the ambient sound levels easily masked the low intensity of the robot’s motors.
Surprisingly, the broadband signal was not found to be significantly easier to localize
than the tonal signal. Instead, participants’ performance under both conditions with
nearly identical in accuracy.
Past work in psychophysics often takes places in controlled laboratory settings. How-
ever, this experiment introduced a number of other factors (e.g., realistic ambient noise,
moving sound source) to mimic a real-world setting. Hence, in the uncertain, dynamic
conditions that real robots operate under, the past findings did not hold. This suggests
that other results from psychophysics may suffer from similar issues and therefore, re-
quires greater testing with robots in real-world settings.
Our second hypothesis explored humans’ perceptions of these auditory signals; we
expected that the tonal and broadband signals would negatively affect annoyance but
positively affect noticeability and localizability. The results confirmed that all three au-
ditory conditions (baseline, tonal, and broadband) significantly affected participants’ rat-
ings of the Turtlebot’s noticeability and annoyance. While participants rated the tonal
and broadband conditions as significantly more localizable than the baseline, a post hoc
analysis revealed that the two auditory signals were not rated significantly differently,
matching the results shown by our objective metrics. As expected, participants also chose
the tonal signal as the most noticeable and annoying and commented that this was due
to its “high pitch.”
Finally, we hypothesized that most participants would prefer to work with a robot
utilizing the broadband auditory signal during a human-robot collaboration task. This
was confirmed with the majority of participants preferring the broadband due to the
annoyance of the continuous tonal signal. These results indicate that promise for using
broadband signals for localization and auditory signaling; they are easy to generate and
due to their diversity of frequencies, are also able to blend into the background.
The results of this study indicate that nonverbal auditory signals can be a useful tool
for HRI. Employing such signals can facilitate coordination and improve safety. sThe
results also suggest that due to the many factors that affect human’s physical perception
88
of nonverbal signals, more work is needed to understand how psychophysics phenomena
are affected by the types of real-world, dynamic settings robots operate under. Overall,
these findings strongly support the use of auditory icons to enable localization of a co-
located robot by humans and the use of iconic broadband signals to minimize human
annoyance.
6.7 Summary
In this chapter, we explored the design of auditory signals for HRI. An overarching goal
of this thesis is to enable robots to effectively signal information about their internal state
with varying levels of salience, urgency, and other important communication properties.
This requires the robot to have a rich vocabulary of nonverbal signals as explained in Sec-
tion 3.2.
The work presented in this chapter is a first approach towards creating intuitive and
iconic auditory signals for human-robot communication. Similar to the work presented
in Chapter 4, this research drew from common user experiences to convey information
about the robot’s state. Auditory signals provide certain advantages, most notably hu-
mans’ ability to recognize and react to many auditory cues without significant cognitive
effort.
Understanding the effects of sound variations (e.g., broadband, tonal) is also a neces-
sary step towards generating signals that balance a robot’s task-oriented objectives with
human preferences for interaction. These findings directly inform our communication
framework which strives to plan robot behavior with humans in mind. In the next chap-
ter, we employ these findings in the first application of our communication framework.
89
7. Application of Robot Auditory Signals
This chapter presents an application of the results of our user study (S2) in auditory
signal design (Section 6.4) as well as a simplified version of our proposed communication
framework (Chapter 3). In this application, we sought to employ auditory signals for
localization in a more realistic setting in which visual cues are also present. We also
present insights from the first implementation of the communication framework to inform
future applications.
7.1 Introduction
As a first step towards validating the proposed communication framework, we applied
the MDP model described in Section 3.1 to the problem of auditory localization, the
communication scenario from Chapter 6. The robot’s goal is to enable humans in the
nearby environment to locate it via its auditory signals.
The previous user study (S2) exploring auditory localization (Section 6.4) had two
major findings. The first is that auditory icons, or sounds that correspond to real events in
the world, can act as effective signals for robot communication. Auditory icons naturally
express information about the robot’s operations and are quick to transmit making them
efficient and effective signals.
The second insight is that broadband sounds, which contain a larger range of frequen-
cies and no distinct tone, were found to be approximately equally in localizability to tonal
sounds. While past works in psychophysics found broadband auditory signals to be more
localizable, our results showed that this was not the case in the dynamic and uncontrolled
settings robots operate under (Chapter 6).
We incorporated these findings into our communication framework with the aim of
overcoming the major shortcomings, such as the removal of visual cues, from the initial
90
user study (S2) exploring auditory signal design (Section 6.4). Since humans tend to
continuously utilize cues to localize other agents and events in the environment, this is a
more realistic scenario for robot nonverbal communication.
Due to the complexity of the communication framework, we made several assump-
tions that enabled its model to be simplified to a set of one-state Markov Decision Pro-
cesses (MDPs). This reduces the complexity of learning the robot’s ideal communication
policy as the associated reward distributions are treated as stationary.
7.2 Model and Implementation
In this section, we describe an implementation of the communication framework (Chap-
ter 3) for auditory localization. This implementation is utilized in the context of a human-
robot collaboration task for an experimental data collection described in the next section.
7.2.0.1 State Space
In the user study (S2) exploring auditory localization (Chapter 6), we limited participants’
sensing capabilities by removing visual cues from the environment. Although this iso-
lated the effects of the study manipulations on a human’s ability to localize the robot,
this is not an accurate representation of human perceptual processes. To more accurately
represent how humans localize other agents in the real world, we assume that visual cues
will also be utilized in the localization process. This is incorporated as part of the model’s
state space.
The resulting state space S is a tuplefe
s
,v,dg consisting of the ambient sound level of
the environment e
s
, the visibility of the robot from the human observer’s perspective v,
and the distance between the robot and human d. We discretized each of these variables
in the implementation as follows.
e
s
=f1,2g- For ambient sound, the same industrial background track from the initial
user study (Chapter 6) was utilized at two different sound levels: 1) 55 dBA (similar
levels to an office or restaurant) and 2) 70 dBA (similar to an industrial environment).
We chose two ambient sound environments, since the sound levels utilized in the orig-
inal study are only applicable to a subset of non-humanoid robot applications. The
91
addition of an ambient environment which more closely mimics human-oriented set-
tings, such as public areas where service robots operate, increases the generalizability
of this work. The track was kept consistent as altering any parts of the task or envi-
ronment could present confounds requiring additional experimentation to resolve.
v =f1,2,3,4g- Visibility of the robot takes into account whether the robot is in the
human’s biological viewing range (approximately 114 degrees) and whether the view
of the robot is obstructed by objects in the environment. This resulted in four levels of
visibility (lowest to highest): 1) obstructed and out of view, 2) obstructed and in view,
3) not obstructed and out of view, and 4) not obstructed and in view. This variable
does not actually model whether the human is looking at the robot as it requires careful
gaze, head, and body tracking, which are often infeasible in the real-world. Instead,
we assume this will be handled by the model’s ability to handle uncertain or partially
random outcomes.
d=f1,2g- Due to the limited size of the experimental space (17 feet by 19.5 feet), the
distance d was varied to be either 1) close or 2) far. In larger areas, distance be encoded
as a continuous variable or utilized with psychophysics models if necessary. However,
the small size of the open space in the data collection, combined with the size of the
participant’s work area and the robot enabled us to reduce this variable to just a binary
state.
7.2.0.2 Action Space
The same auditory signal was used for all signaling policiesp(s). The signal was a broad-
band sound recorded from the iRobot AVA mobile base (Figure 7.4), the robot employed
in the experiment. AVA’s naturally produced sounds were utilized due to the finding that
auditory icons act as effective signals for localization in Chapter 6. The resulting record-
ing resembled the noises produced by the operation of a desktop computer (e.g., hard
drive spinning, fans). Using this sound, four auditory signaling actions were created by
varying the sound level intensity over time, as shown in Figure 7.1.
The first action (Figure 7.1, 1) keeps the sound level consistent. This action is similar
to those typically utilized by household products (e.g., microwave beeping) and other
machinery (e.g., car alerts). Unlike these alerts, however, the auditory signal utilized in
92
(1) Steady (2) Increasing (4) Triangle
0 1 2 3 4 5
time
0 1 2 3 4 5
time
0 1 2 3 4 5
time
(3) Decreasing
0 1 2 3 4 5
time
Figure 7.1: The auditory signal action space used in the first application of the signaling framework.
The minimum and maximum sound intensity levels depended on the ambient environment.
this implementation lasts until the human has completed their response action. This is
discussed in more detail in the next section.
The second action (Figure 7.1, 2) linearly increases the sound level over time until it
reaches a maximum value. The third action (Figure 7.1, 3) linearly decreases the sound
level over time until it reaches a minimum value. Both actions (2 and 3) maintained
a constant sound intensity level for the first second to give participants time to start
listening for the robot’s auditory signal. After reaching the minimum or maximum sound
intensity level at t= 4 sec, the sound intensity level for these two actions stayed constant
for the remainder of the interaction.
The last action (Figure 7.1, 4) linearly increases and decreases the sound intensity to
reach these maximum (L
imax
) and minimum (L
imin
) levels at a regular frequency. This
sawtooth was designed to keep the average sound levels consistent over time but increase
the salience of the signal by continuously altering the intensity level.
The minimum and maximum sound intensity levels (L
imin
and L
imax
) were chosen
based on the ambient sound level of the environment. For e
s
= 1, L
imin
= 65 dbA and
L
imax
= 75 dBA at the sound source. For e
s
= 2, L
imin
= 80 dBA and L
imax
= 90 dBA at the
sound source.
We chose to explore a narrowly focused set of signaling actions to support our goal of
investigating an initial application of the proposed robot signaling model. Moreover, for
auditory localization, the robot expresses only one feature of its internal state, its relative
position, making it logical to utilize only one auditory signal. Variation of the sound
level intensity enabled us to alter the salience or noticeability of the signal, which may be
important in cases where the human observer cannot utilize visual cues or localization is
more critical, such as in emergency scenarios.
93
0 1 2 3 4 5 6
time
0 1 2 3 4 5 6
time
Steady Increasing Decreasing Triangle
Reward Term Cost Term
Figure 7.2: The reward function is composed of a reward term (task outcome) and a cost term (interaction
outcome).
7.2.0.3 Reward Function
The proposed framework (Chapter 3) proposes reward formulations that balance effec-
tiveness of the signal with the preferences of the human observer. Effectiveness of the
signal can include an updated human mental model (if the signal is informational) or a
human’s response. In this scenario, effectiveness is defined as the human’s knowledge of
the robot’s location. We assume that the human observer prefers to minimize interruption
and annoyance. Hence, the proposed reward function aims to balance subject’s ability to
localize a robot in the nearby physical area with the salience of the signal. We equate the
signal’s salience to the sound intensity level of the robot’s auditory signal over time. We
formulate the reward as
R=
1
t
resp
norm
1
R
t
resp
0
L
i
(t)dt
(7.1)
where t
resp
is the subject’s response time, t
resp
norm
is the response time normalized by dis-
tance to the robot, and L
i
(t) is the sound level intensity of the auditory signal over time.
In this application, we structure the data collection task such that the human has an ob-
servable response to the robot’s signal. This enables better observation of the effectiveness
of the signaling actions.
This reward formulation values a lower response time, as shown by the first term
in Eq. 9.2. The second term penalizes using higher sound level intensities, especially for
longer periods of time. We chose to integrate over the duration of the signal as our prior
studies showed that a continuous signal is considered more annoying. Hence, auditory
94
Figure 7.3: The diagram on the left shows the data collection setup, and the photograph on the right shows
the actual space.
signals that result in a rapid response by the human or do not utilize high intensity levels
may yield a better reward than traditional static signals.
7.3 Experimental Design [E2]
We applied the proposed model to auditory localization using a human-subjects data
collection to directly learn the optimal signaling policy for a subset of world states. Au-
ditory localization is primarily important for scenarios where a robot and human are in
relatively close proximity. Since this is a first attempt at solving the signaling model, we
discretized the state space and and directly learned a reward distribution for each world
state.
7.3.1 Collaborative Task
At the start of the data collection session, the participant was given a bin of LEGO pieces
and assembly instructions. Throughout the session, the participant sat at a workstation
constructing a LEGO structure, as shown in Figure 7.3. Participants were instructed to
work on the structure at all times unless they were given a cue to interact with the robot.
The light-based cue was provided via a strip of LEDs attached to the perimeter of the
participant’s workstation.
95
Bin for LEGO Pieces
Cylindrical
Speaker
Figure 7.4: The iRobot AVA mobile base used in the experimental data collection.
The robot is introduced as a way to provide participants with additional parts for their
LEGO structure. When the LED strip turns green, the robot begins its auditory signaling
action. The light cue also prompts the participant to leave the workstation, walk to the
robot, retrieve a LEGO piece, and return to their workstation. Both the LEDs and auditory
signal are turned off once the participant returns to the workstation.
Similar to the order packaging task from the initial auditory localization study (Sec-
tion 6.4), the structure building task was designed to distract the participant and to ma-
nipulate their attentional focus away from the robot. The complexity of the task also
ensured a moderate to high cognitive load. Between interactions, the robot randomly
moved around the room to familiarize participants with the auditory and visual cues
provided by its motion. Ambient environment sounds were provided by four speakers
located around the perimeter of the room for the duration of the session.
7.3.1.1 Robot
The robot used in this data collection was an iRobot AVA holonomic mobile base, as
shown in Figure 7.4. The robot was equipped with a cylindrical speaker and a bin for
holding the LEGO blocks. Similar to Chapter 8, we chose this robot platform for its height
and low intensity of sound emitted by its motors.
96
Viewing Range Obstructions 55 dbA 70 dbA
Out of view Occluded Close Increasing Triangle
Out of view Occluded Far Steady Triangle
In view Occluded Close Increasing Increasing
In view Occluded Far Steady Steady
Out of view Not Occluded Close Increasing Steady
Out of view Not Occluded Far Increasing Steady
In view Not Occluded Close Increasing Increasing
In view Not Occluded Far Increasing Increasing
Robot Visibility (v ) Distance
(d )
Ideal Policy ( π*)
Figure 7.5: The auditory signal policies learned from the data collection for each world state.
7.3.2 Participant Allocation
A total of 16 participants (11 males, 5 females, ages 19-31, M=27.5, SD=3.92) were re-
cruited from the local community for the data collection. Participants were only exposed
to one of the ambient sound levels e
s
to limit the total number of interactions and avoid
fatigue. Each participant was exposed to the remaining 8 world states (4 visibility levels
and 2 distances) and 4 auditory signaling policies, for a total of 32 interactions.
7.3.3 Measures
For each interaction, we measured the time for the participant to reach the robot (t
resp
) and
normalized this measure by the starting distance between the robot and human (t
resp
norm
).
7.4 Analysis
The goal of this data collection was twofold: to apply the proposed signaling framework
to a real problem and to apply the results of our prior work in auditory signal design
to a real-world scenario of localization of a robot by humans. Since few works have ex-
plored auditory localization in this context, learning expected rewards of different signal-
ing actions is a first step towards creating more intelligent, adaptive robot communication
behaviors.
Due to the challenges of conducting experiments with a robot and human interacting
in the real world, we chose to utilize offline learning algorithms. Data gathered during the
experiment was used to perform policy evaluation using offline Monte Carlo sampling
97
which averages the reward (Eq. 9.2) for each episode and signaling action (i.e., policy).
The results are shown in Figure 7.5.
Surprisingly, we found that despite varying the minimum and maximum sound level
intensities L
imin
and L
imax
to match the ambient sound levels, the policies p
which max-
imized the reward varied between our two ambient sound conditions (55 dbA vs 70 dbA)
for interactions when the robot was out of the human’s view. These results suggest that
in the absence of initial visual cues, more salient or higher cost signals best support local-
ization.
We found that the increasing sound action performed the best across several different
world states, despite having the longest average response time t
resp
norm
. This was largely
due to the low cost associated with the sound; this signaling policy always started at the
minimal intensity, as shown in Figure 7.5 (3). These findings may also be affected by the
rate of change in sound level intensity utilized by the sound policies.
Considering the size of the data collection space, we assumed most positions could
be reached by participants within about three seconds. We also assumed participants
would need approximately one second to notice the LED light cue and stop their some-
times noisy manipulation of LEGO pieces and find the robot. These assumptions were
supported by the results of the experiment; the median and average (not normalized)
response times were 4.0 seconds and 3.8 seconds, respectively.
Given the above assumptions, the increasing and decreasing auditory signaling ac-
tions held constant volume for one second initially and reached their constant end vol-
umes after four seconds. Thus, compared to the other auditory policies, the increasing
action had a much lower cost term for the first four seconds of signaling. The overall
signaling scheme also caused the decreasing auditory action to never be chosen as the
ideal policy, despite yielding the second lowest average response time across states.
The triangle action repeated every 2 seconds. Due to the constant change in intensity
throughout the signal, we predicted that this policy would be the most salient. The trian-
gle policy performed the best for the lowest visibility, highest ambient noise world states;
however, we found that for many other world states, the triangle policy performed the
worst. Since humans typically utilize intensity differences during localization (Brungart
and Rabinowitz, 1999), the constant variations in the triangle signal may have confused
participants.
98
7.5 Discussion
This chapter presents a first application of the proposed communication framework
(Chapter 3) in the context of human-robot collaborative task. We also applied the find-
ings of the our user study (S2) in auditory signal design (Section 6.4) to a more realistic
interaction environment and scenario.
The initial user study (S2) showed the benefits of using auditory icons which resemble
real sounds found in the world for communicating information. The auditory icon em-
ployed in S2 (Section 6.3) was designed to be highly salient, mimicking the sound of the
robot’s servo motors while in motion. We also found that broadband icons were highly
localizable but easier to ignore than tonal sounds.
In this chapter, we employed a broadband auditory icon that also aimed to reduce
the noticeability of the signal to enable regular usage without disturbing nearby humans.
To better simulate real interaction conditions, many of the constraints used to isolate the
effects of signal design in S2 (Section 6.4) were removed. The proposed communication
framework (Chapter 3) accounts for uncertainties in human perception and behavior that
occur in these more authentic conditions.
Since this application only looked at one communication scenario, the framework’s
model (Section 3.1) was reduced to a set of one state MDPs. We also assume a station-
ary reward distribution which enables us to apply Bandit algorithms for solving for the
robot’s communication policy.
As expected, we found that the new auditory icon used to signal the robot’s location
was less distracting to participants but still localizable. Most participants stated they
were able to completely “tune out” the auditory signal when performing the distractor
task. These findings show that auditory icons can be created that have different levels of
awareness from co-located humans. Increasing the saliency of signals can be a useful tool
in communicating information with greater priority (e.g., notification vs. error).
We also found that more work is needed to refine the communication framework,
especially the formulation of the robot’s reward function. Employing time directly in
the function caused the cost term to dominate the reward function. One approach for
addressing this in future implementations is to incorporate the salience of a signal into
the reward function, instead of using a fixed reward term for all signaling actions.
99
In this implementation, the state space of the model does not include certain informa-
tion which are often employed in communication, such as the human’s urgency. More
complex HRI scenarios will require the robot to signal a variety of information under dif-
ferent conditions. We adddress these issues in our final application of the communication
framework described in Chapter 9).
7.6 Summary
In this chapter, we presented the first implementation of our communication framework
and applied the results of our previous work on auditory signal design (S2). Due to the
constrained nature of the auditory localization scenario, we implemented a reduced form
of the framework’s model (Section 3.1) and assumed stationary reward distributions. We
conducted an experiment (E2) to validate the framework and apply the results of S2 in
a more realistic scenario. Our findings indicate that more work is needed to investigate
the model parameters and algorithms for planning the robot’s communication behavior.
We also found that the use of broadband icons was especially effective in combination
with visual cues for enabling users to localize the robot with low cognitive effort. In the
next chapter, we further explore the design space of nonverbal robot communication by
combining auditory and visual cues to create multimodal signals.
100
8. Design of Robot Multimodal Signals
The focus of this chapter is the design of communicative multimodal signals for robots
to facilitate human-robot interaction (HRI). Towards this goal, we conducted a user study
exploring the design of multimodal auditory and visual signals for requesting help from
a human interactor during a collaborative task.
8.1 Introduction
Chapters 4 and 6 investigated how to design nonverbal signals utilizing light or sound
through two user studies (S1,S2). The findings of these studies provided important in-
sights for design effective nonverbal signals using a single modality. Humans, however,
often employ several channels during communication. Therefore, the goal of this chapter
is to investigate issues relating to the design of multimodal communication behaviors that
employ two or more signaling modalities.
Multimodal communication enables humans to combine many cues (e.g., gaze, gesture,
facial expression) in order to transmit greater amounts of information. Multimodal sig-
nals can also be used to reinforce a message, increasing the saliency of the communication
and its likelihood of being received and decoded.
When designing multimodal signals, inspiration can be drawn from many types of
human-human interaction. For instance, humans employ visual aids such as pictures,
physical models, or other tools in combination with speech or text during instruction
(e.g., teaching). Robots can employ similar methods using visual displays, projectors, and
other non-anthropomorphic modalities. A highly related field of work, multimodal inter-
action, seeks to enable more natural communication for devices by employing multiple
modalities for interaction.
101
Since humans continuously emit information through subconscious nonverbal cues, a
large portion of their communication is multimodal. Moreover, speech is often reinforced
through gesture (e.g., pointing) and gaze. Since effectively employing even one type of
nonverbal signaling is still an open HRI research challenge, multimodal communication
in only sparsely employed by robots.
An exception to this is task-oriented motion which can act as an unintentional non-
verbal cue (i.e., leakage), revealing information about the robot’s state. This type of cue
is not investigated in this chapter as our primary focus is intentional signals designed for
communication; task-oriented motion is only present to provide context for the robot’s
actions. Moreover, as mentioned in Section 2.3, robot motion is not always generated to
be state-expressive (e.g., rapidly exploring random tree (RRT) algorithms).
Towards the goal of designing effective multimodal signals for HRI, we present a user
study with 30 participants exploring the use of a combined visual and auditory signal for
requesting help during a human-robot collaborative task.
The previous studies (S1,S2) employed a free-flying robot (AR Parrot drone) and a
non-holonomic, short base (Turtlebot 2). To increase the generalizability of our research,
this study (S3) uses a different type of mobile robot, the iRobot AVA. Unlike the Turtle-
bot 2, this platform is holonomic and about half the height of a human. The robot’s
functionality and embodiment are similar to other service robots (e.g., Savioke Relay).
8.2 Related Work
In Chapter 1, we discussed the numerous challenges that arise when designing nonverbal
signals for robot communication. Employing multiple modalities can exacerbate these
issues; in addition to the challenges that arise when creating each individual signal, re-
searchers must also consider how these signals can be successfully combined.
Prior work in multimodal communication for HRI has primarily explored the combi-
nation of nonverbal signaling modalities with speech (Funakoshi et al., 2008). Much of
this research has attempted to replicate human multimodal communication by utilizing
anthropomorphic nonverbal channels, such as facial expression, gesture, and gaze (Mutlu
et al., 2009a, Salem et al., 2011, 2012, Tojo et al., 2000); fewer works have employed typical
102
non-humanoid modalities, such as light, in combination with speech (Funakoshi et al.,
2008).
A number of works have also explored the combination of anthropomorphic nonver-
bal modalities, with gesture and gaze as the most common pairing. This combination
is often due to the influence of both channels on a human’s attentional focus and their
ability to establish joint attention, an essential component of physical coordination. While
there is limited work in HRI exploring similar pairings between common non-humanoid
signaling modalities, we can draw insight from other areas of human-machine communi-
cation.
Multimodal signals are commonly utilized in product design, from smartphone alerts
to automobile indicators. It is common to pair a visual and auditory signal. The utiliza-
tion of signals that require perception by two different senses can increase the likelihood
that a signal is noticed and decoded correctly.
However, successful multimodal communication requires two or more signals em-
ploying different modalities to be perceived nearly equivalently. Generating these analo-
gous signals is still an ongoing area of research, however. A related problem in robotics
is motion retargetting, which aims to recreate the same motions on different robot plat-
forms with varying degrees-of-freedom and constraints (Gleicher, 1998). Animators also
encounter these issues but have found ways to generate highly expressive motion, ges-
ture, and gaze on non-humanoid characters (e.g., Pixar lamp).
Alternatively, information can be broken up and divided between signals. This con-
cept is similar to the approach of sequential signaling, as discussed in Chapter 4, but
requires even greater cognitive effort, as the human receiver must perceive and decode
two signals simultaneously. While this is often done in human communication, humans
are not as adept at recognizing and reacting to analogous cues on non-humanoid modal-
ities (e.g., light).
Although the signal parameters that exist for one modality are often not present on
another, it may be possible to find pairings between parameters by utilizing similar per-
ceptual phenomenon. For instance, auditory signals possess a tempo, or the rate of move-
ment in the sound. Visual signals can also alter parameters over time to create a similar
speed or rate of change. These changes can also elicit perceptions of movement and which
may provide insight into their pairing and manipulation.
103
This study also examined signaling for a specific and important internal robot
state, needing help. In many cases, the robot’s capabilities and knowledge are not sufficient
for completing the task, either individually or in a collaboration (Rosenthal and Veloso,
2012). Hence, assistance from humans may be critical to the robot’s success, making it
important for a robot to be capable of signaling to co-located humans when it requires
help (Fischer et al., 2014, Hüttenrauch and Eklundh, 2003, Rosenthal and Veloso, 2012).
The collaborative context is unique in that eliciting help is critical to not only the
robot’s success, but the human’s as well. When requesting help from a bystander,
the robot must be careful to provide a clear, polite request that the human can under-
stand (Fischer et al., 2014, Rosenthal and Veloso, 2012). When working on a collaborative
task, however, shared goals and knowledge can enable each party to request help in a
more succinct and efficient manner.
Asking for help can be broken down into three phases: 1) get the responder’s attention,
2) alert the responder that help is needed, and 3) put forth the request (Fischer et al., 2014,
Hüttenrauch and Eklundh, 2003, Saulnier et al., 2011). Since the semantics of the request
are often dictated by the scenario, we focus on only the first two phases. In this scenario,
the robot seeks to convey with its signal that feedback, or a response, is needed from the
human receiver.
As discussed in Chapter 4, robot communication is often in the context of motion.
However, base motion is often not well suited for signaling as it interferes with the robot’s
task and can require significant time to observe the robot’s entire trajectory (Saulnier
et al., 2011). Instead, we employ a multimodal signal in conjunction with the robot’s
approaching motion for requesting help in this study (S3). These multimodal signals
consist of simultaneous visual (similar to light) and auditory signals.
8.3 Experimental Design [S3]
To better identify understand how auditory and visual signals can be combined, we con-
ducted a within-subjects user study in which participants collaborated with the iRobot
AVA platform on a physical human-robot collaborative task. In addition, this study also
aimed to understand how robot signaling behaviors should be altered to account for
104
different human states, an key component of the communication framework presented
in Chapter 3.
8.3.1 Hypotheses
As a first step towards these goals, we created a set of signals to request help using a visual
display to mimic a binary visual signal and an auditory signal. The signal parameters of
both individual signals are manipulated to convey different levels of urgency of the help
request. We predict that the request scenario (i.e., the signal urgency and the responder’s
availability) will affect the collaboration both objectively and subjectively. We also expect
that participants will prefer that the robot balance the annoyance of the signal with the
urgency of the request.
H1: Design of Nonverbal Signals. Non-humanoid robots can effectively utilize a combined
visual and auditory signal to create nonverbal signals that convey urgent and non-urgent requests
for help.
H2: Objective Request Metrics. The request scenario significantly affects the time it takes for
the human collaborator to react and respond to the request.
H3: Subjective Request Metrics. The human collaborator’s perception of the help interaction
is affected by the request scenario.
8.3.2 Collaborative Task
Several constraints were considered when designing a collaborative task to investigate
robot help requests. First, as the focus of the study is on how to ask for help, participants
must be motivated to assist the robot. To ensure this, the participant’s success should
depend on the robot’s ability to perform its part of the collaborative task. Second, to
create consistency, we minimized the robot error by 1) limiting the robot to simple actions
within a confined area and 2) reducing the need for accurate perception. Third, the task
should be realistic. Simulating a real world task creates a more authentic experience that
motivates performance and increases willingness to collaborate.
To satisfy these constraints, we chose a task in a food service scenario: the participant
and robot work together to collect and input meal orders. Since the study took place in
a university research lab, participants were told the orders were from the surrounding
offices on the same floor. In order to remove potential task inconsistencies, the robot
105
Figure 8.1: An diagram of the experimental setup (left) and a participanting interacting with the AVA mobile
base.
and participant are given separate assignments: the robot drives around, collecting meal
orders, and the participant inputs orders dropped off by the robot into a computerized
system.
The robot also periodically requests help from the participant in retrieving a name
or room number from the building directory. The robot is idle only while waiting for
the participant to take a batch of meal orders from it or to answer its help request. The
participant is idle and available when they have no new orders to put in the system.
Figure 8.1 shows a diagram of the experimental setup. The ordering workstation
consists of a single table, chair, and computer located in the middle of a room with
entrances to both the right and left. During the study, the robot enters the room on either
side to perform one of three actions: drop off a batch of meal orders, request help from the
participant, or pass between the exterior rooms.
When dropping off orders or requesting help, the robot moves via a straight line
trajectory (with a slight arc for naturalness) from the entrance to the “interaction area" in
front of the participant’s workstation. After the interaction is complete, the robot leaves
the room using the same trajectories. The robot never moves behind the workstation
to prevent potentially startling the participant. We chose these paths as the distance
to the workstation is short and they are consistent with prior work in robot approach
behaviors (Dautenhahn et al., 2006, Hüttenrauch and Eklundh, 2003).
106
The robot employed in this study was the mobile base of the iRobot Ava (Figure 8.1,
right). The base is holonomic and automatically avoids both static and dynamic obstacles.
Two additions were made to the robot for the experiment: 1) a tablet was mounted on a
pole above the base for interacting with participants during help requests, and 2) a small
pink container was added to the top for holding meal orders.
When requesting help, the robot starts playing the help signal as it enters the room.
The visual portion of the signal was displayed on the tablet and the sound played from
its speakers. The robot always approached the participant with the tablet facing them.
After the robot stops at the workstation, the participant touches the tablet on the robot,
ending the help signal and displaying the robot’s help request. The participant inputs
their response on the tablet and the robot drives away. To drop off meal orders, the robot
stopped in front of the workstation and waited for the participant to retrieve the batch of
meal orders from the pink container.
8.3.3 Procedure
As participants entered the experiment room, they were given a brief overview of the
study and shown the robot. After obtaining informed consent, a pre-study questionnaire
was administered. The experimenter then explained in detail the task that participants
were about to perform with the robot and demonstrated how to use the meal order
system.
Participants were told that the goal of the study was to better understand how humans
and robots can work together. In order to motivate participants to perform the task well,
they were also told they could earn a 50% bonus (to their compensation) based on their
performance during the task and the overall team performance. Individual performance
was defined as the average order input time, average batch input time, and correctness of
orders. As we wanted this task to be representative of real world scenarios that involve
balancing individual needs with those of the team, participants were told individual
performance affected the bonus more than overall task performance. After assisting the
robot, participants rated the help interaction with the robot. At the end of the task, we
administered a post-study survey.
107
8.3.4 Manipulated Variables
We manipulated two variables: the availability of the human responder and the urgency of
the robot’s help request signal.
Responder availability was manipulated to be either busy or available. The responder was
busy if the participant was occupied by inputting a batch of meal orders. If the participant
was waiting for the robot to drop off new orders, they were available. We utilized a human-
in-the-loop to identify availability in order to minimize errors. The experimenter utilized
a live video stream to send the robot into the room at the appropriate times.
Urgency of the help signal was manipulated to be low, medium, or high. Each help
signal consisted of synchronized visual and auditory binary signal. This resulted in a
simple visual signal that flashed between white and shades of red (i.e., blinking pattern)
and an audio signal of a sine wave intercut with silence (beeping sound). Red is typically
associated with errors, making it a natural color choice for signaling for help.
A simple binary signal was chosen for several reasons. First, similar signals are
already commonly used across a variety of domains and scenarios, including medical
alarms, fire alerts, weather alerts, and machine errors (Edworthy, 2013, McNeer et al.,
2007, Sirkka et al., 2014). Second, such signals have not been extensively explored in HRI
or collaborative robotics, making them potentially valuable tools. Lastly, varying other
factors, such as the timing or the shape of the sound wave, would have exponentially
increased the number of factors, making the study intractable.
In comparison to the study described in Chapter 6, this study employed explicit au-
ditory cues which are meant to draw the human’s attention. Since faults and errors often
require a higher level of effort and though from a human, a more explicit cue can better
obtain the human’s attention and is often required for safety (e.g., smoke alarm).
To determine the three variations of help signal, we conducted a pilot study in which
participants were exposed to several auditory only and visual only signals through a
remote survey. These signals varied in “intensity” (i.e., pitch or color) and period. The
number of colors tested were limited to those in the pink to red range different enough to
be distinctly perceived from a distance. Sound pitch choices were sampled from the range
known to be comfortable for human hearing and were selected to be easily differentiated
from one another (Yost, 1994). The signal period (or pattern frequency) test values were
108
1
2
3
4
5
Likert Rating
Period (ms)
Urgency
500 Hz
700 Hz
1000 Hz
2500 Hz
0 1000 2000 3000
1
2
3
4
5
Likert Rating
Period (ms)
Annoyance
500 Hz
700 Hz
1000 Hz
2500 Hz
0 1000 2000 3000
1
2
3
4
5
Likert Rating
Period (ms)
Urgency
dark
medium
light
0 1000 2000 3000 4000
1
2
3
4
5
Likert Rating
Period (ms)
Annoyance
dark
medium
light
0 1000 2000 3000 4000
Figure 8.2: Participant ratings of urgency and annoyance for auditory only signals (top) and visual only
signals (bottom) in the pilot study.
chosen such that change in intensity was perceivable for the smallest period and that the
largest period allowed for several cycles to pass during the robot’s approach.
A total of 15 participants (8 males, 7 females; ages 23-37, M = 27.67, SD = 3.77) were
recruited from the local community. The results of the survey are shown in Figure 8.2.
Participants rated the urgency and annoyance of each of the sound and visual signals on
a 5-point Likert scale.
A repeated measures analysis of variance (ANOVA) on the urgency ratings showed
a significant effect for sound pitch (F(3,297)= 5.09, p = 0.002) and period (F(4,296)=
133.81, p< 0.000), as well as visual signal intensity (F(2,298) = 208.94, p< 0.001) and
period (F(3,297) = 121.05, p< 0.001). Likewise, an ANOVA on the annoyance ratings
also showed a significant effect for sound pitch (F(3,297)= 70.26, p< 0.001) and period
(F(4,296) = 25.66, p< 0.001), as well as visual signal intensity (F(2,298) = 147.67, p<
0.001) and period (F(3,297)= 107.82, p< 0.001).
109
From these results, three levels of sound pitch and signal period were chosen. The low
signal consisted of a very light red color and a 500 Hz sine wave at a period of 3000 ms.
The medium signal consisted of a moderate red color and a 1000 Hz sine wave at a period
of 1500 ms. The high signal consisted of a very dark red color and a 2500 Hz sine wave at
a period of 250 ms. We restricted the minimum period to be 250 ms as survey participants
complained that signals with periods lower than 250 ms were extremely disorienting and
uncomfortable to look at.
8.3.5 Participant Allocation
A total of 30 participants (19 males, 11 females; ages 18-35, M = 25.53, SD =5.42) were
recruited from the local community. Participants rated their familiarity with robots on a
5-point Likert scale on average 2.63 (SD = 1.06).
The experiment used a within-subjects design that enabled participants to compare
the different help request scenarios. Participants were not told that there were different
scenarios in order to avoid biasing their reactions to the different signals. The order of
the conditions was fully counterbalanced to control for ordering effects. To eliminate
the novelty effect of the robot, the robot dropped off orders and passed through the
experiment room several times before requesting help.
8.3.6 Dependent Measures
The dependent measures included both objective and subjective measures for evaluating
the effectiveness of the help requests and participants’ perceptions of the interactions.
The objective measures consisted of reaction time, response time, and gaze duration.
The reaction time is the amount of time that passes from when the robot arrives at the
workstation with a help request until the participant starts moving to respond to the
robot. The response time is the the total amount of time it takes for the participant to
respond to the request, starting from when the robot arrives with the help request un-
til the participant completes their response via the tablet interface. The gaze duration is
the percentage of the robot’s approach, from when it enters the room to when it arrives
at the workstation, that the participant’s gaze is focused on the robot. Each of these
measures was obtained from video data taken at three points in the room, coded by the
experimenters.
110
The subjective measures consisted of 5-point Likert scale ratings of each help interac-
tion for the following measures: the robot’s ability to get the participant’s attention, the ur-
gency of the robot’s request, the participant’s annoyance of the help signal, the overall interaction
quality. Urgency and annoyance ratings were used to better understand how humans per-
ceive the signals in the context of a collaborative task versus in a vacuum, as in the initial
survey. The ability to get people’s attention is also a common metric for alarms and char-
acteristic of notification systems (Fischer et al., 2014, McNeer et al., 2007, Sirkka et al.,
2014). Finally, participants were asked to rate the overall interaction.
After the study, a post-study questionnaire and interview were administered. Both
sets of questions included measures about the task, working with the robot, and the help
signals.
8.4 Analysis
8.4.1 H1- Design of Nonverbal Signals
The first hypothesis investigated whether a simultaneous visual and sound signal is ef-
fective for indicating a need for help and a level of urgency for that request.
As a manipulation check, participants were asked in the post-study survey to de-
scribe the robot’s method for requesting help in detail. 100% of participants noticed that
the robot used different signals for requesting help. Most participants, however, only
mentioned two sets of help signals, urgent and non-urgent. After further probing, we
found that only 8 out of 30 (26.67%) participants noticed the difference between low and
medium urgency signals during the task, indicating that the manipulation between those
signals failed. When participants were shown the signal during the post-study survey,
however, all noticed three distinct signals. This suggests that the perceptual difference
between low and medium levels is not large enough for participants to differentiate be-
tween the two, when concentrating on a different task (Maljkovic and Nakayama, 1994).
This confirms that more research is needed to understand signal perception in more real-
istic conditions (Chapter 6).
After verifying the manipulation between high and low/medium urgency, partici-
pants were asked to interpret the difference between signals. All participants reported
that the different signals represent different levels of urgency. Participants were also
111
0
1
2
3
4
5
Low Medium High
Time (sec)
Reaction Time
Available
Busy
0
2
4
6
8
10
12
14
Low Medium High
Duration (sec)
Response Time
Available
Busy
0%
25%
50%
75%
100%
Low Medium High
Percentage of Approach
Gaze Duration
Available Busy
Reaction Time Response Time Gaze Duration
Availability F(1,179)= 1228.66, p< 0.000 F(1,179)= 251.50, p< 0.000 F(1,179)= 15154.62, p< 0.000
Urgency F(2,178)= 300.74, p< 0.000 F(2,178)= 81.78, p< 0.000 F(2,178)= 183.04, p< 0.000
Avail,Urg F(5,175)= 238.31, p< 0.000 F(5,175)= 46.38, p< 0.000 F(5,175)= 100.77, p< 0.000
Figure 8.3: Objective Collaboration Metrics: means (top) and ANOVA results (bottom) of reaction time,
response time, and gaze duration by availability and urgency
asked whether they used the auditory or visual signal more to gauge the urgency of the
request. 60% of participants responded that the visual signal, especially its color, was the
best indicator of urgency. Only 16.67% of participants chose sound as the better indicator,
while 23.3% chose both. Most participants stated the sound got their attention and alerted
them to the robot needing help. They could then glance over to the screen and see what
color was flashing and determine the urgency of its request.
8.4.2 H2- Objective Collaboration Metrics
The second hypotheses examined whether the request scenario affects the time it takes
for participants to react and respond to the help request. Repeated measures ANOVAs
on participants’ reaction time, response time, and gaze duration demonstrated a signifi-
cant effect for both responder availability and signal urgency, confirming this hypothesis.
Furthermore, an interaction effect between availability and urgency was also seen for all
three measures. The results of these ANOVAs can be seen in Figure 8.3, bottom.
Figure 8.3, left shows a graph of the mean reaction time by condition. As expected,
participants reacted quicker when they were available with no meal orders to input. Fig-
ure 8.3, middle shows a similar result: participants not only started to move towards the
robot faster, but actually completed the help prompt more quickly when they were busy
112
inputting orders. The post-study survey also supported these findings, as several par-
ticipants commented that they attempted to help the robot more quickly when they had
remaining meal orders to input.
We also analyzed participants’ gaze behavior and found that when busy, participants
rarely glanced at the robot (Figure 8.3, right), whereas when available, their gaze rarely
left the robot. In the post-study survey, participants commented that in the available
state, they wished the robot moved more quickly as they became impatient waiting for it
to arrive at the workstation.
As shown in Figure 8.3 bottom, the urgency of the signal also significantly affected
participants’ response. Participants reacted and responded more quickly when the robot
used a higher urgency signal. Since most participants grouped the medium and low
urgency signals together as not urgent, their reaction times were closer together. The
response times showed a similar but less pronounced trend, as it took a certain amount
of time to go through the robot’s help prompt.
In the post-study survey, participants were asked whether they treated the various
help signals differently. 90% of participants said they treated the high urgency signal dif-
ferently than the other signals. Participants immediately responded to the high urgency
signal but waited until finishing at least the current order for the low or medium urgency
signals.
Although most participants reported responding to the low and medium urgency
signals the same way, we found a significant difference in their reaction and response
times. This stands in contrast to the higher urgency signal in which many participants
stated they consciously decided to help the robot immediately, as it seemed better for the
task. This suggests that even in this simplified scenario, the factors affecting participants’
decisions are not well understood.
Participants also gazed at the robot more, as shown in Figure 8.3 right, when a higher
urgency signal was present. Interestingly, in the busy state, participants glanced more
often at the robot when it used the high urgency signal (29%) than the lower urgency
signals (5%,8%). In the post-study survey, participants commented that they kept looking
to see how far away the robot was to prepare for its arrival.
113
0
1
2
3
4
5
Low Medium High
Likert Rating
Attention
Available
Busy
0
1
2
3
4
5
Low Medium High
Likert Rating
Urgency
Available
Busy
Attention Urgency
Availability F(1,179)= 11.93, p= 0.001 F(1,179)= 4.58, p= 0.034
Urgency F(2,178)= 259.24, p< 0.000 F(2,178)= 424.38, p< 0.000
Availability,Urgency F(2,178)= 0.02, p= 0.981 F(2,178)= 0.18, p= 0.833
Figure 8.4: Subjective Collaboration Metrics: means (top) and ANOVA results (bottom) of participants’
ratings of attention and urgency
8.4.3 H3- Subjective Interaction Metrics
The last hypothesis examined how participants perceive the help interaction based on
the request scenario. Participants were asked to rate each help interaction for the robot’s
ability to get their attention, the urgency of the request, the annoyance of the signal, and
the quality of the interaction on a 5-point Likert scale. The results are shown in Figure 8.4
and Figure 8.5.
We performed repeated measures ANOVAs on the ratings and found that request
urgency significantly affected all four measures (Figure 8.4 and Figure 8.5, bottom)). Re-
sponder availability only had a significant effect on ratings of attention, urgency, and an-
noyance. No significant effect was found for the interaction quality. An interaction effect
between availability and urgency was only found in the ratings for interaction quality.
In the post-study survey, all participants rated the signals used by the robot as ac-
ceptable. However, most participants also suggested changes to the non-urgent signals
to make them more discrete and less obtrusive. Participants were most satisfied with the
high urgency signal and commented that it made them “react immediately” and would
work well in a collaborative task, particularly more urgent ones. Only 3 participants
mentioned being bothered that the robot interrupted them while they were busy.
On average, participants rated the collaboration experience with the robot as 4.17
(SD= 0.52) and the robot as a partner as 4.17 (SD= 0.70) on a 5-point Likert scale. How-
ever, participants’ comments indicated that they believed they would eventually grow
114
0
1
2
3
4
5
Low Medium High
Likert Rating
Annoyance
Available
Busy
0
1
2
3
4
5
Low Medium High
Likert Rating
Interaction
Available
Busy
Annoyance Interaction
Availability F(1,179)= 10.08, p= 0.002 F(1,179)= 0.21, p= 0.648
Urgency F(2,178)= 233.85, p< 0.000 F(2,178)= 148.63, p< 0.000
Availability,Urgency F(2,178)= 0.21, p= 0.814 F(2,178)= 148.63, p= 0.014
Figure 8.5: Subjective Collaboration Metrics: means (top) and ANOVA results (bottom) of participants’
ratings of annoyance, and interaction
weary of the robot, if it were not able to modify its behavior to act more intelligently over
time (e.g., learn to not bother users, learn from its previous help requests).
8.5 Discussion
The goal of this chapter is to understand how non-humanoid robots can employ more
complex communication through multimodal signaling for HRI. Few works in HRI have
explored multimodal signal design for non-anthropomorphic modalities. We conducted
a user study, in the context of a human-robot collaboration task, that investigated the
design of a multimodal visual and auditory signal to request help from a human. Multi-
modal signals can be a valuable tool for robots to communicate with, particularly as the
modalities utilized by non-humanoids are more abstract.
This study (S3) confirmed that a simple auditory signal and visual signal can be ef-
fectively combined to request assistance from a human observer, with different levels of
urgency (H1). While auditory signals are frequently modulated to convey urgency, there
is less research in this area for light and similar visual signals. By pairing complementary
signal parameters and modulating these parameters together, we were able to successfully
create a multimodal signal. This study also confirmed that signals that have conflicting
modulations within these complementary parameters (e.g., beep frequency and visual
115
blinking frequency) can cause confusion, further suggesting that these parameters may
be naturally associated.
Results also showed that participants not only reacted faster to a more urgent signal
but responded to the request for help quicker as well, confirming our second hypothesis
(H2). This suggests that the designed help signal is an effective medium for conveying
the robot’s needs to the human collaborator.
A manipulation check also showed that participants only consciously perceived two
levels of urgency during the task. Their reaction time, response time, and gaze duration,
however, show a clear and significant difference in how they treat each of the three signals.
Further complicating these results are many participants’ comments indicating that they
actively chose to treat urgent and non-urgent signals differently due to the nature of the
collaborative tasks. This suggests that participants’ reactions to the signals are governed
by both conscious and subconscious cognitive processes. Hence, further exploration into
how humans react to different perceptual phenomena when interacting with a robot can
yield essential knowledge for designing robot behavior.
A main goal of this study was to better understand how human users want a robot to
behave during collaboration. While we initially assumed participants would be somewhat
tolerant of the high urgency signal due to its necessity in critical situation, we were sur-
prised to find that participants were not only tolerant but thought the signal was highly
appropriate, despite its annoyance. They found the high urgency signal to be "great at
getting their attention" and "good, when used properly." Only two participants said they
would change anything about the high urgency signal.
As expected, the low and medium urgency signals were less liked, despite being rated
as less annoying than the high urgency signal. While these findings support our notion
that the human collaborator’s perception of the interaction is affect by the request scenario
(H3), we still are limited in our understanding of how these and other factors of the signal
will affect the interaction.
These findings suggest that although the design of the help-seeking multimodal signal
was effective at altering human collaborators’ responses, it is important to consider the
robot’s request scenario when planning its communication behavior. This supports our
inclusion of this information as a parameter of the proposed communication framework
(Chapter 3).
116
Participants dissatisfaction with the simplicity of the multimodal signals in S3 also
suggest that HRI researchers must consider long term effects when designing robot be-
havior. The ability to adapt and evolve its behavior over time is a critical component for
creating an intelligent robotic agent and an eventual goal of our communication frame-
work.
8.6 Summary
In this chapter, we explored the design of multimodal signals for HRI. Since human com-
munication often utilizes several communication channels simultaneously, it is important
for the robot to be able to effectively combine signals for interaction. Multimodal signals
can increase the effectiveness, saliency, and capacity of a communication, as well as over-
come environmental interference. The work presented in this chapter is a first approach
towards understanding how to combine non-anthropomorphic communication channels,
such as light and sound. This study (S3) directly informs the model of our communi-
cation framework as it requires a set of nonverbal signals to comprise its action space.
Our findings also provide important insights for balancing human and robot preferences
during interaction. In the next chapter, we describe an applications of the communication
framework proposed Chapter 3.
117
9. Application of the Communication Frame-
work
This chapter presents an application of our proposed communication framework (Chap-
ter 3) to a human-robot collaborative task. We implement the model described in Sec-
tion 3.1 with the goal of validating its usage for human-robot communication. We employ
a simulated human-robot interaction (HRI) platform to learn policies for a robot’s com-
munication behaviors from interaction with human users. Finally, we present the results
of this experiment and discuss their implications on HRI research.
9.1 Introduction
The primary goal of this thesis work is enabling robots to intelligently communicate with
humans using nonverbal signals. In Chapter 3, we proposed a computational framework
that formulates robot communication as a decision-making problem. The robot must
choose when and how to communicate with humans while dealing with the inherent
uncertainties of the world.
A key assumption is this framework is that humans will not always focus their at-
tention on or be willing to interact with a robot. These instances can occur purposefully
if the human is busy or annoyed with the robot’s interruption, or inadvertently due to
interference in the signal’s transmission or the human’s unavailability. To overcome these
challenges, we incorporate models of the human, robot, and environment into ouf for-
malism for planning the robot’s communication.
This information is encoded in the state and action space of the framework’s model
(Section 3.1). This thesis described several user studies (S1-S3) and applications (E1-E3)
exploring the design space of nonverbal robot signals to inform the action and state spaces
118
of this model. The findings of these studies also offered insight into how humans believe
robots should behave during communication, an important component of the model’s
reward function.
In the initial application (E3) of the communication framework (Chapter 7), we found
shortcomings in the implementation of the model, particularly in its reward function
(Section 7.2). The implementation also employed a reduced model that consisted of a
set of one-state Markov Decision Processes (MDPs) with stationary reward distributions.
This assumption was made because only one type of information (the robot’s location)
was communicated during the experiment.
To address these issues, we apply the communication framework to a simulated
human-robot collaborative task. It takes considerable time and effort for researchers to
enable a human and a robot to interact in the real world. Many of the potential techniques
for solving the framework’s MDP model also requires a high number of interactions. The
use of a simulation decreases the overhead for running experiments, enabling the simu-
lated robot and real human users to interact more frequently.
In this chapter, we describe an implementation of the communication framework and
its validation with human users. The simulated robot continuously learns more optimal
policies for its nonverbal communication using model-free reinforcement learning. We
present the results of this experiment and discuss their implications on HRI research.
9.2 Related Work
Mediating the flow of communication has been studied extensively in human-computer
interaction (HCI) for building systems that provide notifications or alerts. Such systems
often attempt to transmit notifications in a manner that provides the most value to hu-
mans. This cost based approach requires the system to probabilistically infer the human
user’s state and use this information to determine whether a communication from the
system will be disruptive or beneficial.
While several works discuss this cost-based approach at a high level, only a few ex-
plore in depth how a utility function can be constructed to determine the gain or cost of a
communication. Instead, more focus is often given to the research challenges associated
119
with estimating a human’s interruptibility or designing notifications with varying levels
of saliency or attention from the human user.
Methods for estimating a human’s interruptibility enable systems to weigh the hu-
man’s current state when deciding whether to transmit a communication by assigning a
higher cost to communication actions when the human is less interruptible. Interruptibil-
ity can be defined by several factors relating to a human’s current task and engagement.
Availability, or whether the human is already engaged in a task, is one of the easiest
metrics to employ. However, this approach ignores the variance in potential tasks, which
range in their physical and cognitive effort and makes certain tasks more sensitive to
interruption.
The temporal dynamics of a human’s current task also impacts interruptibility. Certain
tasks that are more time sensitive are worse for interruptions since there can be critical
delays that cause negative impacts to the person or task. Prior work that has explored
which moments of a task are best suited for interruptions have found that certain break-
points enable easier resumption of a task at a later point Adamczyk and Bailey (2004),
Borst et al. (2015), Cutrell et al. (2001). As a result, models for predicting task breakpoints
have been well researched Iqbal and Bailey (2008). The content of the interruption has
also been shown to be important as irrelevant communications taking longer to process
and cause humans greater annoyance Cutrell et al. (2000).
Another challenge is that humans are able to multitask, either performing several
tasks simultaneously or quickly switching between tasks. To enable consideration of
these multiple factors, some systems attempt to reason about humans’ attentional state
using Bayesian inference Horvitz et al. (1999). Different costs can then be assigned to
interruptions in each attentional state. This method offers flexibility in that the attentional
state can encapsulate different tasks, breakpoints, and other variables related to human
interruptibility. The primary drawback is that it requires potential attentional states to be
known in advance and assigned a cost, which may be challenging for robots that operate
in unstructured settings. Noisy observations, such as a robot’s sensor measurements, can
also lead to more inaccurate state estimations.
Since humans can be in different states of attentional awareness, communication ac-
tions need to also take on varying levels of saliency, urgency, and other properties Bliss
et al. (1995), McCrickard and Chewar (2003), McCrickard et al. (2003a), Pousman and
120
Stasko (2006). These properties can also be used to express information about the sys-
tem’s desired response. There has already been significant research on perceived urgency,
or how immediately action is required by the human. Higher saliency communication sig-
nals are often used for more urgent situations since they elicit more attention and a faster
response from human users.
Similar to when the human is in a more involved task state, communication actions
that draw greater attention from the human should result in higher costs. The cost of
a communication has two components: the cost of the human attending to the commu-
nication signal and the cost of the human’s subsequent response Horvitz et al. (1999).
Previously, these costs have been measured by the time each takes away from the hu-
man’s main task Horvitz et al. (1999). Thus, if a human ignores the communication, only
the first component exists. These costs can then be weighted by those associated with the
human’s state.
Few works in HRI have employed similar techniques for mediating communication.
Much of the past research has focused on generating effective communication actions
given a robot’s unique embodiment Cha et al. (2018). A select number of these works
investigated communication signals specifically for minimizing interruptions Rousseau
et al. (2013), Saulnier et al. (2011). The ability to effectively communicate under uncer-
tainty is important for robots who collaborate with or require assistance from humans Cha
and Matari´ c (2016), Rosenthal and Veloso (2011). However, there is very limited work in
HRI employing the models and findings from notifications and interruptibility research
in HCI Trafton et al. (2012).
In this work, our goal is to enable robots to probabilistically reason about communi-
cation actions to act as a more thoughtful interactor. We build off past work in HCI by
addressing the research problem of designing an effective utility function for deciding
when and how a robot should communicate. We employ only nonverbal signals in this
work as they are an important part of communication and facilitate fluid, effortless coordi-
nation. Since robots are embodied agents that humans attribute more complex properties
to, we also aim to further investigate how past findings in mediating communication are
affected by the use of a robot.
121
9.3 Model and Implementation
In this section, we describe an application of the communication framework, including
its model and its implementation for a simulated human-robot collaborative task. We
employ small, discrete state and action spaces to reduce the amount of data needed to
solve the MDP model.
9.3.1 State Space
In the previous application of the communication framework (E3), only information about
the environment was included in the state space of the MDP model (Section 7.2). Con-
versely, this application (E4) focuses on the other proposed components of the model’s
state space since it takes place in a simulated environment.
Robot State: The first component of the model’s state space (s
R
) involves information
relating to the robot. We can separate s
R
into information regarding the robot and infor-
mation regarding its communication. We use the following variables to describe s
R
:
Urgency- Urgency is defined as requiring immediate attention and is related to
criticality, priority, and safety. The robot’s communication can convey three levels
of urgency: low, medium, and high. Prior work, including our study in multimodal
signaling (S3), has found that urgency is an important property that affects how a
human will react to a communication. This property is also critical for the robot’s
operations as high urgency signals can be used to indicate dangerous situations for
either the robot or co-located humans.
Duration- The robot’s communication and the human’s subsequent response have
an expected duration of either short or long. Prior research in notification systems
employed time (for the signal and the human’s response) as an important factor for
deciding when to communicate (Horvitz et al., 1999). If significant time is required
for the human to address a communication, the robot can choose to convey its
request when the human has greater availability.
Effort- The expected effort that the robot’s communication requires of the human to
resolve is either low or high. If the robot interrupts a human who is already engaged
122
in a task, the human’s ability to return to this task depends on the time and effort
involved in resolving the communication.
Operational- The final variable describes the robot’s operational state, or whether
it is capable of performing tasks and communicating. In this implementation, the
robot can either be operational or not operational.
Since the human’s state is difficult to directly observe and may deviate from models of
expected behavior, sthe duration and effort variables are defined as the expected value if
the human behaves as the robot intends.
Human State: The second component of the model’s state space (s
H
) involves information
relating to the human. In this implementation, this information is encoded into one
variable, the human’s receptiveness to communication, which can be low or high. We
assume that a human’s receptiveness is not dependent on only their current state, but
their past history of interactions with a robot.
Prior work often treats interruptions as independent events; however, humans likely
take into account past interactions for new interruptions. If the robot has interacted with
the human recently or many times in the past, we predict the human’s receptiveness to
additional interruptions by the robot will decrease. However, as time passes, the effects
of these interactions on the human’s receptiveness are likely to lessen. This assumption
mimics cognitive models of task interruption and its effects on human memory and goals.
Therefore, we define receptiveness a summation of the human and robot’s previous com-
munications, as shown below.
h
r
=
n
å
i=1
e
t nt
i
5
a
i
(9.1)
We calculate the value of each communication as the product of its expected duration
and the human’s availability at that time. Availability is determined by the moment
of the robot’s communication. If the human is between tasks or at a breakpoint in the
task sequence, they are more available than if they are in the middle of an activity. The
receptiveness value is thresholded to determine its discrete category.
We eliminated certain states that rarely occur in the world (e.g., high urgency and low
effort scenarios) and were left with 16 starting states. In addition to these states, there is
a state for the human and robot are interacting, a negative state when the robot does not
123
receive a response and becomes inoperational, and a terminal state when the interaction
ends.
Figure 9.1: Interactions between the human and robot are treated as episodes as proposed in Chapter 3.
9.3.2 Action Space
The action space of the model consists of four communication actions: no action, low
salience signal, medium salience signal, and high salience signal. We constrained the set of
available nonverbal signals to just a small number of signals derived from the findings of
our previous studies in signal design (S1-S3).
The signals are composed of a blinking light on the robot and a simultaneous beeping
sound, similar to the signals used in Chapter 8. The main difference is we employed three
different light colors (green, yellow, red) from the findings of Chapter 4. As the saliency
of the signal increases, the frequency of the beeping and blinking increase. The robot can
also choose not to use any communication signal as a case of deferring communication
action (Horvitz and Apacible, 2003).
9.3.3 Reward Function
In Chapter 3, we proposed a reward function with two terms, r
R
to describe the robot’s
needs and r
H
to describe the human’s needs. In the previous application (Chapter 6), we
only used r
R
because the robot communicated just one component of its state (location).
However, in this application, we include both terms, as shown below.
r= r
R
+ r
H
(9.2)
124
We defined r
R
and r
H
as the sum of the explicit (task) and implicit (information gain)
rewards that each agents gains. In this application, we combine these terms and only
consider the expected value of these rewards, as defined in notification systems literature.
Action Response r
R
r
H
None N/A (c
1
(b
u
i
1)+ inop) 0
Signal
No (c
2
b
u
i
+ inop)
a
I
v
i
b
u
i
Yes (c
3
b
u
i
) inop
c
2
(b
d
i
b
e
i
)+a
i
v
i
b
u
i
r
R
is composed of two terms. One term rewards the robot if the human responds to
the robot’s communication action. A higher reward is given for more urgent commu-
nication scenarios. The second term is a cost related to the robot failing to resolve its
communication scenario if the human does not respond to its communication action. An
additional cost term is included if the robot passes through a negative state that causes it
to become inoperational.
When the robot takes no communication action, there is no direct effect on the human
and therefore, r
H
is 0. There is also no reward term in r
R
since the human cannot respond
to a request they have not been told about.
9.3.4 Learning Algorithm
In this work, we employ epsilon-soft on-policy Monte Carlo control, a model-free rein-
forcement learning algorithm that switches off between exploiting the current optimal
action and exploring other actions. Since interactions between the human and robot are
treated as episodic, we use the robot’s actual rewards, reducing the bias of its learned
values.
Typically, the biggest challenge when employing Monte Carlo methods is the high
number of samples needed for the algorithm to converge (i.e., variance). Since we em-
ploy a small, discrete space in a simulated environment, exploration of all of the states
several times is possible. Our goal is for the robot to behave more intelligently than when
employing static policies so we also value the improvements in the robot’s policy even
before the algorithm converges.
125
Figure 9.2: We employed a Monte Carlo control method switching off between exploiting the maximum
value action and exploring other sub-optimal actions.
e= min
0.2,max
1,(0.01 n)
1
(9.3)
e denotes the exploration rate and depends on the number of interactions, as shown
in Figure 9.2. The values of e were bounded between 0.2 and 1.
9.4 Experimental Design [E3]
We applied the model above to learn better communication policies for the robot during
a loosely collaborative task with human users. The robot continues to learn new policies
while interacting with real human users. In this section, we describe the design of the
simulated environment and the details of the experiment.
9.4.1 Collaborative Task
The simulated human-robot collaborative task takes places in a grid world environment
resembling a farm, as shown in Figure 9.3. The participant has a human avatar which they
move around the environment to complete a variety of tasks (shown on the bottom left)
to earn money. The robot acts as the human’s teammate, taking care of the farm’s fields
and crops as the participant completes other tasks related to the farm (e.g., harvesting
wheat). Between tasks, the participant can earn additional money by gathering items
found around the farm.
126
Figure 9.3: The grid world environment used for the simulated human-robot collaborative task.
Periodically, the robot alerts the participant via a nonverbal signal that it wants to
interact with them. If the participant chooses to engage with the robot, they move the
avatar close to the robot’s location and press a button to trigger the robot to communicate
its message. The urgency of the information in the message is varied. At the lowest level,
the robot provides helpful information for earning money. As the urgency increases, the
robot’s message becomes more important for the human and robot.
The presented tasks vary in their complexity and time requirements, enabling us to
manipulate the human’s state (s
H
). For instance, certain tasks negatively affect the hu-
man’s score if they are not completed within a time limit. In this scenario, the human has
to choose whether to ignore the robot or lose some of their earned money. Participants
were told that the performance of the farm as a whole would also affect their final score.
9.4.2 Pilot
First, we ran a pilot with 12 participants from the USC community to interact with the
robot in the simulation environment. The simulation ran between 30 and 35 minutes
for each participants. Participants were first given thorough instructions about the game
before starting. The pilot enabled us to test the simulation environment and tune the
parameters of the algorithm, including the exploration rate.
127
0
0.2
0.4
0.6
0.8
1
1 11 21 31 41 51 61 71 81 91 101 111 121 131
ε
Number of Trials
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
1 11 21 31 41 51 61 71 81 91 101 111 121 131
Average Reward
Number of Trials
Figure 9.4: The results of the pilot experiment. As the exploration rate e decreases (left), the average return
(right) increases.
From the results of the pilot, we looked at the average reward over time. Initially, while
the exploration rate was high, the average reward varied drastically between across trials.
As the exploration rate decreased, the average reward continued to steadily increase.
We also looked at the maximum action for each state. We found that in states with low
urgency and low receptivity, the general maximum value action was no communication.
This is consistent with our expectation that lower saliency signals should be employed for
less urgent information. Since the information the robot provides has little direct value,
it should only be communicated when the human is receptive.
Consistent with the results from Chapter 8, we found the highest saliency communica-
tion action often yielded the most reward for high urgency states. Since this information
is often critical for the success of the collaborative task, it should always be communicated
regardless of receptivity.
9.4.3 Participants
For the experiment, we recruited 44 participants from the local community. Participants
were aged 18-29 years old and compensated $25 for completing the study.
9.4.4 Procedure
Participants were first consented to the experiment. They were then given written instruc-
tions about the simulation. Before starting, the experimenter reviewed the instructions.
The simulation ran for approximately 35 minutes, during which participants were al-
lowed to ask questions to the experimenter. Post surveys were administered afterwards.
128
0
0.2
0.4
0.6
0.8
1
1 51 101 151 201 251 301 351 401 451 501 551
ε
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1 51 101 151 201 251 301 351 401 451 501 551
Average Return
Figure 9.5: The results of running the Monte Carlo algorithm in the simulated human-robot collaborative
task. As the exploration rate e decreases (left), the average return (right) increases until convergence after
approximately 500 interactions.
9.5 Analysis
The results of running the Monte Carlo control algorithm are shown in Figure Figure 9.5.
The exploration rate e starts at 1 so that all actions are chosen with equal probability for
the first 100 interactions. As e decreases, the average return increases until converging
after approximately 500 interactions.
We also looked at the maximum value actions a
for each starting state s
i
after running
the algorithm. For the low urgency robot states, the results showed that it was better for
the robot to not use any communication action unless the human’s receptiveness was
high and the interaction was short and required little effort from the human.
For medium urgency robot states, the maximum value action alternated between the
medium and high saliency actions, based on the human’s receptiveness. Since receptive-
ness primarily represents the human’s current availability, a higher saliency communica-
tion was only appropriate when the human was more receptive.
For the high urgency robot states, we found that the highest saliency action yielded
the greatest reward. This is consistent with results from our past work Cha and Matari´ c
(2016), which showed that humans were more willing to be disrupted with higher saliency
signals if they perceived the urgency of the request to be higher.
At the end of the experiment, we also asked participants to rate the robot’s thoughtful-
ness in communicating on a 5-point Likert scale. The results (Figure 9.6) show that as the
robot’s rate of exploration decreases, participants’ ratings of the robot’s thoughtfulness
129
increase. This suggests that the policy the robot learns is better at balancing the human’s
needs and preferences than acting randomly.
1
2
3
4
5
1 5 9 13 17 21 25 29 33 37 41
Ratings of Robot as oughtful
Average Rating
Participant Rating
Figure 9.6: Participants were also asked to rate the robot’s thoughtfulness in communicating on a 5-point
Likert scale.
9.6 Discussion
This chapter presents an application of the communication framework proposed in Chap-
ter 3, within the context of a simulated human-robot collaborative task. We also in-
corporated the findings of our previous user studies in nonverbal signal design (Chap-
ter 4, Chapter 6, Chapter 8) into the action space of the framework’s MDP model.
In the experiment, we had participants control a human avatar in a simulated farm
environment. The human and robot acted as a loosely-collaborative team performing
tasks on the farm. We employed three different multimodal nonverbal signals of varying
saliency. We varied the human’s and robot’s state through the tasks they were instructed
to complete. The robot would periodically alert the human who could choose to then
interact or ignore the robot’s communication.
We employed a Monte Carlo control algorithm in which we continuously alternate
between policy evaluation and policy improvement. Since it was challenging to com-
pletely control the human’s state, we employed an epsilon greedy algorithm for policy
improvement. We started with a high rate of exploration and gradually decreased it as
the number of trials (i.e., interactions) increased.
130
The results show that as the exploration rate decreases, the average reward continues
to increase. This suggests that robot is able to continuously improve its policy by taking
the maximum value action. We also found that the maximum value actions learned by
the robot match our previous results, particularly the findings of Chapter 8). This work
serves as a first validation of the communication framework and its potential for HRI.
9.7 Summary
In this chapter, we presented an application of our proposed communication framework.
The framework was applied to the problem of human-robot collaboration in a simulated
environment; real human users interacted with a simulated robotic agent performing
tasks in a “farm” setting. The findings of this experiment indicated that the robot is able
to successfully learn a policy for communication that balances the needs of both agents.
In the next chapter, we describe a hardware solution to support nonverbal signaling for
non-humanoid robot platforms.
131
10. Supporting Nonverbal Signaling through
Hardware Solutions
The focus of this chapter is the design of hardware solutions to support nonverbal com-
munication for non-humanoid robots. In the previous chapters, we described research
exploring the design and application of nonverbal robot signals. While it is important to
understand how to design effective nonverbal communication, researchers must also ad-
dress how these signals can be employed by different types of non-humanoids platforms
with a range of embodiments.
The work presented in this thesis has primarily utilized light and sound as signaling
modalities, because they are low in cost, can have high salience, and are often employed
in human-machine interaction. Enabling non-humanoid robots to utilize sound is rela-
tively easy as it typically only requires the addition of a speaker. There are often more
challenges to enabling non-humanoid robots to effectively employ visual signals, such as
light, because they require additional modifications and are dependent on visibility of the
robot and its signaling hardware. For instance, a binary indicator light is only visible at
certain positions and angles; if the human is located at the opposite side of the light, the
light is completely obscured.
The wide range of existing non-humanoid robot shapes and sizes also makes it chal-
lenging for researchers to quickly prototype and evaluate different light configurations
and signal designs. This is a large barrier to creating generalizable, standard signals
which can be employed more broadly in human-robot interaction (HRI). To overcome
theses issue, we propose the use of a signaling platform to support visual signaling for
non-humanoid robots.
Towards this goal, we present the design of ModLight, a modular research tool con-
sisting of a set of low cost light blocks that can be easily reconfigured to fit a myriad
132
of robots and applications. ModLight also provides researchers, designers, and students
with software tools that enable them to visually design new signals and easily integrate
them into existing systems.
10.1 Introduction
Recent work has shown the potential for lights to act as simple, yet expressive signaling
mechanisms for a variety of robots and applications (Baraka et al., 2015, Rea et al., 2012,
Szafir et al., 2015). The design and success of the signals utilized in these works are often
dependent on the shape, configuration, and placement of the lights on the robot. Since
the construction and integration of a light signaling system requires significant time and
effort, it is difficult to try out many potential designs.
Moreover, as systems are typically designed and constructed for a particular robot,
it can be burdensome to validate signaling behaviors across multiple platforms. Hence,
there is a need for research tools that enable roboticists to more quickly and efficiently
explore the design and use of light behaviors for HRI (Löcken et al., 2014).
We aim to address this need through ModLight, a modular low-cost research tool that
provides light signaling capabilities and can easily reconfigure to a variety of shapes and
sizes. ModLight is self-contained and allows users to more quickly explore the design
space of light signals. Since it is controlled and powered externally, it is easy to integrate
onto different robots which facilitates cross-platform validation.
ModLight provides a software library and visual programming interface that enables
users of varying technical backgrounds to quickly and easily create complex light behav-
iors. In addition, users can create their own new libraries building off our original code.
This approach also promotes collaborative design, a method which has been shown to be
beneficial when designing light behaviors (Löcken et al., 2014).
10.2 Related Work
For many years, electronic devices have incorporated lights for communication pur-
poses (Harrison et al., 2012). Lights provide a simple, yet reliable communication channel
and are often used to reveal information about a device’s state. More recently, HRI re-
searchers have utilized light as a communication channel to indicate information about a
133
robot’s internal state and operations. Their efficiency, low-cost, small size, and durabil-
ity, also make LEDs well suited to the wide range of environments and applications that
non-humanoid robots operate under.
Although many individual studies have shown the promise of light-based communi-
cation, more work is needed to completely explore its design space. This is an especially
complex issue due to the bandwidth of light signaling systems. While a light source can
be used as a simple on/off signal to communicate a binary state, the use of different col-
ors and lights can greatly expand the expressiveness and information capacity of a single
light source. As multiple lights are combined together in different configurations, their
capacity grows exponentially. The placement of lights may also affect how humans per-
ceive light signals. Hence, ModLight is designed to enable users to more easily explore
these research questions in the context of different robot platforms and applications.
Tools that enable researchers to replicate HRI behaviors across different platforms
also facilitate the design and creation of more generalizable signals. Since designers
currently create specialized signals for each platform, the process of developing new
robots can be costly, require a large amount of work, and result in inconsistencies between
different types and models of robot. This research seeks to facilitate the standardization of
nonverbal signals by enabling many types of non-humanoid robots to utilize equivalent
signals through flexible signaling platforms.
To support this goal, we suggest the use of modularized light blocks which can be
reconfigured to form different shapes and sizes. Modularity has often been utilized in
building robotic platforms due to the versatility, robustness, and cost of the approach (Yim
et al., 2002). The ability to reconfigure the modules enables the same robot to perform
many different tasks. Similarly the use of light modules allows the platform to fit a wide
range of scenarios and physical forms.
Augmenting robots with light signaling capabilities has typically been an ad-hoc pro-
cess. A limited set of works (Jacobsson et al., 2008, Rea et al., 2012) have explored the
physical design of light signal platforms on robots. However, many works in HCI have
investigated the design and creation of displays integrated with lights for a number of dif-
ferent applications, such as notifications, directions, and exercise encouragement (Chang
et al., 2001, Fortmann et al., 2013, Hansson and Ljungstrand, 2000, Matviienko et al., 2016).
134
10.3 Design Guidelines
Creating a light signaling platform requires careful consideration of both a robot’s goals
and how it will interact with humans. In addition to the physical construction, the soft-
ware must provide a wide range of signaling functionalities and be able to be integrated
into the robot’s current architecture. Since the use of lights as an expressive communica-
tion medium for HRI is still an open research area, the ability to quickly construct and
test different designs provides researchers with greater flexibility and efficiency.
To guide the design of this system, we derived three sets of requirements. User require-
ments identify the general needs that must be met for the system to be a viable research
tool from the perspective of targeted users. Usage requirements take into account the func-
tionalities that are required to meet the signaling needs of a wide range of robots. Lastly,
we use these requirements to generate system requirements, or general design criteria for
the mechanical, electrical, and software components of the system
10.3.1 User Requirements
We identified three potential users: researchers, designers, and students or hobbyists. As
the targeted users vary greatly in technical ability and available resources, it is important
that the system be flexible, easy to use, and simple to manufacture. With these goals in
mind, we generated the following requirements:
R1: The system should be adaptable in both its size and shape to fit a wide range of
robot platform.
R2: The system should not interfere with the main functionality of the robot in any way.
R3: The system should be self-contained and not require significant modification of the
robot platform to install.
R4: The system should be low-cost and affordable for all potential users.
R5: The system should utilize easy to obtain components that can be recycled for other
applications.
R6: The system should utilize fabrication methods that are widely available and low-
cost.
R7: The system should be easy to program and integrate into existing robot software.
R8: The system should be quickly and easily reconfigurable.
135
Ground
Vehicles
Industrial
Robots
Service
Robots
Flying
Robots
Telepresence
Transportation Service Communication Surveillance Manipulation Domestic
Condition Knowledge Activity
Faults/Errors Operating Mode Task Map
Figure 10.1: Non-humanoid robot use cases, applications, and signaling requirements.
Since the system targets a variety of users, these requirements are broad but provide the
inspiration for more specific hardware and software criteria which we discuss below.
10.3.2 Signaling Requirements
We envisioned the system to be used primarily for communication in HRI and there-
fore, took into account the signaling needs of many current robot platforms and appli-
cations. Although light signals can be beneficial for many robots, this work focuses on
non-humanoid robots with limited communication modalities. This section also aims to
motivate the need for greater research into nonverbal signaling and in particular, the need
for more signaling hardware.
In Figure 10.1, a broad overview of non-humanoid robot types, as well as their pri-
mary uses and signaling needs, is presented. Five types of robot are considered: ground
vehicles, flying robots, industrial robots, service robots, and telepresence robots. In deter-
mining their signaling needs, we consider the operating environment, usage, who will be
co-located with the robot, and scenarios that may require communication with co-located
humans.
We identified three broad categories of information that are important to express by
different non-humanoid robots. The features of these categories will only be briefly
touched upon to illustrate the signaling needs for this platform. A more detailed dis-
cussion about information in a robot’s internal state and its signaling needs are provided
in (Cha et al., 2017).
136
The first signaling category relates to the robot’s condition. A robot’s condition encom-
passes a wide set of characteristics relating to its general operation, including its health,
control mode, and faults. For safety and successful coexistence, it is vital for a robot co-
located with humans to effectively communicate its condition to enable both collaborators
and bystanders to assess whether the robot is safely operating and whether it can benefit
from human intervention (Breazeal et al., 2005, Fischer et al., 2014, Takayama et al., 2011).
Information about the robot’s faults or errors function to warn nearby people that the
robot has a problem so that they can provide assistance, notify the correct personnel, or
stay away in certain more hazardous cases (Currie and Peacock, 2002). Acknowledging
errors or mistakes has also been shown to improve people’s attitudes towards robots (Cha
et al., 2015, Lee et al., 2010).
As many robots can function with a range of autonomy, it is important to know the
robot’s current operating mode. If the robot is being operated by a person that receives
visual or audio feedback, privacy concerns may arise if people are unaware they are
being observed (Lee et al., 2011). Prior research has also shown that people tend to
trust and treat the robot differently when they believe it is autonomous versus manually
controlled (Kraft and Smart, 2016, Lee et al., 2014). This can be further complicated in
scenarios where the same robot may be operated in multiple modes or by different people
(e.g., telepresence robots or autonomous vehicles).
The second category relates to the robot’s knowledge, both about itself and the world.
For robots that are expected to directly communicate with people, this knowledge can
be critical to the robot’s success. Moreover, knowledge about the robot’s capabilities
is essential for robots that operate in close proximity with humans, particularly in more
private areas. For instance, many robots may have inherent surveillance or data gathering
capabilities that can be detrimental if humans are unaware. This information can also help
humans to set correct expectations when interacting with or tasking robots.
The primary and most common signaling need relates to the activity of the robot.
Since all of the identified robot types utilize motion, it is important to provide cues that
enable coordination and prevent collisions. These cues which express information such
as motion intent and timing (i.e., when a motion will start or occur) help humans to plan
their own actions with the robot in mind (Dragan et al., 2015, Takayama et al., 2011).
137
Since many robots are functional in nature, expressing information about their task
provides insight into the robot’s behavior and what actions they may take in the future.
This also enables humans to coordinate their own actions with the robot so they can
collaborate. In cases where a nearby human is responsible for assigning tasks to the
robot, task progress helps them to manage allocation of future tasks.
Keeping this wide range of information in mind, we aimed to provide a high level
of functionality and flexibility to users. Since the signals should be visible in different
environments and while the robot is potentially moving, the system should provide a
wide range of colors, speeds, and brightness levels. Moreover, colors already have strong
associations and can hence, be a powerful tool when trying to gain attention, convey
different states (e.g., error vs. okay), or have different "levels" of signaling (e.g., low vs.
high battery).
Also, as the same lights may be used to convey several different types of information,
we aimed to provide the software functionality to create more complex behaviors. This is
one of the primary concerns with light signaling as many existing devices rely on simple
behaviors (e.g., on vs. off) which do not scale well for high information content. Hence, to
generate signals that are intuitive and contain high information content, greater flexibility
is needed. Lastly, we also took into account how light has already been used to convey
information for a range of products, including robots, electronic devices, transportation,
and other applications. We confirmed that this platform can replicate both the signals
and light configurations employed in these past works.
10.3.3 System Requirements
Module: A module is defined as an individual piece with a contained light source, as-
sociated electronics, light diffuser, and physical connectors to combine with additional
modules.
1. Modularity- Each module should be self-contained and possess the mechanical con-
nectors to form common shapes such as linear arrays, grids, and rings.
2. Reconfigurability- The modules should be easily reconfigured to without significant
effort or modification.
3. Size- The design should minimize module size and the transition between light-
emitting faces of the modules.
138
4. Fabrication- Fabrication should be low-cost and easily accesible. Modules should be
designed for both indoor and outdoor use and scalable in size and quantity.
5. Front- The front face of the module must be removable and effectively diffuse light
source throughout.
6. Safety- The design should minimize accessibility of eletronic components and catas-
trophic breakdowns.
Electronics:
1. Lights- The lights should be minimal in size, easy to control, and have high visibility.
They should also be individually programmable and have a large range of colors for
greater flexibility.
2. Power- The entire system should run off an independent, external power source.
3. Controller- Lights should be individually addressable and controlled by a common,
low-cost microcontroller.
4. Speed- Lights should have a fast refresh rate to support high speed light behaviors.
Timing requirements should accommodate several potential controllers.
Software:
1. Software Library- The system should provide a software library using a higher-level
programming language that removes the need to directly program the microcon-
troller and enables users to more easily generate complex light behaviors. Identified
core behaviors for each light include turning on, blinking, and pulsing. The library
should provide functions for both core behaviors as well as common behaviors in-
volving groups of lights. The library should also support combining or chaining
multiple behaviors to create more complex signals while dealing with the intricacies
involved such as timing.
2. Visual Programming Interface- The system should provide a graphical user interface
for programming the lights. It should provide the same range of light behaviors as
the software library but require no programming knowledge. It should also generate
the corresponding code using the software library for users to embed in their own
programming applications.
139
1000 µF
Figure 10.2: The ModLight System: an external power pack, an Arduino micrcontroller for driving the
system, and 3D printed modules containing an LED and light diffusing acrylic.
10.4 System Overview
ModLight consists of a set of physical modules, external power pack, microcontroller,
LEDs, an a software interface. Modules can be constructed individually to support dif-
ferent numbers and configurations of the system.
10.4.1 Mechanical Design
ModLight consists of individual modules that contain an RGBW LED and a piece of
frosted acrylic for uniform light diffusion. We experimented with several different mod-
ule designs, varying the shape of the module as well as their connective linkages. Our
final physical prototype is in the shape of a truncated pyramid which was chosen for
its tapered sizes. This enables a greater degree of relative rotation between the modules
compared to rectilinear sides.
The linkages used to connect individual modules are two-section hinge joints. Each
module contains one half of the hinge which are joined together by a pan head screw.
A nut is placed opposite the screw head to tighten the linkage at a particular angle.
This mechanism enables rotation along the edge of each module along one dimension.
Reconfiguration is quick and easy and only requires removing the washer and screw. This
design is simple, easy to produce, and permits a variety of geometrical shapes. It also has
a high tolerance for errors in the fabrication of the linkages.
The modules were fabricated using 3D printing, due to its cost effectiveness and ease
of access. However, this also affected the design of the modules. For instance, the resolu-
tion of many home-use 3D printers limits the size and complexity of the linkage design.
140
Figure 10.3: Several designs for the ModLight module.
Even smaller versions of the current design had significant flaws when printed on lower
cost systems.
The prototype module size was determined by the size of the LED. However, more
compact LEDs are available which can be used for applications where a smaller module
size is preferred. For larger modules, an additional LED can be added to maintain the
same level of brightness.
Although the initial design of the front face of the module was square, the initial
prototype was slightly rectangular to accommodate the LED system used during testing.
The front face of the module is a piece of frosted acrylic with 54% transmission made by
Acrilyte. This transmission rate was chosen, as it evenly diffused the light and created a
softer look.
Several other module shapes were also developed and prototyped, including a trun-
cated pyramid and a hexagonal prism. The mechanism for combining linkages was also
investigated, with the aim of enhancing flexibility and removing parts that can wear out
(e.g., screws).
10.4.2 Electronics
The primary component of ModLight is the light emitting diodes (LEDs) contained in the
modules. When evaluating potential LEDs, we looked for bendability, high luminosity,
low power consumption, low cost, and high diffusion. We also wanted the LEDs to be
individually controllable to provide a greater range of capability. For these reasons, the
Adafruit NeoPixel Digital RGBW LEDs were chosen (on a flexible PCB). Adafruit also
provides an Arduino Library making the NeoPixels a popular choice for both researchers
and hobbyists (Baraka et al., 2015, Szafir et al., 2015).
141
Figure 10.4: ModLight Software: C++ library that generates light behaviors and the Visual Programming
Interface.
A Arduino Microcontroller Board was used to control the LEDs due to its low cost,
high performance, and popularity. Although there are many Arduino boards, an Arduino
Mego was chosen for several reasons. First, the protocol used by the LEDs is very timing-
specific and can only be controlled by microcontrollers with highly repeatable 100mS
timing precision. Also, to run a large number of LED modules, the whole array must
be buffered into the memory. The Arduino Mega has an accurate clock cycle and a
large amount of RAM for storing the array. The Arduino Uno can be substituted, but
the memory size limits the number of LEDs that can be controlled. Since ModLight
is intended to be portable and require no modifications to the robot, the system was
powered using a small external power bank that outputs the required 7 to 9 Volts. An
overview of the system is shown in Figure 10.2.
10.4.3 Software
We developed a generic C++ library which interacts with the Arduino to control the
RGBW LEDs contained in the modules. The library wraps the provided Adafruit Ar-
duino libraries as well as providing additional behaviors and timing functionalities. We
also developed a Visual Programming Interface which uses the library and allows non-
programmers to create and prototype light behaviors.
142
C++ Library
The C++ library was created to provide functions to help users easily create complex light
behaviors. This also removes the need for users to directly program the microcontroller
and facilitates integration of light signaling code with preexisting software used to control
robots.
NeoPixel Code Converter Library is a generic library that generates ino files based
on the input configuration. These generated ino files utilizes the Arduino system calls
to execute the desired Light Pattern behaviours. NeoPixel Code Converter Library hides
away the complexity of maintaining LED states and also wraps the logic of updating
LEDs with correct delay and color combinations.
The library is initialized using a vector of LightParameter objects. Each LightParame-
ter object represents a group of LEDs along with their assigned Light Pattern behaviour.
Currently supported Light Patterns are defined by the ActivePattern enum in LightPa-
rameter.h header.
enum ActivePattern { NO_PAT = 0,
RAINBOW_CYCLE = 1,
THEATER_CHASE = 2,
COLOR_WIPE = 3,
SCANNER = 4,
FADE = 5,
BLINK = 6,
ON_AND_OFF = 7,
PULSATING = 8,
LOADING = 9,
STEP = 10
};
Calls to the create() function loop through the input vector and generate the Ar-
duino setup() and loop() function bodies. The loop() function uses the LightSignal
Arduino library to update the LEDs with the desired delays and color changes.
Code for the NeoPixel Code Converter Library is built using cmake that generates a
static library, libNeoPixelCodeConverter.a. The main.cpp file generated by the graphical
user interface (GUI) depends on this static library. Examples of various light behaviors
can be found under /Examples folder.
143
LightSignal Arduino Library is built as a third party library for Arduino. This library
handles the logic to update LEDs with different light behaviors. Code for this library is
present in Arduino_Libs/LightSignal folder. Arduino call the library inside the loop()
function using themainLoop() function.
When the mainLoop() is called with the pointer to the LightParameter object. Li-
brary verifies whether the pattern needs to be refreshed based on start and stop time. If
the pattern needs to be refreshed then it would call theUpdate() to set the correct color
and delay on the LEDs to reflect the desired Light Pattern.
ROS Integration
To make the system more useful for roboticists, a simple robot operating system (ROS)
integration was created (Quigley et al., 2009). The ROS package, ros_modlight, contains
a ROS node which subscribes to the topic lightbehavior_cmd and publishes its status to
modlight_status.
Visual Programming Interface
We also created a visual programming interface for users with limited programming
experience. The GUI provides users with the same functionality as the C++ library and
also allows them to directly upload light behaviors to the Arduino. This makes it a
valuable tool for all users as it enables them to quickly explore different light signals and
see them in real life, before spending time and effort to integrate them into their system.
The GUI was developed using the Qt framework in C++, see Figure 10.4. It requires
users to first add LEDs and then, configure their behaviors. Users can add multiple LEDs
and assign IDs to each. LEDs are represented by circles marked with the assigned ID
value. Later the LEDs are grouped into various sections and each section is configured
with a Light Pattern to display.
Light Patterns also have additional parameters like start time, end time, total
run cycles, brightness level etc. (All parameters can be found under struct Light-
Parameter in LightParameter.h). Code for the GUI application is present under the
/GUI_Code/BlockTrial folder. The project is created using Qt Creator IDE and can be
opened and built using the BlockTrial.pro project file.
After configuring the LEDs, GUI can be used to either upload the code directly to
Arduino or generate the main.cpp file which uses the NeoPixel Code Converter library.
144
The generated main.cpp file can be compiled later to generate the .ino files for flashing
the Arduino microprocessor.
10.5 Discussion
ModLight provides an alternative to the current approach of designing and constructing
specialized light signaling platforms for each robot. It is robot agnostic and prevents
users from having to commit to a specific configuration allowing them greater freedom in
their design and research. The system also provides a broad set of functionalities making
it an effective platform for many robotic applications.
The creation of ModLight was guided by the specificed design criteria. However,
the general nature of the tool creates many design constraints. The most significant of
these constraints concerned the platform’s availability to not only robotics researchers and
designers, but students and hobbyists as well. This greatly limited the cost, hardware, and
methods that could be used to build ModLight.
Our final design employed two main fabrication methods: 3D printing (for modules)
and laser cutting (for acrylic). Both methods are becoming more widely available to the
general public through lower cost machines targeted for home use as well as commu-
nity fabrication or “build” labs. There is also a burgeoning online marketplace for these
technologies making them cost-effective and increasingly attractive for future use.
For hardware, we chose components that are widely available and commonly used
in research and hobbyist applications. We chose the Adafruit Neopixel RGBW LEDs for
their affordability, software library, and support. For controlling the LEDs, we used an
Arduino Mega microcontroller board. Arduino boards are popular, easy to use, flexible,
and provide software tools and support for several operating systems. Users can also
adapt the ModLight to work with the Arduino UNO, a lower cost, entry level platform.
The overall price of the ModLight system can be broken down into the base cost for
the system and the cost per module. The base cost is approximately $68 and includes the
cost of the Arduino microcontroller, portable power source, and other one-time system
components. The cost per module is approximately $2.50 and includes the cost of printing
each module from an outside service, the LED, the diffusion material, and connectors
between the modules.
145
The ModLight software consists of a C++ library and a visual programming interface
which remove the need for users to directly program the Arduino and LEDs. Instead the
C++ library handles messier details of implementation such as timing while providing
higher level light behavior functions. This makes it more accessible to users of varying
technical backgrounds and easier to integrate in pre-existing robot software. In addi-
tion, the visual programming interface enables users to quickly prototype different light
behaviors without writing any code.
The accessibility of these components as well as ModLight’s design and software tools
makes it possible for users to easily add new functionalities. Users can also customize
the size, shape, and connections of each module to greater fit their own needs.
10.6 Summary
In this chapter, we presented ModLight, a modular interactive light signaling platform
for HRI. The primary goal of this thesis is enabling non-humanoid robots to effectively
employ nonverbal signals to communicate information about their internal state. How-
ever, this also requires appropriate robot hardware to support the use of nonverbal signals
across the wide range of non-humanoid robot embodiments. Such tools enable the more
broad utilization of the signal design insights from the previous chapters and supports
standardization of signaling behaviors. In the next chapter, we summarize the contribu-
tions of this thesis and discuss its implications for HRI research and robots deployed in
the real world.
146
11. Summary and Conclusions
This thesis addressed a number of research problems relating to generating nonverbal
communication for non-humanoid robots during HRI (Figure 1.2). An approach to
planning a robot’s communication behaviors based on the formalism of signaling as a
decision-making problem was presented. In this framework, the robot aims to balance
its own task-oriented needs with the preferences of the human interactor. To inform the
state and action space of the framework’s MDP model, we conducted three user stud-
ies on visual and auditory signal design (S1-S3) and three experiments (E1-E3) applying
their findings to HRI scenarios. We presented an implementation of the communication
framework using insights from our user studies and applications (S1-S3, E1-E3) applied
to a human-robot collaborative task in a simulation environment and an experiment eval-
uating its effectiveness (E3). To increase the generalizability of this research, we employed
a wide range of appearance-constrained, non-humanoid robots and applications in our
studies and experiments.
11.1 Contributions
Enabling a non-humanoid robot to effectively employ nonverbal communication for in-
teracting with humans poses numerous challenges. The robot’s communication actions
must convey information concisely and intuitively such that the human does not have to
spend significant time or effort decoding the robot’s actions. The robot must also take
into account the human interactor’s state in order to act as an effective and considerate
interaction partner.
These challenges resulted in two goals for this thesis. First, this work aimed to enable
robots to plan their communication actions intelligently. By taking into account its human
interactors, the environment, and the uncertainty in the world, the robot can increase its
147
functional and social success. This work also explored the design space of nonverbal
robot signals to facilitate standardization of signals across the range of existing non-
humanoid robot platforms. This goal will become will become increasingly important as
more non-humanoid robots are deployed in human environments, particularly in more
safety-critical scenarios, such as autonomous driving. Although these studies employed
non-humanoid robots, many of our findings can also provide insight for communication
with a wide range of systems, ranging from smart devices to anthropomorphic robots.
Towards these larger goals, we made the following contributions:
11.1.1 Framework for Robot Communication
In Chapter 3, we presented a computational framework for planning robot communica-
tion. We mathematically formulated communication as a decision-making problem in
which the robot must choose when and how to communicate with humans in the nearby
environment. We modeled this problem using a Markov Decision Process (MDP) and in-
corporated models of the human, robot, and environment in the state space of the MDP .
A major goal of this framework was balancing the human’s and robot’s objectives
when planning the robot’s communication. We described different methods for encoding
these contrasting objectives into the reward function of the MDP . We also reviewed meth-
ods for solving the MDP to generate a policy that optimizes this reward function while
remaining tractable. This framework is a first step towards enabling robots to communi-
cate more intelligently under the inherent uncertainty of real human environments.
A primarily limitation of the framework is its assumption that all components of the
state space and reward function are fully observable. In the real world, this is unlikely
to be true. Rather, certain aspects of the MDP will likely need to be estimated proba-
bilistically from the robot’s observations. This makes other variations of MDPs, such as
the Partially Observable Markov Decision Process (POMDP) and the Mixed Observability
Markov Decision Process (MOMDP), more well suited to encoding robot communication.
11.1.2 Design of Nonverbal Signals
The communication framework described in this thesis requires a vocabulary of nonver-
bal signals with well known effects on human perception and behavior. To inform the
framework, we investigated the design space of auditory and visual signals for robot
148
communication in three user studies (S1-S3). A major goal of these studies was to iden-
tify underlying signal design principles that can be used to generate a standardized set
of nonverbal signals across the wide range of non-humanoid robot platforms. We val-
idated the findings of these studies in three applications (E1-E3) of HRI using several
types of non-humanoid robot and robot applications. The results of these works also pro-
vided insights into how humans believe the robot should behave when communicating,
informing the reward function of our framework. We directly employed signals from our
studies (S1,S3) in our implementation and validation of the communication framework
(Chapter 9).
Chapter
Signaling
Modality
Communication
Framework
Light Sound
S1 4. Design of Robot Light Signals X
E1 5. Applications of Robot Light Signals X
S2 6. Design of Robot Auditory Signals X
E2 7. Application of Robot Auditory Signals X X
S3 8. Design of Robot Multimodal Signals X X
E3 10. Application of Communication Framework X X X
In S1 and E1, we investigated the design and usage of robot light signals. First, we
found that humans have certain connotations of light signals from prior experiences, es-
pecially when combined with robot motion (S1). We then applied these results to different
HRI scenarios and found that the usage of these light signals helped humans to better
understand certain robot behaviors (E1). However, we also found that without the context
of the robot’s actions, light signals are often too abstract to attribute specific state related
information.
In S2 and E2, we investigated the design and usage of robot auditory signals, par-
ticularly for enabling localization of the robot’s position in the nearby environment. In
our study, we found that the use of auditory icons enabled humans to better localize the
robot in the absence of visual cues (S1). We also compared broadband and tonal auditory
signals and found both to be highly localizable but broadband sounds to be significantly
less annoying and distracting. We used the findings of this study in an application of our
framework (E2) and found that discrete auditory icons in the presence of visual cues are
effective for localization.
149
Finally, we investigated the use of multimodal signals in S3 and E3. We investigated
how humans react to multimodal robot signals for requesting help. We found that hu-
mans are more willing to be interrupted, especially with more salient signals, in higher
urgency scenarios. We also found that humans’ responses are affected by their availabil-
ity. We applied the results of this study to the final application of our framework in which
the robot learned an optimal policy for communication (E3).
11.1.3 Design of Robot Hardware for Signaling
Since non-humanoid robots range so widely in size, shape, and form, another goal of this
thesis work was to enable this wide range of platforms to employ the same nonverbal
signals. To support our research on nonverbal signal design, we explored the design of
hardware solutions to support the signaling modalities employed in this research. To-
wards this goal, we presented ModLight, a modular research tool consisting of a set of
low cost light blocks that can be reconfigured to fit a myriad of different robots. A sec-
ondary goal of the creation of ModLight was enabling a wider range of people, such as
students and hobbyists, to also explore robot communication.
11.1.4 Application of Communication Framework
Our final contribution was the application of our communication framework to a sim-
ulated human-robot collaborative task. The use of a simulated environment enabled a
larger and more diverse set of interactions without the significant overhead of deploying
a robot in the physical world. We presented an implementation of the communication
framework that continuously learned online new policies for the robot’s behavior using
a Monte Carlo control algorithm that averages sampled returns after each episode (i.e.,
interaction). The results of this work showed that incorporating the human and their
preferences into the robot’s communication planning improves collaboration. This appli-
cation also validated the communication framework and showed its potential for HRI.
11.2 Future Work
Although this thesis presented significant findings in human-robot communication, this
field of research is young and evolving. The usage of probabilistic models for planning
150
robot communication, in particular, is relatively unexplored ,and therefore, requires sig-
nificant future work. Our usage of a MDP for modeling the communication problem goes
beyond past research in HCI but still fails to take into account the difficulty in measuring
human state. Therefore, a natural extension of this work is to use observations to estimate
a belief state over certain variables.
More work is also needed to refine the reward function. As a first step, we used
the expected values of a human’s response. However, this can be estimated through
observations or by more closely aligning the reward function with observable events. For
simulated interactions, such as the scenario employed in E3, this requires careful design
such that the scoring of the simulation aligns with how events occur in the real world.
This work can also be used to model communication between multiple interactors.
While we assumed a dyadic interaction with one human and one robot, we can extend
the MDP to account for multiple of each agent. However, further investigation is needed
to understand how the framework must be adapted to account for these differences,
especially as the tractability of solving the model decreases.
In this thesis, we primarily focused on two communication modalities: light and
sound. However, there are other channels of communication that must be accounted
for (e.g., speech) and can provide a rich design space for interaction (e.g., projector). Fur-
ther investigation of signals employing these channels and how they can be combined
together are needed in the future.
Lastly, this work can also be employed to better track human models of the robot
which are important for predicting future behavior. In more formalized interaction, such
as human-robot teaming, this work can be employed to mediate communication for cer-
tain goals. In teaming, a shared mental model is often crucial for both individual agent’s
and the entire team’s success. Maintaining such a model requires continuous communica-
tion and can thus, benefit from the use of formalized methods for communication (Gervits
et al., 2018). In the future, more work is needed to investigate these challenges.
11.3 Final Words
Nonverbal communication is an important tool for robots that interact with humans in
the real world. Our long-term vision is for non-humanoid robots to effectively employ
151
nonverbal signals to create fluid, robust interactions. This goal requires robots to maintain
a probabilistic representation of the world that captures the uncertainty of interacting
with humans. Using this representation, the robot can reason about its communication
actions to balance both the humans and its own needs. This reasoning also requires a
vocabulary of nonverbal signals that can be employed by robots with vastly different
embodiments and applications to communicate their state.
This thesis serves as a first step towards our vision by proposing a framework for
robot communication (Chapter 3) and applying its model to human-robot collaboration
(Chapter 7 and Chapter 9). We also conducted studies exploring the design space of
nonverbal signal design (Chapter 4, Chapter 6, Chapter 8) to support the systematic gen-
eration of nonverbal signals for non-humanoid robots. We validated the findings of these
studies in several applications of HRI (Chapter 5, Chapter 7, Chapter 9).
This thesis is only a first step towards understanding how to reason about a robot’s
communication actions with a human’s preferences in mind. However, our hope is that
this work will serve as a foundation for future research in human-robot communication
as there remain many open research challenges.
152
Bibliography
P . D. Adamczyk and B. P . Bailey. If not now, when?: the effects of interruption at different
moments within task execution. In Proceedings of the SIGCHI conference on Human factors
in computing systems, pages 271–278. ACM, 2004.
H. Admoni. Nonverbal Communication in Socially Assistive Human-Robot Interaction.
AI Matters, 2(4):9–10, 2016.
H. Admoni, T. Weng, and B. Scassellati. Modeling communicative behaviors for object
references in human-robot interaction. In Robotics and Automation (ICRA), 2016 IEEE
International Conference on, pages 3352–3359. IEEE, 2016.
J. K. Aggarwal and M. S. Ryoo. Human Activity Analysis: A Review. ACM Computing
Surveys, 43(3):16, 2011.
R. Alami, A. Albu-Schäffer, A. Bicchi, R. Bischoff, R. Chatila, A. De Luca, A. De Santis,
G. Giralt, J. Guiochet, G. Hirzinger, et al. Safe and Dependable Physical Human-Robot
Interaction in Anthropic Domains: State of the Art and Challenges. In IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems. IEEE, 2006.
S. Andrist, T. Pejsa, B. Mutlu, and M. Gleicher. Designing Effective Gaze Mechanisms
for Virtual Agents. In ACM Conference on Human Factors in Computing Systems, pages
705–714. ACM, 2012.
G. R. Arrabito, T. A. Mondor, and K. J. Kent. Judging the urgency of non-verbal auditory
alarms: A case study. Ergonomics, 47(8):821–840, 2004.
K. Baraka and M. M. Veloso. Mobile Service Robot State Revealing Through Expressive
Lights: Formalism, Design, and Evaluation. International Journal of Social Robotics, pages
1–28, 2017.
K. Baraka, A. Paiva, and M. Veloso. Expressive Lights for Revealing Mobile Service Robot
State. In Robot 2015: Second Iberian Robotics Conference, pages 107–119. Springer, 2015.
K. Baraka, S. Rosenthal, and M. Veloso. Enhancing Human Understanding of a Mobile
Robotâ
˘
A
´
Zs State and Actions using Expressive Lights. In IEEE International Symposium
on Robot and Human Interactive Communication, pages 652–657. IEEE, 2016.
C. Bartneck. From Fiction to Science–A cultural reflection of social robots. In ACM
Conference on Human Factors in Computing Systems: Workshop on Shaping Human-Robot
Interaction, pages 1–4, 2004.
153
C. Bartneck, T. Suzuki, T. Kanda, and T. Nomura. The influence of peopleâ
˘
A
´
Zs culture
and prior experiences with Aibo on their attitude towards robots. Ai & Society, 21(1-2):
217–230, 2007.
C. Bartneck, D. Kuli´ c, E. Croft, and S. Zoghbi. Measurement Instruments for the An-
thropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of
Robots. International Journal of Social Robotics, 1(1):71–81, 2009.
A. Bauer, D. Wollherr, and M. Buss. Human-Robot Collaboration: A Survey. International
Journal of Humanoid Robotics, 5(01):47–66, 2008.
J. M. Beer, A. Prakash, T. L. Mitzner, and W. A. Rogers. Understanding Robot Acceptance.
Technical Report HFA-TR-1103, Georgia Institute of Technology, 2011.
J. Billingsley, A. Visala, and M. Dunn. Robotics in agriculture and forestry. In Springer
Handbook of Robotics, pages 1065–1077. Springer, 2008.
J. P . Bliss, R. D. Gilson, and J. E. Deaton. Human probability matching behaviour in
response to alarms of varying reliability. Ergonomics, 38(11):2300–2312, 1995.
C. Bodden, D. Rakita, B. Mutlu, and M. Gleicher. Evaluating Intent-Expressive Robot Arm
Motion. In IEEE International Symposium on Robot and Human Interactive Communication,
pages 658–663. IEEE, 2016.
D. A. Boehm-Davis and R. Remington. Reducing the disruptive effects of interruption:
A cognitive framework for analysing the costs and benefits of intervention strategies.
Accident Analysis & Prevention, 41(5):1124–1129, 2009.
J. P . Borst, N. A. Taatgen, and H. van Rijn. What makes interruptions disruptive?: A
process-model account of the effects of the problem state bottleneck on task interrup-
tion and resumption. In Proceedings of the 33rd annual ACM conference on human factors
in computing systems, pages 2971–2980. ACM, 2015.
C. Breazeal, C. D. Kidd, A. L. Thomaz, G. Hoffman, and M. Berlin. Effects of Nonverbal
Communication on Efficiency and Robustness in Human-Robot Teamwork. In IEEE/RSJ
International Conference on Intelligent Robots and Systems, pages 708–713. IEEE, 2005.
C. Breazeal, J. Gray, and M. Berlin. An Embodied Cognition Approach to Mindreading
Skills for Socially Intelligent Robots. International Journal of Robotics Research, 28(5):656–
680, 2009.
S. A. Brewster. Non-Speech Auditory Output, pages 220–239. Lawrence Erlbaum Associates,
2002.
D. S. Brungart and W. M. Rabinowitz. Auditory localization of nearby sources. head-
related transfer functions. The Journal of the Acoustical Society of America, 106(3):1465–
1479, 1999.
M. Bualat, J. Barlow, T. Fong, C. Provencher, T. Smith, and A. Zuniga. Astrobee: Devel-
oping a free-flying robot for the International Space Station. In AIAA SP ACE Conference
and Exposition, 2015.
154
R. L. Buckner and D. C. Carroll. Self-projection and the brain. Trends in Cognitive Sciences,
11(2):49–57, 2007.
B. Buren, S. Uddenberg, and B. J. Scholl. The automaticity of perceiving animacy: Goal-
directed motion in simple shapes influences visuomotor behavior even when task-
irrelevant. Psychonomic Bulletin & Review, 23(3):797–802, 2016.
E. M. Caruso, Z. C. Burns, and B. A. Converse. Slow motion increases perceived intent.
Proceedings of the National Academy of Sciences, 113(33):9250–9255, 2016.
J. R. Cauchard, K. Y. Zhai, and J. A. Landay. Drone & Me: An Exploration Into Nat-
ural Human-Drone Interaction. In ACM International Joint Conference on Pervasive and
Ubiquitous Computing, pages 361–365. ACM, 2015.
E. Cha and M. Matari´ c. Using Nonverbal Signals to Request Help During Human-Robot
Collaboration. In IEEE/RSJ International Conference on Intelligent Robots and Systems,
pages 5070–5076. IEEE, 2016.
E. Cha, A. D. Dragan, and S. S. Srinivasa. Perceived Robot Capability. In IEEE International
Symposium on Robot and Human Interactive Communication, pages 541–548. IEEE, 2015.
E. Cha, Y. Kim, T. Fong, and M. J. Matari´ c. A Survey of Nonverbal Signaling Methods for
Non-Humanoid Robots. Foundations and Trends in Robotics, 2017.
E. Cha, Y. Kim, T. Fong, and M. J. Matari´ c. A Survey of Nonverbal Signaling Methods for
Non-Humanoid Robots. Foundations and Trends in Robotics, 2018.
A. Chang, B. Resner, B. Koerner, X. Wang, and H. Ishii. Lumitouch: an emotional com-
munication device. In ACM Conference on Human Factors in Computing Systems Extended
Abstracts, pages 313–314. ACM, 2001.
H. H. Clark and S. E. Brennan. Grounding in Communication. Perspectives on Socially
Shared Cognition, 13(1991):127–149, 1991.
T. J. Clarke, M. F. Bradshaw, D. T. Field, S. E. Hampson, and D. Rose. The perception of
emotion from body movement in point-light displays of interpersonal dialogue. Percep-
tion, 34(10):1171–1180, 2005.
M. Coeckelbergh. Humans, Animals, and Robots: A Phenomenological Approach to
Human-Robot Relations. International Journal of Social Robotics, 3(2):197–204, 2011.
T. E. Cohn. Method and apparatus for enhancing visual perception of display lights,
warning lights and the like, and of stimuli used in testing for ocular disease, Jan. 20
1998. URLhttps://www.google.com/patents/US5710560. US Patent 5,710,560.
N. J. Currie and B. Peacock. International space station robotic systems operations-a
human factors perspective. In The Human Factors and Ergonomics Society Annual Meeting,
volume 46, pages 26–30. SAGE Publications, 2002.
155
E. B. Cutrell, M. Czerwinski, and E. Horvitz. Effects of instant messaging interruptions
on computing tasks. In CHI’00 extended abstracts on Human factors in computing systems,
pages 99–100. ACM, 2000.
E. B. Cutrell, M. Czerwinski, and E. Horvitz. Notification, disruption, and memory:
Effects of messaging interruptions on memory and performance. In Human-Computer
Interaction: INTERACT, volume 1, page 263, 2001.
K. Dautenhahn, M. Walters, S. Woods, K. L. Koay, C. L. Nehaniv, A. Sisbot, R. Alami, and
T. Siméon. How May I Serve You? A Robot Companion Approaching a Seated Person
in a Helping Context. In ACM/IEEE International Conference on Human-Robot Interaction,
pages 172–179. ACM, 2006.
B. H. Deatherage. Auditory and other sensory forms of information presentation. Human
Engineering Guide to Equipment Design, pages 123–160, 1972.
T. Dingler, J. Lindsay, and B. N. Walker. Learnabiltiy of sound cues for environmental
features: Auditory icons, earcons, spearcons, and speech. In International Conference on
Auditory Display, 2008.
W. H. Dittrich and S. E. Lea. Visual perception of intentional motion. Perception, 23:
253–253, 1994.
W. H. Dittrich, T. Troscianko, S. E. Lea, and D. Morgan. Perception of emotion from
dynamic point-light displays represented in dance. Perception, 25(6):727–738, 1996.
A. Dragan and S. Srinivasa. Generating Legible Motion. In Robotics: Science and Systems,
2013.
A. D. Dragan. Legible Robot Motion Planning. PhD thesis, The Robotics Institute, Carnegie
Mellon University, 2015.
A. D. Dragan. Robot Planning with Mathematical Models of Human State and Action.
CoRR, abs/1705.04226, 2017.
A. D. Dragan, K. C. Lee, and S. S. Srinivasa. Legibility and Predictability of Robot Motion.
In ACM/IEEE International Conference on Human-Robot Interaction, pages 301–308. IEEE,
2013.
A. D. Dragan, S. Bauman, J. Forlizzi, and S. S. Srinivasa. Effects of Robot Motion on
Human-Robot Collaboration. In ACM/IEEE International Conference on Human-Robot
Interaction, pages 51–58. ACM, 2015.
J. L. Drury, J. Scholtz, and H. A. Yanco. Awareness in Human-Robot Interactions. In IEEE
International Conference on Systems, Man and Cybernetics, volume 1, pages 912–918. IEEE,
2003.
B. R. Duffy. Anthropomorphism and the social robot. Robotics and Autonomous Systems,
42(3):177–190, 2003.
156
J. Edworthy. Medical audible alarms: a review. Journal of the American Medical Informatics
Association, 20(3):584–589, 2013.
J. Edworthy and N. Stanton. A user-centred approach to the design and evaluation of
auditory warning signals: 1. methodology. Ergonomics, 38(11):2262–2280, 1995.
M. R. Endsley. Toward a Theory of Situation Awareness in Dynamic Systems. Human
Factors, 37(1):32–64, 1995.
N. Epley, A. Waytz, and J. T. Cacioppo. On Seeing Human: A Three-Factor Theory of
Anthropomorphism. Psychological Review, 114(4):864, 2007.
F. Eyssel, D. Kuchenbrandt, S. Bobinger, L. de Ruiter, and F. Hegel. ‘If You Sound Like Me,
You Must Be More Human’: On the Interplay of Robot and User Features on Human-
Robot Acceptance and Anthropomorphism. In ACM/IEEE International Conference on
Human-Robot Interaction Late Breaking Report, pages 125–126. ACM, 2012.
A. Farnell. Designing Sound. MIT Press, Cambridge MA, 2010.
K. Fischer, B. Soto, C. Pantofaru, and L. Takayama. Initiating Interactions in Order to Get
Help: Effects of Social Framing on People’s Responses to Robots’ Requests for Assis-
tance. In IEEE International Symposium on Robot and Human Interactive Communication,
pages 999–1005. IEEE, 2014.
H. Fletcher and W. A. Munson. Loudness, Its Definition, Measurement and Calculation .
Bell Labs Technical Journal, 12(4):377–430, 1933.
T. Fong, J. R. Zumbado, N. Currie, A. Mishkin, and D. L. Akin. Space Telerobotics Unique
Challenges to Human–Robot Collaboration in Space. Reviews of Human Factors and
Ergonomics, 9(1):6–56, 2013.
J. Forlizzi and C. DiSalvo. Service Robots in the Domestic Environment: A Study of the
Roomba Vacuum in the Home. In ACM/IEEE International Conference on Human-Robot
Interaction, pages 258–265. ACM, 2006.
J. Fortmann, T. C. Stratmann, S. Boll, B. Poppinga, and W. Heuten. Make Me Move
at Work! An Ambient Light Display to Increase Physical Activity. In International
Conference on Pervasive Computing Technologies for Healthcare, pages 274–277. Institute for
Computer Sciences, Social-Informatics and Telecommunications Engineering, 2013.
C. Frith and U. Frith. Theory of mind. Current Biology, 15(17):644–645, 2005.
K. Funakoshi, K. Kobayashi, M. Nakano, S. Yamada, Y. Kitamura, and H. Tsujino. Smooth-
ing Human-robot Speech Interactions by Using a Blinking-Light as Subtle Expression.
In International Conference on Multimodal Interfaces, pages 293–296. ACM, 2008.
S. R. Fussell, S. Kiesler, L. D. Setlock, and V . Yew. How People Anthropomorphize Robots.
In ACM/IEEE International Conference on Human-Robot Interaction, pages 145–152. IEEE,
2008.
157
S. Garzonis, C. Bevan, and E. O’Neill. Mobile Service Audio Notifications: intuitive se-
mantics and noises. In Australasian Conference on Computer-Human Interaction: Designing
for Habitus and Habitat, pages 156–163. ACM, 2008.
W. W. Gaver. Auditory icons: Using sound in computer interfaces. Human-Computer
Interaction, 2(2):167–177, 1986.
D. Gentner and A. L. Stevens. Mental Models. Psychology Press, 2014.
F. Gervits, T. W. Fong, and M. Scheutz. Shared mental models to support distributed
human-robot teaming in space. In 2018 AIAA SP ACE and Astronautics Forum and Expo-
sition, page 5340, 2018.
M. Gleicher. Retargetting Motion to New Characters. In Conference on Computer Graphics
and Interactive Techniques, pages 33–42. ACM, 1998.
J. Goetz, S. Kiesler, and A. Powers. Matching Robot Appearance and Behavior to Tasks
to Improve Human-Robot Cooperation. In IEEE International Symposium on Robot and
Human Interactive Communication, pages 55–60. IEEE, 2003.
M. A. Goodrich and A. C. Schultz. Human-Robot Interaction: A Survey. Foundations and
Trends in Human-Computer Interaction, 1(3):203–275, 2007.
A. Guillaume, C. Drake, M. Rivenez, L. Pellieux, and V . Chastres. Perception of urgency
and alarm design. In International Conference on Auditory Display. Elsevier, 2002.
D. Hadfield-Menell, S. Milli, P . Abbeel, S. J. Russell, and A. Dragan. Inverse reward
design. In Advances in Neural Information Processing Systems, pages 6765–6774, 2017.
R. Hansson and P . Ljungstrand. The Reminder Bracelet: Subtle Notification Cues for
Mobile Devices. In ACM Conference on Human Factors in Computing Systems Extended
Abstracts, pages 323–324. ACM, 2000.
K. S. Haring, C. Mougenot, F. Ono, and K. Watanabe. Cultural Differences in Perception
and Attitude towards Robots. International Journal of Affective Engineering, 13(3):149–157,
2014.
C. Harrison, J. Horstman, G. Hsieh, and S. Hudson. Unlocking the Expressivity of Point
Lights. In ACM Conference on Human Factors in Computing Systems, pages 1683–1692.
ACM, 2012.
M. Hegarty, M. S. Canham, and S. I. Fabrikant. Thinking About the Weather: How Display
Salience and Knowledge Affect Performance in a Graphic Inference Task. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 36(1):37, 2010.
M. W. Hoffman, D. B. Grimes, A. P . Shon, and R. P . Rao. A probabilistic model of gaze
imitation and shared attention. Neural Networks, 19(3):299–310, 2006.
E. Horvitz and J. Apacible. Learning and reasoning about interruption. In Proceedings of
the 5th international conference on Multimodal interfaces, pages 20–27. ACM, 2003.
158
E. Horvitz, A. Jacobs, and D. Hovel. Attention-Sensitive Alerting. In Conference on Uncer-
tainty in Artificial Intelligence, pages 305–313. Morgan Kaufmann Publishers Inc., 1999.
C.-M. Huang and B. Mutlu. Modeling and Evaluating Narrative Gestures for Humanlike
Robots. In Robotics: Science and Systems, pages 57–64, 2013.
H. Hüttenrauch and K. S. Eklundh. To Help or Not To Help a Service Robot. In IEEE
International Symposium on Robot and Human Interactive Communication, 2003.
S. T. Iqbal and B. P . Bailey. Effects of intelligent notification management on users and
their tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Sys-
tems, pages 93–102. ACM, 2008.
H. Ishii and B. Ullmer. Tangible Bits: Towards Seamless Interfaces between People, Bits
and Atoms. In ACM Conference on Human Factors in Computing Systems, pages 234–241.
ACM, 1997.
M. Jacobsson, J. Bodin, and L. E. Holmquist. The see-Puck: A Platform for Exploring
Human-Robot Relationships. In ACM Conference on Human Factors in Computing Systems
Extended Abstracts, pages 141–144. ACM, 2008.
N. Jarrassé, J. Paik, V . Pasqui, and G. Morel. How can human motion prediction increase
transparency? In IEEE International Conference on Robotics and Automation, pages 2134–
2139. IEEE, 2008.
P . N. Johnson-Laird. Mental models: Towards a cognitive science of language, inference, and
consciousness. Harvard University Press, 1983.
S. Keizer, M. E. Foster, O. Lemon, A. Gaschler, and M. Giuliani. Training and evaluation
of an mdp model for social multi-user human-robot interaction. In Proceedings of the
SIGDIAL 2013 Conference, pages 223–232, 2013.
P . Keller and C. Stevens. Meaning From Environmental Sounds: Types of Signal-Referent
Relations and Their Effect on Recognizing Auditory Icons. Journal of Experimental Psy-
chology: Applied, 10(1):3–12, 2004.
R. Kelley, A. Tavakkoli, C. King, M. Nicolescu, M. Nicolescu, and G. Bebis. Understand-
ing Human Intentions via Hidden Markov Models in Autonomous Mobile Robots. In
ACM/IEEE International Conference on Human-Robot Interaction, pages 367–374. ACM,
2008.
R. Kelley, A. Tavakkoli, C. King, A. Ambardekar, M. Nicolescu, and M. Nicolescu.
Context-Based Bayesian Intent Recognition. IEEE Transactions on Autonomous Mental
Development, 4(3):215–225, 2012.
O. Khatib, K. Yokoi, O. Brock, K. Chang, and A. Casal. Robots in Human Environments:
Basic Autonomous Capabilities. International Journal of Robotics Research, 18(7):684–696,
1999.
159
S. Kiesler. Fostering Common Ground in Human-Robot Interaction. In IEEE International
Symposium on Robot and Human Interactive Communication, pages 729–734. IEEE, 2005.
S. Kiesler and J. Goetz. Mental Models and Cooperation with Robotic Assistants. In ACM
Conference on Human Factors in Computing Systems. ACM, 2002.
M.-g. Kim, H. S. Lee, J. W. Park, S. H. Jo, and M. J. Chung. Determining Color and
Blinking to Support Facial Expression of a Robot for Conveying Emotional Intensity.
In IEEE International Symposium on Robot and Human Interactive Communication, pages
219–224. IEEE, 2008.
R. Kittmann, T. Fröhlich, J. Schäfer, U. Reiser, F. Weißhardt, and A. Haug. Let me in-
troduce myself: I am care-o-bot 4, a gentleman robot. In Mensch Und Computer, pages
223–232. De Gruyter Oldenbourg, 2015.
M. L. Knapp, J. A. Hall, and T. G. Horgan. Nonverbal Communication in Human Interaction.
Cengage Learning, 2013.
H. Knight and R. Simmons. Expressive Motion with X, Y and Theta: Laban Effort Fea-
tures for Mobile Robots. In IEEE International Symposium on Robot and Human Interactive
Communication, pages 267–273. IEEE, 2014.
K. Kobayashi, K. Funakoshi, S. Yamada, M. Nakano, T. Komatsu, and Y. Saito. Blinking
Light Patterns as Artificial Subtle Expressions in Human-Robot Speech Interaction. In
IEEE International Symposium on Robot and Human Interactive Communication, pages 181–
186. IEEE, 2011.
K. Kraft and W. D. Smart. Seeing is Comforting: Effects of Teleoperator Visibility in
Robot-Mediated Health Care. In ACM/IEEE International Conference on Human-Robot
Interaction, pages 11–18. IEEE, 2016.
T. Kruse, P . Basili, S. Glasauer, and A. Kirsch. Legible Robot Navigation in the Proximity
of Moving Humans. In IEEE Workshop on Advanced Robotics and its Social Impacts, pages
83–88. IEEE, 2012.
D. Kuli´ c and E. Croft. Physiological and subjective responses to articulated robot motion.
Robotica, 25(1):13–27, 2007.
P . A. Lasota, T. Fong, J. A. Shah, et al. A Survey of Methods for Safe Human-Robot
Interaction. Foundations and Trends in Robotics, 5(4):261–349, 2017.
H. Lee, J. J. Choi, and S. S. Kwak. Will You Follow the Robot’s Advice? The Impact
of Robot Types and Task Types on People’s Perception of a Robot. In International
Conference on Human-Agent Interaction, pages 137–140. ACM, 2014.
M. K. Lee, S. Kielser, J. Forlizzi, S. Srinivasa, and P . Rybski. Gracefully Mitigating Break-
downs in Robotic Services. In ACM/IEEE International Conference on Human-Robot Inter-
action, pages 203–210. IEEE, 2010.
160
M. K. Lee, K. P . Tang, J. Forlizzi, and S. Kiesler. Understanding Users! Perception of
Privacy in Human-Robot Interaction. In ACM/IEEE International Conference on Human-
Robot Interaction Late Breaking Report, pages 181–182. ACM, 2011.
B. A. Lewis and C. L. Baldwin. Equating Perceived Urgency Across Auditory, Visual, and
Tactile Signals . In The Human Factors and Ergonomics Society Annual Meeting, volume 56,
pages 1307–1311. Sage Publications, 2012.
C. F. Lewis and M. K. McBeath. Bias to experience approaching motion in a three-
dimensional virtual environment. Perception, 33(3):259–276, 2004.
C. Lichtenthäler and A. Kirsch. Legibility of Robot Behavior: A Literature Review. 2016.
A. Löcken, H. Müller, W. Heuten, and S. C. Boll. Exploring the Design Space of Ambient
Light Displays. In ACM Conference on Human Factors in Computing Systems Extended
Abstracts, pages 387–390. ACM, 2014.
K. F. MacDorman, S. K. Vasudevan, and C.-C. Ho. Does Japan really have robot mania?
Comparing attitudes by implicit and explicit measures. AI & society, 23(4):485–510,
2009.
V . Maljkovic and K. Nakayama. Priming of pop-out: I. role of features. Memory & Cogni-
tion, 22(6):657–672, 1994.
N. Martelaro, V . C. Nneji, W. Ju, and P . Hinds. Tell Me More Designing HRI to Encourage
More Trust, Disclosure, and Companionship. In ACM/IEEE International Conference on
Human-Robot Interaction, pages 181–188. IEEE, 2016.
E. Martinson and D. Brock. Improving Human-Robot Interaction through Adaptation to
the Auditory Scene. In ACM/IEEE International Conference on Human-Robot Interaction,
pages 113–120. ACM, 2007.
D. Matsui, T. Minato, K. F. MacDorman, and H. Ishiguro. Generating Natural Motion
in an Android by Mapping Human Motion. In IEEE/RSJ International Conference on
Intelligent Robots and Systems, pages 3301–3308. IEEE, 2005.
A. Matviienko, A. Löcken, A. El Ali, W. Heuten, and S. Boll. NaviLight: investigating
ambient light displays for turn-by-turn navigation in cars. In International Conference on
Human-Computer Interaction with Mobile Devices and Services, pages 283–294. ACM, 2016.
D. S. McCrickard and C. M. Chewar. Attuning Notification Design to User Goals and
Attention Costs. Communications of the ACM, 46(3):67–72, 2003.
D. S. McCrickard, R. Catrambone, C. M. Chewar, and J. T. Stasko. Establishing tradeoffs
that leverage attention for utility: empirically evaluating information display in notifi-
cation systems. International Journal of Human-Computer Studies, 58(5):547–582, 2003a.
D. S. McCrickard, C. M. Chewar, J. P . Somervell, and A. Ndiwalana. A Model for No-
tification Systems Evaluation-Assessing User Goals for Multitasking Activity. ACM
Transactions on Computer-Human Interaction, 10(4):312–338, 2003b.
161
R. R. McNeer, J. Bohórquez, Ö. Özdamar, A. J. Varon, and P . Barach. A new paradigm
for the design of audible alarms that convey urgency information. Journal of Clinical
Monitoring and Computing, 21(6):353–363, 2007.
J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. Annual
Review of Psychology, 42(1):135–159, 1991.
K. Mombaur, A. Truong, and J.-P . Laumond. From human to humanoid locomotion-an
inverse optimal control approach. Autonomous Robots, 28(3):369–383, 2010.
B. Mutlu, T. Shiwa, T. Kanda, H. Ishiguro, and N. Hagita. Footing In Human-Robot Con-
versations: How Robots Might Shape Participant Roles Using Gaze Cues. In ACM/IEEE
International Conference on Human-Robot Interaction, pages 61–68. ACM, 2009a.
B. Mutlu, F. Yamaoka, T. Kanda, H. Ishiguro, and N. Hagita. Nonverbal Leakage
in Robots: Communication of Intentions through Seemingly Unintentional Behavior.
In ACM/IEEE International Conference on Human-Robot Interaction, pages 69–76. ACM,
2009b.
S. Nikolaidis and J. Shah. Human-Robot Cross-Training: Computational Formulation,
Modeling and Evaluation of a Human Team Training Strategy. In Proceedings of the 8th
ACM/IEEE international conference on Human-robot interaction, pages 33–40. IEEE Press,
2013.
L. Oestreicher and K. S. Eklundh. User Expectations on Human-Robot Co-operation.
In IEEE International Symposium on Robot and Human Interactive Communication, pages
91–96. IEEE, 2006.
S. Paepcke and L. Takayama. Judging a Bot By Its Cover: An Experiment on Expecta-
tion Setting for Personal Robots. In ACM/IEEE International Conference on Human-Robot
Interaction, pages 45–52. IEEE, 2010.
K. Papadopoulos, K. Papadimitriou, and A. Koutsoklenis. The Role of Auditory Cues in
the Spatial Knowledge of Blind Individuals. International Journal of Special Education, 27
(2):169–180, 2012.
R. D. Patterson and T. Mayfield. Auditory warning sounds in the work environment.
Philosophical Transactions of the Royal Society of London: Biological Sciences, 327(1241):485–
492, 1990.
D. R. Perrott and T. N. Buell. Judgments of sound volume: Effects of signal duration,
level, and interaural characteristics on the perceived extensity of broadband noise. The
Journal of the Acoustical Society of America, 72(5):1413–1417, 1982.
E. Phillips, S. Ososky, J. Grove, and F. Jentsch. From Tools to Teammates: Toward the
Development of Appropriate Mental Models for Intelligent Robots. In Human Factors
and Ergonomics Society Annual Meeting, volume 55, pages 1491–1495. SAGE Publications,
2011.
162
P . Popoff-Asotoff, J. Holgate, and J. Macpherson. Which is Safer–Tonal or Broadband
Reversing Alarms?). In Acoustics, 2011.
Z. Pousman and J. Stasko. A Taxonomy of Ambient Information Systems: Four Patterns
of Design. In Working Conference on Advanced Visual Interfaces, pages 67–74. ACM, 2006.
A. Powers and S. Kiesler. The Advisor Robot: Tracing People’s Mental Model from a
Robot’s Physical Attributes. In ACM/IEEE International Conference on Human-Robot In-
teraction, pages 218–225. ACM, 2006.
M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng.
Ros: an open-source robot operating system. In IEEE International Conference on Robotics
and Automation, volume 3, page 5. IEEE, 2009.
D. J. Rea, J. E. Young, and P . Irani. The Roomba Mood Ring: An Ambient-Display Robot.
In ACM/IEEE International Conference on Human-Robot Interaction Late Breaking Report,
pages 217–218. ACM, 2012.
R. Read and T. Belpaeme. How to Use Non-Linguistic Utterances to Convey Emotion in
Child-Robot Interaction. In ACM/IEEE International Conference on Human-Robot Interac-
tion Late Breaking Report, pages 219–220. ACM, 2012.
R. Read and T. Belpaeme. Non-Linguistic Utterances Should be Used Alongside Lan-
guage, Rather than on their Own or as a Replacement. In ACM/IEEE International
Conference on Human-Robot Interaction Late Breaking Report, pages 276–277. ACM, 2014a.
R. Read and T. Belpaeme. Situational Context Directs How People Affectively Interpret
Robotic Non-Linguistic Utterances. In ACM/IEEE International Conference on Human-
Robot Interaction, pages 41–48. ACM, 2014b.
L. D. Riek, T.-C. Rabinowitch, B. Chakrabarti, and P . Robinson. How Anthropomorphism
Affects Empathy Toward Robots. In ACM/IEEE International Conference on Human-Robot
Interaction Late Breaking Report, pages 245–246. ACM, 2009.
S. Rosenthal and M. Veloso. Modeling humans as observation providers using pomdps.
In RO-MAN, 2011 IEEE, pages 53–58. IEEE, 2011.
S. Rosenthal and M. M. Veloso. Mobile Robot Planning to Seek Help with Spatially-
Situated Tasks. In AAAI Conference on Artificial Intelligence, page 1, 2012.
S. Rosenthal, A. K. Dey, and M. Veloso. Using decision-theoretic experience sampling
to build personalized mobile phone interruption models. In International Conference on
Pervasive Computing, pages 170–187. Springer, 2011.
V . Rousseau, F. Ferland, D. Létourneau, and F. Michaud. Sorry to Interrupt, But May I
Have Your Attention? Preliminary Design and Evaluation of Autonomous Engagement
in HRI. Journal of Human-Robot Interaction, 2(3):41–61, 2013.
D. Sadigh, S. S. Sastry, S. A. Seshia, and A. Dragan. Information Gathering Actions
over Human Internal State. In IEEE/RSJ International Conference on Intelligent Robots and
Systems, pages 66–73. IEEE, 2016.
163
M. Salem, K. Rohlfing, S. Kopp, and F. Joublin. A Friendly Gesture: Investigating the
Effect of Multimodal Robot Behavior in Human-Robot Interaction. In IEEE International
Symposium on Robot and Human Interactive Communication, pages 247–252. IEEE, 2011.
M. Salem, S. Kopp, I. Wachsmuth, K. Rohlfing, and F. Joublin. Generation and Evaluation
of Communicative Robot Gesture. International Journal of Social Robotics, 4(2):201–217,
2012.
M. Salem, M. Ziadee, and M. Sakr. Marhaba, how may I help you? Effects of Politeness
and Culture on Robot Acceptance and Anthropomorphization. In ACM/IEEE Interna-
tional Conference on Human-Robot Interaction, pages 74–81. ACM, 2014.
P . Saulnier, E. Sharlin, and S. Greenberg. Exploring Minimal Nonverbal Interruption in
HRI. In IEEE International Symposium on Robot and Human Interactive Communication,
pages 79–86. IEEE, 2011.
A. Sauppé and B. Mutlu. Robot Deictics: How Gesture and Context Shape Referen-
tial Communication. In ACM/IEEE International Conference on Human-Robot Interaction,
pages 342–349. ACM, 2014.
B. Scassellati. Theory of Mind for a Humanoid Robot. Autonomous Robots, 12(1):13–24,
2002.
C. E. Shannon and W. Weaver. The mathematical theory of communication. University of
Illinois press, 1998.
R. M. Siino and P . J. Hinds. Robots, Gender & Sensemaking: Sex Segregation’s Impact On
Workers Making Sense Of a Mobile Autonomous Robot. In IEEE International Conference
on Robotics and Automation, pages 2773–2778. IEEE, 2005.
A. Sirkka, J. Fagerlönn, S. Lindberg, and R. Frimalm. An Auditory Display to Convey
Urgency Information in Industrial Control Rooms. In International Conference on Engi-
neering Psychology and Cognitive Ergonomics, pages 533–544. Springer, 2014.
M. Sneddon, K. Pearsons, and S. Fidell. Laboratory study of the noticeability and an-
noyance of low signal-to-noise ratio sounds. Noise Control Engineering Journal, 51(5):
300–305, 2003.
R. Stalnaker. Common Ground. Linguistics and philosophy, 25(5):701–721, 2002.
A. Steinfeld, T. Fong, D. Kaber, M. Lewis, J. Scholtz, A. Schultz, and M. Goodrich. Com-
mon Metrics for Human-Robot Interaction. In ACM/IEEE International Conference on
Human-Robot Interaction, pages 33–40. ACM, 2006.
S. S. Sundar, T. F. Waddell, and E. H. Jung. The Hollywood Robot Syndrome Media Effects
on Older Adults’ Attitudes toward Robots and Adoption Intentions. In ACM/IEEE
International Conference on Human-Robot Interaction, pages 343–350. IEEE, 2016.
D. Szafir, B. Mutlu, and T. Fong. Communication of Intent in Assistive Free Flyers. In
ACM/IEEE International Conference on Human-Robot Interaction, pages 358–365. ACM,
2014.
164
D. Szafir, B. Mutlu, and T. Fong. Communicating Directionality in Flying Robots. In
ACM/IEEE International Conference on Human-Robot Interaction, pages 19–26. ACM, 2015.
D. Szafir, B. Mutlu, and T. Fong. Designing planning and control interfaces to support
user collaboration with flying robots. International Journal of Robotics Research, pages
1–29, 2017.
D. J. Szafir. Human Interaction with Assistive Free-Flying Robots. PhD thesis, The University
of Wisconsin-Madison, 2015.
L. Takayama and H. Harris. Presentation of (Telepresent) Self: On the Double-Edged
Effects of Mirrors. In ACM/IEEE International Conference on Human-Robot Interaction,
pages 381–388. IEEE, 2013.
L. Takayama, D. Dooley, and W. Ju. Expressing Thought: Improving Robot Readabil-
ity with Animation Principles. In ACM/IEEE International Conference on Human-Robot
Interaction, pages 69–76. ACM, 2011.
K. Terada and A. Ito. Can a robot deceive humans? In ACM/IEEE International Conference
on Human-Robot Interaction Late Breaking Report, pages 191–192. IEEE Press, 2010.
K. Terada, T. Shamoto, A. Ito, and H. Mei. Reactive Movements of Non-humanoid Robots
Cause Intention Attribution in Humans. In IEEE/RSJ International Conference on Intelli-
gent Robots and Systems, pages 3715–3720. IEEE, 2007.
A. Thomaz, G. Hoffman, M. Cakmak, et al. Computational Human-Robot Interaction.
Foundations and Trends in Robotics, 4(2-3):105–223, 2016.
T. Tojo, Y. Matsusaka, T. Ishii, and T. Kobayashi. A Conversational Robot Utilizing Facial
and Body Expressions. In IEEE International Conference on Systems, Man, and Cybernetics,
pages 858–863. IEEE, 2000.
J. G. Trafton, A. C. Schultz, M. Bugajska, and F. Mintz. Perspective-taking with Robots:
Experiments and models. In IEEE International Symposium on Robot and Human Interac-
tive Communication, pages 580–584. IEEE, 2005.
J. G. Trafton, A. Jacobs, and A. M. Harrison. Building and verifying a predictive model of
interruption resumption. Proceedings of the IEEE, 100(3):648–659, 2012.
L. Turchet, S. Spagnol, M. Geronazzo, and F. Avanzini. Localization of self-generated
synthetic footstep sounds on different walked-upon materials through headphones.
Virtual Reality, 20(1):1–16, 2016.
L. D. Turner, S. M. Allen, and R. M. Whitaker. Interruptibility prediction for ubiquitous
systems: conventions and new directions from a growing field. In Proceedings of the 2015
ACM international joint conference on pervasive and ubiquitous computing, pages 801–812.
ACM, 2015.
165
C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark, J. Dolan, D. Duggins,
T. Galatali, C. Geyer, M. Gittleman, S. Harbaugh, M. Hebert, T. M. Howard, S. Kolski,
A. Kelly, M. Likhachev, M. McNaughton, N. Miller, K. Peterson, B. Pilnick, R. Rajku-
mar, P . Rybski, B. Salesky, Y.-W. Seo, S. Singh, J. Snider, A. Stentz, W. R. L. Whittaker,
Z. Wolkowicki, and J. Ziglar. Autonomous Driving in Urban Environments: Boss and
the Urban Challenge. Journal of Field Robotics, 25(8):425–466, June 2008.
D. Vogel and R. Balakrishnan. Interactive Public Ambient Displays: Transitioning from
Implicit to Explicit, Public to Personal, Interaction with Multiple Users. In ACM Sym-
posium on User Interface Software and Technology, pages 137–146. ACM, 2004.
M. L. Walters, D. S. Syrdal, K. Dautenhahn, R. Te Boekhorst, and K. L. Koay. Avoiding
the Uncanny Valley - Robot Appearance, Personality and Consistency of Behavior in
an Attention-Seeking Home Scenario for a Robot Companion. Autonomous Robots, 24
(2):159–178, 2008.
L. E. Wool, S. J. Komban, J. Kremkow, M. Jansen, X. Li, J.-M. Alonso, and Q. Zaidi.
Salience of unique hues and implications for color theory. Journal of Vision, 15(2):10–10,
2015.
P . R. Wurman, R. D’Andrea, and M. Mountz. Coordinating Hundreds of Cooperative,
Autonomous Vehicles in Warehouses. AI magazine, 29(1):9, 2008.
M. Yim, Y. Zhang, and D. Duff. Modular Robots. IEEE Spectrum, 39(2):30–34, 2002.
W. A. Yost. Fundamentals of hearing: An introduction. Academic Press, 1994.
A. Zhou, D. Hadfield-Menell, A. Nagabandi, and A. D. Dragan. Expressive Robot Motion
Timing. In ACM/IEEE International Conference on Human-Robot Interaction, pages 22–31.
ACM, 2017.
166
Abstract (if available)
Abstract
As robots increasingly perform tasks in a diverse set of real-world environments, they are expected to not only operate in close proximity to humans but interact with them as well. This has led to great interest in the communication challenges associated with the varying degrees of coordination and collaboration required between humans and robots for these tasks. Non-humanoid robots can benefit from the use of nonverbal signals as they often lack the communication modalities that humans intrinsically rely on to obtain important state information. ❧ The goal of this thesis is to enable non-humanoid robots to intelligently utilize nonverbal signals to communicate information about their internal state. As interaction is a complex process, we propose a computational framework that formalizes the robot’s communication behavior as a decision making problem under uncertainty. Building on prior work in notification systems, this framework takes into account information about the human and robot and attempts to balance their individual objectives to create more acceptable robot behavior. ❧ To inform the framework’s Markov Decision Process model, we explored the design space of light, sound, and motion for nonverbal signaling during a human-robot collaboration task. We present three user studies that identify underlying signal design principles based on human perceptions. We applied the findings of these studies to interaction scenarios in three different experiments. To increase the generalizability of this research, we employed several types of non-humanoid robot platform that vary in appearance and capabilities. ❧ Finally, we applied the communication framework to a simulated human-robot collaboration task. A policy for the robot’s nonverbal signaling behavior was generated using model-free reinforcement learning. This experiment evaluated the impact of the robot’s actions on participants’ perceptions of the robot as a teammate. Results showed that the use of this framework enables the robot to not only improve its own task-oriented outcomes but to act as a more thoughtful and considerate agent during interaction with humans. ❧ This research contributes to both the design and planning of nonverbal communication for non-humanoid robot platforms with both theoretically and empirically driven methodologies. Although the number of non-humanoid robots deployed in the world is growing, this field of research is still maturing. This work provides a foundation for future human-robot interaction research in these areas while promoting generalizability and standardization of robot behaviors across the diverse set of existing non-humanoid robots.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Coordinating social communication in human-robot task collaborations
PDF
Situated proxemics and multimodal communication: space, speech, and gesture in human-robot interaction
PDF
Managing multi-party social dynamics for socially assistive robotics
PDF
Towards socially assistive robot support methods for physical activity behavior change
PDF
The task matrix: a robot-independent framework for programming humanoids
PDF
Socially assistive and service robotics for older adults: methodologies for motivating exercise and following spatial language instructions in discourse
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Robot life-long task learning from human demonstrations: a Bayesian approach
PDF
Physical and social adaptation for assistive robot interactions
PDF
Data-driven acquisition of closed-loop robotic skills
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
Optimization-based whole-body control and reactive planning for a torque controlled humanoid robot
PDF
Intelligent robotic manipulation of cluttered environments
PDF
Macroscopic approaches to control: multi-robot systems and beyond
PDF
Active sensing in robotic deployments
PDF
Multiparty human-robot interaction: methods for facilitating social support
PDF
Quality diversity scenario generation for human robot interaction
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
Learning from planners to enable new robot capabilities
Asset Metadata
Creator
Cha, Elizabeth Worley
(author)
Core Title
Nonverbal communication for non-humanoid robots
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/20/2018
Defense Date
08/29/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithms,collaboration,Communication,human-computer interaction,human-robot interaction,OAI-PMH Harvest,robotics,user-centered design
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Matarić, Maja J. (
committee chair
), Georgiou, Panayiotis (
committee member
), Sukhatme, Gaurav (
committee member
)
Creator Email
echa@usc.edu,elizabeth.w.cha@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-108262
Unique identifier
UC11675352
Identifier
etd-ChaElizabe-6976.pdf (filename),usctheses-c89-108262 (legacy record id)
Legacy Identifier
etd-ChaElizabe-6976.pdf
Dmrecord
108262
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Cha, Elizabeth Worley
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithms
collaboration
human-computer interaction
human-robot interaction
robotics
user-centered design