Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Decision support systems for adaptive experimental design of autonomous, off-road ground vehicles
(USC Thesis Other)
Decision support systems for adaptive experimental design of autonomous, off-road ground vehicles
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DECISION SUPPORT SYSTEMS FOR ADAPTIVE EXPERIMENTAL DESIGN OF
AUTONOMOUS, OFF-ROAD GROUND VEHICLES
by
Jason M. Gregory
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2024
Copyright 2024 Jason M. Gregory
Dedication
To my family and friends, both human and robotic.
ii
Acknowledgements
This endeavor was made possible by the help from many people over the past several years, and I am
forever grateful for their contributions. First, I am significantly indebted to my advisor, Dr. Satyandra K.
Gupta, who taught me countless lessons - both technical and non-technical. I would not be the researcher
I am today if it weren’t for Dr. Gupta’s wisdom, patience, foresight, and willingness to discuss material
at the dry erase board any time of the day (or night). Words simply cannot express my gratitude for
how Dr. Gupta championed my professional development and personal growth, first and foremost, while
promoting my technical skill sets. I am also thankful for the members of my committee, Drs. Gaurav
Sukhatme, Heather Culbertson, Stefanos Nikolaidis, and Quan Nguyen, because of the time and feedback
that they provided to improve the quality of my work.
I would like to extend my sincere gratitude to my University of Southern California lab mates, Sarah
Al-Hussaini, Ariyan Kabir, and Brual Shah, who welcomed me to Los Angeles and helped make my PhD
journey a memorable one. The late nights of working in the lab together were incredibly rewarding for
me and have become cherished memories. Additionally, completing this dissertation would not have been
possible without the generous support from the DEVCOM Army Research Laboratory (ARL), who financed
my research. I am profoundly grateful for the contributions my ARL colleagues made to my academic and
professional careers as well as the lifelong friendships that we’ve developed while pouring blood, sweat,
and tears into our field experiments. I would like to specifically thank Dave Baran, Jon Fink, Ethan Stump,
John Rogers, Felix Sanchez, Eli Lancaster, Long Quang, Daniel Sahu, Trevor Rocks, and Julie Foresta.
iii
Through my research with my ARL colleagues, I also collaborated with world class roboticists at the Defense Advanced Research Projects Agency. I am truly thankful for the opportunities afforded to me and
decades of wisdom imparted by Stuart Young, Mike Perschbacher, Scott Fish, and Doug Hackett.
I would be remiss if I didn’t express my deepest appreciation to my family and friends who unknowingly provided me with the necessary support and inspiration to cross the finish line. My parents, Cheryle
and Rick, may never understand how important their unconditional love, selflessness, encouragement,
and reminders to hang tough were to me. My siblings and nephews, Kim, Rob, Robbie, and Andrew, have
always fostered my passions and motivated me to pursue a career in robotics. Finally, I cannot imagine
completing this journey without the love and support of my significant other, Karen Reed. In addition
to responding to every paper deadline and technical hurdle with patience and compassion, she offered
encouragement, stability, and confidence in me when I needed it the most.
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Publication Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2: Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Cyber Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Field Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Model Based Systems Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 System Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Shared Autonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.10 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.11 Generative AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 3: Taxonomy of Decision Support Systems for Experimental Design . . . . . . . . . . . . . 45
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Guiding Insights From Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 DSS Assistance Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.3 Evaluation and Metrics for DSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
v
3.4 Stages of Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.1 Concept of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Stage 0: No Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.3 Stage 1: Design Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4.4 Stage 2: Design Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.5 Stage 3: Conditional Design Recommendation . . . . . . . . . . . . . . . . . . . . . 63
3.4.6 Stage 4: Single Design Recommendation . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.7 Stage 5: Sequential Design Recommendation . . . . . . . . . . . . . . . . . . . . . 64
3.5 Construction Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Chapter 4: Design Assistance for Structured Experimental Design . . . . . . . . . . . . . . . . . . . 70
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Stage 1 DSS Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1 Empirical Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.2 Support Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Exploratory User Study on Design Assistance . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 5: Design Monitoring for Proactive Decision Support . . . . . . . . . . . . . . . . . . . . . 94
5.1 The Need for Proactive Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Stage 2 DSS Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.1 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2.2 Alert Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Exploratory User Study on Design Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.5 Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Chapter 6: Design Recommendation for More Informed Experiment Selection . . . . . . . . . . . . 135
6.1 The Role of Design Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3 Stage 3 DSS Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.1 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.2 Experimental Design Mathematical Framework . . . . . . . . . . . . . . . . . . . . 141
6.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4.1 System Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
vi
6.4.2 Formulation and Evaluation for Autonomous Navigation . . . . . . . . . . . . . . . 145
6.4.3 Adaptive Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.4.4 Adaptive Experimental Design Using Prior Knowledge . . . . . . . . . . . . . . . . 148
6.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Chapter 7: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.1 Intellectual Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2 Anticipated Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.4 Opportunities to Bridge the HRI Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
vii
List of Tables
3.1 An overview of the proposed six-stage taxonomy of design support systems for
adaptive experimental design in field robotics. . . . . . . . . . . . . . . . . . . . . . 61
5.1 Training and validation results from 5-fold cross validation. . . . . . . . . . . . . . . . . . 111
6.1 Results from Scenario 1 of an adaptive experimental design. Components the
experimenter explored were the: mapper (M), global planner (GP), local planner
(LP), and controller (C). Test assessments include: failure (F), success with
unsatisfied constraints (SU), and success with fully satisfied constraints (S). . . 151
6.2 Results from Scenario 2 of an adaptive experimental design where the experimenter leveraged prior knowledge. In addition to the component definitions in
Table 6.1, the experimenter also explored the hardware (HW). . . . . . . . . . . . . 154
viii
List of Figures
1.1 Future autonomous robotic systems are expected to operate alongside humans in the
real-world so they require insightful testing and evaluation methods for humans to
sufficiently understand performance and limitations. . . . . . . . . . . . . . . . . . . . . . 2
1.2 Current methods for experimentation can be incredibly labor intensive and require many
people for operations, networking, testing, analysis, and other logistics. . . . . . . . . . . . 4
1.3 Risk to human and robot safety is oftentimes an inevitable aspect of experimentation
of field robotics that must be properly managed. For example, a robot can tipover and
damage sensors but experimenters can also learn a lot about performance and limitations
by conducting experiments at the edge of operational limits. . . . . . . . . . . . . . . . . . 4
1.4 The Systems Engineering Process was designed to provide a path for improving the cost
effectiveness over the course of the system lifecycle, but has been recently challenged
with the prominence of AI/ML-enabled, experimental robotics. [167] . . . . . . . . . . . . 6
1.5 The software architecture diagram of the ARL Ground Autonomy Software Stack.
Autonomous ground vehicles require an end-to-end navigation stack, such as this one,
that take sensor input and produce motor commands. The focus of this dissertation is to
investigate decision support systems that augment the human’s decision making abilities
when designing experiments for autonomous ground robots and navigation stacks. . . . . 8
1.6 An example of the Robot Operating System graph containing nodes and topics of the ARL
Ground Autonomy Software Stack shown in Figure 1.5. The node and topic labels are
intentionally unreadable to capture the vast complexity of the overall system. . . . . . . . 9
1.7 Context is a defining challenge in experimentation of robotics. Take, for example, (a) the
traditional academic problem of shortest path motion planning. In real-world applications,
a variant of this problem could be (b) to plan paths with respect to key, structured terrain
or (c) with respect to kinodynamic constraints to ensure vehicle roll-over safety. Either of
these variants could additionally include (d) formations and high-level maneuvers or (e)
adversarial aspects that alter the resulting paths and subsequent evaluation metrics. . . . 12
1.8 The scope of this dissertation lies in system development where significant experimentation is required to develop an understanding of autonomous ground robots. . . . . . . . . 13
ix
1.9 Examples of off-road-capable ground vehicles. The scope of this dissertation lies strictly
within experimental design for autonomous ground field robots, such as these. . . . . . . 15
1.10 My previous work on autonomous navigation capabilities, such as SLAM, path planning,
and control have informed the evaluation of individual autonomy components in this work. 16
1.11 My previous work on multi-robot teaming, such as information exchange through
rendezvous, robot rescue strategies, and multi-robot task allocation informed how larger,
complex systems can be evaluated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 My previous work on human-robot interaction techniques, including augmented reality,
robot-based change detection, virtual reality, and gesture control provided insights for
how experimenters interact with robotic systems and possible avenues for information
flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.13 My previous work on alert generation frameworks for human-supervised robot teams
informed approaches in this work for proactively notifying decision makers. . . . . . . . . 19
2.1 The proposed architecture from [170] for Intelligent Decision Support systems, which add
artificial intelligence functions to traditional DSSs. . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 An illustration borrowed from [247] that shows the goal of using ChatGPT to enable users
on the loop that can seamlessly deploy platforms. . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1 Photos from my experiments testing autonomous navigation capabilities of field robots,
specifically off-road ground vehicles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Stage 0: No Support, which is representative of how the majority of field robotics
experimentation is conducted today. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Stage 1: Design Assistance - support is provided in the form of information-aiding prompts. 49
3.4 Stage 2: Design Monitoring - support is provided in the form of alerts pertaining to
experiments that could be or already have been conducted. . . . . . . . . . . . . . . . . . 50
3.5 Stage 3: Conditional Design Recommendation - support is provided in the form of
a partially-defined experiment recommendation with respect to either parameters or
components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Stage 4: Single Design Recommendation - support is provided in the form of one complete
experiment recommendation that could feasibly be executed without modification. . . . . 52
3.7 Stage 5: Sequential Design Recommendation - support is provided in the form of a
recommendation for a series of experiments that could feasibly be executed sequentially
without modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.8 The SAE J3016 levels of driving automation provided inspiration for the taxonomy of
experimental design DSSs in this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
x
3.9 Three illustrative scenarios of adaptive experimental design in a hypothetical study
containing several experiments. An experimenter can choose to monotonically increase
the stages of decision support (blue line), increase until a certain stage but then plateau
(green line), or increase and then decrease the DSS stage (purple line). Importantly,
the experimenter can choose to use any stage (provided that the DSS has the necessary
features) for any number of experiments and change the stage in response to new
observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.10 The six-stage taxonomy of DSSs for experimental design was extensively validated
through field experimentation involving different biomes, sensor suites, platforms, and
autonomy stacks. Every decision observed during experimentation could be mapped do
support defined in a stage within the taxonomy. . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1 One of the two missions conducted in the forest using Research Platform 1. Blue discs
indicate the ordered goal locations of the waypoint mission. . . . . . . . . . . . . . . . . . 75
4.2 One of the two missions conducted in the desert using Research Platform 2. Blue discs
indicate the ordered goal locations of the waypoint mission. . . . . . . . . . . . . . . . . . 76
4.3 A diagram showing the deployment of the proposed Stage 1 DSS. The system consists of
two questionnaires, one which is answered by the human experimenter before conducting
an experiment and one to be answered after the experiment. The key themes of the
questions are shown for brevity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4 The participants were shown an image of a real robot (Figure 4.1b), the equivalent robot
in simulation (Figure 4.4a), and a video of a simulated waypoint mission (a representative
screenshot is shown in Figure 4.4b) as introductory material. For designing experiments,
participants were shown a simulated dense forest environment (Figure 4.4c) and given
written context that included the robot’s size, number of waypoints, total length of the
mission, environmental characteristics (e.g., trees, logs, trails, bushes, tall grass, ravines,
and rolling hills), the season, and a weather forecast for the simulated conditions. . . . . . 83
4.5 The participants constructed experiments by choosing from a list of 12 hardware
components, 16 software components, and 9 algorithmic parameters. The baseline
configuration is represented by the checkmarked components and default parameter values. 84
4.6 The first five questions of the Stage 1 DSS questionnaire that were provided to the
participants in the assisted group. Participants were given the option to answer silently or
by typing in the free response text box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7 The remaining four questions of the Stage 1 DSS questionnaire that were provided to the
participants in the assisted group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Survey results for the dense forest scenario. Check marks indicate the participant
performed the task indicated by the corresponding column title and ‘X’ marks indicate
they did not. Red-shaded and blue-shaded cells represent the corresponding mistakes and
suboptimal decisions, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
xi
4.9 Results from the exit survey of the human subjects study where participants were asked
to answer 5-point Likert scale questions to indicate whether they thought experiment
planning using the DSS questionnaire was useful and burdensome. Note, the control group
(left column of graphs) were asked to retrospectively consider the DSS questionnaire
after designing their experiment whereas the assisted group (right column) used the
questionnaire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1 Examples of experimental designs illustrating a mistake and suboptimal decisions
depending on the selection and hardware and software components given certain context,
namely the testing environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 A system architecture diagram of a mobile manipulator system with an alert generation
module from my previous work [104]. The alert generation module accepts environmental
models, plans, and alert condition settings to produce plan and risk visualizations for a
human supervisor before robot commands are sent to the controller. This architecture
represents one paradigm for alert generation in robot deployment, which provides some
useful insights but is characteristically different than alert generation for experimental
design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3 An example borrowed from [254] illustrating (a) 2-D convolution where neighbors of the
red node are ordered and have a fixed size; and (b) graph convolution where neighbors
are unordered and variable in size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4 One possible representation of an experimental design for a waypoint mission in a forest
with the ARL Ground Autonomy Software Stack using a Graph Neural Network. . . . . . 102
5.5 A combined diagram showing the GNN structure and the subsequent use of inferred
decision quality. If a mistake or suboptimal decision is inferred by the model then an alert
is issued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6 A representative example of a graph representation included in the training dataset to
build the GNN-based alert generation framework. This experimental design corresponds
to Participant 2 in the control group of Study 1 and contains a mistake because the
participant forgot to include the necessary camera hardware input after they added trail
detection software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7 A representative example of a graph representation included in the training dataset to
build the GNN-based alert generation framework. This experimental design corresponds
to Participant 5 in the assisted group of Study 1 and contains a suboptimal decision by not
including software components that would be useful in the forest environment, such as
trail detection, terrain classification, or height mapping. . . . . . . . . . . . . . . . . . . . . 107
5.8 A representative example of a graph representation included in the training dataset to
build the GNN-based alert generation framework. This experimental design corresponds
to Participant 5 in the control group of Study 1 and contains suboptimal decision by
including road detection software, which is deemed unnecessary in the forest environment. 108
xii
5.9 A representative example of a graph representation included in the training dataset to
build the GNN-based alert generation framework. This experimental design corresponds
to Participant 6 in the assisted group of Study 1 and contains no mistakes or suboptimal
deicisions so it is labeled as “good". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.10 A representative example of a graph representation included in the training dataset to
build the GNN-based alert generation framework. This experimental design is artificially
created by an expert experimenter and represents how synthetic data can be incorporated
into the dataset using existing knowledge of the system. . . . . . . . . . . . . . . . . . . . 110
5.11 A screenshot from the video of the simulated robot using the baseline configuration in the
forest environment, which was provided to the participants as an initial experiment. . . . 113
5.12 Alerts provided to the participants whose experimental designs included either mistakes
or suboptimal design decisions in the context of testing autonomous ground robots in the
forest environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.13 A screenshot from the video of the simulated robot after the participant designed an
experiment with a mistake. The top-left shows a third-person view of the simulated robot
in the forest, the top-right shows visualization of the robot’s mapping and planning during
the autonomous waypoint navigation, and the bottom table provides quantitative results.
In this case, the mistake prevented the robot from completing the mission. . . . . . . . . . 116
5.14 Screenshots from the two possible videos that participants received as feedback if their
experimental designs did not contain a mistake. . . . . . . . . . . . . . . . . . . . . . . . . 117
5.15 Participants were asked to conduct post-experiment analysis by stating their observations
related to positive and negative outcomes after receiving feedback from their experimental
designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.16 Participants were asked whether they would conduct another experiment and what
changes they would make as a way to emulate the tradeoff analysis considerations that
are required of experimenters of real-world systems. . . . . . . . . . . . . . . . . . . . . . 120
5.17 The distribution of participants that chose to respond to the questionnaire-based planning
stage silently, by typing, or using a mixture of both. . . . . . . . . . . . . . . . . . . . . . . 122
5.18 The results of Study 2 including (a) the original experimental designs and (b) the final
experimental designs after some participants were alerted based on the inference of the
Stage 2 DSS. The green-highlighted rows in (b) indicate improvements of experimental
design quality after the experimenter made being notified by an alert and revising their
experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.19 The participants’ opinion-based responses to the exit survey in Study 2. . . . . . . . . . . . 125
5.20 Categorizations of the participants’ next experimental designs. In this case “N/A" indicates
that the participants did not want to conduct another experiment. . . . . . . . . . . . . . . 126
xiii
5.21 A representation of the ARL Ground Autonomy Software Stack shown in Figure 1.5 using
a Heterogeneous Graph Neural Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.1 Top-down map of Test 1 in Scenario 1. Light gray regions indicate free space in the
map built online by the robot, blue discs indicate waypoints, bold cyan lines shows the
autonomous robot’s path, light green lines represent replans, and red lines represent
remote control by the experimenter. In this case, the test failed using the experimenter’s
initial system configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2 Top-down map of Test 2 in Scenario 1. In this case, the experimenter chose to reduce the
unknown cell cost after Test 1, which enabled planning to the final waypoint. Recall, the
color representations are explained in the caption of Fig. 6.1. . . . . . . . . . . . . . . . . . 153
6.3 Top-down map of Test 3 in Scenario 2 where the experimenter leveraged prior knowledge.
In this case, the robot navigates the mission successfully, but the experimenter rates the
qualitative score unacceptably low. Recall, the color representations are explained in the
caption of Fig. 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4 Top-down maps of Test 4 in Scenario 2 where the experimenter leveraged prior knowledge.
In this case, the experimenter chose to increase the robot’s forward velocity and disable
loop closure, which produced a faster completion time but required more interventions
and achieved a lower qualitative score. Recall, the color representations are explained in
the caption of Fig. 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.1 Different experimental robotic systems that could benefit from AI-enabled RDT&E tools
using insights and approaches investigated in this dissertation. . . . . . . . . . . . . . . . 163
xiv
Abstract
The rapid advancement of artificial intelligence, machine learning, human robot interaction, and safe
learning and optimization has led to significant leaps in component- and behavior-level capabilities for autonomous robots. However, human capability-enhancing research, development, testing, and evaluation
toward understanding and building trust in next-generation autonomous robots still requires additional
attention. This is important because autonomous robots cannot be deployed safely alongside humans
without sufficient understanding of system performance, limitations, and trustworthiness of capabilities,
which necessitates experimentation. The process of constructing experiments, i.e., experimental design, is
a supreme step in the concept-to-fielding life cycle of an autonomous robot because it dictates the amount
of information gained by the experimenter, the cost of information acquisition, and the rate of building
system understanding. Conducting experiments is challenging, though, due to complexity and context. Autonomous robots can be massively complex, multi-disciplinary systems that use artificial intelligence and
machine learning across a range of components (e.g., perception, state estimation, localization, mapping,
path planning, and control) and experiments are specific to a given scenario, system, experimenter, and
set of multi-objective metrics defined by the intended application. To assist with the adaptive, sequential
decision-making process of experimental design, a Decision Support System (DSS) can potentially augment
the human’s abilities to construct more informative, less wasteful experiments. This dissertation aims to
provide conceptual and computational foundations for DSSs in the domain of adaptive experimental design
for autonomous, off-road ground vehicles. First, I present a six-stage taxonomy of DSSs for experimental
xv
design of ground robots, which is informed and inspired by the vast body of literature of DSS development
in domains outside of robotics. This taxonomy also serves as a roadmap to guide ongoing development of
DSSs for experimental design. Next, I develop and evaluate a Stage 1 DSS that provides design assistance
to experimenters in the form of prompts for the purposes of experimental design conceptualization and
structured thought analysis. Building on this I propose and evaluate a Stage 2 DSS to provide proactive
decision support in the form of alerts so that low value experimental designs might be avoided. Finally, I
lay the groundwork for a Stage 3 DSS to provide narrowly-scoped experimental design recommendations
for assisting with subsequent experimental selections. I anticipate that this work will help improve human
decision-making of experimental design for real-world autonomous ground vehicles.
xvi
Chapter 1
Introduction
1.1 Motivation
Over the past several decades the scientific community has produced a revolution and explosion of autonomous robotic systems through the development of artificial intelligence (AI), machine learning (ML),
human robot interaction (HRI), and safe learning and optimization. These scientific leaps have spanned
nearly all domains of life, including industrial, transportation, home assistive, social, medical, agricultural,
humanitarian assistance, and military applications, as illustrated in Figure 1.1. Robotic research in these
areas has benefited largely due to the massive investments by industry and government. For example,
investment in the self-driving car industry is now in the hundreds of billions of US dollars. Similarly, the
United States military has identified autonomy and AI as key enabling technologies in their Robotic and
Autonomous Systems Strategy and continue to invest funds in the research and development of new capabilities for human-agent teams. Given the expectation that many autonomous systems will be ubiquitous
and work alongside humans within the next 10 years, building trustworthiness and explainablility in our
robotic systems are required for deployment.
1
(a) Autonomous Vehicles (b) Manufacturing
(c) Humanitarian Assistance/Disaster Relief (d) Healthcare
(e) Agriculture (f) Military
Figure 1.1: Future autonomous robotic systems are expected to operate alongside humans in the realworld so they require insightful testing and evaluation methods for humans to sufficiently understand
performance and limitations.
2
Despite these extraordinary advancements and investments, the community continues to be plagued
by insufficient and inferior robotic autonomy with respect to human performance. For example, the auto
industry has yet to advertise a level 5 autonomous car on the market after decades of investment and
robots on the battlefield are largely teleoperated by humans. To this end, there are numerous examples of
AI/ML-based approaches that produce catastrophic results after learning unintended policies or exhibiting
prohibitive brittleness to minor domain shifts in real-world conditions. Commonly cited examples of this
in literature include reinforcement learning-based agents that prefer doing nothing (or destructive tasks)
rather than working toward the defined goal as well as image classification algorithms that fail to identify
stop signs that have negligible perturbations (e.g., minor graffiti). These undesirable, and in some cases
inexplicable, characteristics lead researchers to oftentimes treat AI/ML models as black boxes and rely
on increasingly large training datasets; however, this does not facilitate trust, transparency, or deeper
understanding of autonomous robots, and is not always a feasible solution in data-scare applications.
The scientific community has a growing focus on component- and behavior-level capabilities, but there
lacks sufficient attention to rigorous, human capability-enhancing research, development, testing, and
evaluation (RDT&E) that could strengthen our collective understanding and trust in autonomous robots.
RDT&E efforts are integral to systems engineering (SE) for realizing autonomous systems because it characterizes the state-of-the-art, identifies critical technology gaps, and explores the state-of-the-possible –
all of which can inform the requirement writing and SE communities. This dissertation argues that as
the level of robotic autonomy grows so, too, should the sophistication of the experimental methods for
system evaluation and characterization. Investment in human-centric experimental design solutions can
significantly reduce research barriers and accelerate the development and deployment of trustworthy, explainable autonomous robots.
3
Figure 1.2: Current methods for experimentation can be incredibly labor intensive and require many people
for operations, networking, testing, analysis, and other logistics.
Figure 1.3: Risk to human and robot safety is oftentimes an inevitable aspect of experimentation of field
robotics that must be properly managed. For example, a robot can tipover and damage sensors but experimenters can also learn a lot about performance and limitations by conducting experiments at the edge of
operational limits.
4
Collectively, the community’s current RDT&E strategy is an unsustainable model because it is labor
intensive, slow, expensive, in some cases risky with respect to human and robot safety, and increasingly
problematic as investments in AI grow. An example of a team conducting field experiments of ground
vehicles is shown in Figure 1.2 and an example of a potential outcome of experiments is shown in Figure 1.3.
Writing software, training AI models, and running experiments can require several weeks to months and
cost tens to hundreds of thousands of dollars. Ultimately, the system engineer’s ability to understand and
efficiently test learning-based systems becomes one of the primary bottlenecks in the system life cycle and
introduces significant delays to technology adoption and system deployment.
There are several key challenges and limitations that must be addressed to enable the envisioned deployment of AI-enabled autonomous robots. First, modern day approaches to autonomous robot experimentation are outdated and too slow compared to the rate of robotic capability development and the need
for trustworthiness. The literature has historically used Design of Experiments to construct a-priori test
plans and system designers have developed Testing, Evaluation, Verification, and Validation (TEV&V) techniques for assessing solutions with respect to pre-defined requirements and assurance cases. The United
States Department of Defense has a long history of investments in TEV&V [13, 199] and the Systems Engineering Process, shown in Figure 1.4, has been demonstrated to enable system designers to build large,
complex systems. However, these methods tend to span years, involve higher technology readiness levels, investigate trivial or relatively well-understood system behaviors, incur tremendous expenses, and are
not designed for rapidly evolving robotics research. Second, the community faces a unique challenge with
next-generation autonomy because large, complex AI/ML models and emergent behaviors complicate rulebased testing, the rapid rate of autonomy development introduces technology obsolescence in the testing
cycle, and robots with higher levels of autonomy necessitate more trust from the humans they operate
alongside. Third, RDT&E is complicated by the growing presence of heterogeneous, multi-agent systems
and context-dependent, multi-objective design metrics. Finally, there exists a gap in the scope of effort,
5
priorities, and communication between academia and industry that, if unified, could greatly improve our
collective RDT&E efforts. New testing methods will enable adaptive, non-prescriptive experiments that
efficiently identify scientific underpinnings of autonomous robots and evolve with the experimenter’s understanding of the robot in a “discovery learning” fashion. There are several opportunities for improving
the community’s RDT&E approaches by drawing inspiration from, adapting, and building upon advancements in AI, ML, HRI, and safe learning.
Figure 1.4: The Systems Engineering Process was designed to provide a path for improving the cost effectiveness over the course of the system lifecycle, but has been recently challenged with the prominence of
AI/ML-enabled, experimental robotics. [167]
New testing methods will enable adaptive, non-prescriptive experiments that efficiently identify scientific underpinnings of autonomous robots and evolve with the experimenter’s understanding of the robot.
Practitioners and system designers already encode knowledge in their tests using constraints, but rigid
test procedures lack the flexibility to excite and explore autonomous robots with state-of-the art capabilities in varying contexts. Future experimental design techniques could introduce the necessary knobs for
an experimenter to perform on-the-fly constraint satisfaction, system characterization, and risk-reward
analysis that otherwise isn’t feasible. This greater understanding of research-level autonomy enabled by
advanced experimental design procedures will, in turn, have cascading effects in systems engineering and
6
TEV&V because robots will be better understood throughout the maturation process and accelerate the
rate of realizing and deploying trustworthy autonomous systems.
1.2 Problem Statement
This dissertation aims at increasing the value, decreasing the cost, and balancing the risk associated with
experimentation of autonomous ground vehicles. Towards this, the central problem is how to construct experiments of the robotic system for the benefit of the human experimenter. Experimental design in robotics
research is inherently a human-in-the-loop, sequential decision-making process. Here, a human uses prior
knowledge and insights gained from previously conducted experiments to construct a subsequent experiment that they believe will maximize their understanding of the robot’s limitations and performance while
incurring some cost to conduct the experiment. This process is also a peculiar HRI problem where the interaction between the human and single- or multi-agent autonomous robotic system is in the form of an
experiment. The human has control over the level of autonomy and structure of the team, and the human – not the robot – is the learner with respect to the robot’s ability to perform some domain-specific
task(s) during the experiment. The selected experiment dictates the nature of information exchange and
the human’s understanding correlates to the trustworthiness of the autonomous system.
7
Figure 1.5: The software architecture diagram of the ARL Ground Autonomy Software Stack. Autonomous
ground vehicles require an end-to-end navigation stack, such as this one, that take sensor input and produce motor commands. The focus of this dissertation is to investigate decision support systems that augment the human’s decision making abilities when designing experiments for autonomous ground robots
and navigation stacks.
8
Figure 1.6: An example of the Robot Operating System graph containing nodes and topics of the ARL
Ground Autonomy Software Stack shown in Figure 1.5. The node and topic labels are intentionally unreadable to capture the vast complexity of the overall system.
9
Naturally in this problem statement there are two key challenges: complexity and context. Field robots
operating in unstructured environments alongside humans are inherently complex, multi-disciplinary systems that prohibit a-priori experiment planning. Oftentimes these systems have tens of different applicable
sensors, software architectures for navigation that contain thousands of modular nodes and dynamically
reconfigurable parameters, and ever-evolving AI/ML components to enable learning, which have blackbox interactions and cascading effects. An example of such a complex software architecture is the ARL
Ground Autonomy Software stack, which is shown in Figures 1.5 and 1.6.
Similarly, the design, testing, and application of field robots is dependent on the context by which it
will be used. We use the definition of context from [51], which states “Context is any information that can
be used to characterise the situation of an entity. An entity is a person, place, or object that is considered
relevant to the interaction between a user and an application, including the user and applications themselves." In our case, the entity is the autonomous field robot, the user is the experimenter, the application
is the anticipated use for the robot, and context is relevant information. For example, the software stack
of a field robot shown in Figure 1.5 could be used for a variety of different applications, such as the ones
shown in Figure 1.7, with slight modifications to algorithms, parameters, or AI models within the modular
architecture. As a result, experimental design is scenario-, system-, and experimenter-specific. The scenarios for evaluation reveal information about system performance, the specific system under test dictates
what experiments are feasible, and experimenters inherently have varying levels of expertise and trust
with the robotic system based on experience. Furthermore, the testing scenario and intended deployment
application dictate the metrics for evaluation, which oftentimes leads to multi-objective criteria.
We approach RDT&E of autonomous ground robots from the perspective of human-AI teaming and
take inspiration from “discovery learning” [223], where the human experimenters, e.g., researchers and
practitioners, learn about the behaviors and capabilities of the autonomous robot by collaborating with
AI-enabled tools to augment and boost human decision making during RDT&E. Specifically, our objectives
10
are to develop novel Decision Support Systems (DSSs) to provide cognitive assistances and decision-aiding
tools that catalyze scientific discovery and knowledge acquisition of autonomous robots while balancing
risk.
11
(a) Shortest Path
(b) Structured Terrain (c) Safe Slope Traversal
(d) Formation and Maneuver (e) Tactics
Figure 1.7: Context is a defining challenge in experimentation of robotics. Take, for example, (a) the traditional academic problem of shortest path motion planning. In real-world applications, a variant of this
problem could be (b) to plan paths with respect to key, structured terrain or (c) with respect to kinodynamic constraints to ensure vehicle roll-over safety. Either of these variants could additionally include (d)
formations and high-level maneuvers or (e) adversarial aspects that alter the resulting paths and subsequent evaluation metrics.
12
1.3 Scope
The scope of this dissertation is to investigate DSSs for the purposes of augmenting the humans decision
making abilities in the experimental design of autonomous ground robots. In the concept-to-fielding life
cycle, shown in Figure 1.8, the work in this dissertation sits squarely in the middle at system development.
Some requirements and concepts may have been determined for a desired application or domain but the
system under test is not mature enough to warrant validation, verification, or deployment yet. Instead,
system development is pursued through RDT&E of the autonomous ground robots, which necessitates
extensive experimentation.
Figure 1.8: The scope of this dissertation lies in system development where significant experimentation is
required to develop an understanding of autonomous ground robots.
The specific field robots within scope of this work are Ackermann and differential drive ground vehicles, such as the ones shown in Figure 1.9, that possess autonomous navigation capabilities. In this
dissertation specifically, the ARL Ground Autonomy Software Stack, shown in Figure 1.5, is used as a
representative navigation architecture that provides opportunities for meaningful RDT&E.
It is important to note that the work in this dissertation seeks to investigate new AI-enabled approaches
for DSSs to assist with the experimental design of autonomous ground robots, which may also use AI.
Proposing new AI techniques coupled with DSSs that reveal or assist humans with important experimental
design aspects that might not otherwise be considered is squarely within scope of this work. Developing
the AI techniques used in the navigation capabilities of the autonomous ground robots is not within scope
13
of this dissertation and assumed to be fixed. In other words, experiments of the autonomous ground
robot can be constructed by changing the various perception, Simultaneous Localization and Mapping
(SLAM), planning, and control components of the autonomy stack, and any of these may or may not use
AI; however, modifying or developing new AI-based approaches for individual stack components is not
the focus of this dissertation.
1.4 Organization
This dissertation first introduces decision support systems in the context of field robotic experimental design and continues with the exploration of gradually matured instantiations of different DSSs. First, related
works are presented in Chapter 2 to provide the reader with context regarding the research community.
Chapter 3 then presents a proposed taxonomy for DSSs aimed at experimental design for autonomous
ground vehicles that builds on existing DSS work from domains outside of robotics. In that chapter, the
concept for operations, requirements, and responsibilities for the human and DSS are discussed. The first,
most basic form of a DSS, referred to as Stage 1, is presented and evaluated in Chapter 4. A Stage 2 DSS
and Stage 3 DSS are then presented in Chapters 5 and 6, respectively. In each case, greater functionality
is added to the DSS instantiation to proactively assist the experimenter with constructing experiments.
The defining feature of the Stage 2 DSS is prompts to alert the experimenter before an experimenter is
conducted and the defining feature of the Stage 3 DSS is a recommendation for the subsequent experiment built on Bayesian Optimization. This dissertation concludes with discussion of the contributions and
future directions in Chapter 7.
14
(a) Clearpath Jackal (b) Clearpath Husky
(c) iRobot PackBot (d) Clearpath Warthog
(e) HDT Wolf (f) Polaris RZR
Figure 1.9: Examples of off-road-capable ground vehicles. The scope of this dissertation lies strictly within
experimental design for autonomous ground field robots, such as these.
15
1.5 Publication Note
Portions of this dissertation were published in peer-reviewed venues. Specifically, the terminology and
taxonomy in Chapter 3 appear in [85], the Stage 1 DSS and evaluation described in Chapter 4 appear in
[89], and the mathematical formulation presented in Chapter 6 appears in [88].
I completed a number of research projects and collaborative efforts during my dissertation studies
that are not explicitly discussed here, but revealed the ubiquity of experimental design of autonomous
robots and informed my pursuit of DSSs for testing ground autonomy. These include my publications on
autonomous navigation capabilities [241, 190, 90, 256, 38, 41, 257, 44], multi-robot teaming [83, 84, 109,
107, 86], human-robot interaction using augmented reality, virtual reality, and gesture control [87, 184, 49,
186, 185, 183], and alert-generation frameworks for robot tasking [110, 106, 105, 104, 108]. Examples of
each of these are shown in Figures 1.10-1.13, respectively.
(a) This figure was borrowed from [190].
(b) This figure was borrowed from [38].
Figure 1.10: My previous work on autonomous navigation capabilities, such as SLAM, path planning, and
control have informed the evaluation of individual autonomy components in this work.
16
(a) This figure was borrowed from [107].
(b) This figure was borrowed from [109].
Figure 1.11: My previous work on multi-robot teaming, such as information exchange through rendezvous,
robot rescue strategies, and multi-robot task allocation informed how larger, complex systems can be
evaluated.
17
(a) This figure was borrowed from [183].
(b) This figure was borrowed from [49].
(c) This figure was borrowed from [184].
Figure 1.12: My previous work on human-robot interaction techniques, including augmented reality, robotbased change detection, virtual reality, and gesture control provided insights for how experimenters interact with robotic systems and possible avenues for information flow.
18
(a) This figure was borrowed from [105].
(b) This figure was borrowed from [105].
Figure 1.13: My previous work on alert generation frameworks for human-supervised robot teams informed approaches in this work for proactively notifying decision makers.
19
Chapter 2
Literature Review
2.1 Overview
This chapter provides an overview of the literature as it pertains to experimental design for robotic systems
to establish scientific community context for this dissertation. First an introduction to Cyber Physical
Systems is presented as a broad category of robotic systems and the subcategory of field robots is described
since these systems are the focus on this dissertation. Next, the field of Model-Based Systems Engineering is
outlined to explain techniques and tools that are used during a complex system “life cycle", which includes
everything from conceptualization through deployment and maintenance in the real world. As part of
the robotic system life cycle, experimentation and ways to build system understanding play integral roles
that enable technology maturation and adoption. An in-depth presentation of the challenges and existing
approaches of both experimental design and system understanding are provided as the basis for existing
works that this dissertation will build from. Over the past several decades, new fields of research have been
created specifically to assist humans with decision making, which is relevant because experimental design
is a sequential decision making problem. Expert Systems and Decision Support Systems are two such
examples of decision making tools, and both are described in considerable detail since Decision Support
Systems are the key technology investigated in this dissertation. To help differentiate the paradigm of
human-supervised experimental design and user intervention to dynamically regulate system behavior, a
20
subchapter on shared autonomy is provided. Finally, two promising areas that are expected to be relevant
in experimental design are presented. An explanation of Active Learning literature is given from the
perspective of assisting humans with decision making toward the goal of learning about robotic systems,
and Generative AI is highlighted given its enormous success and anticipated impact on experimental design
recommendation.
2.2 Cyber Physical Systems
A Cyber Physical System (CPS) is defined in literature as a new generation of systems with integrated
computational and physical capabilities that can interact with humans through many modalities [19]. The
research community has contributed significant advancements in CPSs, especially in terms of system level
design [197] and software development [20]. Generally speaking, researchers in this field seek to develop
tools and frameworks for system engineers and designers to manage the complexity associated with CPSs.
This includes technologies such as hierarchical design space exploration of heterogeneous embedded systems using symbolic search and multi-granular simulation [156], a framework for component-based modeling using an abstract layered model based on behavior and interaction models [81], approaches for system
engineering of human-machine systems [25], and composition theory for stability of heterogeneous systems [229]. Since CPS design is currently so time consuming and challenging, recent efforts have focused
on new AI-based methods for symbiotic, correct-by-construction design of CPS where automated systems
seek to predict CPS properties with high fidelity prior to construction as a way to eliminate the need for
costly design-test-redesign cycles [47]. To achieve new advancements in CPS design, researchers have
looked to improve modeling and simulation techniques, including model- and data-driven tools [261],
virtual prototyping for integration [164], new tool suites such as OpenMETA [22, 228], and simulation
testbeds for security and resilience [163]. In addition to modeling and simulation, the research community
21
has also sought formal analysis methods for the design of CPS which can increase the speed at which systems are designed while respecting requirements and specifications [238, 69, 131, 206]. Similar to existing
CPS works, my dissertation also seeks to eliminate costs related to system development; however, this
work is explicitly focused on experimental design for improving human understanding in real-world testing, as opposed to system design, modeling, and simulation, to address some of the monumental challenges
of field robotics.
2.3 Field Robots
An area related to CPS is field robots, which consist of machines that work in harsh, unstructured environments, spanning virtually all environmental domains (i.e., land, sea, air, space) and in complex applications (e.g., mining, agriculture, underwater exploration, highways, planetary exploration, coastal surveillance and rescue) [236]. Existing examples of field robotics include inspection robots [144], information
gathering for humanitarian efforts [143, 83], off-road autonomy [119], and agricultural robots [23]. Once
sufficiently mature, field robots are expected to have immense impact on real-world missions, including
military [30] and human assistance and disaster relief operations [152]. However, because the operational
environments are so challenging and limit the operational speed of autonomous systems, field robots are
not currently commonplace and instead humans are performing the dirty and dangerous tasks that autonomous systems would be useful for. The research community has identified the “grand challenge" in
field robotics as creating smart, reliable mobile machines that can move more capably than equivalent
manned machines in unstructured environments, which remains an open challenge even after decades
of research [236]. Within this grand challenge, the community is collectively investigating perception
challenges related to terrain understanding, building human interfaces to work cooperatively with human
supervisors, designing behaviors to work alongside humans with safety guarantees, new mobility modalities to traverse complex terrain, resilient and adaptive learning capabilities, and multi-agent cooperation.
22
My dissertation is focused on improving adaptive experimental design of field robots. While I aspire to
develop domain-agnostic insights useful to the scientific community at large, my work is centered around
adaptive experimental design specifically for off-road, autonomous ground vehicles in field robotic applications.
2.4 Model Based Systems Engineering
Systems engineering (SE) is a critical, interdisciplinary field that studies how to design, integrate, and
manage complex systems over the “life cycle", i.e., all phases from conceptualization through development,
distribution, maintenance, retirement, phase-out, and disposal [28, 137, 154]. More recently, the field of
Model Based Systems Engineering (MBSE) has emerged as a way to use representations, i.e., models, as an
integral part of the systems development process spanning the life cycle [60]. One of the fundamental steps
within the MBSE process of CPS and field robots is proper verification and validation to ensure capabilities
and behaviors are expected as the designer intended. Verification is defined as confirmation, through the
provision of objective evidence, that specified requirements have been fulfilled and validation is defined
as confirmation, through the provision of objective evidence, that the requirements for a specific intended
use or application have been fulfilled [113]. Processes for both of these have a long history in the SE and
research communities, dating back to the 1980s with software development [248]. Best practices for these,
especially validation of simulation models [132], have been developed and they largely revolve around
precise problem formulation, assumption definition, subject matter expert utilization, and communication
with the decision maker.
The field has grown to include expert systems [165], simulation models [198], and physical systems,
which has led to the name testing, evaluation, verification, and validation (TEV&V). Within TEV&V, research typically focuses on assessing a system’s ability to meet certain pre-defined requirements or safetycritical specifications, e.g., compliance with respect to regulations [221]. From a systems engineering
23
perspective, the TEV&V community has extensively studied different methodologies for designing and assessing systems [95]. Perhaps most notably, the SE community developed the “Vee" model [191], which has
demonstrated significant success and maturation over several decades [149]. Similarly, the MBSE community has developed the System Modeling Language, known as SysML [93], that provides a general-purpose
system architecture modeling language for Systems Engineering applications [234]. However, the value
and benefits of MBSE reported in literature to date are mainly based on expectation and evidence remains
inconclusive [100]. While some efforts in SE focus specifically on AI-enabled systems [133], traditional
TEV&V problems do not yet address AI/ML-enabled field robot, such that there exist many open problems,
due to system complexity and human-system interaction [175]. In fact, the authors of [37] conducted an
extensive systematic mapping study of model-driven engineering for mobile robot systems and found that
there exists a strong need for better validation to more accurately identify system limits, scope, applicability, and characteristics. The authors also noted that sound and rigorous empirical experiments are not yet
widely carried out by the scientific community. Specifically for the Department of Defense and relevant
AI-enabled systems, the authors of [187] found that testing and evaluation still requires work in six areas:
user engagement, policy and guidance, measures and metrics, data, infrastructure, and cybersecurity.
A critical concern of the TEV&V community is assurance case construction and adherence, where an
assurance case is generally understood to be a way to determine if a system is sufficiently dependable
for fielding within an operational context. Several efforts have looked at assurance case automation [48],
assurance case-driven design for safety and security requirements [219], and run-time assurance [216].
A similar focal point of the TEV&V community is scenario-based approaches for testing and evaluation.
Here, literature looks to develop tools for generating scenarios that will allow system designers to assess
whether requirements are met [160, 159]. As a direct result of tremendous investments by industry, autonomous vehicle (AV) research has attracted the attention of the TEV&V community and precipitated
24
numerous advancements in software verification with respect to safety [180], testing methods [103], testing frameworks [61], modeling and simulation approaches [8], and automated scenario-based approaches
for testing safety metrics and vehicle behaviors [53, 70, 111]. With these contributions, the research community has also documented the vast challenges in TEV&V of autonomous vehicles [99, 125], which is still
being used to direct ongoing research efforts.
A key concept originating from the AV community is Operational Design Domain (ODD) [135, 235],
which is defined by SAE J3016 as “Operating conditions under which a given driving automation system or feature thereof is specifically designed to function, including, but not limited to, environmental,
geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or
roadway characteristics" [195]. Since this definition, the concept of Operational Design Condition was
introduced, which consists of 1) an ODD; 2) the subject vehicle capabilities; and 3) the driver capabilities.
ODDs and ODCs provide a common framework for defining specific conditions that requirements and
test cases can be developed to scope the design of AVs. Also within the AV community is the notion of
informed safety where the user is aware of what a system can and cannot do. A recent study demonstrated
that trust in an automated system increased from 32.4% to 65.4% when users of a 56-participant study
were provided knowledge about the true capabilities and limitations of the automated system compared
to when no knowledge was provided [120].
While TEV&V offers many insightful considerations and techniques, this field considers systems that
are at higher technology readiness levels than what is in scope within the context of my dissertation.
Instead of verifying and validating mature system designs, or assisting an engineer with assessing systems
with respect to long-standing community standards and requirements, I seek to improve an experimenter’s
ability to learn about field robotic systems that may be experimental or only have notional, mission-specific
requirements. My work involves experimental design of a system for which an initial design has been
made, but before the system matured to a technology readiness level high enough for formal TEV&V. With
25
that being said, some concepts from the AV community, such as ODDs and ODCs may provide guiding
principles for designing new tools and frameworks for decision-assisting experimental design.
The most relevant parallels between traditional TEV&V and the work herein is the goal of informing
a human on system capabilities with respect to some pre-defined objectives. The authors of [35] identity
several opportunities in the automated vehicle testing field and some of these could take an analogous
form in experimental design of field robotics, including building data-efficient methodologies for identifying known and unknown scenarios, generating diverse, critical, and natural scenarios to meet different
requirements, building performant surrogate models, and designing metrics that objectively quantify criticality of scenarios.
2.5 Experimental Design
Testing robotic systems takes many forms including field testing, logging and playback, simulation, planbased testing, compliance testing, unit testing, performance testing, hardware testing, robustness testing,
regression testing, continuous integration, and test maintenance [3]. These are difficult, however, and the
authors of [3] found in a 12 participant user study, that robotic testing challenges fall into three themes:
real-world complexities, community and standards, and component integration. Specifically, experienced
roboticists face challenges with respect to environmental complexity, software and hardware integration,
engineering complexity, high costs, unpredictable corner cases, lack of documentation, lack of an oracle,
coordination and collaboration among relevant people, the culture of testing, and distrust of simulation,
Among the challenges that all real-world robotics testing faces, the inevitably unforgiving operational
environment is a defining characteristic that especially torments field robotics applications. Not only are
these environments challenging for autonomous systems to operate within, but they are notoriously difficult to model, which introduces the large gap between simulation environments and reality that the research community acknowledges as a major challenge [4]. As such, field robotics development necessitates
26
real-world experimentation. Historically, experimentation in many domains is performed using Design of
Experiments [158, 42, 43, 123] and Optimal Experimental Design (OED) [39, 62, 263]. While these approaches provide principled methodologies rooted in statistics for the selection of experiments, such as
factorial designs [31], response surface models[16], and optimality criteria [102, 77], a-priori experiment
plans have limited value in field robotics. The often large systems that are common in autonomous robots
are comprised of interdependent components with labyrinthine interactions that prohibit high-fidelity
modeling, closed-form solutions for performance, one-shot optimization, or combinatorial search over
the parameter space. Literature has also investigated sequential OED [213] as a way to address the need
for making experimental design decisions after obtaining some observations. However, experimentation
and system performance for many field robotics applications involves both context-dependent, qualitative analysis such that there is rarely one single configuration that is optimal under all conditions, but
rather is locally-optimal and produces desirable results under representative conditions, especially for
robots operating alongside humans. AI/ML-enabled robotic systems also complicate the commonly-used
rule-based techniques and introduce emergent behaviors that make it difficult to write the testing requirements needed for TEV&V. Furthermore, the nature of field robotics applications inherently introduces risk
to experimentation and deployment in the real-world, which must be properly managed and necessitates
human-in-the-loop decision making. My dissertation aims to build assistive tools for human-in-the-loop
decision making in adaptive experimental design because it is an under-developed, but majorly impactful
area for the realization of deployable field robotic systems.
2.6 System Understanding
The fundamental objective of experimental design is to construct experiments that will reveal information
about a system to increase a human’s understanding of performance, capabilities, and limitations. The
27
research community has considered how to build a principled understanding of systems from various perspectives, including understanding the effect of learning-enabled components with respect to the failure
of an entire CPS [55], understanding the impact of heterogeneity on the composition and compositionality
properties of models and tools in CPSs [227], finding pareto-optimal human-understandable interpretations of machine learning black box models [239], learning-enabled machines to improve scientific methods
[174], and causal graphical models for robot understanding. In this vein, the research community has also
investigated tools to assist humans with designing, testing, benchmarking, and understanding systems,
including real-time analysis tools of model training for novices [205], prediction tools of mission-based
reliability in component-based systems [161], and indoor scene generation and synthesis frameworks for
generating large-scale simulation environments in embodied AI research [262]. One prominent example
of a test-generation tool is Klee [34], which is a symbolic execution tool that generates high-coverage tests
and detects if any input exists could cause an error, where coverage is defined as exciting every line of
executable code.
There have also been several efforts for developing tools and guidelines to assist with the development
and testing of robotic software. For example, the authors of [145] developed evidence-based architectural
guidelines for ROS-based robotic software by surveying developers and mining repositories to characterize
the state-of-the-practice. While the many repositories do not document software architectures, those that
did are primarily concerned with maintainability, performance, and reliability. In terms of tools, Robot
Web Tools is a collection of open-source libraries and tools for building web-based robot apps with ROS
[240]. This tool provides a rosbridge protocol for enabling use over wide area networks and human-robot
interaction through web browsers. Similarly, Roboturk is a crowdsourcing platform that enables imitation
guided skill learning for manipulation tasks via data collection over the web [147]. There also exist tools
for automatically setting up, starting, resuming, and replicating user-defined experiments to help manage
the burden of robotics software [226]. Literature has also investigates the use of Signal Temporal Logic to
28
assist with automated testing of robotic controllers with respect to task specifications and limited budgets
[112]. Methods like this one have demonstrated the ability to leverage task specification information to
help build a model of system robustness when performing tests of specific robotic components, such as
robotic controllers; however, temporal logic-based techniques for system validation have been shown to
lack inherent human interpretation and the formal methods community tends to be overconfident in their
estimation of interpretability [218].
Underscoring how difficult the problem of system design and understanding is, DARPA has proposed
two separate programs looking at related challenges. The DARPA Symbiotic Design for CPS program
[46] sought to develop AI-based approaches to enable correct-by-construction design of military-relevant
CPS. The focus within this program was specifically on predictability, convergence, and exploration of
state space to enable humans to build systems more efficiently. Second, the DARPA Competency Aware
ML program [45] aimed to enable learning systems to be aware of their own competency, which would
improve human-machine teaming and task synergies.
Although the current research in system understanding does not address the specific challenges of
field robotics or adaptive experimental design, many of the underlying goals and techniques are relevant
to my work. More specifically, I explore how experimenters build interpretations and mental models of the
performance of field robotic systems, which can be black box models. I investigate reactive and proactive
methods for assisting the experimenter with the decision making process in experimental design in an
effort toward increasing the value of selected experiments.
2.7 Expert Systems
An expert system (ES) is a computer program that reasons with domain-specific knowledge that is symbolic
and mathematical to perform tasks as well as human specialists in a problem area [33]. The goal of an ES
is to mimic the decision-making abilities of a human expert using AI, specifically knowledge engineering.
29
Expert systems are called knowledge-based because their performance is dependant on the use of facts and
heuristics used by experts [222] and, as a result, ESs fall under the broader umbrella of knowledge-based
systems, which are computer architectures that use a knowledge base to solve complex problems.
ESs are typically comprised of four major components: a knowledge base, an inference engine, a
knowledge engineering tool, and a user interface [136]. The knowledge base is created by aggregating
and representing human expert knowledge (i.e., facts and experiences) using a knowledge engineering
tool so that the inference engine can reason and derive new knowledge to answer questions when queried
via the user interface. To achieve this, these systems use three different reasoning paradigms: forward,
backward, and opportunistic reasoning [33]. Forward reasoning is a data-directed approach and consists
of a forward-chaining process that uses a collection of facts to draw feasible conclusions or predictions;
whereas backward reasoning starts with a goal (e.g., hypothesis), does not require all data to be available when inferences begin, and chains subgoals in a backward fashion to draw conclusions about why
something happened. Opportunistic reasoning combines elements of both forward and backward reasoning where emerging data enables new inferences and new conclusions lead to additional questions [33].
Importantly, multiple experts can contribute to the ES knowledge base and multiple ES can be run simultaneously for a given problem so the breadth and expertise offered to a user is greater, and less biased,
than any one expert. It then follows that knowledge acquisition is considered to be the most common
disadvantage and biggest challenge to building ESs.
ESs were among the earliest branches of AI to be commercialized [76] with the first system, called
DENDRAL, originating in the 1960s [139] and others such as MYCIN, XCON, Dipmeter Advisor System
following in the 1970s and 1980s [33]. With the maturation of ESs, the field has been applied to virtually all domains of decision making, including agriculture, business, chemistry, communications, education, engineering, geology, image processing, information management, law, manufacturing, mathematics,
medicine, meterology, military, power systems, space technology, and transportation, among others [56].
30
ESSs have been applied to so many domains because they specifically target solving tasks that humans must
complete, including control, design, diagnosis, instruction, interpretation, monitoring, planning, prediction, prescription, selection, and simulation [56]. The vast exploration of ESs has been fueled by three
noteworthy advantages that ESs have been attributed with: 1) managing complexity through incremental
competence building; 2) offering interpretation and explanation by examining the entire state of the system
and its environment; and 3) reasoning about specialized, problem-specific knowledge [33]. Furthermore,
literature has identified declarative knowledge, i.e., explicit facts, as the key insight and representational
principle from the AI field to enable interpretation and reasoning in complex problems [33].
Following reported successes in scientific research, ESs have seen waves of commercial success - the
first of which coming in the early and mid-1980s and declining to significant disuse or abandonment in 1987
to 1992 [76]. The author of [76] notes that ESs are built either from a technical perspective, focusing on
the technological issues, or from an organization perspective, relating to deployment issues. Interestingly,
literature found that ES decline was not always attributable to technical or economic failure, but rather
managerial issues, such as low user acceptance, system developer retention, and dynamic organizational
priorities [76]. Despite this decline, ESs effectively laid the groundwork and transformed into Rule Based
systems that saw great success in the early 2000s and remain an active area of technology development
today, such as [266].
In the past three decades, expert systems have been applied to the field of robotics in various ways
[217]. Many of these ES applications have involved control of robotic systems, including manipulation
and swarming robots, and also incorporating Multidimensional Informational Variable Adaptive Reality,
referred to as Mivar-based, ESs [245]. There have also been unique applications, such as using ESs in librarian robots to provide services for librarians and information resources specialists [15]. Using ESs for
the purposes of experimental design of field robots, however, has received little attention. In this dissertation I specifically seek to build systems to support the experimenter’s decision making process. It’s worth
31
noting that designing an ES for experimental design might be feasible, but it will inevitably be difficult
and limited due to the experimental nature of the desired application. A necessary component of an ES is
a knowledge base built by human experts and the objective of experimental design is to construct experiments that will reveal information about a robotic system. Experts working with a specific robotic system
could encode their existing knowledge regarding both the system under test as well as the experiment
best practices, but they will need to regularly update this knowledge base after new information is gained
from one or more experiments. It remains an open question as to how to build this knowledge base for an
experimental robotic system where very little may be known about the system performance, limitations,
and operational environment, including by experts, since the robotic system is not necessarily mature or
well tested.
2.8 Decision Support Systems
Research in Decision Support Systems (DSSs) dates back to the 1970s [9, 118, 214], originating in managerial decision support [140, 78], but has since spanned a vast range of other domains [171]. Researchers have
documented the benefits offered by a system capable of assisting a human with decision making in many
domains outside of robotics, including marketing, government, and agriculture [57, 59, 58, 212]. As a result, DSSs have been explored in a growing number of applications such as business and managerial [177],
medical [200], architecture [74, 10], warehouse management [1], housing evaluation [162]. DSSs have also
been applied to domains spanning forestry management [250], human-robot interaction [24], human-inthe-loop planning [207], and path planning and collision avoidance using optimization [134] and settings
with multiple criteria [173]. Specific to robotics, there exist examples of DSSs in robot design [194, 98] and
deployed systems [172, 215], but these don’t apply to an experimenter’s decision-making process during
the experimental design of a field-worthy robotic system comprised of hardware, software, and parameter
selections. Given the wide applicability and documented successes in literature [58, 212], but noticeable
32
absence of mature DSSs in the area of field robotics, I seek to develop a DSS for adaptive experimental
design, specifically for off-road, autonomous vehicles and the unique corresponding challenges.
Most often, DSSs operate with respect to a domain- or task-specific decision-making cycle. For example, the work of [176] formalizes the stages of diagnostic and therapeutic measures in medical DSSs.
Medical professionals can receive diagnostic information from DSSs in the form of prompts and alerts.
These types of DSSs are being used widely and have reached acceptable performance for several medical
conditions. Future DSSs in the medical domain will seek to provide support for more challenging tasks,
such as therapeutic measures in the form of treatment and rehabilitation recommendations.
DSSs also operate with respect to the time horizon in which decisions must be planned or made. The
authors of [259] survey DSSs for agriculture 4.0 and describe short-term planning (day-to-day activity),
mid-term planning (seasonal), and long-term planning (yearly) for decision making. Their work highlights that the current agricultural DSSs mainly focus on short-term planning and lack considerations on
mid-term and long-term planning. DSSs that account for longer term planning pose more technological
challenges, but can provide more informed assistance towards better decision making.
In the literature, there are several forms of assistance that DSSs typically provide to the decision maker.
One type of decision support can be viewed as information aiding decision making and examples of this
include alarms, “what-if" analysis, and anomaly monitoring and detection [243, 117, 17, 249, 215, 148, 105,
104]. These DSSs provide some data to assist in the decision making process, which can result in improved
decision making; however, much of the burden for decision making still lies with the human.
A second, more sophisticated type of support is suggestion aiding decision making, which typically
take the form of recommendations of optimal decisions or decision parameters There are also many works
where a DSS tries to directly recommend the best or optimal decision or decision parameters in different
applications [196, 75, 94, 142, 242, 176, 193]. DSSs that provide this type of assistance alleviate some of
the decision-making burden by proactively suggesting decisions. There are also works in literature that
33
provide a combination of both information and suggestions for aiding decision making [246, 79, 237],
which can provide greater flexibility.
The evaluation process for DSSs outside of robotics is a well-studied topic in the literature. In most
cases, evaluation refers to ex-post where the observed result of DSS implementation is evaluated, that is,
the quality of the decision [2]. One of the most common ways to evaluate a particular DSS is by evaluating
the outcome of DSS implementation. Examples of such evaluation are common in the medical [157] and
marketing [244] domains. Alternatively, literature has considered ex-ante evaluation where either the
impact of the DSS in the decision making process or the processes within the DSS are assessed and the
likely impact of DSS alternatives are predicted. This type of evaluation can help determine the information
necessary to optimally design a DSS or optimally select a DSS portfolio [233, 72]. For example, the authors
of [189] investigate how to determine which evaluation information to obtain to maximize utilization. The
work of [259] discusses evaluating components in agricultural DSS processes, such as prediction, expert
knowledge, and use of historical data. Finally, some evaluations include process- and outcome-oriented
evaluation [173], as well as a combination of both [211].
Generally speaking, dimensions and metrics for evaluating recommender systems is an active area of
research in the literature. The authors of [18] present a range of metrics categorized by recommendation-,
user-, system-, and delivery-centric evaluation. Recommendation-centric evaluation includes measuring
the correctness of recommendations, coverage, diversity, and confidence in the recommendation. Usercentered evaluation is defined by trustworthiness, serendipity, utility, and risk metrics. System-centric
evaluation can be based on robustness, learning rate, scalability, stability, and privacy. Delivery-centric
evaluation uses metrics such as usability and user preference. Consideration of certain risk factors are also
critical in the design and implementation of DSSs. The authors of [224] discuss some pitfalls of clinical
DSS which can easily apply to many applications. For example, fragmented workflows, alert fatigue and
34
inappropriate alerts, impact on user skill, system and content maintenance, poor data quality, and lack of
transportability and interoperability are some potential negative impacts of DSS usage.
35
Figure 2.1: The proposed architecture from [170] for Intelligent Decision Support systems, which add
artificial intelligence functions to traditional DSSs.
36
A DSS that makes use of AI or ML to construct alternative options in the decision making cycle is referred to as Intelligent DSS or IDSS [14, 116]. An early example of a proposed IDSS architecture is shown in
Figure 2.1. The published IDSS to date have used artificial neural networks, support vector machines, evolutionary algorithms, decision tree learning, Bayes learning, case based reasoning, and pattern recognition
to develop IDSSs [151]. Applications for IDSSs have included text analytics and mining, ambient intelligence, internet of things, biometrics, expert systems [116], skin cancer detection [230], adaptive oceanic
sampling for AUVs with energy constraints [264], and human decision-making in swarm robotics [171].
Perhaps the existing work most relevant to this dissertation is that of [172], which investigates how an
IDSS can improve the threat assessment capability of an autonomous robotic system and the affected user.
While that work provides a number of key insights and inspirations, this work deviates by the nature of
the fundamental problem I seek to address; namely, I am researching IDSS for human-centric, adaptive experimental design of field robotics so that experimenters can more rapidly acquire system understanding.
It’s important to note the differences between a DSS and ES, since one is the focus of this work, and
since there is significant overlap [63]. An ES encodes knowledge from an expert into a computer program
with the goal of mimicking human decision making for a well-defined task while a DSS provides various
forms of support for a user to make a decision. Both systems provide some form of a user interface to
enable interaction between a user and the system. Where the ES is comprised of a knowledge base and
inference engine, the DSS uses data and models that are not necessarily hand-encoded by human experts.
Importantly, a DSS seeks to support the user in making a decision by providing access to data and models
relevant to the decision at hand, and these data and models can be used however the user desired; on the
other hand, an ES seeks to provide a conclusion or decision to a query posed by a user, typically a nonexpert, that is more correct or better quality than the user could provide themselves [67]. DSSs typically
have greater flexibility than classical ESs because the user has more freedom to use or manipulate the data
and models during the decision making process rather than receiving outputs from an inference engine.
37
In this work we are bound to the conditions under which an experimenter of a field robotic system will
operate. These conditions include working with an experimental robotics system for which little may be
known given the low technology readiness level, being required to make several hypotheses, inferences,
decisions, and conclusions during an adaptive, sequential decision making process for experimentation,
and evaluating robotic systems under some context-specific objectives for which an experimenter chooses.
Due to these conditions, building extensive databases may be disproportionately difficult or time consuming with respect to the experimenter’s intended testing and evaluation cycle. An experimenter will conduct
an experiment as part of the experimentation to gain insights with respect to a hypothesis, and this is a
necessary step in the scientific process regardless of using a DSS or ES. It will inherently be difficult for an
ES to draw conclusions about an experimenter’s hypotheses because experts, including system designers,
will likely not be able to encode facts about the complex system interactions operating in the real world
without conducting experiments. Therefore, the key question is how to construct these experiments to
be maximally informative, which is a user-specific process where the emphasis is placed on enabling the
experimenter to make the best decision possible rather than providing them a specific conclusion. All this
to say that while ESs could be a supporting technology to experimental design, I anticipate more value
and impact from a properly designed DSS and therefore investigation into the design and role of DSSs is
the focus of this dissertation.
2.9 Shared Autonomy
It is important to distinguish DSSs and the field of shared autonomy for the context of this work. Shared
autonomy, and the similar (sometimes interchangably used) fields of sliding autonomy [50, 96, 97, 52] and
shared control [54], traditionally considers the degree of user intervention in the dynamic regulation of
the behavior of a system [32, 71], and has been expanded to mulitdimensional perspectives to include
interactive scenarios for cooperative tasks [204]. Analogous to scenario-based approaches for TEV&V,
38
recent efforts in shared autonomy research have investigated quality diversity scenario generation as a
way to explore failure cases [66, 65]. DSS are similar to shared autonomy approaches in that both are
human-centered technologies with a focus on maximizing a human’s abilities; however, they differ in the
context of decision making. In the scope of this work DSSs aim to enhance the human’s decision making for
experimental design during a study whereas shared autonomy seeks to enable the human with the ability
and decision making to release or take control of a system while performing operations. Shared autonomy
is not typically concerned with experimental design or focused on building the human’s understanding of
the system with respect to performance and limitations. In this work, I seek to develop DSSs (and IDSSs) for
adaptive experimental design recommendation to aid an experimenter with decision making. The human,
off-road autonomous system, and DSS will not share autonomy or control. Instead, the human will always
maintain full decision authority of which experiment to conduct, the off-road ground vehicle of interest
will perform autonomous navigation, and the DSS will provide supplemental capabilities for improving
decision superiority with respect to experimental design.
2.10 Active Learning
Along the lines of building system understanding through experimentation, there have been recent advances in the field of active learning which is an engaging, instructional method from the field of education [178] and has since been adopted by the machine learning and robotics communities [210, 208,
5, 232]. Active learning allows a “learner" to choose the information from which they learn to expedite
the information gathering process and has been applied to experimental design [209, 203, 82] for tasks
such as discovery of biological networks [225], text categorization and sensor placement [258], and the
entire experimentation process including hypothesis generation and testing [122]. In the context of field
robotics, there are fewer examples of active learning applied to experimental design. My previous works
explored online monitoring and characterization of autonomous ground vehicles to assist experimenters
39
with investigating system performance [241] as well as experimental design from the perspective of active learning of the experimenter’s decision making process (as opposed to a robot or algorithm) during
experimentation of an off-road autonomous navigation system [88]. In this dissertation, I explore the role
of active learning frameworks and methods that could be beneficial in the context of experimental design
and DSSs for assisting human decision making.
2.11 Generative AI
Generative Artificial Intelligence refers to a field of artificial intelligence that uses models and techniques
for creating new content. These methods are designed to understand and learn the underlying patterns,
structures, and characteristics of the training data, which enables the generation of new content. Generative AI utilizes techniques such as deep neural networks to capture and imitate the statistical distribution
of the training data. When these models are trained on large datasets they can generate new examples
that exhibit similar characteristics to the training data. They have been successfully used to generate a
variety of outputs, including images, texts, audio, and videos [121, 141, 7]. Recent advances in Generative AI, such as ChatGPT [166] and DALL-E 2 [181], are expected to revolutionize many industry sectors
and we’ve already seen unprecedented improvements and impact in some fields, including manufacturing
[130], education [169], and medical research [155].
Large Language Models (LLMs) have exploded in popularity, both in the research community and
commercial industry, due to the recent success of interactive tools, such as ChatGPT. There are a number
of opportunities for LLMs to significantly improve software engineering tasks, including specification
generation, just in time developer feedback, generating unit tests, documentation of contracting language
and regulator requirements, and language translation to modernize legacy software systems [168]. For
example, the authors of [192] created a requirements modeling tool that uses ChatGPT to extract the
required elements from text and assemble the results into a requirements model using a rule-based method.
40
Importantly, to realize such benefits, there are a number of concerns regarding the use of LLMs; most
notably, the quality and representativeness of data must be managed carefully since LLMs require massive
amounts of data and biases will be amplified; LLMs pose privacy violation and plagiarism concerns by
including content in the training data that is owned by others; LLMs have unprecedented environmental
concerns due to unsustainable power requirements; and LLMs lack of explainability, which could lead
to unintended consequences [168]. The use of LLMs has also begun to expand in robotics applications.
The authors of [247] explore how ChatGPT can be used in aerial navigation, manipulation, and embodied
agents by defining a high-level robot function library, building an objective-defining prompt for ChatGPT,
and integrating a feedback loop for users to evaluate code generated by ChatGPT before the final code
is deployed on the robot. An illustration of their goal using ChatGPT in robotic deployment is shown in
Figure 2.2. This vision is comparable to the experimental design paradigm envisioned in this dissertation.
Where an LLM is the key enabling technology that alleviates burden from the user to generate code that
is deployed on the robot for various tasks, this work seeks to develop DSSs that alleviate burden from the
experimenter by providing decision support that is used to select experiments in various contexts.
Figure 2.2: An illustration borrowed from [247] that shows the goal of using ChatGPT to enable users on
the loop that can seamlessly deploy platforms.
Broadly speaking, Generative AI marks a new frontier in human-robot integration and offers opportunities to remove burden from the human so that they can focus more on supervision while maintaining
necessary levels of safety and efficiency. There are significant investments in this area due to the enormous
41
amount of interest in Generative AI ideas and the anticipated impact. Recent efforts are showing early
signs of success in using Generative AI in robotics, such as grasping [129], manipulator dynamics [188],
and gripper design [92], with the breadth and applications expected to grow. However I have outlined
grand opportunities to leverage Generative AI and improve interaction and trustworthiness, collaboration
and cooperation, robot motion, robot perception, synthetic scenario generation, testing and evaluation,
failure detection, and robot design [91].
One of the most exciting aspects of Generative AI is the ability to provide recommendations for humans
by creating content from a representative distribution trained on input data. For example, the authors
of [255] created a multi-task, multi-fidelity Bayesian Optimization framework that performs amortized
auto-tuning for hyperparameter recommendation. Other examples come from the chemistry and medical
fields. The authors of [126] developed a Bayesian Optimization-based framework for generating organic
molecules that possess specific molecular properties. Similarly, literature has investigated candidate participant recommendations for drug discovery studies. The authors of [114] developed a method using
GFlowNets and active learning with epistemic uncertainty estimation to produce diverse batches of useful and informative candidates, which was demonstrated on biological sequence design tasks. GFlowNets
are promising to be a valuable tool for AI-driven scientific discovery [115] because they are designed to
enable sample diversity and Bayesian posterior estimators for causal models for epistemic uncertainty
quantification and information gain predictors, which are desirable for experimental design.
The marriage of experimental design, DSSs, and Generative AI is especially compelling because it
brings together the strengths of both humans and machines. If DSSs were able to leverage Generative AI
effectively, they could make experimental recommendations using quantitative analysis and uncertainty
estimation with big data while still preserving the human experimenter’s decision authority so that they
can prioritize experiment objectives and perform context-dependent reasoning for desired applications.
42
The focus on this dissertation is to lay the groundwork toward this vision of sophisticated DSSs that
improve experimental design of autonomous ground robots.
2.12 Summary
Literature is actively developing hardware and software solutions for cyber physical systems, including
field robots, for a variety of real-world applications. To build these complex systems in a principled manner, the MBSE community has invested in building models to perform verification and validation with
respect to requirements. Prior to deployment, verification, or validation, the scientific community creates
prototypical and experimental robotic systems that may not have well-defined requirements or hardened
solutions. Especially for these systems with lower technology readiness levels, the sequential decision
making process to construct experiments is critical to maximize the experimenter’s understanding. Literature has spent many decades developing ESs and DSSs to assist humans with making more informed
decisions in many domains, but the community has not yet applied these to experimental design of field
robots.
This dissertation aims to build new DSSs for augmenting human decision making abilities in experimental design of field robots, specifically prototypical autonomous ground robots. DSSs can take many
forms and this work explores the underlying technologies with respect to varying amounts of decision
support and its impact on human decision making. The work in this dissertation will build off the decades
of research on DSSs in other domains, leveraging insights on information- and suggestion-aiding decision
support and interactions with humans to maximize benefit (see Chapters 3 - 5). This dissertation will also
take inspiration from the Active Learning and Bayesian Optimization literature to develop frameworks
that monitor human’s experimental design for tailored and useful decision support (see Chapter 6). Note,
ESs are intentionally not used in this dissertation due to the anticipated limited applicability given that
expert systems typically require significant amounts of information from one or more experts, which is a
43
prohibitive requirement for experimental robotic systems. A common characteristic of prototypical and experimental robotic systems is the lack of system understanding and efficient experimentation seeks to build
working knowledge quickly. Instead, DSSs that achieve general use, offer flexibility to add system-specific
decision supporting technologies, and could feasibly automate knowledge engineering are investigated
here. Future work is also presented in the context of existing the Generative AI literature because this
marks the next frontier of recommendation-based DSSs that are expected to have the greatest capabilities
in terms of proactive decision support (see Chapter 7).
44
Chapter 3
Taxonomy of Decision Support Systems for Experimental Design
3.1 Background
Robots that operate in harsh environments, i.e., field robots [236], typically consist of very complex systems and require experimentation in real-world settings to fully characterize behaviors, capabilities, and
limitations. This is because the natural world is unstructured and introduces spatio-temporal effects, nonnegligible stochasticity, and sources of system failure. Importantly, a human is required to lead field robot
experimentation in order to effectively prioritize objectives so testing is consistent with the scope of the
specific field robotic application, manage risk to personnel and hardware, and provide context-dependent
reasoning for desired system behaviors. The construction of experiments, referred to here as experimental design, must be carefully considered to maximize the information gained by the experimenter while
simultaneously minimizing the cost (e.g., money, energy, and time to implement, set up, and execute) and
risk (e.g., personal injury, system damage, and wasted resources). By observing humans conducting field
experiments, our previous work discovered that experimental design consists of a sequential decisionmaking process [88]. A human selects a specific system configuration to test (e.g., hardware, software
components, and parameter values) based on their hypotheses and then uses new information collected
from empirical observations of the previous experiment to inform the decision for the next experiment.
Representative examples of this process from experiments I have conducted of autonomous ground robots
45
is shown in Figure 3.1. During this adaptive, human-centric process, the experimenter effectively learns
about the robotic system through interactions in the form of experiments. Therefore, we view adaptive
experimental design as a unique problem within the field of human-robot interaction (HRI).
(a) Experimentation in the Spring
(b) Experimentation in the Fall
(c) Experimentation in the Winter
Figure 3.1: Photos from my experiments testing autonomous navigation capabilities of field robots, specifically off-road ground vehicles.
46
Researchers in the human-machine interaction community have found that a Decision Support System
(DSS) can vastly improve a human’s decision quality when interacting with very complex systems [78, 14,
12, 29, 11]. DSSs have been well studied for several decades [57, 59, 58, 151] and have demonstrated significant impact in a number of different domains, including managerial [177], medical [200], forestry [250],
architectural [74], warehouse management [1], and housing evaluation [162] problem settings.
Specific to the robotics community, DSSs have been combined with robots for ocean exploration [79],
construction processes [148], and underwater sampling [264]. Researchers have also investigated incorporating artificial intelligence (AI) in a DSS, often times referred to as a Intelligent DSS (IDSS) [116, 14, 231],
for problems such as robotic surface vehicle navigation [215] and threat assessment [172]. DSSs and IDSSs
in HRI have been used primarily in literature to improve robot decision making to perform some task or
human decision making in the physical design of robotic systems [98], i.e., not in experimental design to
explore system performance. Because machines excel at reasoning over large-scale data, estimating uncertainty, and making unbiased decisions, we believe that there is a great opportunity to develop DSSs and
IDSSs for the experimental design of field robotic systems.
Given the absence of IDSSs for experimental design in HRI, and inspired by the large body of DSS literature in other domains, we seek to build a taxonomy to guide the research and development of IDSSs. The
goal of this work is to lay the groundwork for defining terminology and functionality requirements that
the scientific community can collectively investigate in the coming years. We envision DSSs, conceptually
depicted in Figures 3.2–3.7 and outlined in the Stages of Decision Support System Section, that can lead
to the selection of more informative tests and subsequently reduce experimental costs by providing different forms of decision support to the human experimenter. Ultimately, the experimenter is the decision
maker with full authority over which experiment is conducted, including the ability to partially accept or
reject support from a DSS, but we believe such an advisory system can greatly improve the quality of the
human’s decisions.
47
Figure 3.2: Stage 0: No Support, which is representative of how the majority of field robotics experimentation is conducted today.
48
Figure 3.3: Stage 1: Design Assistance - support is provided in the form of information-aiding prompts.
49
Figure 3.4: Stage 2: Design Monitoring - support is provided in the form of alerts pertaining to experiments
that could be or already have been conducted.
50
Figure 3.5: Stage 3: Conditional Design Recommendation - support is provided in the form of a partiallydefined experiment recommendation with respect to either parameters or components.
51
Figure 3.6: Stage 4: Single Design Recommendation - support is provided in the form of one complete
experiment recommendation that could feasibly be executed without modification.
52
Figure 3.7: Stage 5: Sequential Design Recommendation - support is provided in the form of a recommendation for a series of experiments that could feasibly be executed sequentially without modification.
Contributions. In this chapter, we:
1. define common terminology for adaptive experimental design of field robotic systems to facilitate
more coordinated research efforts by the scientific community.
2. provide an overview of literature regarding existing works of DSS and IDSS development and an
analysis of general themes and requirements.
3. propose a six-stage taxonomy of DSSs and IDSSs for adaptive experimental design in field robotics
informed by systems presented in the literature. We believe that this roadmap provides the scientific community with common goals and requirements that are necessary to realize principled
experimental design solutions. We define each stage, the experimenter’s responsibilities, and the
feature requirements for the DSS.
4. identify critical technical gaps with respect to our proposed taxonomy to help guide future research
toward the realization of different DSS solutions.
53
3.2 Terminology
In this section we define necessary terms in the context of adaptive experimental design for field robotics.
This seeks to provide a common terminology for researchers and reduce confusion given that existing DSS
development spans many different domains.
• Autonomy Under Test (AUT): a reconfigurable field robotic system, equipped with autonomous capabilities, that an experimenter seeks to understand through experimentation.
• Study: a series of one or more iterations of the scientific process, including hypothesis construction,
experimental design, conducting a field experiment, and gathering and analyzing results.
• Experimentation: the process of testing hypotheses about the AUT. For clarity, automating experimentation is not within the scope of this taxonomy.
• Experimenter: the human decision maker that chooses and conducts the field experiments.
• Experiment: one iteration of evaluating the AUT with respect to selected inputs in order to obtain
an observation.
• Experimental design: the construction of field experiments, which consists of defining experiment
inputs. Proposing a taxonomy for DSSs that assist with experimental design is the focus of this
paper.
• Input: a fully-defined configuration of the subsystems, components, and parameters that comprise
a field experiment.
• Observation: a record related to the outcome of an experiment, including both quantitative and
qualitative performance.
54
• Decision Support System (DSS): a system that aids the experimenter with making decisions pertaining
to experimental design.
• Subsystem: a self-contained system within the AUT comprised of one or more components (e.g., a
perception subsystem or a motion planning subsystem).
• Component: a reconfigurable class of elements in an AUT’s subsystem (e.g., hardware on the AUT
such as a perception sensor or software capabilities used by the AUT such as a global motion planner).
• Parameter: a reconfigurable characteristic in a set that collectively defines the configuration of a
component (e.g., a specific LiDAR sensor or value for replanning rate) or experimental control variable (e.g., location of navigation goals).
3.3 Guiding Insights From Literature
In this section we draw connections between the literature presented in Chapter 2 to provide context and
inspiration for the construction of our DSS taxonomy. We use the general trends, themes, and lessons
learned from literature with respect to system structure to guide the development of our DSS taxonomy.
3.3.1 Operations
As previously described, DSSs typically operate with respect to a domain- or task-specific decision-making
cycle. Taking the work of [176] as an example due to its performance, diagnostic and therapeutic measures
in medical DSSs take the form of stages. There are two stages for these medical DSSs. The first one is
diagnosis which is similar to our Stage 1 − 2 with prompts and alerts. The second stage is treatment and
rehabilitation recommendation, where the current technology is mostly lacking, and provides motivation
for Stages 3 − 5 in our taxonomy where support graduates from prompts and alerts to suggestions.
55
Chapter 2 also describes how the time horizon for decision making is critical to DSS operations. As in
the work of [259], decision planning can be made in short-term, mid-term, and long-term horizons, where
mid- and long-term hold significant potential but are also among the most difficult and least mature in
literature. In response, we intentionally include in our taxonomy stages that span the full spectrum of
horizons. Specifically, Stages 3–5 refer to going from recommending one partially defined experiment to
one fully defined experiment and then several fully-defined experiments.
The taxonomy proposed in this paper defines the requirements for DSSs specifically for assisting an
experimenter with the task of experimental design in the domain of field robotics. We envision DSSs developed in the context of this taxonomy will operate in different stages (described in the Stages of Decision
Support System Section), and the decision planning horizon will grow for higher stages.
3.3.2 DSS Assistance Types
DSSs in literature have taken the form of providing either information aiding decision making, suggestion
aiding decision making, or both. We sought to ensure that our taxonomy encompasses all types of feasible decision support that could be provided and, as a result, designed stages that account for both and
offer unique types of assistance. Specifically, Stages 1 and 2 in our proposed six-stage taxonomy provide
information aiding decision making through the use of prompts and alerts, which is inspired by the literature that uses “what-if" analysis; while Stages 3–5 are more proactive and provide experimental design
suggestion aiding decision assistance of varying detail and time horizons.
Literature has also investigated human preference in decision support because it is typically beneficial
to provide appropriate assistance catered to a specific decision maker [202, 148, 79]. Using this insight, the
proposed taxonomy allows the experimenter to select the DSS stage for every experiment in a study so
that the decision maker maintains control, receives their preferred decision support, and enjoys maximum
flexibility during their experimentation process.
56
3.3.3 Evaluation and Metrics for DSS
The six-stage taxonomy presented in this paper defines both the requirements of the DSS as well as the
experimenter’s responsibility for each stage of operation. These are developed and presented to enable
both ex-post and ex-ante evaluation as to well as leverage existing evaluation metrics from the literature.
We note that field robotics will likely require recommendation-, user-, system-, and delivery-centric evaluation given the nature of testing and evaluation of complex systems by humans with varying amounts of
expertise. Some metrics may be universally applied to DSS for experimental design in field robotics, such
as cost of experiments, trustworthiness, confidence, scalability, and stability. However, many applications
require domain-specific metrics for evaluation, such as those tailored to coverage, diversity, risk, usability,
and user preference.
3.4 Stages of Decision Support Systems
In this section, we propose a six-stage taxonomy of DSSs for adaptive, experimental design. This taxonomy
is inspired by literature and other domains, but is tailored to the unique challenges present in field robotics
relating to complex systems and real-world environmental constraints. We begin by providing an overview
of DSSs and then define the responsibilities of the experimenter and the requirements for each stage of the
DSS.
3.4.1 Concept of Operations
A DSS for experimental design must first build a knowledge database using data from the experimenter and
experiments in order to provide relevant decision support. As shown in Figure 3.2–3.7, the DSS receives as
input prior knowledge, the experimenter’s preferences and risk tolerance, intermediate feedback from the
evaluator, and all experimental results. All of these sources of data could be inputted to the DSS manually
(e.g., the experimenter verbally or textually specifies their own prior knowledge, risk tolerance, preference,
57
or qualitative observations), but there are vast opportunities for both inputting and processing this data
in an automated fashion in order to improve efficiency. For example, a database could operate on multiple
studies to curate prior knowledge, human preference and risk may be inferred by decision monitoring,
and experimental results can be empirically monitored and analyzed. We also envision a workflow where
DSSs observe some number of experiments before providing any decision support in order to properly
initialize a sufficiently rich database; this initialization process may also be dependent on the size, quality,
and relevancy of prior knowledge applicable to the AUT and experimenter.
We define the experimental design DSS operations in terms of stages, where the experimenter selects
one for each experiment in a study. A stage is defined by the functions and roles of the support system
and are inversely proportional to the amount of experimenter’s responsibilities. As stages increase, the
DSS inherits the functionality of the previous stage(s) and adds more automated capabilities, which, in
turn, reduces the required responsibilities of the experimenter. An illustrative example of this concept is
presented in Figure 3.9 and an overview of each stage is summarized in Table 3.1.
In defining the requirements for each stage we draw inspiration from the levels of driving automation
[195], shown in Figure 3.8; however, adaptive experimental design is dependent on a fluid decision making
process, which is an important distinction from self-driving vehicles. To this end, the stage selection
for a given experiment is the experimenter’s decision and, unique to adaptive experimental design, can
change throughout a series of experiments as the human performs more testing, obtains more data, and
builds an understanding of the AUT within the experimenter-defined context. Generic, information-based
thresholds are intentionally not used to determine when to transition to different stages because the goal of
experimentation is to build the experimenter’s understanding of the AUT. Defining machine-interpretable
thresholds for generalizable, autonomous stage transitioning is likely intractable and also offers less control
and flexibility to the experimenter than if they were to choose their desired decision support. Humans play
a central role because they determine if the DSS should graduate to any higher stages, to which stages the
58
Figure 3.8: The SAE J3016 levels of driving automation provided inspiration for the taxonomy of experimental design DSSs in this work.
DSS should increase (given a sufficiently capable DSS) and when experiments should halt. Note that an
experimenter is not required to increase the stage for any specific experiment, monotonically increase
stages, or adjust stages incrementally. We anticipate that experimenters will adjust the DSS stage to meet
their needs, manage their workload, and transfer decision autonomy of experimental design analogous to
the principles in the field of shared autonomy [71, 204].
Although not yet realized, we believe that continuous advances in AI and machine learning (ML) could
serve as key enabling technologies for more sophisticated DSSs such that any stage in the taxonomy could
take the form of an IDSS. We envision that AI / ML-enabled DSSs will offer a range of capabilities, including
searching the design space and alleviating the experimenter’s burden of identifying critical components
59
and parameters, making actionable recommendations that the human might not otherwise consider, learning about the human’s preferences and risk tolerance through intuitive interactions to make recommendations more valuable to the experimenter, reminding the experimenter of vital experimental considerations
to reduce the frequency of oversights and poor decisions, learning from the experimenter’s mistakes just
as a human would, adapting to changing conditions and new information also as a human would, and enabling intelligent selection of what data the DSS and the experimenter should learn from for more efficient
experiments.
Figure 3.9: Three illustrative scenarios of adaptive experimental design in a hypothetical study containing
several experiments. An experimenter can choose to monotonically increase the stages of decision support
(blue line), increase until a certain stage but then plateau (green line), or increase and then decrease the
DSS stage (purple line). Importantly, the experimenter can choose to use any stage (provided that the
DSS has the necessary features) for any number of experiments and change the stage in response to new
observations.
3.4.2 Stage 0: No Support
In the first stage, stage 0, the experimenter is responsible for all aspects of experimental design, including
all component and parameter selections. Under these conditions, the DSS does not offer any information
or recommendation, and the full burden of experimental design is placed on the experimenter. Note that
much of the experimental design in the current field robotics literature corresponds to stage 0 in that there
is no DSS or experiment recommendation.
60
Table 3.1: An overview of the proposed six-stage taxonomy of design support systems for adaptive experimental design in field robotics.
Stage Name Decision Support
System Output
Decision Support
System Requirements Experimenter Responsibilities
All aspects of experimental design,
including subsystem configuration,
0 No Support None None component selection, and
parameter selections
Aiding with conceptualizing experiment All aspects of experimental design,
objectives, anticipated outcomes, and including subsystem configuration,
1 Design Assistance Prompt(s) reasoning about experimental design component selection, and
decisions parameter selections
Monitoring of experiments and generating All aspects of experimental design,
notifications for the experimenter including subsystem configuration,
2 Design Monitoring Alert(s) regarding critical experimentation component selection, and
considerations parameter definitions
1) Subsystem- or component-only
recommendations for new experiments or Remaining subsystem or comonent selections
3 Conditional Design One partially-defined 2) parameter-only recommendations for and parameter definitions that are not fully
Recommendation experiment previously-conducted experiments defined by the decision support system
Full definition of an experiment Approval or rejection of recommendations
4 Single Design One fully-defined including component selections made by the decision support system
Recommendation experiment and parameter definition
Full definitions for a sequence
5 Sequential Design Several fully-defined of experiments including component None required
Recommendation experiments selections and parameter definitions
3.4.3 Stage 1: Design Assistance
A stage 1 experimental design DSS assists the experimenter with conceptually formulating the experiments, as depicted in Figure 3.3. Similar to stage 0, the DSS does not make any recommendations, but
it does assist with experimental design by prompting the experimenter for considerations, predictions,
and justifications of design decisions as a means for increasing the probability that experiments produce
meaningful, defensible, and intended outcomes with respect to a specific hypothesis. Field robotic systems
and the environments in which they operate in are inherently complex, which motivates the use of simple
prompts as a way to tractably support understanding and pursuing experiment objectives. This stage is
inspired by “what-if" analysis used in other applications [243, 117] and could be implemented by different
61
forms of interaction, including pen-and-paper, web-based tools, or verbal communication. Furthermore,
as in stage 0, the experimenter is responsible for all decisions pertaining to the experimental design.
3.4.4 Stage 2: Design Monitoring
An experimental design DSS with stage 2 capabilities can monitor operations and issue alerts related to
critical experiments throughout the adaptive experimental design process. For example, alerts may be
generated for experiment inputs that: 1) do not align with the experimenter’s preferences or risk tolerances based on previous decisions and experiments; 2) lack a sufficient number of observations but may be
valuable to conduct; or 3) have an exceedingly high probability of failure, which is an ever-present challenge in field robotics due to the unstructured operational environment. As a result, this stage will likely
require data related to AUT performance metrics, information gain, and sources of experiment failure.
These alerts serve to direct the experimenter’s attention to some piece of information that could affect the
human’s decision, but the DSS offers no specific experiment recommendation or remedy for resolving the
alert. In the literature this kind of support has already demonstrated value in many different applications
[249, 215, 106]. The automatically generated alerts in experimental design may be provided to the experimenter using a digital interface (e.g., graphical user interface) and how the experimenter responds to the
alerts can provide feedback to the DSS from which its database is updated and decision support is refined.
Alerts are expected to be presented before or during the next experiment so informed decisions can be
made. The specific alert conditions, timing, interface, and information displayed should be reconfigurable
to the specific experimenter so that the human does not become complacent and their decision-making is
not hindered, as cautioned in the literature [252].
62
3.4.5 Stage 3: Conditional Design Recommendation
This is the first stage for which an experimental design DSS can make recommendations, albeit only under
certain experimental conditions. These conditions fall under two categories: 1) new experiments (i.e., an
observation does not yet exist) for which the DSS can only make recommendations for subsystems or components and not specific parameters; or 2) experiments that have already been conducted (i.e., there exists
an observation) for which the DSS can only make recommendations to parameters. The two categories
capture the exploration-exploitation tradeoff that experimenters often face when testing field robotic systems; new experiments may reveal capabilities or limitations of the AUT while tuning parameters may
improve performance to an acceptable threshold. For either scenario, a stage 3 DSS suggests partiallydefined recommendations using a digital interface to define some experiment inputs and the experimenter
is responsible for constructing the full experiment definition. Given the complexity of formulating an intelligent experiment, there are a myriad of opportunities to leverage AI and, as a result, we believe IDSSs will
be necessary to realize useful decision support in stage 3. AI-enabled capabilities could include searching
the design space to optimize parameters of interest, pattern recognition to identify components of interest,
and performance inference for information-theoretic experimental design and cost-benefit analysis. The
quality of decision support will likely be improved by reasoning over the history of decisions made by the
experimenter coupled with the observed outcomes.
3.4.6 Stage 4: Single Design Recommendation
This is the first stage for which an experimental design DSS is required to make fully-defined recommendations that an experimenter could feasibly execute without modification. A recommendation from a stage
4 DSS will define all of the component selections and parameter values that compose the experiment. The
recommendations will also account for the experimenter’s preferences and risk tolerance, which may be
63
manually inputted, learned in an automated fashion, and/or inferred from observations of the AUT in previous experiments and interactions with the experimenter. The DSS will use the experimenter’s decision
of experimental inputs to construct features, identify trends, and reason about the quantitative results and
the experimenter’s qualitative assessment within the human-provided context. Under these conditions,
the experimenter is responsible for the decision to accept or reject a recommendation from the DSS. In
the case of rejection, the experimenter is also responsible for defining their choice of experiment input,
which could be a partial or full configuration change depending on how much of the recommendation the
experimenter chooses to use. Finding the optimal decision for recommendation is a heavily-researched
topic for a variety of other applications [196, 94, 142], from which relevant approaches may emerge.
3.4.7 Stage 5: Sequential Design Recommendation
A DSS with stage 5 features can recommend an ordered sequence of fully-defined experiment inputs,
all of which take into account the experimenter’s preferences and risk tolerance and do not necessarily
require human interaction. We envision that a sequence of recommendations is the most technologically
advanced stage of DSS but could be hugely beneficial for the decision-making process and outcome where
the decision maker needs to make sequential decisions over a period of time. This is comparable to the
long-term planning capability of envisioned DSSs as opposed to the existing DSSs that use short-term
planning [259].
In this stage, the experimenter has no explicit responsibilities but can optionally participate in experimental design decisions by accepting or rejecting the recommendation, which could be any amount of
the sequence ranging from a small change in a single experiment to the entire sequence of experiments.
Regardless, conducting the sequence of experiments remains the experimenter’s responsibility, as the DSS
is only providing a sequence of experiment input definitions. The anticipated value for defining such a sequence of experiments is for domains where reliable communication is not guaranteed, which is a common
64
challenge in many field robotic applications (e.g., humanitarian assistance and disaster relief, military operations, and space exploration). Designs for sequences of experiments can serve either as a necessary means
of experimentation or a contingency plan to overcome intermittent communications. In these cases, degraded communication can negatively impact the experimenter’s ability to make observations in real-time
and configure the AUT for the next experiment. Examples of this type of experiment condition include
long-distance experimentation where the experimenter and AUT are not always physically co-located,
such as space robotics where communication to the AUT can take on the order of days.
3.5 Construction Validation
The DSS taxonomy presented here has been evaluated in over 300 hours of observing or actively conducting field experiments with over 50 researchers from nine different government and academic organizations. This has included weekly test days over the course of six months as well as six, 1 week-long joint
experiments with academic collaborators in ARL’s essential research program and research collaborative
alliance as well as four, 1 week-long joint experiments with performers in the DARPRA RACER program.
These experiments have provided numerous diverse opportunities to validate the construction of the proposed stages and requirements in this taxonomy. This has included testing across all seasons and weather
conditions in different biomes ranging from forest, off-trail, and desert environments using different sensor suites, platforms, autonomous navigation stacks, and experimenters with varying experience levels.
During every experiment, exemplary observations, such as the ones shown in Figure 3.10, were made of
what the experimenter is faced with, the decisions they make, challenges they struggle with, and support
they desire. To date, the proposed taxonomy accounts for every decision an experimenter has made or
wished they could make, which suggests the taxonomy formulation remains relevant and there are no
known gaps.
65
(a) Biome - forest environment (b) Biome - desert environment
(c) Sensor suite (d) Platform
(e) Autonomy Stack - ARL (f) Autonomy Stack - NeBula
(g) Validating examples
Figure 3.10: The six-stage taxonomy of DSSs for experimental design was extensively validated through
field experimentation involving different biomes, sensor suites, platforms, and autonomy stacks. Every
decision observed during experimentation could be mapped do support defined in a stage within the taxonomy.
3.6 Future Directions
In the pursuit of DSSs and IDSSs for experimental design in field robotics, there are a number of open
questions and technical gaps that will need to be addressed before a range of decision support can be
66
offered to an experimenter. We highlight several noteworthy examples here to help direct the scientific
community and expect that new challenges will be revealed during future research efforts.
System Models. One pre-requisite for providing principled decision support is a mathematical framework to represent the AUT with respect to the various subsystems, their components, intra-system interactions, and empirical observations. Such a framework could supplement the existing decision making
process with formal methods and information-theoretic approaches. Importantly, methodologies and representations will need to scale in order to provide decision support for large and complex systems because
field robotic systems oftentimes have many subsystems, components, parameters, and permutations of
feasible configurations; a probabilistic framework will likely provide the basis for stages 2 through 5. Recent literature on how roboticists can select sensors [179] and explore trade-offs in the design space when
building robots [194] may provide useful insights into understanding and modeling systems.
Information Models. In addition to modeling the AUT, future research will be needed to mathematically model information specific to an experimenter, experimental design, and field robotic application.
The goal of experimentation is to maximize the information gained by the experimenter in the context of
their defined objectives while simultaneously reducing the cost to obtain such information. Interestingly,
experimenters typically seek to reduce the number of experiments by avoiding system configurations that
result in catastrophic failures; however, these negative examples can be useful to prove infeasibility under
real conditions or to generate new, important ideas that robustify autonomy in the AUT. New models for
experimental design, including the value of information and the value of experiments, will be needed to
support the decisions that directly affect these competing objectives, comparable to what is referred to in
the literature as information economics [27]. To this end, DSS development would benefit from defining
measures of decision effectiveness [78] so that the impact of information and a proposed DSS can be appropriately assessed. DSSs and IDSSs typically require an appreciable amount of data to contribute intelligent
67
and efficient support, but the cost of conducting field experiments typically limits the number of available observations. As a result, DSSs and IDSSs will need to be data efficient, include real-time updating,
and transform uncertain and incomplete data [214] by reasoning effectively with sparse information and
inferring the effects of system configurations in order to achieve a more favorable cost-benefit ratio.
Experimenter Models. An important aspect of DSS operations is decision support customized to the
experimenter. Building on information-theoretic approaches for the experimenter, future research efforts
should consider intuitive and informative methods for modeling, learning, inferring, and incorporating
experimenter-specific preferences and risk tolerance. HRI literature has already investigated a number of
human factors pertaining to preferences and risk tolerance, but the sequential decision making process
in field experimental design presents unique challenges that require further investigation. This includes
preferences and risk tolerance with respect to a range of different factors (e.g., the experimenter, robotic
system, performance, limitations including failure, and experimental costs), as well as dynamic conditions,
preferences, and risk tolerance that change based on experimental environments, time constraints, current
knowledge, and expected information gain.
Decision Making Models. Planning and decision making across varying time horizons is an open
question in virtually all DSS applications, but is especially important in experimental design for field
robotics. A study of a robotic system consists of a number of experiments, all of which work toward
the common goal of maximizing the human’s understanding of system performance and limitations. Approaches to orchestrate the sequential decision making of experimental design are required to realize a
stage 5 DSS and could provide useful insights in the maturation of lower stage DSSs to maximize the efficiency of experimentation. However, this is a particularly difficult problem because it requires inference
over both the experimental outcomes with respect to a single, selected experiment configuration and the
trend of information gain over multiple experiments with respect to an experimenter.
68
3.7 Summary
In this work, we propose a six-stage taxonomy of human-centered DSS for adaptive experimental design
in the context of field robotics. We provide supporting terms, definitions, and related work to arrive at a
taxonomy that we believe is generally useful for field robotics experimentation and provides a common
framework for future research. To provide more assistance to the research community, we also identify
technical gaps and specific future directions with respect to the requirements in our proposed taxonomy
and the existing literature.
69
Chapter 4
Design Assistance for Structured Experimental Design
4.1 Background
To confidently deploy autonomous systems in unstructured environments, we require trustworthiness in
the abilities and intent of the agents. Trustworthiness in systems can be built by conducting extensive,
meaningful experiments that reveal the performance and limitations of the robot or team. Importantly,
these experiments need to be conducted rapidly and adaptively to match the brisk pace of experimental research developments; otherwise, the technology under test will be obsolete and the findings could
have limited applicable value. To date, the scientific community has devoted a disproportionate amount
of resources to the research and development of learning-enabled, (sub)system-level capabilities, and how
humans interact with robots, compared to the rigorous and expedited test methods for experimental autonomous systems. The construction of experiments, referred to as experimental design [85], warrants
more attention from the community because it directly affects the amount of knowledge gained and, if
done intelligently, can reduce the human experimenter’s effort, increase trustworthiness, and accelerate
the concept-to-fielding cycle.
In addition to the lack of attention given to experimental design, there exists a gap between academia
and industry in the development of trustworthy autonomous systems that improved experimental design
70
methods may help bridge. Robotics research in academia has largely focused on the development of behaviors, capabilities, and interactions for autonomous robots by making significant advancements in artificial
intelligence (AI), machine learning (ML), and human-robot interaction (HRI). While the trust in robotics
research has focused on trust measurement, estimation, transparency, the design of robots, theoretical
frameworks, and formal verification [124], among others that don’t include experimental design. On the
other hand, industry has focused on product-grade system design and maturation with well-defined Testing, Evaluation, Verification, and Validation (TEV&V) techniques at the expense of timely adoption of
state-of-the-art approaches. With more profound experimental design solutions, academia can catalyze
the discovery and communication of the scientific underpinnings for next-generation AI, ML, and HRI
technologies. Novel experimental design solutions will also help industry make more informed design
decisions to expedite the systems engineering cycle and equip practitioners with real-world discoveries
to influence academic efforts. Together, academia and industry can share experimental design procedures
and lessons learned that collectively benefit both groups and collaboratively improve robotics research
while remaining agnostic of intellectual property boundaries.
Experimental design is an adaptive, human-centric procedure [88] as well as a unique human-robot
interaction problem where the interaction between the human and autonomous robot(s) is in the form of an
experiment. A human experimenter executes a sequential decision making cycle to configure autonomous
systems, interprets empirical results after executing an experiment, and uses newly-obtained insights to
inform decisions in the next experiment. The human has control over the level of autonomy and structure
of the team, and the human – not the robot – is the learner with respect to the robot’s ability to perform
some domain-specific task(s). The selected experiment dictates the nature of information exchange, and
the human must intelligently choose valuable experiments in order to mitigate the high cost and risk
associated with conducting a test. The human plays an integral role in experimental design because they
excel at risk management, context-dependent reasoning, and the prioritization of objectives to ensure that
71
experimentation is consistent with the expected system capabilities within the scope of the specific robotic
application.
While the experimenter provides several indispensable skills, humans struggle with reasoning about
large volumes of data, estimating uncertainty from noisy observations, and making unbiased decision
making to maximize the efficiency of exploration and exploitation [64]. These shortcomings are exacerbated when conducting experiments in the physical world due to the stressful and cognitively-burdensome
conditions of interacting with intricate systems. As a result, humans are prone to making errors and suboptimal decisions that waste time and resources, increase risk, decrease the value of selected experiments,
prolong the process of gaining system knowledge, and delay deployment.
To overcome these inherent deficiencies and accelerate the pace of experimentation, we seek to augment the human’s decision making abilities. Already in other domains and for many years, decision support
systems (DSSs) have demonstrated measurable benefits [58, 212] and become productized. Simple checklists have also been shown to introduce structure into the decision making process and provide monumental benefits during taxing and time- and safety-critical operations, such as the prevention of injury
or death for millions of people in routine medical and construction procedures [73]. We take inspiration
from these well-documented successes and investigate how to improve a human’s decision making ability
in experimental design using checklist-style DSSs. As depicted in Figures 3.3-3.7, we envision DSSs that
provide appreciable support to a human and lead to the mitigation of suboptimal decisions or mistakes
by selecting more informative experiments. Ultimately, the support offered by the DSS and more fruitful
interactions between the human and robot will lead to increased explainability of system performance and
limitations.
In this pursuit of improving human decision making in the experimental design of autonomous systems, we pose three research questions:
1. Do experimenters make suboptimal decisions or mistakes during the experimental design process?
72
2. What form of decision support and DSS interaction could be beneficial for experimenters?
3. Do experimenters value experimental design assistance?
Contributions. Through these research questions we seek to better understand how well humans
make experimental design decisions and whether a DSS could provide assistance. We propose an information aiding DSS aimed specifically at providing decision support with minimal integration cost as a way
to synergize with rapidly-evolving robotics research. This is in stark contrast to suggestion-aiding DSS,
which are typically more sophisticated and require large knowledge databases to make proactive recommendations; to the best of our knowledge, we propose the first Stage 1 DSS for experimental design of
autonomous robots [85]. We then seek to address our research questions by:
1. Investigating our proposed DSS for experimental design based on interactions between experimenters and two off-road autonomous navigation systems in a collection of experiments in realworld environments. To the best of our knowledge, this is the first effort toward realizing a DSS for
experimental design aimed at reducing suboptimal decisions or mistakes in experimental robotics.
2. Conducting an exploratory study to investigate human decision making and the potential of decision support in experimental design. Our results reveal that experimenters, including experienced
roboticists, make suboptimal decisions and mistakes, our DSS offers some promising value, and experimenters generally appreciate the decision support. This evidence suggests that experimental
design could benefit from decision support.
73
4.2 Stage 1 DSS Development
In this section, we develop a DSS for experimental design using interactions between experimenters and
two different off-road autonomous ground vehicles. Instead of requiring considerable amounts of preexisting, domain-specific data to build system-specific knowledge databases and models, we take a simplistic, questionnaire-based approach inspired by checklists to remind the experimenter of important experimental design considerations with minimal effort or integration cost.
4.2.1 Empirical Data Collection
In our previous work, we described two experimental scenarios that explored the performance and limitations of the Army Research Laboratory Ground Autonomy Software Stack, which is an end-to-end software
architecture for autonomous navigation. We provide a brief overview of these experiments here and refer
the reader to [88] for more details. The experiments were conducted by having an experimenter choose the
hardware and software components and parameters of the robotic system, and then observe navigation
performance in waypoint missions. An experimenter with a working knowledge of the robotic system
but minimal a-priori knowledge of the environment adaptively chose and conducted 11 experiments in a
forested environment and an additional 11 experiments in a new forested environment with greater knowledge of both the system and environment. The latter of these forested missions, along with the research
platform, are shown in Figures 4.1a and 4.1b. The experimenter chose to change system parameters, such
as global and local path planning algorithms, inertial measurement unit sensors, mapping inflation radii,
and velocity limits. System performance was measured by the total duration of the mission, the number
of safety interventions initiated by a human operator, the number of collisions, and the experimenter’s
qualitative score of performance on a scale of 1 to 5 with respect to a human baseline.
74
(a) Forest mission
(b) Research Platform 1
Figure 4.1: One of the two missions conducted in the forest using Research Platform 1. Blue discs indicate
the ordered goal locations of the waypoint mission.
75
(a) Desert mission
(b) Research Platform 2
Figure 4.2: One of the two missions conducted in the desert using Research Platform 2. Blue discs indicate
the ordered goal locations of the waypoint mission.
76
In a separate study, an experimenter conducted a similar procedure of choosing system configurations
and explored the performance of Team CoSTAR’s NeBula architecture [6] using the same metrics in two
different desert missions. One of the two missions and the research platform for this study are depicted in
Figures 4.2a and 4.2b. In this case, the experimenter was considered an expert with the robotic system but
had no a-priori knowledge of the operational environment. They chose to conduct 10 and 12 experiments,
respectively, and explored the effect of changing system parameters, such as local planning algorithms,
cost function weights, replanning frequencies, traversability mapping thresholds, and velocity limits.
4.2.2 Support Formulation
Through out all 44 experiments of both robotic systems we recorded the configuration changes that
were used as experiment inputs, the subsequent experiment observations, and the experimenters’ verbal
communications. We also recorded the experimenters’ dialogue pertaining to the experiment, including
thought processes, questions, concerns, and decisions, which were logged in natural form so that the decision making process was uninterrupted. This produced a large amount of unstructured, free response
data that is not amenable to formal, mathematical clustering algorithms. We manually clustered the experimenters’ aggregated conversations by topic and time of event, and then constructed corresponding questions that could have been asked to elicit the recorded comments from the experimenters. In terms of the
specific categorical topics used for clustering we found that much of the experimenters’ dialogue directly
or indirectly related to general themes, which were: predicted achievable performance; differences from
previous observations; quantitative performance thresholds; qualitative performance thresholds; problem
identification; proposed solutions; and risk.
We heavily weighted the common questions that experimenters repeatedly posed as a way to capture
the dominant aspects of the experimental design decision making process. This also improves the likelihood that our DSS facilitates valuable thought analysis for experimental design, given that experimenters
77
have demonstrated these types of questions can produce salient information. As a result, our proposed
DSS is a set of carefully crafted questions that are system-agnostic to increase applicability and are deployed by prompting responses from the experimenter at specific times in the decision making cycle. Our
questionnaire-based DSS does not serve as a data logging framework to record every experiment detail,
but rather serves in an advisory role to assist the experimenter with conceptually formulating experiments
through directed thought analysis in order to select the next experiment. All questions are intended to be
answered by the experimenter either verbally or in brief, written form, e.g., pen-and-paper or a web-based
tool. In either case, the DSS evokes explicit responses from the experimenter to ensure intentional thought
from the human, but encourages informal interactions to ensure that the DSS does not become distracting,
burdensome, or time consuming. Similar to checklists, our proposed DSS serves as a reminder for the experimenter to consider noteworthy aspects of experimental design that a human might otherwise forget
or overlook due to high system complexity and cognitive demands during experimentation.
We find that questions can be naturally clustered into two time-based categories: before the experiment is conducted and after an observation has been made. As a result, our DSS is designed to prompt
the experimenter with a pre-experiment questionnaire to better understand and facilitate experimental design decisions. Then, after the experimenter conducts the experiment and makes an observation, the
DSS prompts the experimenter with a post-experiment questionnaire to facilitate analysis, conclusions, and
preparation for the next experiment. Our questions for the pre-experiment questionnaire are:
1. Metrics: What are you measuring for evaluation?
2. Motivation: Why are you conducting this experiment?
3. Change: What specific changes are you making?
4. Hypothesis: What is your hypothesized outcome(s)?
5. Exploitation: What knowledge are you exploiting?
78
6. Analysis: What useful data will the experiment generate, and how will it be used for analysis?
7. Previous: What were the real-world or simulated results of relevant, previously-conducted experiments?
8. Exploration: What are you changing for the first time to explore their effects, if any?
9. Risk: On a scale of 1–10 (1 being risk-free and 10 being almost too risky to perform), how much
risk does your experiment have: a) to human safety; b) to robot safety; and c) due to anticipated
system limitations/failure, and why?
Our proposed post-experiment questionnaire is:
1. Outcome: What did you observe?
2. Impact: Did your changes produce the hypothesized outcome?
3. Deviation: What differences from your intended experimental plan occurred and why were they
necessary?
4. Root cause: Which changes or effects do you suspect caused sub-optimal performance or failure?
5. Objective: Do you consider the observed outcome(s) quantitatively satisfactory, and why?
6. Subjective: In what ways could the robotic system improve qualitatively?
7. Informative: What did you learn from this experiment?
8. Repeat: Is there any reason to conduct this experiment again?
9. Next steps: What questions are still unanswered and what new questions can be asked based on
these results?
79
Our pre- and post-experiment questionnaires are designed to provide an effective means for systematic and structured experimental design. This domain-agnostic, question-based format seeks to provide a
favorable cost-benefit ratio by avoiding the requirement of pre-existing data that more sophisticated DSSs
use to construct knowledge databases [1, 162]. A diagram showing the workflow of the Stage 1 DSS is
shown in Figure 4.3.
Figure 4.3: A diagram showing the deployment of the proposed Stage 1 DSS. The system consists of two
questionnaires, one which is answered by the human experimenter before conducting an experiment and
one to be answered after the experiment. The key themes of the questions are shown for brevity.
4.3 Exploratory User Study on Design Assistance
To explore the quality of decision making and potential room for improvement in experimental design, we
conducted a between-subjects, online user study. This section presents the setup, population, evaluation,
and results of our user study, which are then used in the Discussion section to answer our originally-posed
research questions.
80
4.3.1 Setup
In this study, the independent variable is the use of decision support provided by our DSS and the dependent
variables are the number of suboptimal and mistaken decisions. We desire as few instances of suboptimal
decisions and mistakes as possible and believe that some form of decision support may be able to help
reduce their occurrences.
Our user study was conducted using two versions of an anonymous, online form where participants
were asked to design experiments. Participants voluntarily completed the survey on their own computers at the time and location of their choosing, which helped mitigate interview bias. The participants’
experimental designs were qualitatively evaluated by an expert experimenter who has over 10 years of experience with the specific robotic system used in this study, including conducting real-world experiments
in the scenario that participants were shown [88].
After providing consent and basic demographic information (i.e., occupation and years of field robotic
experience), participants were given background information about the concept of experimental design,
the layout of the study, and their objective of designing experiments. Included in this background information was a picture of a real autonomous ground robot, a simulated instantiation, and a video of an
exemplary waypoint mission, as shown in Figures 4.1b and 4.4. The participant was informed that they will
construct experiments of the ground vehicle by defining configurations of hardware and software components as well as algorithmic parameters that they believe will result in the fastest autonomous navigation
in the given waypoint mission while experiencing the fewest collisions and human interventions.
As depicted in Figure 4.5, the participants were given a list of 12 hardware components, 16 software
components, and 9 algorithmic parameters to choose from when constructing experiments. A baseline
configuration was defined by the expert experimenter and is represented in Figure 4.5 by the hardware
and software components with check marks and the numerical values of the parameters. This was provided to the participants as a starting point for every experiment and represented the minimum set of
81
components and parameter definitions for a functional autonomous ground robot. Note that the available
options provided to the participant are only a subset of what an experimental system in robotics research
could include. A complete set of components and parameters for the real autonomous system was intentionally not provided in order to avoid unfairly overwhelming the participants and requiring substantial
training; however, the participants had a selection of 37 items to emulate some system complexity. To increase familiarity with the autonomous robot in this study, the participant was shown a video of simulated
autonomous navigation using the baseline configuration in an open meadow environment.
The participants were asked to design an experiment for an autonomous waypoint mission in a dense
forest. A photo of the robot in this environment, as seen in Figure 4.4c, along with written context of
the experimental conditions were provided to the participants before they were asked to design their
experiment. A participant designed an experiment by clicking on boxes to add or remove hardware and
software components and moving sliders to change the parameter values. All of the participant’s selections
for hardware, software, and parameters were recorded and evaluated offline.
We used a between-subjects design for this study to minimize the learning and transfer effects among
participants. Our population was randomly divided into two groups, and only one was provided with decision support before making their experimental design decision, while the other group was asked to make
decisions unassisted. The participants assigned to the group that did not receive decision support represent the current methodology of robotics research and are considered the control group. The participants
assigned to the group that proactively received decision support, referred to as the “assisted group", were
prompted with the fixed set of questions in our pre-experiment questionnaire, as described in the DSS
Development Section, and given a free-response text box where they could optionally type any form of response they chose. Alternatively, they were given the choice to answer the pre-experiment questionnaire
prompts silently. These participants were required to spend a minimum of two minutes on the survey page
with the pre-experiment questionnaire, but they could take as much time as they wanted.
82
(a) Background: simulated robot
(b) Background: example waypoint mission
(c) Scenario: dense forest
Figure 4.4: The participants were shown an image of a real robot (Figure 4.1b), the equivalent robot in
simulation (Figure 4.4a), and a video of a simulated waypoint mission (a representative screenshot is shown
in Figure 4.4b) as introductory material. For designing experiments, participants were shown a simulated
dense forest environment (Figure 4.4c) and given written context that included the robot’s size, number
of waypoints, total length of the mission, environmental characteristics (e.g., trees, logs, trails, bushes, tall
grass, ravines, and rolling hills), the season, and a weather forecast for the simulated conditions.
83
Figure 4.5: The participants constructed experiments by choosing from a list of 12 hardware components,
16 software components, and 9 algorithmic parameters. The baseline configuration is represented by the
checkmarked components and default parameter values.
84
Figure 4.6: The first five questions of the Stage 1 DSS questionnaire that were provided to the participants
in the assisted group. Participants were given the option to answer silently or by typing in the free response
text box.
After designing experiments, the participants completed an opinion-based exit survey consisting of
four questions. The participants were asked two questions to indicate how much they agreed with the
assertive statements: “The prompted questions were useful for designing experiments effectively." and
“The time required to answer the questions was burdensome.", where the participant indicated their level
of agreement on a 5-point Likert scale (strongly disagree=0, disagree=1, neutral=2, agree=3, strongly
agree=4). They were also asked two free-response questions: “Please indicate any questions that you
85
Figure 4.7: The remaining four questions of the Stage 1 DSS questionnaire that were provided to the
participants in the assisted group.
feel were unnecessary, or type ‘None’" and “Please indicate any questions or topics that you feel should
be added, or type ‘None’".
4.3.2 Participants
We identified active researchers for participation in our study using publicly available information from
robotics conferences, journals, and articles. Requests to participate in this online study were sent via email
to researchers and roboticists that have previously published or worked on autonomous robots, as indicated
86
by published materials. Participants were intentionally recruited and selected from the same population to
reduce selection bias and ensure sufficient relevant experience to represent decision making in the robotics
community. No compensation was offered to participants, and this study was approved by the USC Institutional Review Board. Participants that have experience with conducting field robotic experiments were
preferred, and the minimum qualifying experience was candidates that have both familiarity with the key
components of functional mobile robots (i.e., localization, perception, planning, and control) as well as the
associated challenges of operating in the real world (i.e., experience beyond an academic course). This
requirement ultimately ensured that participants have an understanding of the components and how they
could fit together for a robotic system so that intentional decisions are made when constructing experiments. As a result, this study allowed basic researchers, engineers, practitioners, and experiment-support
staff to participate. After one person was removed from the study due to insufficient recent experience
with ground robots specifically, our study population consisted of N(1) = 12 people. Note, the notation
N(1) is used to denote the population size of this human subjects study – the first presented in this dissertation – because a second study is presented in Chapter 5 and there the population size is reported using
N(2). Each participant in the population was randomly assigned to one of two groups based on a coin flip,
such that the control group contained N
(1)
c = 6 participants and the assisted group contained N
(1)
a = 6
participants. Based on self-reported demographic information, the control group consisted of five people
with 1 − 5 years of field robotics experience and one person with 10 − 15 years of experience while the
assisted group consisted of two people with 1 − 5 years of experience, one person with 5 − 10 years, two
people with 10 − 15 years, and one person with 15+ years of field robotics experience.
4.3.3 Evaluation
We categorize the participants’ responses as good, suboptimal, and bad experimental designs. A bad experimental design contains at least one mistake, which is defined as an objectively flawed decision; for
87
example, adding GPS localization but forgetting to add a corresponding GPS sensor would be a mistake
because the software component wouldn’t have the necessary data input and therefore the autonomous
robot won’t be able to function as desired. A suboptimal experimental design is defined as containing a
selection that significantly differs from the expert’s experimental design; for example, adding road detection in the dense forest setting is ill-advised because there are no roads in the environment, and, thus, this
decision would lead to wasted computational resources. Finally, a good experimental design is defined as
an experimental configuration that contains no mistakes or suboptimal decisions.
The expert experimenter defined their experimental design, and we compared the participant’s responses to determine decision quality. For the dense forest, the expert chose to 1) add a camera and 2) add
off-road capabilities in response to testing in the forest environment; they also advise against adding road
detection and following, increasing the obstacle safety radius beyond 100 cm, or increasing the roll and
pitch constraints beyond 50 degrees because of the expected terrain challenges presented by trees, logs,
and elevation change in the forest. For the purposes of evaluating experimental designs, the participants
were given credit for including off-road capabilities if they added any of the following: 1) visual inertial
odometry; 2) height mapping; 3) terrain classification; and/or 4) trail detection and following.
4.3.4 Results
The participants’ responses and experimental design categorizations for the two groups are shown in
Figure 4.8. We see from these results that the assisted group performed qualitatively better by producing
three good, two suboptimal, and one bad experimental designs, compared to the one good, four suboptimal,
and one bad experimental designs in the control group.
In the opinion-based exit survey, the participants in the assisted group agreed the DSS questions were
useful (µ = 2.83, σ = 0.41) and disagreed with the assertion that the questions are burdensome (µ =
1.83, σ = 0.75), where µ and σ are mean and standard deviation, respectively, and values 1, 2, and 3,
88
correspond to disagree, neutral, and agree on our 5-point scale. After making their experimental designs,
the participants in the control group were also shown the pre-experiment questionnaire for reflection
and asked the same agree/disagree opinion-based questions. The participants in the control group agreed
that the DSS appears to be useful (µ = 3.00, σ = 0.63) and not burdensome (µ = 1.33, σ = 1.03).
By soliciting free-response feedback on extraneous and missing aspects of the DSS, we found only one
participant thought some of the prompting questions were extraneous (Pre-Experiment Questions 5 and
6) and no participants suggested a fundamentally new question; participants only suggested rewording or
clarifying of existing questions.
(a) Control group
(b) Assisted group
Figure 4.8: Survey results for the dense forest scenario. Check marks indicate the participant performed
the task indicated by the corresponding column title and ‘X’ marks indicate they did not. Red-shaded and
blue-shaded cells represent the corresponding mistakes and suboptimal decisions, respectively.
4.4 Discussion
Our exploratory study revealed several important aspects of human-in-the-loop experimental design. First,
we observed empirical evidence to suggest experimenters make suboptimal decisions and mistakes. We
89
observed 1 mistake and at least 2 suboptimal decisions in both groups. Importantly, this includes experienced field roboticists since all of our participants have at least 1 year of field robotics experience and are
actively conducting robotics research.
For our second research question, we proposed a DSS where the interaction is in the form of thoughtprovoking questions that the experimenter could type or answer silently. This questionnaire-based DSS
introduced zero integration costs to the robotic system, and no more than to 5.2 minutes of additional
time to an experimental design decision for participants in the assisted group. We observe that the DSS,
as presented, is showing promise in reducing suboptimal decisions. All of the participants in the assisted
group added cameras, whereas one participant in the control group forgot a camera and mistakenly added
perception-based off-road capabilities. There was also one fewer instances of both adding unnecessary
software (i.e., road detection and following) and choosing risky parameter values (i.e., excessively large
pitch constraint) by the participants in the assisted group compared to the control group. Admittedly,
there is still room for improvement, including more targeted ways to reduce the number of mistakes.
This is especially paramount with regards to the interdependence of hardware and software components,
given that one participant in the assisted group made a mistake by adding a GPS localization software
component without the corresponding sensor. The results of our exit survey, shown in Figure 4.9 reveal
that the 12 participants in this study felt favorable in terms of the usefulness and lack of burden of the DSS
for the decision making process of experimental design. Interestingly, the participants in the control group
viewed the DSS slightly more favorably than the participants in the assisted group, which may be because
they can more easily imagine the utility of assisted thought after having made their decisions unassisted.
It’s also worth noting that participants preferred to type their responses to the DSS questions, at least
partially, rather than answering silently. None of the participants in the assisted group answered the entire
questionnaire silently and 4 participants typed their responses to at least 7 of the 9 prompting questions,
which participants reported as leading to “more thorough" responses and “thought-out" decisions. While
90
the questionnaire-based DSS is showing some promise with enabling more structured thought analysis
for the population in this study, passive-only decision support is inherently limited. That is, the DSS does
not actively use the information provided by the experimenter and the decision support offered does not
reflect any of the decisions being considered or previously made by the human. Proactive decision support
is investigated in Chapters 5-6 to assess the feasibility of positively affecting the experimenter’s decision
making.
(a) Usefulness viewed by the Control group (b) Usefulness viewed by the Assisted group
(c) Burden viewed by the Control group (d) Burden viewed by the Assisted group
Figure 4.9: Results from the exit survey of the human subjects study where participants were asked to
answer 5-point Likert scale questions to indicate whether they thought experiment planning using the
DSS questionnaire was useful and burdensome. Note, the control group (left column of graphs) were asked
to retrospectively consider the DSS questionnaire after designing their experiment whereas the assisted
group (right column) used the questionnaire.
4.5 Study Limitations
Our results help shed light on the quality of decision quality in experimental design; however, there are
several limitations with our human subjects study that are relevant for analyzing our findings and can
influence future studies.
91
Simplified state space exploration: The participants designed their experiments by making selections
from a list of 37 items, which are a small subset of what would be required by a deployable autonomous
ground vehicle. This was to remove the requirement of having prior experience with a specific robotic
system and make the survey accessible to any roboticist with experience of a similar system. As the
number of components and parameters in deployable, real-world robotic systems increases, we suspect
experimenters will be more likely to make suboptimal decisions or mistakes during experimental design
if unassisted.
Idealized decision making conditions: The participants completed the online survey under ideal conditions that are not necessarily representative of real-world conditions, i.e., without the stress or fatigue
that is common in experimental design. Practitioners and experimenters are oftentimes required to make
decisions “in the wild", which can be dirty or harsh environmental conditions, e.g., extreme heat or cold,
and this erodes their decision making quality. The longer experimenters conduct tests, the more physically
and cognitively burdened they become. Therefore, we expect mistakes and suboptimal decisions to be even
more likely than what is found in the study presented in this chapter when experimenters are conducting
tests under less forgiving conditions.
Lack of decision feedback: The participants in this study were asked to design an experiment for a
given scenario in order to investigate the thought and experiment construction processes, but they did not
observe the outcomes of their design decisions. Future studies could benefit from showing the participants
the robot’s execution of the selected experiment. The benefits of this could include opportunities to excite
multiple iterations of the decision making cycle, evaluate the post-experiment questionnaire, assess the
human’s analytical thinking skills, and explore whether experimenters make insufficient or flawed analysis
that influences their subsequent experimental design.
Limited sample size: Although the N(1) = 12 participants originated from the same population and
possessed relevant robotics experience, a study with a larger sample size might reveal more fundamental
92
trends in decision making or characteristics of experimental design problems that could be of interest to
the scientific community.
4.6 Summary
In this chapter, we explore the human decision making process in the experimental design of autonomous
robots and what role decision support could play. From our exploratory, online user study, we discovered
examples of experienced field roboticists making suboptimal decisions and mistakes when designing experiments in relatively idealized conditions with simplified robotic systems. As a first step toward reducing
the frequency of poor decisions, we constructed a questionnaire-based DSS using empirical interactions
between experimenters and two off-road autonomous robotic systems. The questionnaire format was inspired by simple checklists due to the simplicity, minimal integration cost, and demonstrated benefit of
preventing routine tasks from being overlooked or forgotten. With this in mind, we found that our proposed system is showing promise with helping humans in some specific situations related to hardware,
software, and parameter selection, where the experimental design process can be overwhelming and the
likelihood of mistakes or suboptimal decision making is increased. Importantly, users in the human subjects study felt as though our DSS is useful and not burdensome, which bodes well for technology adoption
and use.
93
Chapter 5
Design Monitoring for Proactive Decision Support
5.1 The Need for Proactive Decision Support
As we discovered in the previous chapter, prompting experimenters with checklist-style questions introduces useful structure with minimal integration cost, but has limited effect due to the passive nature of
support. There is greater opportunity to improve the experimenter’s decision making ability and the quality of decision support by introducing proactive support. That is, if DSSs provide decision support before
a decision were made the human could potentially alter what test they choose to conduct and, in doing
so, the human may effectively reduce or eliminate the selection of low value experiments that they might
otherwise accidentally or unknowingly conduct. Take, for example, the mistake made by Participant 3 in
the human subjects study in Chapter 4.3.4. If this participant were proactively notified that they forgot
to include the GPS sensor that is necessary for the GPS localization that they wished to use, they could
reconfigure their experiment to correct for this mistake and save the setup time for an objectively flawed
experiment. As a second example, consider Participant 5 from the study in Chapter 4.3.4. While their
experiment will yield some useful results because there are no mistakes in their experimental design, they
are not maximizing the potential information gain by excluding off-road capabilities. Given the context
that the experiments are conducted in the forest, it would behoove the experimenter to include either
terrain classification, trail detection, and/or height map generation as a way to evaluate components that
94
will likely provide necessary capabilities for improved performance. To further illustrate these scenarios,
examples of experimental designs are shown in Figure 5.1. Note, the road detection and following software
component that leads to a suboptimal decision for an experiment conducted in the forest would be applicable in an urban setting and, conversely, the trail detection and following software component would be
a suboptimal inclusion for an urban testing environment.
(a) Mistake due to missing hardware for a software selection.
(b) Suboptimal decision due to inclusion of unnecessary software for a forest environment.
(c) Suboptimal decision due to inclusion of unnecessary software for a urban environment.
Figure 5.1: Examples of experimental designs illustrating a mistake and suboptimal decisions depending
on the selection and hardware and software components given certain context, namely the testing environment.
95
Toward providing experimenters with proactive decision support, we look to introduce intelligent
alerts based on related documented successes in literature of assistive technologies, such as anomaly detection enabling control correction during robot deployment [215], as well as our previous work in alert
generation in human-robot teams [110, 106, 105, 104, 108]. For example, my previous work presented a
threshold-based approach to generating alerts for remote operators of mobile manipulator systems so that
human supervisors can quickly assess the situation and respond accordingly [104]. The system diagram
from that work containing the alert generation module is shown in Figure 5.2. The common theme across
alert generation for the human supervising the mobile manipulator and humans constructing experiments
is the presentation of timely, actionable information that allows a human to act before an event occurs in
order to reduce risk or prevent unwanted outcomes, e.g., collision in a manipulation task or selection of a
low-value experiment. A fundamental difference, however, between these two applications is the decision
making process and available information. In my previous works an environmental model was built in
the observable configuration space and a motion planner was used to show the intended trajectory of the
mobile manipulator so that an operator could intervene on short-horizon timescales. Uncertainty in this
case was with respect to the motion of the robotic system. For experimental design, the decision making
process spans longer time horizons and the uncertainty is with respect to experimental inputs, outcomes,
and system interactions for an environment that is not necessarily entirely observable. Using a DSS to
facilitate the sequential decision making process is especially appealing to help mitigate these challenges,
and alert generation is an exciting addition to DSSs because it has the potential to greatly impact the experimenter’s selection in a flexible and unobtrusive fashion. This chapter describes one possible instantiation
of a Stage 2 DSS that uses an alert generation framework to provide proactive decision support.
96
Figure 5.2: A system architecture diagram of a mobile manipulator system with an alert generation module from my previous work [104]. The alert generation module accepts environmental models, plans, and
alert condition settings to produce plan and risk visualizations for a human supervisor before robot commands are sent to the controller. This architecture represents one paradigm for alert generation in robot
deployment, which provides some useful insights but is characteristically different than alert generation
for experimental design.
97
5.2 Stage 2 DSS Development
Much like a blind spot monitoring system in an on-road vehicle, alerts can provide a small amount of
supporting information that requires no additional input from the human but can directly supplement or
augment decision making in a meaningful way, e.g., preventing an accident while driving. Alert generation
should ideally maximize the benefit added to the experimenter while minimizing the overhead and cost
for both integration and deployment. Towards this goal, our guiding principles for alert generation are
that alerts in decision making for experimental design must be timely, easily interpretable, relevant, and
judicious. Alerts must be provided in a timely fashion with respect to the decision cycle so that the human
has sufficient time to act upon the information or event the alerts pertains to. Alerts must also be provided
in a fashion that the human intuitively and quickly understands so that the human’s subsequent decisions
and actions respond appropriately. In order for alerts to offer value, they must prudently provide information or assistance that is related to the human’s decision; otherwise, alerts could be viewed as irrelevant
or a nuisance that the human would prefer not using. In the case of the blind spot monitoring example,
alerts must be provided to the driver before an accident occurs, in a form the driver comprehends in under
a second, and only issued for other vehicles in the immediate vicinity that truly pose a threat of collision.
In the case of sequential decision making for experimental design of autonomous ground robots, alerts
must be provided before the subsequent experiment is selected, in a form the experimenter understands to
make corresponding adjustments to the experiment configuration, and pertains to desirable experiments
the experimenter is interested in conducting.
From the results presented in Chapter 4.3.4 we see that there is room for improvement in terms of
preventing mistakes and suboptimal decisions. We will use the examples and lessons learned from that
human subjects study to guide our development of alert generation in this chapter. First, we notice that
mistakes oftentimes take the form of architectural design issues where experimenters do not select the
necessary input component(s), typically sensors, for other desired components. This suggests that the
98
structure of the system plays a critical role in designing feasible experiments, especially in the case of
exploring new hardware-software pairs. Next, we notice that five out of six of the suboptimal decisions
were because participants either excluded off-road capabilities or included road detection capabilities – the
former being desirable and the latter being unnecessary, specifically in the forest environment. From this
we conclude that suboptimal decisions can be related to the context of the operational environment and
intended application, and capturing this context within our alert generation framework will be paramount.
5.2.1 Graph Neural Networks
We propose using Graph Neural Networks (GNN) as the mechanism to generate alerts in a DSS for experimental design of autonomous ground robots. As their name implies, GNNs are a neural network-based
approach to process data represented by graphs. Following their introduction by [80], GNNs have growing interests in literature and demonstrated noteworthy successes across a wide range of domains, such as
social networks, bioinformatics, drug discovery, and recommendation systems among others [253]. GNNs
are an exciting avenue for DSSs because experimental design oftentimes include relational or structural
information, which naturally lends itself to a graph representation. In this subchapter, a brief background
on GNNs is provided to facilitate the description of using GNNs for alert generation in DSSs that will be
described in the next subchapter. For a more detailed explanation of GNNs, the reader is referred to [254].
The mathematical notation used here is borrowed from [254] to provide the reader with a more comprehensible explanation. A graph G = (V, E) contains a set of nodes, V , and set of edges, E. The
corresponding adjacency matrix A ∈ R
n×n
is defined as Aij = 1 if eij = (vi
, vj ) ∈ E and Aij = 0 if
eij ∈/ E. A graph may have node attributes X ∈ R
n×d or edge attributes Xe ∈ R
m×c
. In the case of
node attributes, node v has an associated feature vector xv ∈ R
d
and in the case of edge attributes, there
is a feature vector x
e
v,u ∈ R
c
for edge (v, u). Generalizing from the convolution operation on grids, GNNs
compute analogous convolutions over node neighborhoods, where the neighborhood of a node v ∈ V is
99
defined as N(v) = {u ∈ V | (v, u) ∈ E}. We then define node v’s hidden feature vector as hv ∈ R
b
and
the node hidden feature matrix H ∈ R
n×b
.
Intuitively, graphs are an effective means of representing objects (or concepts) and relationships using
nodes and edges, respectively. An object is defined by features and related objects, which are captured
in GNNs by node feature vectors and edges to adjacent nodes. A GNN defines and computes a state
for each node and then iteratively updates the state using the states of neighboring nodes. Literature
has proposed different models to propagate this information through out the graph, e.g. layering [153]
and recursive [201], but the key idea is that structured information in the form of node representations
and neighborhoods are used in the data processing step to reason over the broader graph structure, as
illustrated in Figure 5.3. GNNs are powerful tools in the ML literature because they can operate on a
general class of graphs, including cyclic, directed, dynamic, or heterogeneous graphs as well as in nonEuclidean and Euclidean spaces. Following the taxonomy presented in [254] and similar review in [265],
there are generally four categories of GNNs: recurrent GNNs, convolutional GNNs, Graph Autoencoders,
and Spatial-Temporal GNNs. Recurrent GNNs use recurrent neural architectures as the basis for learning node representations and nodes exchange information with neighbors until an equilibrium is reached;
whereas, convolutional GNNs generalize the convolutional operator and aggregate node and neighbor
features with multi-layer neural networks. Graph Autoencoder-based GNNs use an encoder-decoder architecture to encode nodes or entire graphs into a latent feature space and then reconstruct, or decode,
graph data. Spatial-Temporal GNNs operate specifically on spatial-temporal graphs and typically combine
graph convolutions with recurrent or convolutional neural networks to simultaneously learn over spatial
and temporal dependencies. Generally speaking, GNNs perform node-, edge-, and graph-level tasks, including classification and regression; and depending on the task and availability of labeled data, GNNs are
trained in either supervised, semi-supervised, or unsupervised settings.
100
(a) 2D convolution (b) Graph convolution
Figure 5.3: An example borrowed from [254] illustrating (a) 2-D convolution where neighbors of the red
node are ordered and have a fixed size; and (b) graph convolution where neighbors are unordered and
variable in size.
5.2.2 Alert Generation
GNNs are an attractive representation for encoding experimental design aspects because robotic system
architectures and information flow can naturally be captured in a graph. Importantly, we seek to build
GNNs that account for the context of the experimental design with respect to the system and environment,
and whose outputs are actionable decision support for the experimenter. Our goal is to build an alertgeneration framework for a Stage 2 DSS by training a GNN to perform graph classification. This GNN
will provide the ability to infer the decision quality of the experimenter’s proposed experimental design
before the experiment is conducted and provide an opportunity for the experimenter to revise their design,
if needed, by issuing an alert with respect to the graph classification. The trained GNN will output a
label corresponding to “good", “suboptimal", and “bad" classifications, similar to the expert experimenter’s
evaluation in the user study presented in Chapter 4.3.3, except in this case there is no human oversight
in the assessment. The anticipated value offered by this GNN-based alert generation framework is the
introduction of proactive, context-dependent decision support that the Stage 1 DSS did not possess.
We begin by describing the construction of the graph representation, which encapsulates both structural and non-structural information for a single experimental design. For the purposes of this work, we
101
will start with the most basic graph formulation to convey the key concepts and then explain how graph
representations could mature and add new functionality. That is, the initial formulation uses undirected,
homogeneous graphs that are static for each iteration of experimental design, and future work could include heterogeneous graphs, as discussed in Chapter 5.6.
Let each node in the set, V , of graph G be a component or concept, where components represent the
structure defined by the system architecture and concepts encode contextual, non-structural information.
For example, there are nodes for hardware and software components, representing things such as a LiDAR
sensor or a SLAM algorithm, as well as nodes for concepts, such as the environment that experiments will
be conducted in. To manage computational complexity, we define single nodes for entire algorithms when
representing software components, rather than the literal implementation where an algorithm could have
any number of nodes. Continuing with the notation from the previous subchapter, every node has an
associated feature vector xv ∈ R
d
, where d = 1 such that a scalar is assigned to each node to indicate its
feature. The scalar encoding is shown in Figure 5.4, where the “hardware", “perception", “estimation", and
“planning and control" nodes are structural and the “mission type" and “environment" nodes are contextdependent, non-structural nodes.
Figure 5.4: One possible representation of an experimental design for a waypoint mission in a forest with
the ARL Ground Autonomy Software Stack using a Graph Neural Network.
102
Let the set of edges, E, in the graph indicate meaningful relationships between nodes, such as inputoutput or influential relationships. We assume initially that these edges are undirected and have no feature
attributes. Future graph formulations will make use of directed edges to explicitly encode the direction of
dependency and data flow, and could introduce per-edge attributes to encode more complex interactions.
In the case of connected structural nodes, edges indicate that one component produces some data that
is the input for another component, e.g., point clouds from a LiDAR sensor are provided to the SLAM
component for localization and mapping. In the case of non-structural nodes, edges indicate that some
information or relationship has direct impact on the connected node, e.g., a node representing a forest
environment will influence the performance of the SLAM component due to vegetation, natural obstacles,
canopies, and occlusions. These non-structural nodes are critical in the construction of the graph because
they provide a novel way to encode context beyond what is captured in the structural nodes of the system
architecture. Note, these graphs are considered to be small, i.e., |V | + |E| < 1000, such that the adjacency
matrices can be stored and processed on a modern computer.
Graph classification is achieved by training the GNN using supervised learning. A dataset, D is a collection of graphs Gi
, i = 1, . . . , |D|, where each graph is assigned a label L(Gi) that indicates the decision
quality of the experimental design. For this work, alert triggering conditions are encoded using the expert experimenter’s evaluation from the study in the previous chapter. This is chosen to illustrate how an
expert might codify their assessment to imbue GNN-based alert generation and produce similar experimental design critique capabilities. The set of labels are {0, 1, 2, 3} and correspond to “mistake-hardware",
“suboptimal-missing", “suboptimal-unnecessary", and “good", respectively. Label 0 for “mistake-hardware"
specifically means that a component lacks the necessary hardware input. Labels 1 and 2 for “suboptimalmissing" and “suboptimal-unnecessary" are given to graphs that are either missing software components
that would be useful in the experiment-specific environment and components that are extraneous for the
103
experiment-specific environment, respectively. A GNN is constructed using the “GraphConv" neural network operator offered in PyTorch∗
. The GNN is comprised of three convolutional layers with rectified
linear units, referred to as relu. A readout layer using global mean pooling is used for computing batchwise graph-level outputs that average node features and builds a global perspective of the graph structure
in a compact representation. Finally, the activation layer is implemented using a dropout (p = 0.5) and linear transformation to classify outputs. This activation layer effectively provides regularization and seeks
to prevent overfitting. The resulting trained GNN accepts graphs as inputs and predicts the decision quality as output. As shown in Figure 5.5, an alert is issued if the inferred decision quality is either a mistake
or suboptimal decision; otherwise, the experiment is conducted as the human designed.
Figure 5.5: A combined diagram showing the GNN structure and the subsequent use of inferred decision
quality. If a mistake or suboptimal decision is inferred by the model then an alert is issued.
To train the GNN to perform graph classification, we must build the dataset D to include labeled examples of experimental designs. We achieve this by translating all 12 of the experimental designs from
∗
https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GraphConv.
html
104
the study in the previous chapter (6 from the control group and 6 from the assisted group) to graphs in
the form of adjacency matrices. Each graph is then assigned an appropriate label based on the decision
quality. We further expand the size of the dataset by using synthetic data. Depending on the level of expertise a system designer or experimenter has with a specific system in an environment, they can construct
and label graphs for hypothetical experimental designs. Examples of experimental designs containing
mistakes are typically easy and cheap to construct because these oftentimes correspond to architectural
design flaws. One could imagine automating this process by starting with a “good" experimental design
and then arbitrarily removing edges between structural nodes to break input-output connections of hardware and software nodes. More interestingly and subsequently challenging, is the process of constructing
suboptimal experimental designs. In some instances, suboptimal experimental designs have intuitive characteristics that are well understood by humans, e.g., road detection is categorically unnecessary in a dense
forest environment, which allows it to be constructed and added to the dataset easily. Sometimes suboptimal experimental designs are more nebulous, e.g., which nodes should be connected to non-structural
nodes with “environment" features, and so it will likely require experiential insights and lessons learned
by conducting experiments. In any case, this process provides a way to encode expert knowledge similar
to that of an exper system, and the dataset can continuously be updated as any experimenter, novice or
expert, conducts more experiments.
To train the GNN for the alert-generation framework in our Stage 2 DSS we constructed a dataset
of 26 graphs. In addition to the 12 graphs from the previous study, 14 synthetic graphs were generated,
where 5 represented experimental designs with mistakes, 5 were designed with missing components useful
for the forest environment, and 5 were designed to include the unnecessary road detection software. The
synthetic data was specifically crafted to assist with class balance across the training dataset so that in total
there were 7 examples each of “mistake-hardware", “suboptimal-missing", and “suboptimal-unnecessary"
105
labels, and 5 examples of “good" experimental designs. Representative examples of graphs in this dataset
are shown in Figures 5.6 - 5.10.
Figure 5.6: A representative example of a graph representation included in the training dataset to build
the GNN-based alert generation framework. This experimental design corresponds to Participant 2 in the
control group of Study 1 and contains a mistake because the participant forgot to include the necessary
camera hardware input after they added trail detection software.
106
Figure 5.7: A representative example of a graph representation included in the training dataset to build
the GNN-based alert generation framework. This experimental design corresponds to Participant 5 in the
assisted group of Study 1 and contains a suboptimal decision by not including software components that
would be useful in the forest environment, such as trail detection, terrain classification, or height mapping.
107
Figure 5.8: A representative example of a graph representation included in the training dataset to build
the GNN-based alert generation framework. This experimental design corresponds to Participant 5 in the
control group of Study 1 and contains suboptimal decision by including road detection software, which is
deemed unnecessary in the forest environment.
108
Figure 5.9: A representative example of a graph representation included in the training dataset to build
the GNN-based alert generation framework. This experimental design corresponds to Participant 6 in the
assisted group of Study 1 and contains no mistakes or suboptimal deicisions so it is labeled as “good".
109
Figure 5.10: A representative example of a graph representation included in the training dataset to build
the GNN-based alert generation framework. This experimental design is artificially created by an expert
experimenter and represents how synthetic data can be incorporated into the dataset using existing knowledge of the system.
Using the dataset of 26 graphs, the GNN was trained using 5-fold cross validation, where 21 graphs
were used for training and 5 graphs were used for validation across 5 rolling iterations of the dataset.
Training in each fold was performed over 50 epochs along with the Adam optimizer with learning rate of
0.001 and batch size of 4. Cross entropy is selected as the loss function to measure classification performance of the model as a probability and facilitates interpretation for issuing alerts. As shown in Table 5.1,
the GNN achieved an average training loss of 0.96 and average validation loss of 0.86.
With a trained GNN, our Stage 2 DSS can accept candidate experimental designs and infer with some
probability the associated decision quality. In a real-world deployment of the GNN-equipped DSS, the
110
Table 5.1: Training and validation results from 5-fold cross validation.
Accuracy Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Average Std Dev.
Training 0.97 0.95 0.95 0.99 0.95 0.96 0.02
Validation 0.80 0.79 0.98 0.80 0.94 0.86 0.09
GNN would predict if an experimental design has either a mistake or suboptimal decision and then issue
a corresponding alert for the experimenter to consider. In this work we are interested in both the feasibility of training GNNs as well as the impact of alert generation on decision making. Therefore, we issue
alerts for all participants that trigger the conditions corresponding to the GNN labels (“mistake-hardware",
“suboptimal-missing", and “suboptimal-unnecessary") and predefine the text of each alert so there is a oneto-one mapping between the GNN inference and a specific alert, as shown in Figure 5.12. It’s important to
note that while a traditional model checker in the systems engineering literature could certainly achieve
the same functionality for proactively detecting mistakes in architectural configurations; the GNN-based
alert generation framework offers additional functionality not offered by model checking, namely reasoning about suboptimal decisions in the context of a specific experimental environment.
5.3 Exploratory User Study on Design Monitoring
In this chapter, we describe a second exploratory human subjects study that was conducted to evaluate
the efficacy and impact of the proposed Stage 2 DSS. Here, the proposed Stage 2 DSS adds an alert generation framework to assist participants beyond the Stage 1, questionnaire-based DSS that was studied in
Chapter 4.3.
5.3.1 Setup
The setup of the human subjects study in this chapter is similar to the study described in Chapter 4.3.1.
Again, participants completed an online survey to construct experiments in a forested environment and
their experimental designs were evaluated by an expert experimenter to determine a quality rating of good,
111
suboptimal, or mistake. Participants provided demographic information pertaining to their occupation
and years of field robotic systems, and participants with less than at least one year of experience were
excluded. The participants in this study also received the same background information, including images
of the real autonomous ground robot in Figure 4.1b, the simulated version of the robot, and a video of an
exemplary waypoint mission, shown in Figure 4.4. The participants also received the same set of hardware
components, software components, algorithmic parameters, baseline configuration, and mechanism to
construct experiments, i.e., selecting and de-selecting boxes to include or exclude components as well as
sliders to change parameter values.
However, there are a few noteworthy differences between this study and the one presented in Chapter 4.3.1. First, participants were provided an example of the baseline configuration running in the testing
environment as a form of an initial experiment. This was provided to the participants in the form of an
exemplary video of the simulated robot in the forest and a table of quantitative results, shown in Figure 5.11. The video showed the ground robot successfully completing the waypoint mission, but veering
off the trail throughout the mission. As reported in the empirical results table of the survey, the robot
experienced four collisions with trees and logs and required five interventions from the simulated human
operator. This material gave the participants an opportunity to make context-dependent qualitative and
quantitative assessments before constructing their experimental design so that they were more familiar
with the autonomy under test and environmental challenges specific to the forest. This also resembles
real-world experimentation strategies, where an experimenter oftentimes uses some known baseline configuration as a starting point and selects subsequent experiments based on initial findings.
A second important difference in this study was the introduction of the Stage 2 DSS. In addition to
providing the assisted group with the Stage 1 questionnaire before the participants constructed their experiment, we also evaluated their experimental design and provided corresponding alerts before the experiment was finalized and virtually conducted. Although the trained GNN in Chapter 5.2.2 achieved training
112
Figure 5.11: A screenshot from the video of the simulated robot using the baseline configuration in the
forest environment, which was provided to the participants as an initial experiment.
and validation accuracy of 0.96 and 0.86, respectively, the primary goal of this user study was to evaluate
the impact of alerts on decision making. As a result, experimental designs were manually evaluated so
that all instances of mistaken and suboptimal decisions were detected. Alerts that could feasibly be generated by the trained GNN were then provided corresponding to the classifications “mistake-hardware",
“suboptimal-missing", “suboptimal-unnecessary", and “good". In the case of a mistake, corresponding to
an experimental design where a software component does not have the necessary sensory input, the alert
shown in Figure 5.12a was issued. If an experimental design was designated the label “suboptimal-missing",
then the experimenter did not include off-road capabilities in their experimental design and the alert shown
in Figure 5.12b was issued. Finally, in the case of a “suboptimal-unnecessary" inference, the DSS provided
the alert shown in Figure 5.12c to indicate that there is extraneous software components in the participant’s
design. If an alert was generated, the participants were given the option to either revise their experimental
113
design before making their final selection, or ignore the alert and proceed with their original experimental
design.
A third distinction with this study is the introduction of feedback regarding the participants’ experimental design, which directly addresses the lack of decision feedback limitation of the first study identified
in Chapter 4.5. This feedback is important because it provides more opportunity to understand what participants are considering during experimental design, to inform future DSS development, as well as areas
where decision support may be useful. Specifically in this study, the participants were shown one of three
videos depending on the quality of their experimental design. If an experimental design contained a mistake then the participant was shown a video where the robot doesn’t move from the starting location and
they were notified that the robot did not complete the waypoint, which simulates an ill-configured robotic
system. A screenshot of the video and empirical results table is presented in Figure 5.13. If an experimental
design was suboptimal, by excluding software components useful specifically useful for the forest environment or by including unnecessary components such as road detection, then the participant was shown
a video where the robot performed slightly better than what they saw in the baseline configuration video,
as seen in Figure 5.14a. The robot remained on the trail somewhat more frequently leading to a slight
reduction in interventions and collisions to four and two, respectively. Finally, if a participant created a
good experimental design, void of mistakes and suboptimal decisions, they were provided the third video,
which corresponds to the best empirical results. As shown in Figure 5.14b, the simulated robot achieved a
significant reduction in mission duration, interventions, and collisions as a result of remaining on the trail
and operating at higher speeds over the course of the mission.
114
(a) Alert for a mistake due to exclusion of system-critical components
(b) Alert for a suboptimal decision related to exclusion of forest-relevant components
(c) Alert for a suboptimal decision related to inclusion of forest-irrelevant components
Figure 5.12: Alerts provided to the participants whose experimental designs included either mistakes or
suboptimal design decisions in the context of testing autonomous ground robots in the forest environment.
115
Figure 5.13: A screenshot from the video of the simulated robot after the participant designed an experiment with a mistake. The top-left shows a third-person view of the simulated robot in the forest, the
top-right shows visualization of the robot’s mapping and planning during the autonomous waypoint navigation, and the bottom table provides quantitative results. In this case, the mistake prevented the robot
from completing the mission.
116
(a) Minor improvement over baseline due to suboptimal experimental design
(b) Significant improvement over baseline due to a good experimental design
Figure 5.14: Screenshots from the two possible videos that participants received as feedback if their experimental designs did not contain a mistake.
117
After participants watched the video and reviewed the empirical results, they were asked to conduct
post-experiment analysis by responding to the free-response questions pertaining to what positive and
negative outcomes they observed. As shown in the survey screenshot in Figure 5.15, the participants
described their qualitative assessment based on what they saw in the video as well as a quantitative assessment based on the empirical results of their experimental design with respect to the baseline configuration. All three videos and empirical results tables internationally included changes in navigation quality
and metrics from that of the baseline configuration so all participants were guaranteed to see noteworthy
differences corresponding to their experimental design decisions.
Figure 5.15: Participants were asked to conduct post-experiment analysis by stating their observations
related to positive and negative outcomes after receiving feedback from their experimental designs.
As part of the post-experimental analysis, the online study also included questions, as presented in
Figure 5.16, about follow-on actions with respect to whether the participant felt another experiment was
warranted. By design, even in the case of the good experimental design, the participants were shown
118
empirical results with non-zero interventions and collisions to create an opportunity where the participant had to decide whether a subsequent experiment would be valuable. If the participant’s first, good
experimental design led to the globally optimal results, which rarely occurs in field robotic experimentation [88], then they would likely not consider conducting another experiment. The participants had to
weigh the tradeoff between constructing another experiment versus the potential information gain and
performance improvement. If the participants expressed desire to conduct another experiment, they were
asked to describe in words what changes they would make for their next experiment. A free-response
question was used here to provide the participants with the maximum amount of flexibility in formulating
another experiment in case they knew what improvements they’d like to precipitate but not sure how to
create the necessary changes using the provided options. Collectively, the addition of this set of postexperiment questions to the online study effectively forms one complete cycle of the adaptive sequential
decision making process where participants construct an experimental design, simulate the conducting of
an experiment, make observations, analyze results, create new hypotheses, and consider another iteration
for a subsequent experiment.
Finally, participants conducted an exit survey as a way to capture their opinions about the Stage 2 DSS.
The exit survey contained all of the questions that were used in the study described in Chapter 4 as well
as one additional Likert-scale question (“The alerts were useful. If you did not receive any alerts, please
select N/A"), one integer-response question (“How many alerts did you receive during your experimental
design") to confirm understanding, and one free-response question (“Please design any alerts or additional
assistance you wish you had during experimental design") for suggestions on future design support.
5.3.2 Participants
For this study, researchers from the academic community were invited to participate via email. As with
the study in Chapter 4, and using the same approval from the USC Institutional Review Board, candidates
119
Figure 5.16: Participants were asked whether they would conduct another experiment and what changes
they would make as a way to emulate the tradeoff analysis considerations that are required of experimenters of real-world systems.
with relevant experience in experimental ground robotics were identified using publicly available information from robotics conferences, journals, and articles, e.g., IEEE International Conference on Robotics
and Automation, IEEE International Conference on Intelligent Robots and Systems, IEEE Robotics and Automation Letters, the International Journal of Robotics Research, and the Journal of Field Robotics. The
minimum qualifying experience to participate in this study was persons with familiarity of the key components of a functional mobile robot and the associated challenges of operating in the real world so that
an intentional decision could be made when constructing experiments.
The goal of this study is to demonstrate the existence of mistakes, suboptimal decisions, and alert
conditions so that the impact of intervening with alerts can be measured. Therefore, the population size
for this study was selected to be N(2) = 20 because it provides at least 95% confidence in detecting
these events. Let pm, ps, and pa be the probability of a mistake, suboptimal decision, or alert condition
120
occurring, respectively. qi = 1−pi
is equal to the probability an event does not occur and we assume these
events are independent and identically distributed (i.e., iid). For sample size N(2) = 20 the probability
that there is no occurrence of a given event in the study is 1
q
20
i
. For estimating values of pi
, we look to
empirical results from the user studies. In the previous user study that is described in Chapter 4.4 and
consisted of N(2) = 12 participants, the probability of a mistake was 0.1¯6 and Chapter 5.3.4 will show
that the probability of a mistake in this user study consisting of 20 participants is 0.15. As a result, if the
probability of a mistake is expected to be in the range [0.15, 0.1¯6] then the probability that the study with
sample size N(2) = 20 will not include this event is [2.60 × 10−2
, 3.90 × 10−2
]; this corresponds to at
least 96.10% confidence in the existence of a mistake. The probability of a suboptimal decision in the user
study presented in the previous chapter and this chapter is 0.50 and 0.25, respectively. If the probability
of a suboptimal decision is expected to be within the range [0.25, 0.50] then the probability it does not
occur is [9.54 × 10−7
, 3.17 × 10−3
]; this corresponds to at least 99.68% confidence in the existence of a
suboptimal decision. Likewise, the 5-fold cross validation results presented in Chapter 5.2.2 show that the
trained GNN model achieves a validation accuracy of 0.86 when inferring decision quality, which is used
as an estimate for the performance of detecting alert conditions. The probability an alert is not detected
in a study of 20 participants is 8.38 × 10−18; this corresponds to more than 99.99% confidence in the
existence of alert condition triggering, assuming the presence of suboptimal decisions.
Since this is a continuation of the first study from Chapter 4, the entirety of the new population is
assigned to the assisted group using the Stage 2 DSS, i.e., N
(2)
a = 20 and compared to the control group
from the previous study. Based on self-reported demographic information from the participants, the population of this study consisted of nine people with 1 − 5 years of field robotics experience, five people
with 5 − 10 years of experience, three people with 10 − 15 years of experience, and three people with
more than 15 years of experience. One person reported having a Bachelor’s degree, ten participants had
master’s degrees, and the remaining nine participants had doctoral degrees.
121
5.3.3 Evaluation
Borrowing from the evaluation procedure of the first study, experimental designs constructed by the participants were evaluated by an expert experimenter who determined if there were any mistakes or suboptimal
decisions, i.e., objectively or subjectively flawed experimental designs, respectively. Since participants in
this study were given the opportunity to revise their experimental design if an alert was generated and
the participant opted to make a change, the decision making quality of some participants were evaluated
twice. That is, the original experimental design for all participants were evaluated and any additional,
revised experimental designs were also evaluated using the same criteria.
5.3.4 Results
As seen in Figure 5.17, half the participants chose to plan their experimental design by answering the DSS
questionnaire silently. Seven of the ten participants chose to type their responses to at least six of the
nine questions. Interestingly, some participants chose to plan their experimental designs using a hybrid of
typing and answering silently; in this case, these participants typed their response to at least one, but no
more than five questions in the questionnaire.
Figure 5.17: The distribution of participants that chose to respond to the questionnaire-based planning
stage silently, by typing, or using a mixture of both.
122
The evaluation of the participants’s original experimental designs is shown in Figure 5.18a and a summary of the corresponding alert generation, subsequent decisions, and decision quality for their final experimental designs is shown in Figure 5.18b. In their initial design, 12 participants constructed “good"
experiments, five participants constructed “suboptimal" experiments, and three had mistakes. The three
participants that introduced mistakes into their experimental designs did not configure a functional robot
with the necessary inputs. More specifically, two of these participants forgot to include a GPS sensor for the
GPS SLAM component, and one participant forgot to add a camera that is needed for imagery-based classification algorithms. Of the five participants that made suboptimal decisions, two had multiple instances
where their designs could be improved. The most common suboptimal decision from this population was
omitting off-road capabilities that would be useful specifically for navigation in the forest. For the eight
participants that designed experiments with either mistakes or suboptimal decisions, seven alerts were
generated. Six participants chose to revise their experimental designs such that the population achieved
20 experimental designs with only one mistake and four suboptimal decisions.
The results of the opinion-based exit survey, shown in Figure 5.19 provide positive feedback for the
use of the DSS, which is consistent with the results found in Chapter 4.3.4. First, the participants in this
study agreed that the planning stage using the DSS was useful (µ = 3.05, σ = 0.51), the planning stage
is generally viewed as necessary (µ = 3.15, σ = 0.75), and disagreed with the assertion the time required
to answer the questionnaire is burdensome (µ = 1.60, σ = 0.99). Recall, the coding of the Likert scale in
this case is 0 corresponds to strongly disagree and 4 corresponds to strongly agree. It’s also worth noting
that the population in this study was slightly in favor of responses to the DSS questionnaire being typed
or written (µ = 2.35, σ = 0.93). Finally, six out of seven participants that received alerts also selected
“Agree" in the exit survey when asked their opinion on the assertion that “The alerts were useful."
Of the population in Study 2, 16 of the 20 participants desired conducting another experiment after
observing the results of their experimental design. The participants were asked to describe in words what
123
(a) Original experimental designs
(b) Selected experimental designs after alert generation
Figure 5.18: The results of Study 2 including (a) the original experimental designs and (b) the final experimental designs after some participants were alerted based on the inference of the Stage 2 DSS. The greenhighlighted rows in (b) indicate improvements of experimental design quality after the experimenter made
being notified by an alert and revising their experiment.
124
(a) The planning stage was useful for designing experiments effectively.
(b) The planning stage was necessary for designing experiments effectively.
(c) The time required to answer the planning questions
was burdensome.
(d) The planning questions must be typed or written to
be effective.
Figure 5.19: The participants’ opinion-based responses to the exit survey in Study 2.
subsequent experiment they would conduct and these were grouped into experiment variable (i.e., sensor, algorithm, parameter, and multiple) as well as targeted system capability (i.e., hardware, perception,
planning, control, and a combination of the former). From the free responses, we note that the population
identified trail detection and following as the primary objective for subsequent experiments and their desired changes for future experimental designs reflected this direction. As seen in Figure 5.20, participants
that indicated a desire to conduct another experiment were almost evenly split on changing algorithms
and parameters. In terms of desired capability change, participants were mostly focused on planning in an
effort to illicit more effective trail following.
The exit survey provided free-response questions so that the participants were given an opportunity
to provide feedback for revising the Stage 1 questionnaire functionality and suggest new capabilities for
the Stage 2 alert generation. 16 of the 20 participants did not feel there were any unnecessary questions
125
(a) Which variable of the experimental configuration to change
(b) Which capability of the robotic system is targeted
Figure 5.20: Categorizations of the participants’ next experimental designs. In this case “N/A" indicates
that the participants did not want to conduct another experiment.
in the questionnaire. However, four participants believed the questionnaire could be shortened. Three of
these participants did not recommend any specific questions to omit while one participant suggested the
omission of Pre-Experiment Questions 5, 7, and 9.
In terms of new questions that should be added to the questionnaire, there was no consensus from
the participants that would warrant additions. Perhaps most interestingly, there was a recommendation
to reformulate the consideration of risk when assisting the experimenter with experimental design. One
participant suggested that, instead of prompting the experimenter for their estimate of risk on a scale, the
system should perform risk analysis to identify novel or additional risks that the experimenter could work
to mitigate. This suggestion directly aligns with the goals of alert generation-based Stage 2 DSSs, rather
than the questionnaire-based Stage 1 DSS, and could be a fruitful avenue of future work.
126
Finally, the participants were asked the intentionally open-ended question of what additional assistance they think would be beneficial for experimental design. Participants identified various forms of
proactive support that would provide new information for the experimenter. Examples of this include: 1)
integration insights, i.e., hardware requirements, budget constraints, and performance benefits for each
software functionality under consideration; 2) alerts if the predicted experimental result would lead to
worse performance than the baseline or previous experiment; and 3) causality analysis, i.e., information
about what caused interventions and the nature of the intervention.
5.4 Discussion
First, we see that the results in this study corroborate the results in the study presented in Chapter 4.3.
In the first study 0.1¯6 of the population constructed an experimental design with a mistake, 0.33¯ made
suboptimal decisions, and 0.50 designed “good" experiments. Here, in their original experimental design,
0.15 of the population made a mistake, 0.25 designed suboptimal experiments, and 0.60 of the population
proposed “good" experiments. A participant further validated the contribution of the DSS by stating in their
free response when talking about their preferred method of answering questions, “Silently. I knew what I
was doing and I did not need to formally communicate it. That said, having looked at the questions, nudged
and seeded my mind to be focused on what can be done to improve performance." We also demonstrated the
flexibility of the DSS because half the participants were able to answer the entire questionnaire silently,
while the other half used their preferred method of decision support, which was to type responses to
at least some of the questions in the questionnaire. Some participants felt comfortable answering the
Pre-Experiment Questionnaire silently based on their previous work with similar robots. While other
participants felt that typing their responses was important, citing the importance of committing to an
objective, specifying the criteria before performing the test so objectives don’t change mid-execution, and
indicating what metrics would be important. Interestingly, one participant noted that they answered the
127
DSS questions silently for this study because they’re alone, but if they were doing this in a lab or with a
team of experimenters they would prefer to write their responses on a white board and suggested formal
documentation of responses is preferred in real-life experiments.
The introduction of the Stage 2 DSS resulted in greater benefit over the Stage 1 DSS. The number of
mistakes present in selected experimental designs dropped by 66.¯6%, from three to one, after participants
received alerts and adjusted their originally-bad experimental design to a good experimental design. The
number of suboptimal decisions also reduced by 37.5%, from eight to five. In this case, there were two
instances of upgrading a suboptimal experimental design to a good one, and one of those instances included
modifications to address multiple suboptimal aspects.
While the results of this study are promising, there are still some noteworthy shortcomings of the current system. There were two instances where the participant was alerted and they modified their experimental design to no effect in terms of decision quality, i.e., the mistakes and suboptimal decisions were not
addressed. This is because the alert provides some general guidance as to what may be an issue, but there
is no specific identification, guidance, or proposed resolution. It’s worth noting that our current DSS was
only evaluating the first experimental design proposed by the participant. Future instantiations of DSSs
should continuously check all proposed experimental designs, i.e., the revised designs after alert-initiated
modifications. This is an important implementation detail because continuously checking experimental
design proposals could help mitigate the lack explicit problem identification in the generic alert-based system. In other words, an experimenter might make changes after receiving an alert and believe they have
addressed the problem, but an additional alert after evaluating their new design could reveal that problems
still persist.
The proposed DSS also did not detect the suboptimal scenario where a participant includes an extraneous sensor, such as when a participant includes a GPS sensor, but excludes the GPS localization software.
128
This condition is not sufficiently represented in the training data, no alert was generated, and the participant was not given the opportunity to address this suboptimal decision. This effectively captures a broader
consideration with this GNN-based DSS, which is that all alert conditions are predicated on having sufficient representation and diversity in the training dataset.
Another important aspect of DSS design and evaluation is the freedom of choice that the experimenter,
and in the case of this study the participant, has in experimental design. The DSS is intentionally designed
to only provide decision support and the human has total authority to make whatever decision they want.
The results in Figure 5.18b show that there was one instance where a participant was alerted of a suboptimal
decision and that participant (ID 14) chose not to modify their experimental design leading to a suboptimal
decision. While the DSS should likely never be designed to override the human’s decision, this example
indicates that there could potentially be an opportunity for DSSs to provide more proactive and descriptive
insight to illustrate the impact of the alerted issue. One can imagine situations where an experimenter
may be more encouraged to take action following an alert if they better understood the consequences of
ignoring the alert. It’s also worth noting that this was the only participant that received alerts and did
not agree with the Likert-scale question in the exit survey regarding the usefulness of alerts. The notion
of proactive decision support was further emphasized by a participant’s free response when they noted
that the system should assist with risk assessment rather than asking an experiment for an evaluation on
a 1 − 10 scale, as currently implemented. With a better understanding of the implications and associated
risk to either the human, robot, or experimental outcomes, an experimenter will likely be able to make
better decisions when designing experiments.
The exit survey results shown in Figure 5.19 provide additional support to the findings in the first
study that the DSS is considered to be both useful and not overly burdensome by users. This is a welcomed finding because if experimenters do not value the support offered by a DSS, or find that the cost of
using it outweighs the benefit, then they are less likely to make use of the DSS. While the population from
129
both studies generally did not feel the DSS was overly burdensome, the results offer insight to potential
opportunities for further minimizing cost and maximizing contributed support. One participant in this
study identified three questions in the Pre-Experiment Questionnaire that they thought should be omitted
and another three participants suggested the set of questions could be reduced without providing specific
recommendations. In response, it could be worth investigating an adaptive questionnaire that adds or subtracts questions on a per-experiment and per-experimenter basis. The questionnaire could be customized
so that a more experienced, knowledgeable, or efficient experimenter is asked fewer questions so that their
time commitment, and in turn burden, is less while they still receive some benefit of structured experimental design and thought analysis. Since the majority of both populations from the two studies did not
recommend removing questions, we still feel that the set of questions should be included but future work
could investigate how to choose the minimum set of these questions without loss of decision support or
quality.
5.5 Study Limitations
Simplified state space exploration and idealized decision making conditions: The value of using an online
forms, such as the studies in this dissertation, is that more participants can be recruited and participants
require shorter commitments compared to running lives experiments due to greater accessibility and minimized training times. However, the tradeoff is that online studies created generically for any field roboticist
with experience in ground robotics will inherently have limitations related to simplified state spaces and
idealized decision making conditions. Experimenters of deployable autonomous ground robots, such as
those using the ARL Ground Autonomy Software Stack, will have significantly more complex decisions
since there are two to three orders of magnitude greater number of components and parameters, collectively; furthermore, these decisions will be made under harsher conditions due to weather, fatigue, and the
130
natural environment. We, again, expect more frequent mistakes and suboptimal decisions in real-world
experimentation compared to what is observed in this study.
In a similar vein, the experimental results were provided to participants immediately and there were
no constraints enforced on the participant’s decision making. There was no delay between the participant
constructing an experimental design and receiving the video of simulated results so both their decisions
and observations were both top of mind. The participants were not given any information about the
computational capabilities of the robotic system and were free to select as many hardware and software
components as they’d like. This included the possibility of duplicate algorithms, such as multiple path
planners or controllers. Real-world experimentation requires considerable time and effort to prepare the
system given some physical limitations, such as the available CPU and GPU resources, conduct the experiment, and collect and analyze the results - all of which complicates decision making.
Lack of sequential design decision making: The study in this chapter included more of the decision making process than the study in Chapter 4.3, namely participants received simulated results corresponding
to the quality of their experimental design decisions. Participants were also asked to describe their experimental observations in words and then given the opportunity to describe in words their next experiment,
if they wanted to conduct one. However, future studies should include multiple rounds of experimental
design to more thoroughly characterize the quality and challenges associated with decision making as well
as the evolution of the experimenter’s understanding. Such a study may require greater training for the
participants because we already observe in this study participants that have no hands-on experience with a
mobile robot in a specific natural environment oftentimes have a general idea of what behavioral changes
they wish to produce, but are unsure what system components to change to achieve the desired effect.
131
5.6 Summary
This chapter presented a novel Stage 2 DSS that adds a GNN-based alert generation framework on top of
the existing Stage 1 questionnaire functionality. By issuing alerts to experimenters based on the inferred
decision quality, the DSS demonstrated the ability to reduce the number of experimental designs containing
mistakes and suboptimal decisions in a user study with 20 participants. The participants in this study
corroborated the findings from the user study in the previous chapter, giving evidence that users feel the
proposed DSS is both useful for experimental design planning and not burdensome. The key to improving
experimental designs more effectively than Stage 1 DSSs is the introduction of proactive decision support
where potential concerns are raised to the experimenter before they make their final selection for which
experiment to conduct.
While there are several promising findings offered in this chapter, there is still great opportunity to
mature GNN-based Stage 2 DSSs to provide even more effective decision support. First, the GNN used
in this chapter operated on homogeneous graphs and therefore was limited in its ability to represent
complex robotic systems and experimental designs. Instead, future work should consider the use of Heterogeneous GNNs (HetGNNs) [260] for more expressive representations. A HetGNN is a graph neural
network that contains multiple types of nodes and edges. For example, we could instead define the graph
as G = (V, E, OV , RE) where OV is the set node-specific attributes and RE as the set of edge-specific
attributes. Node-specific attributes open the door for incorporating parameters that were otherwise not
considered in our GNN-based formulation. Likewise, edge-specific attributes could correspond to semantic representations of data and information flow that are more human interpretable. Consider the example
where “hardware" nodes produce data for “perception" and “estimation" nodes, which produce maps for
“planning and control" nodes that produce plans and execute actions in environments defined in “environment" nodes where the mission defined by the “mission" node is executed. An example of this concept of
using HetGNN representations in experimental design is provided in Figure 5.21.
132
Figure 5.21: A representation of the ARL Ground Autonomy Software Stack shown in Figure 1.5 using a
Heterogeneous Graph Neural Network.
In addition to graph classification, GNNs and HetGNNs could also be used for graph regression and
offer unique forms of effective decision support, specifically static analysis. The graph classification application in this chapter led to the inference experimental design decision quality. Instead, graph regression
might enable the inference of estimated numeric system performance without physically conducting the
experiment. For example, if the dataset D were updated to replace qualitative labels with quantitative
values, such as mission duration, average autonomous speed, or average interventions or collisions per
unit distance, then the GNN might be trained to predict empirical results. The anticipated benefit of this
static analysis is that if the GNN prediction is sufficiently accurate then the experimenter could effectively explore the experimental design state space with significantly reduced experimentation costs. Static
analysis would also increase the saliency and timeliness of alert generation because alerts could be issued
when the predicted empirical performance of a candidate experimental design will be worse than either
the baseline configuration or previously-conducted experiments. We hypothesize that a GNN or HetGNN
cannot realistically be trained for an experimental field robot such that real-world experimentation is no
133
longer required altogether. However, given the success of GNNs to learn complex relationships in literature
we anticipate that some notional system performance could be learned that reduces some experimental
costs, even if it is in a narrowly-scoped experimental design application. It’s worth noting that the dataset
for training a GNN or HetGNN to perform graph regression could likely be continually updated as more
experiments are performed, in a similar fashion to the graph classification demonstrated in this chapter.
A third avenue of future work will involve an investigation into pragmatic aspects of GNN-based alert
generation to facilitate use by practitioners. Among these aspects will be addressing the curse of dimensionality, which is a common challenge for GNN applications and complex systems. Research into clever
structural and non-structural node encodings, general representation learning, or discovery of underlying
experimental design structures that help focus graph classification and regression could be fruitful. For
example, in real-world scenarios, it may be possible to automate the process of instantiating and connecting the structural nodes from a system architecture diagram, but translating the complete architecture
diagram may be unnecessarily complicated and drastically increase the computational cost. The creation
of context-dependent, non-structural nodes is not yet codified for experimental robotic systems and so
this will also complicate DSS deployment. How to automate both the proper representation of system
architectural components to GNNs and codifying context-based aspects of experimental design remains
an open question.
134
Chapter 6
Design Recommendation for More Informed Experiment Selection
6.1 The Role of Design Recommendations
The previous chapter demonstrated the value of proactive decision support where, if given the opportunity,
experimenters could improve their experimental designs before conducting an experiment. This opportunity was afforded by the DSS through monitoring the experimenter’s experimental designs and issuing
appropriate alerts. While these alerts had a measurable impact on the quality of decision making, there’s
still room for improvement given that the alerts don’t instruct or assist the experimenter with mitigating
the suspected or inevitable problems. In this vein, recommendation is the next significant step in proactive
decision support as well as the defining feature of Stage 3 DSSs, per the taxonomy presented in Chapter 3.
The key to design recommendation is the construction of actionable experimental design information that
the experimenter would want or need such that they can make equally good, or better, decisions but are
required to perform less work.
While one approach to design recommendation could be to perform root cause analysis on the experimental design that resulted in an alert being generated from the Stage 2 DSS in Chapter 5, there are two
monumental challenges. First, designing a DSS capable of performing automated root cause analysis for an
experimental robot assumes that the system can be modeled with sufficient fidelity; however, we assume
the robotic system and its performance under context-dependent objectives are a black box. Second, even
135
if the root cause could be identified it would still be difficult to codify the resolution in a meaningful way.
For the cases where an experimenter knows exactly how to resolve an issue that warrants an alert, there is
little-to-no value in having the DSS recommend a resolution. For the remaining cases, where the experiment is not sure how to resolve an issue, it’s unclear how or if such complex interactions can be encoded by
a machine given the vast number of external factors that go into context-dependent experimental design.
We seek to bypass these challenges altogether and focus on proactive design recommendation in parallel with an experimenter as opposed to in response to their proposed design. A DSS that proposes some
portion of the experimental design is an easier task, still with significant potential, because it leverages
previous observations that inherently include some information about what the experimenter has deemed
valuable in the context of their objectives and doesn’t necessarily require modeling causal relationships.
In this chapter we explore the challenges of constructing a DSS capable of providing some form of
recommendation. Our goal is to lay the groundwork for a future Stage 3 DSS that could recommend
a partially-defined experiment, i.e., propose a component for a new experiment or a parameter from a
previously-conducted experiment.
6.2 Background
Generally speaking, the goal of experimentation is to learn the performance and limitations of a system
while minimizing the effort to gain this information. The classical approach to system experimentation
in many domains is Design of Experiments (DOE) [158, 123] where an experimenter produces an a-priori
plan to conduct tests, excites the system in certain ways, and tests some hypotheses. For field robotic
experimentation, this approach is limited in many ways. First, field robotic systems are not simple; they
consist of many components, each of which can have many parameters. System components usually
exhibit complex interactions that prohibit a closed-form solution for performance, one-shot optimization,
or combinatorial search over the parameter space. The system will also produce variable performance
136
depending on the fidelity of the experimentation and environment it is tested in. Field robots inherently
operate in the natural world so measures of system performance will be more accurate if experimentation
is conducted in the real-world, compared to simulation, but this introduces stochasticity, spatio-temporal
changes, and the increased threat of hardware and software failure – all of which complicate a-priori test
planning. Similarly, system performance for many applications is context-dependent and therefore there
is likely no single set of system parameters that are optimal under all conditions, but rather a set of system
parameters that produce desirable results under representative conditions. Lastly, tests can be very costly
(e.g., experimenter’s time and energy, expensive equipment, time to implement, setup, and execute) and
this cost introduces an inherent risk to conducting a test (e.g., personal injury, system damage, and wasted
resources). An experimental design with an exhaustive or large number of tests is typically not feasible,
but more importantly the risk of each test must be properly managed; this necessitates human-in-theloop experimental design. Tests in a field robotics experiment should be adaptively selected using insights
gained through out the experimentation process. Rather than attempting to build a complete model of
the system, we seek to balance the tradeoff between system performance and the experimenter’s incurred
cost and risk of conducting tests; however, formal methods for human decision making in the context of
experimental design for field robotics remains an open question.
The problem of experimental design can be viewed from the perspective of active learning, which is
an engaging, instructional method from the field of education [178] and has since been adopted by the
machine learning and robotics communities [208, 232]. Unlike many active learning settings for robotics
[40]–[36], we treat the experimenter as the “learner", rather than the robot, and view conducting an experiment as an engagement or query with an “oracle" who reveals information about the system. The goal
is for an experimenter to find the system configuration that produces the best performance in some setting, rather than a distribution of performance in all settings, thus we find inspiration from sequential and
adaptive Bayesian Optimization (BO) techniques [220]–[82]. An experimenter begins with a system and
137
some belief about system performance, defines some objective and constraints for experimentation, and
chooses an initial system configuration. After each experiment the experimenter receives observations,
and can choose either the system configuration for the subsequent experiment using this information or to
stop conducting tests; an appropriately-selected acquisition function can be used to aid with this decision.
Conceptually, we view this process as learning from human decision making. By allowing an experimenter
to choose the data from which they learn from, in this case the experiment configuration, we assert that
they can achieve both greater system understanding and performance in a reduced number of tests.
Contributions: In this work, we present:
1. a retrospective case study (Section 6.4) of an adaptive experimental design of a real-world, autonomous, off-road navigation system. This case study includes two distinct scenarios: one where
the experimenter had minimal knowledge of the system and one in a different environment while
using prior knowledge from previous experiments. We identify options the experimenter was presented with, their decisions, and the resulting outcomes, which serves to provide illuminating examples of the adaptive, decision making process for experimentation of a complex, field robotics
system.
2. analysis of the decision making process in our case study (Section 6.5) to assess whether adaptive
experimental design can be modeled by active learning-based frameworks. We identify trends in the
case study and draw connections to mathematical models, which we believe provides preliminary
knowledge useful for building a DSS that can systematically guide experimenters in decision making of adaptive, field robotic experimentation. This includes identifying formal representations and
recommendations for key aspects during decision making process.
138
6.3 Stage 3 DSS Development
This subchapter describes design recommendation from the perspective of Active Learning and BO. A
Stage 3 DSS seeks to offer design recommendations with respect to a component for a new experiment or
a parameter of a previously-conducted experiment, which is more computationally feasible because the
state space is intentionally restricted. Active Learning and BO have both been applied to other domains
where guided exploration and exploitation are desired so their applicability is studied here.
6.3.1 Bayesian Optimization
As described in Chapter 6.2, DoE has limited applicability in experimental design of field robots due to the
desire for adaptation to newly revealed information during experimentation. DoE and optimal design are
primarily focused on validity, reliability, and replicability, and while experimental design for field robots
also seeks to achieve these characteristics from experimentation, the ultimate goal is to maximize the human experimenter’s understanding of the autonomy under test. In this sense, conducting a small number
of experiments, even if they aren’t comprehensive, can still be incredibly informative and satisfy the experimenter’s objectives. Furthermore, the desire for adaptation during experimentation directly opposes
the fundamental principles of DoE, which include randomization of tests and test “blocking" where the
order is intentionally controlled for assessment and bias mitigation - both of which may be intractable
depending on the complexity of the robotic system and dimensionality of the experimental design problem. Designing experiments in the classical methods inherently becomes prohibitively costly and timeconsuming, which is mitigated with sequential experimental design provided experiments can be designed
in a sample-efficient, data-driven manner.
One critical aspect to enabling such a experimental design paradigm is to incorporate knowledge and
beliefs about the system to guide future experimental design. By no coincidence, this is a fundamental
principle of BO and effectively answers the question, given some amount of information, what should be
139
the next evaluation for acquiring new information. In this subchapter we provide general background
information as an introduction to the key concepts of BO before drawing connections to experimental
design for field robots. We refer the reader to [68] for more details on BO.
The goal of BO is to perform global optimization over expensive, black-box functions in a limited
budget of queries. This relates to experimental design because the interactions of the robotic system and
the evaluation in noisy, real-world settings form a black-box system. Conducting experiments is expensive
because it requires time and effort to prepare the hardware and software, conduct experiments (in our case
execute waypoint missions), and analyze results, which necessitates a concerted effort to limit the number
of iterations of experimentation.
Borrowing notation from [68], we can represent this in the following way. Let f be a black-box
function that is continuous, expensive to evaluate, void of special structure (e.g., convexity or concavity),
derivative-free (i.e., no first- or second-order information is provided upon evaluating f), and possibly
noisy. Define the feasible set of f, called X , be a simple set that is easy to assess membership. We can
query fwith a point x to obtain an observation y ∼ f(x)+ϵ where ϵ is the noise assumed to be independent
and normally distributed with unknown variance.
The key insight in BO is to use a probabilistic, predictive model to aid in the optimization while balancing the exploration of the state space and exploitation of acquired information. BO achieves this using
a surrogate model and an acquisition function. The surrogate model uses all existing data in a dataset to
approximate the landscape over which we seek to optimize and is cheaper to compute than the black-box
function. The acquisition function, α: X→ R, provides the strategy for selecting the next query point to
evaluate by assigning a scalar to a given point x representing its value or usefulness as a query point.
140
The iterative process is as follows: first the surrogate model is fit using all available data, which corresponds to updating the posterior probability distribution on fusing Bayes’ Theorem. Next, the acquisition
function is used to solve an optimization problem of the form
xt+1 = argmax
x∈X
α(x) (6.1)
to select the next query point. The black-box function is queried to obtain a new observation yi
, which is
added to the dataset and the process is repeated for some limited number of iterations. Importantly, the
posterior probability distribution is used by the acquisiton function to estimate f(x) for values of x that
haven’t been observed and by continually updating the dataset all available information is used to guide
the selection of the next query point.
6.3.2 Experimental Design Mathematical Framework
This subchapter first presents an overview of a mathematical framework for the general problem of adaptive experimental design for field robotics, from which we will draw connections to in our analysis of a
case study. We refer to inputs as the configuration of parameters of a field robotics system, which define
an experiment, and can correspond to software parameters or hardware components. The term parameters
is used broadly to encapsulate everything from single variables to large changes in system logic. Outputs,
in this context, refer to performance observations after executing an experiment using the field robotics
system with some set of inputs.
Our formulation of adaptive experimental design relies on several assumptions: The experimenter has
access to the system inputs and can change the respective values so that tests in an experiment correspond
to changes in the input space. The experimenter can build some reasonable belief model on system performance which corresponds to how inputs affect outputs. This could be informed by design principles,
expert opinion, simulation of varying fidelity, or some initial experimentation with the system under some
141
conditions or environmental settings. In this setting, observations of system performance will have some
amount of noise, which reflects the real-world constraint that field robotics do not typically exhibit deterministic behaviors. There’s a highly complex relationship between system inputs and outputs such that
system performance has no closed-form solution and can be treated as a black box. We treat the system
as non-chaotic, i.e., performance is expected to be continuous within a local region of the input space, and
changes in inputs do not lead to inexplicable outputs.
We cast adaptive experimental design as a problem of finding inputs that produce the (locally) best
performance while satisfying some set of constraints. Adapting the notation from [68], we mathematically
formulate this as
argmin
x∈X
f(x) (6.2)
subject to gi(x) ≥ 0, i = 1, . . . , I (6.3)
where x ∈ R
d
is the configuration of system parameters, X is the set of all inputs, g represents a set of
I constraints that define the feasible set of system configurations, and f is a derivative-free, expensive to
evaluate function that lacks known special structure. This formulation assumes the objective function is
defined with respect to cost and therefore is minimized, but an equivalent formulation can be derived for
applications with reward-based objective functions. Importantly, an observation of the objective function,
f(x), is revealed sequentially, assumed to be noisy, and may warrant conducting additional tests using the
same system configuration to reduce uncertainty of an observation. g could include both quantitative and
qualitative constraints on the system that are defined or desired by the experimenter.
Concepts from BO could be leveraged for addressing adaptive experimental design. As is typical in
many BO applications, a Gaussian Process (GP) can be used as the surrogate model for representing the
experimenter’s belief about the experimental design space. GPs are a collection of random variables for
142
which a finite number have a joint Gaussian distribution [182]. Borrowing notation from [182], a GP is
defined as
f(x) ∼ GP(m(x), k(x, x
′
)) (6.4)
where m(x) is a mean function
m(x) = E[f(x)] (6.5)
and k(x, x
′
) is a covariance function
k(x, x
′
) = E[(f(x) − m(x))(f(x
′
) − m(x
′
))] (6.6)
The covariance function, also referred to as kernel, is used to compute the covariance matrix and enables
the computation of the dot product of two vectors in a given feature space, oftentimes referred to as the
“generalized dot product". The kernel can be thought of as a similarity function because two samples that
are closer in the input space will have a positive correlation as a way to indicate similarity compared to
that of samples that are far apart. As a result, the selection of the kernel plays a pivotal role in the GP
model.
Additionally, an acquisition function can be constructed to identify the most promising subsequent
experiment configuration by using previous observations and estimating the value of conducting a new
test. Literature has already investigated a number of acquisition functions with different properties and
under different settings. This includes functions that suggest subsequent experiments where improvement
is most likely (i.e., Probability of Improvement referred to as PI), the expected magnitude of improvement
is greatest (i.e., Expected Improvement referred to as EI), the expected magnitude of improvement of the
posterior is greatest (i.e., Knowledge Gradient referred to as KG), the largest decrease in entropy or uncertainty (i.e., Entropy Search referred to as ES and Predictive Entropy Search referred to as PES), and the
143
minimization of regret (i.e., GP Upper Confidence Bound referred to as GP-UCB). The reader is directed to
more detailed works on acquisition functions, such as [68] and [82].
6.4 Case Study
In this section, we present a retrospective case study of an adaptive experimentation of an autonomous,
off-road navigation system. Specifically, we present our field robotics system, one instantiation of an
experimental design problem for autonomous navigation, and two different scenarios.
6.4.1 System Definition
For this case study we use the ARL Ground Autonomy Software Stack, which is an experimental, end-toend field robotics system that implements long-range, off-road autonomous navigation in unknown and
unstructured environments. Our system includes a number of state-of-the-art components for state estimation, simultaneous localization and mapping (SLAM), terrain understanding, motion planning, trajectory
optimization, control, and user interfaces for defining navigation missions [83]. A key characteristic of our
system is that it is designed to be modular and generalizable across a wide range of sensors, platforms, and
algorithms, such that components can be easily modified or replaced. This allows for rapid, system-wide
adaption to perturbations in challenging environments as well as quantification of individual components
with respect to impact on autonomous navigation. A key characteristic of our system is that it is designed
to be modular and allows for easy configuration of individual parameters, components, and algorithms.
While our software suite is platform-agnostic, we evaluated autonomous, off-road navigation capabilities
on a Clearpath Warthog, equipped with an Ouster OS1 LiDAR, a KVH Geo-Fog 3D Inertial Measurement
Unity (IMU), a Microstrain 3DM-GX5-35 IMU, and Ubiquiti BulletAC-IP67 2.4GHz and 5GHz radios for
communications. For computing, the Warthog makes use of two Neousys Nuvo 7166GC computers, each
with an Intel i7-8700T CPU and an Nvidia GTX 1660 Ti. The Warthog has a MasterClock GMR1000 time
144
server which provides NTP and PTP time syncronization, as well as outputing a PPS signal used to sync
the Microstrain IMU. A system diagram and a photo of the physical platform is shown in Fig. 1.5.
6.4.2 Formulation and Evaluation for Autonomous Navigation
Performance of our autonomous, off-road navigation system is evaluated using a series of waypoint missions where the experimenter defined the objective as navigation to the predefined goal locations as quickly
as possible with constraints on the number of human interventions, collisions, and quality of navigation.
Here, the quality of navigation is required to be “human-like" navigation as evaluated by the experimenter,
which we define for this work as autonomous navigation that aligns with how a human would teleoperate
the robot, i.e., smooth and predictable. We define the adaptive experimental design problem for the desired
autonomous navigation as
argmin
x∈A
f(x) = w1T1(x) + w2T2(x) (6.7)
g1(x) ≤ D(j) ∀j ∈ 1, 2, . . . , G (6.8)
g2(x) ≥ G (6.9)
g3(x) ≤ N (6.10)
g4(x) ≤ C (6.11)
g5(x) ≥ H (6.12)
where T1(x) corresponds to the amount of time required for the robot to successfully execute the mission, T2(x) corresponds to the amount of time required to conduct the test, and weights, w1, w2 ∈
[0, 1]; w1 + w2 = 1, allow the experimenter to express preference on the two components. By designing the objective function in this way, the value of conducting a new experiment must outweigh the cost
of conducting the test in order for subsequent experiments to be desirable. According to constraint (6.8),
145
the distance from the robot’s position to the center of the waypoint must come within the experimenterdefined distance D(j) for waypoint j to be considered “achieved", i.e., g1(x) = ||ρr − ρj ||2 ≤ D(j) where
ρr ∈ R
3
is the pose of the robot in three-dimensional space and ρj ∈ R
3
is the center location of the j-th
waypoint. Constraint (6.9) requires the number of waypoints “achieved" in a test, g2(x), must be at least
G waypoints. When G is equal to the total number of waypoints, this constraint enforces completion of
the entire mission in order for an experimental configuration to be acceptable solution. Constraints (6.10)
and (6.11) require that the number of human interventions, g3(x), and number of collisions, g4(x), be
no greater than thresholds N and C, respectively. Finally, constraint (6.12) requires that the quantitative
evaluation that is provided by a human rater, g5(x), must be greater than some threshold H, where higher
scores correspond to more “human-like" navigation. In our case study, we consider adaptive experimental design with minimal a-priori system knowledge (Section 6.4.3) and with a-priori system knowledge
from previous experimentation (Section 6.4.4), where each scenario uses a different waypoint mission in a
unique off-road environment.
6.4.3 Adaptive Experimental Design
In the first scenario of our case study, referred to as Scenario 1, the system was evaluated in a 4-waypoint
mission, spanning over 200m from paved roads near a building to a gravel road in a forest. The experimenter required the robot to navigate the entire mission (G = 4), coming within D(j) = {2.35, 3.52, 5.32,
6.37} meters for waypoints j = 1, 2, 3, 4, with zero interventions (N = 0), zero collisions (C = 0), and a
qualitative rating of at least H = 3 where qualitative scores can range from 1 to 4 and a score of 4 indicates a predictable autonomous system. The experimenter had minimal a-priori knowledge of the system
performance for this specific mission and platform, but had some working knowledge of the system based
on implementing and testing the system components in a physics-based simulation environment.
146
Table 6.1 presents the results of this adaptive experimental design. Each row corresponds to a test
where the experimenter specified a system configuration with the intentions of testing a dedicated component or set of components. The Configuration Change column refers to what specific parameter within
the targeted component was altered for the test. Initially, the experimenter started with their best estimate of a feasible system configuration and observed a failed mission due to the robot’s inability to plan
to the last waypoint. The increased system understanding gleaned from the failed mission led the experimenter to suspect the mapping and navigation subsystems as the source of the failure. For the second test
the experimenter chose a system configuration that reduces the unknown cell cost of the mapper component; this resulted in a successful, albeit slow, experiment. To improve the overall quality of planning the
experimenter then increased the re-planning frequency of the global planning component - this change
reduced the amount of time for the robot to navigate the mission. In the fourth test the experiment chose
to repeat the configuration from the third test to collect more data regarding the repeatability of system
performance. The experimenter observed worse performance (i.e., larger completion time and a collision)
which suggests there is likely variance in the system’s ability to autonomously navigate this mission.
The experimenter chose to explore different path planning algorithms in the fifth and sixth tests by
trying different permutations of SBPL [138] and GLS [146] for global planning and NLOPT [83, 101] and
MPPI [251] for local planning. These tests reveal that SBPL with MPPI (Test 6) can produce a shorter
mission duration than GLS and MPPI (Test 5) or SBPL and NLOPT (Test 4) for this mission.
The experimenter sought to reduce the mission duration even further in the seventh experiment by
increasing the forward velocity from 3.0m/s to 4.0m/s and using the NLOPT local planner. The experimenter was forced to abort this test because the robot did not correctly account for navigating over roadside curbs resulting in unsafe behavior at higher speeds. The eighth test was conducted using the slower
maximum forward velocity of 3.0m/s; however, the robot was unable to achieve the last two waypoints
due to poor initialization of the mapping frame of reference and accumulating additional error over time.
147
Test 9 targeted an improvement for this by increasing the requirement for initializing the mapping frame
of reference. Finally, the experimenter chose to evaluate the local planner, MPPI, and global planner, GLS,
with the increased initialization requirement in Tests 10 and 11. The experimenter decided that sufficient
performance had been achieved and additional tests would not outweigh the cost. They noted that if there
were further experimentation it should focus on a previously-tested configuration to increase confidence
in performance. It’s worth noting that the experimenter observed the best quantitative performance in
Test 6 and best qualitative performance in Test 10, but chose to conduct 11 experiments in total.
6.4.4 Adaptive Experimental Design Using Prior Knowledge
In the second scenario, referred to as Scenario 2, the system was evaluated in a 5-waypoint mission that
started in a forest, progressed through a field, down a road, and to the top of a large hill totaling over
200m. This mission exposes the robot to several different ground surfaces including, dirt, concrete and
grass as well as cluttered and clear environments to plan over. Similar to the Scenario 1, the experimenter
required full execution of the mission (G = 5) with zero interventions, zero collisions, and a relatively
high qualitative rating (N = 0, C = 0, H = 3), while the experimenter-defined threshold for acceptable
waypoint distance was D(j) = {3.83, 5.43, 9.95, 9.95, 9.95} meters for waypoints j = 1, 2, . . . , 5.
Uniquely, the experimenter had prior knowledge of system performance for these tests by conducting experimentation in Scenario 1. The operational environments used in these two scenarios have both
comparable and mutually exclusive characteristics that led the experimenter to believe the system could
perform similarly, but not exactly the same in the two scenarios. Both environments require traversal
through a forest and on a paved road; and while Scenario 1 requires navigation on a gravel road, Scenario
2 requires navigation in a field and up a large hill. As a result, the experimenter chose to use an initial
configuration that was nearly identical as the final configuration from the previous scenario. The only
148
difference was that the experimenter reduced the maximum velocity from 3.0m/s to 1.5m/s, which was influenced by observations of failed, high-speed tests during Scenario 1, safety concerns, and the uncertainty
of system performance in the new environment.
This scenario is summarized in Table 6.2. Interestingly, the robot was unable to navigate to the first
waypoint using the initial system configuration because the planner could not find any feasible paths in
the forest. Due to a-priori system knowledge the experimenter investigated the obstacle height threshold
parameter of the mapper over the next two tests to allow the robot to find a collision-free path out of the
forest as well as along the road with overgrown grass on either side. After the robot was able to execute
the entire mission in Test 3, but experienced some stalling at lower speeds, the experimenter targeted
improving performance by increasing the linear velocity. They also disabled loop closure in the mapping
component because they believed the risk of false positives was unnecessary in a mission that does not
require navigation that revisits the same location. Next, the experimenter recalled that the MPPI local
planner produced the two fastest runs from Scenario 1 (Tests 8 and 12 for Table 6.1) so they decided to
use this algorithm for Test 5 and it led to the first test that completed the mission and satisfied all of the
constraints. To gain greater confidence in the repeatability of this result, the experimenter conducted Test
6 using the same configuration as Test 5, but the robot failed due to significant map drift. The experimenter
hypothesized that the Geofog IMU was introducing error because it was not configured to use PTP time
synchronization, whereas the robot from the previous test used the Microstrain IMU with time synchronization. Using this information and their observations from Scenario 1, the experimenter conducted Tests
7 and 8 using the Microstrain IMU; this change led to the two best missions thus far due to consistently
less map drift. Finally, the experimenter decided to conduct exploratory Tests 9−10 in an effort to improve
the qualitative rating. They did this by increasing the radius of inflated obstacles so that path planning
would be more centered on the road; however this had the unintended consequence of decreasing planning performance in the forest where there are sporadic obstacles. As a result, the experimenter reverted
149
back to the configuration of Test 8 for Test 11 and observed another high-quality run. With three observations of similar, acceptable performance (Tests 7, 8, and 11), the experimenter chose to stop conducting
additional tests. Again, the experimenter chose to conduct 11 experiments in total, but in this scenario the
experimenter observed the best quantitative and qualitative performance in Test 8.
150
Table 6.1: Results from Scenario 1 of an adaptive experimental design. Components the experimenter explored were the: mapper (M), global planner (GP), local planner (LP), and controller
(C). Test assessments include: failure (F), success with unsatisfied constraints (SU), and success
with fully satisfied constraints (S).
Test Targeted Configuration Change T1(x) g2(x) g3(x) g4(x) g5(x) Assessment
ID Components (seconds)
1 - None (initial run) 448.34 3 1 0 1 F
2 M Reduced unknown cell cost 112.40 4 0 0 3 S
3 GP Increased replanning frequency 94.09 4 0 0 3 S
4 GP None (repeat) 125.36 4 0 1 2 SU
5 GP, LP Switched to GLS+MPPI 184.22 4 2 0 2 SU
6 GP Switched to SBPL 91.70 4 0 0 3 S
7 LP, C Switched to NLOPT, increased velocity 31.07 1 1 0 1 F
8 C Decreased velocity 146.56 2 1 0 1 F
9 M Increased frame alignment requirement 164.45 4 1 0 2 SU
10 LP Switched to MPPI 92.05 4 0 0 4 S
11 GP Switched to GLS 101.95 4 0 0 3 S
151
Figure 6.1: Top-down map of Test 1 in Scenario 1. Light gray regions indicate free space in the map built
online by the robot, blue discs indicate waypoints, bold cyan lines shows the autonomous robot’s path,
light green lines represent replans, and red lines represent remote control by the experimenter. In this
case, the test failed using the experimenter’s initial system configuration.
152
Figure 6.2: Top-down map of Test 2 in Scenario 1. In this case, the experimenter chose to reduce the
unknown cell cost after Test 1, which enabled planning to the final waypoint. Recall, the color representations are explained in the caption of Fig. 6.1.
153
Table 6.2: Results from Scenario 2 of an adaptive experimental design where the experimenter
leveraged prior knowledge. In addition to the component definitions in Table 6.1, the experimenter also explored the hardware (HW).
Test Targeted Configuration Change T1(x) g2(x) g3(x) g4(x) g5(x) Assessment
ID Components (seconds)
1 C Reduced velocity 107.51 0 1 0 1 F
2 M Increased height threshold 356.90 4 0 0 2 F
3 M Reduced height threshold slightly 300.96 5 0 0 2 SU
4 C, M Increased velocity, disabled loop closure 256.57 5 2 0 1 SU
5 LP Switched to MPPI 163.83 5 0 0 3 S
6 LP None (repeat) 223.65 3 2 1 1 F
7 HW Switched to Microstrain IMU 160.78 5 0 0 4 S
8 HW None (repeat) 154.06 5 0 0 4 S
9 M Increase global and local costmap inflation 330.31 5 2 0 1 SU
10 M Reverted global costmap inflation 263.04 5 2 0 2 SU
11 M Reverted local costmap inflation 155.80 5 0 0 4 S
154
Figure 6.3: Top-down map of Test 3 in Scenario 2 where the experimenter leveraged prior knowledge. In
this case, the robot navigates the mission successfully, but the experimenter rates the qualitative score
unacceptably low. Recall, the color representations are explained in the caption of Fig. 6.1.
155
Figure 6.4: Top-down maps of Test 4 in Scenario 2 where the experimenter leveraged prior knowledge. In
this case, the experimenter chose to increase the robot’s forward velocity and disable loop closure, which
produced a faster completion time but required more interventions and achieved a lower qualitative score.
Recall, the color representations are explained in the caption of Fig. 6.1.
156
6.5 Analysis
From the two scenarios in our case study, we notice several interesting trends that could contribute to
the realization of a risk-managing advisory system for adaptive experimental design with preferences.
First, we see that through out the case study the experimenter decomposed the problem of selecting input
configurations of a high-dimensional system into simpler sub-problems that targeted specific sub-systems
each test. The experimenter typically only changed parameters for a single component, and never more
than two, which helped keep system exploration and exploitation tractable. It is also worth noting that the
experimenter demonstrated explicit acts of risk management in their configuration of Test 1 in Scenario
2 by intentionally reducing the linear velocity of the platform due to concerns about personal and system
safety in a new environment.
For component configuration, we observe that the experimenter demonstrated several examples of
choosing to explore or exploit their belief of system performance and the information gained from previous
tests. Tests 5 and 6 in Scenario 1, for example, explored the impact of novel global and local planners and
Tests 9−11 in Scenario 2 collected new data points for exploring the qualitative impact of unique mapping
settings. As for exploitation, the experimenter justified the configurations of Tests 2, 3, 7, and 9 of Scenario
1, and Tests 2 − 5 and 7 of Scenario 2, as specifically seeking performance improvements. In both cases of
exploration and exploitation, these tests resemble feasible suggestions from EI or KG acquisition functions
because they correspond to strategically maximizing improvement.
The experimenter also exhibited tendencies of choosing experiment configurations that sought to reduce uncertainty about system performance. Tests 4, 8, and 10−11 in Scenario 1 as well as Tests 6 and 8 of
Scenario 2 were repeated experiments of identical, or extremely similar, prior configurations. Not only did
this confirm that the observations of system performance have some variance, but two experiments (Test 8
of Scenario 1 and Test 6 of the Scenario 2) failed, revealed critical issues with the system configuration, and
157
were immediately followed by an experiment that sought to maximize performance. Future advisory systems might leverage PES to model this type of decision making because the experimenter actively sought
to reduce the uncertainty of performance associated with some experiment configuration.
In terms of preferences, the experimenter noted that they generally sought to characterize the system performance over minimizing experimentation costs in Scenario 1 because they had minimal a-priori
knowledge, whereas they prioritized reducing costs in the second scenario because they had already built
greater working knowledge. We deduce that experimenters might choose w1 ≥ w2 in experiments where
they feel their belief model is not sufficiently informed, such as initial experimentation with a new system,
and w1 ≤ w2 if they have relevant, prior knowledge.
6.6 Summary
This chapter identified a need for formal decision-making methods of adaptive experimental design for
field robotics. We seek to build a Stage 3 DSS, intended for both novice and experienced experimenters,
that can leverage newly-acquired information from previous experiments to inform the selection of subsequent experiments. Toward realizing such a system, we present and analyze a case study of off-road,
autonomous navigation from the perspective of Active Learning. We draw connections between Bayesian
Optimization and observations from an adaptive experimental design, and note that several different acquisition functions would likely be necessary to model the decisions in our case study, including EI, KG,
and PES.
Using the case study insights and BO-based mathematical formulation presented in this chapter, there
are several fruitful avenues of future work. First, literature would benefit from the construction of a BObased Stage 3 DSS, which will require an investigation into the selection of kernel functions for the GP
and acquisition functions for the query point selection strategy. Exploring the applicability of EI, KG,
and PES could be fruitful given that our case study shows evidence of human decision making strategies
158
that resemble these acqusition functions. Likewise, existing works in deciding whether to replicate an
experiment or explore elsewhere in the state space, such as [26] which focuses on computer simulation
experiments, could guide the development of GPs for Stage 3 DSSs.
An in-depth characterization of GP construction and deployment is especially important because literature has shown that BO approaches are oftentimes highly sensitivity to the surrogate model performance.
Literature has already documented several challenges with using BO in real-world settings so these will
likely present similar technical hurdles in the application for experimental design. Most notably, the curse
of dimensionality will need to be managed. Proposed BO approaches typically restrict the input dimensionality to less than or equal to 20, BO can be slow to converge in high-dimensional spaces, and GP models
have been shown to scale poorly. The authors of [26] use integrated mean-squared prediction error (IMSPE) and propose a lookahead scheme that is both computationally tractable and designed to assist with
the question of exploration versus replication. Additionally, the authors of [150] propose an acquisition
function to quantify information gain of state-action pairs in the optimal solution of Markov decision processes for plasma control for nuclear fusion. We hypothesize that experiment configurations will likely
require intelligent representations to be sufficiently small, and underlying mathematical frameworks will
need to focus on tractability, to manage computational complexity for real-world applicability.
159
Chapter 7
Conclusions
This chapter presents the intellectual contributions and anticipated benefits from the work proposed in
this dissertation as well as future research directions to build on existing efforts.
7.1 Intellectual Contributions
This dissertation makes several contributions toward realizing DSSs for adaptive experimental design of
field robots, specifically, off-road ground vehicles.
1. Development of a six-stage taxonomy. In Chapter 3 I introduced a taxonomy and new concept
of operations for experimental design in field robots, which takes the form of a human experimenter
designing experiments using a DSS that produces tailored decision support. This is significantly
different than modern day experimentation in field robotics. My taxonomy serves as a roadmap for
DSS development and outlines technical gaps and specific future research directions to facilitate the
research community in the pursuit of effective decision support.
2. Development and evaluation of a Stage 1 DSS. I proposed and evaluated a Stage 1 DSS in Chapter 4 where initial results from an exploratory study demonstrate that 1) experienced field robotics
make suboptimal decisions in their experimental design, which suggests there exists an opportunity for DSSs to have an appreciable impact in RDT&E; 2) a simple checklist-style questionnaire
160
can help experimenters with conceptually formulating experiments to avoid suboptimal decision
making; and 3) DSS users generally find experimentation decision support useful and not burdensome, which provides promise for DSS adoption. The most notable outcome from this investigation
is the importance of proactive decision support, which necessitates the development of Stage 2-5
development.
3. Development and evaluation of a Stage 2 DSS. To address the need for proactive decision support
I designed a Stage 2 DSS using Graph Neural Networks to generate alerts for the experimenter
regarding their experimental design. To the best of my knowledge, this is the first GNN-based DSS for
experimental design of field robots. A human subjects user study revealed that alert generation is an
effective tool in further assisting experienced roboticists with reducing the number of mistakes and
suboptimal decision in experimental design. In addition to an exploration of the state-of-the-possible
for proactive decision support, my findings in Chapter 5 uncovered the need for experimental design
recommendation, where participants were promptly alerted of an experimental design shortcoming
but unable to resolve the issue.
4. Real-world case study evaluation to inform the development of a Stage 3 DSS. The retrospective case study I conducted in Chapter 6 provided exemplary insights into the sequential decision making process of experimenters performing experiments of autonomous ground vehicles
in complex environments. Using these insights, I outlined an architecture for a Stage 3 DSS from
the perspective of using Bayesian Optimization to recommend high-value subsequent experimental
designs.
161
7.2 Anticipated Benefits
The anticipated benefits of this work is threefold. First, equipped with new AI-enabled RDT&E tools and
concepts such as next-generation DSSs, researchers and systems engineers should inherit improved decision making abilities so that experimenters are of higher value and less wasteful. DSSs can provide new
or more useful information in a way that human experimenters can leverage to construct experiments
tailored to their objectives, moreso than if they had no support. Decision aid tools should specifically assist experimenters and system engineers with quantitative analysis, uncertainty quantification, and bias
mitigation when testing and evaluating learning-based systems. Importantly, while DSSs and other decision aid tools should help augment the experimenter’s decision making abilities, the human will retain
decision authority so that they can ensure selected experiments align with their prioritized experimental
objectives and risk tolerance in the context of a desired field application. This will be especially important as RDT&E is applied to different robotic systems, including the various ground, aerial, legged, and
multi-robot systems, as shown in Figure 7.1.
Second, frameworks and insights from this dissertation could help produce data efficient methods for
providing decision support and more intelligent experimental design. This dissertation investigated how
to represent experimental designs in novel ways, such as graphs and Bayesian Optimization formulations,
so that they are machine interpretable and can be used for reasoning efficiently. The Stage 1 DSS developed
in this work required no a-priori data and had virtually no integration cost. The Stage 2 DSS was designed
using the simplest graph formulation to illustrate how alert generation could be performed efficiently and
still provide some benefit to decision making, and Bayesian Optimization was specifically investigated for
a Stage 3 DSS to explore data-efficient reasoning over select aspects of sequential decision making for
experimental design.
Finally, experiments that leverage DSSs similar to those explored in this dissertation should increase
the level of trust and explainability humans have with experimental field robotics. We anticipate that DSS
162
Figure 7.1: Different experimental robotic systems that could benefit from AI-enabled RDT&E tools using
insights and approaches investigated in this dissertation.
development for experimental design will have downstream impact on V&V methodology, assurance case
construction, and system transparency that are common pre-requisites to building trust, explainability,
and rapid deployment of learning-based systems. This is because experimenters that are better equipped
to select experiments will more efficiently find system strengths and deficiencies that will inform future experiments, research, and development. A more holistic view of the performance and limitations of complex
robotic systems through experiential experimentation should reveal correlational and causal relationships
between the robotic system and testing condition that would help experimenters explain system behavior
and build trust.
163
7.3 Future Directions
There are several efforts that could have immediate impact building off the work presented in this dissertation that would advance the capabilities of Stage 2 and 3 DSSs. First, as alluded to in Chapter 5. HetGNNs
could be a key enabling technology in a Stage 2 for more expressive representations of experimental designs and richer alert conditions. The alert conditions explored in this work were based on structural information with respect to a high level representation of the system diagram and non-structural information
with respect to experimentation context. The ability to define different features per node and edge could
enable the encoding of important structural and non-structural information that follows an experimenter’s
intuition and aligns with more transparent system understanding. This type of expressive experimental design representation is also likely necessary for sufficiently accurate static analysis due to the sophisticated
component and environmental interactions of autonomous ground systems. Based on existing literature
of GNNs, the curse of dimensionality is expected to be a primary challenge in designing HetGNNs for experimental design and so representation learning and experimental design encodings will likely be a focus
of future work. Along these lines, future work for Stage 3 DSSs using Bayesian Optimization will likely
face similar concerns regarding the curse of dimensionality. DSSs for experimental design would benefit
from efficient ways to represent experimental inputs, design components, and observations so that the
feasible set dimensionality is reduced but the saliency of information used for global optimization is not
lost. Experimenters may also find value in using Bayesian Optimization for a subset of the experimental
design, i.e., only deciding whether to include a certain hardware or software component, or choosing a
certain parameter value, rather than the entire experimental input configuration. Furthermore, specifically
for the BO-based DSS as discussed in Chapter 6, open questions remain surrounding the construction of
the GP kernel and acquisition functions, which are likely experimenter-, system-, and objective-specific.
It’s possible that an experimenter’s experimental design strategy changes over the course of conducting
164
several tests and, as a result, future DSSs will require new or multiple acquisition functions to provide
relevant decision support.
There are also exciting avenues for Stage 4 and 5 DSSs that were not explored in this dissertation.
Specifically, emerging advancements in Generative AI, such as GFlowNets [115], could lead to the realization of proactive experimental design recommendation capabilities. GFlowNets learn to sample from a
distribution given by a reward function and can be used to form efficient and amortized Bayesian posterior
estimators for models conditions on experimental data. As noted by the authors of [115], such posterior
models can estimate epistemic uncertainty and information gain that could be paramount in focusing
experimental design policies. The authors note that GFlowNets can become valuable tools specifically in
scenarios of very large candidate spaces where there’s access to cheap but inaccurate measurements or too
expensive but accurate measurements. GFlowNets for experimental design of autonomous ground vehicles are envisioned to be used for recommending candidate experiments for a human to select from where
the candidate space could be populated with cheap, inaccurate simulation-based results or expensive but
accurate field experiment-based empirical results. The value of GFlowNets was previously demonstrated
in the work of [114], where an Active Learning algorithm generated diverse and informative candidates
for experimental design of a biological sequence study on anti-microbial activity of a peptide.
Looking beyond the work in this dissertation, the scientific community can draw inspiration from,
adapt, and build upon advancements in AI, ML, HRI, and safe learning to revolutionize experimental design,
which will include emerging autonomy in experimentation technologies that enhance the experimenter’s
decision-making ability and trust in autonomous robots. In the short-term, research in experimental methods of autonomous robots could focus on a collective effort to align the scientific community’s interests,
needs, and existing knowledge. The pursuit of a broader taxonomy to classify the existing experimental
methods currently being used across greater domains and applications in the robotics research ecosystem
would provide the added benefit of developing a standard terminology, which is especially important given
165
that RDT&E is multidisciplinary. The community could also develop a common ontology or framework
for specifying relationships and constraints of conceptual or existing experimental methods. This could
help to identify courses of action, research roadmaps, and paths to leverage AI, ML, HRI, and safe learning effectively. The community should collectively build, maintain, and refine one or more knowledge
bases for experimental design because of the critical role they play in decision making and decision support. Knowledge bases will aggregate lessons learned of previous experimental procedures and facilitate
RDT&E to expedite the deployment of trustworthy, explainable autonomous robots. Multiple knowledge
bases that span different domains could offer the benefit of overarching best practices along with specialized knowledge for a given domain, for example one for autonomous ground vehicles and a separate for
manipulation. The curation of such knowledge bases could also make the creation of applicable Expert
Systems more feasible.
In the long-term, the community could leverage AI and ML to catalyze RDT&E with respect to new
capabilities designed for more efficient scientific discovery, knowledge representations, and experimental
methods. New tools could assist experimenters and system designers with state space exploration and
quantitative analysis. Uncertainty estimation appears to be critical in the sequential decision making process of experimental design and emerging techniques from conformal prediction may prove to be valuable
to determine precise levels of confidence in new predictions, which could help guide the selection of successive experiments [21]. Generative AI presents a monumental opportunity in RDT&E to boost human
decision making in the design, development, and testing stages. Generative AI-based tools could provide
enhanced RDT&E efficiency, reasoning, inference, and recommendations, as well as contribute to predictive theory that increases the value of experiments. Future experimental methods could introduce the
necessary knobs for an experimenter to perform on-the-fly constraint satisfaction, system characterization, and risk-reward analysis that otherwise isn’t feasible. This will be especially important as researchers
166
conduct experiments with multi-agent systems and more complex experimental objectives. Greater understanding of research-level autonomy will, in turn, have cascading effects in systems engineering and
TEV&V because robots will be better understood throughout the maturation process and accelerate the rate
of realizing and deploying trustworthy autonomous systems. Opportunities for HRI in AI-enabled RDT&E
could involve imitation learning, learning from demonstration, human modeling, hybrid intelligence, and
knowledge representation for tailoring experiments to the specific experimenter. Interactive experimental
design techniques may also benefit from incorporating physical grounding and semantics so that humans
and robots can collaboratively communicate salient information and observations pertaining to experimental results and decisions. Techniques from safe learning and optimization, such as active learning,
could help improve robustness, manage the inherent experimental risk, and formulate experiments that
account for user-specific risk preferences.
Finally, the community at large should consider the broader implications of DSSs for experimental
design in parallel with technical efforts to maximize acceptance and impact of future developments. Perhaps lessons from the decline of ESs in the 1990s can provide guiding principles on the development and
deployment of future DSSs through out the science and engineering fields. For example, the authors of
[76] note that successes of an ES in the technical or economical sense did not guarantee high levels of
adoption or long-term use, and other managerial and organizational issues led to systems falling to disuse
or abandonment. Common themes in the decline and disuse of ESs that may be applicable to DSS development include changes in desired tasks, expensive long-term maintenance costs, failure to recognize
the size of the task domain, and a focus on solving problems that were perceived to be critical by users.
For future DSS development, researchers should have a clear understanding of the experimental design
tasks that experimenters feel are the most important so that decision support techniques can be properly
scopes and DSSs can offer direct benefit to experimenters. If a DSS requires more data, integration costs,
167
or maintenance that outweigh the perceived benefit of the decision support offered by a DSS, the research
and engineering communities will likely not make use of advancements.
7.4 Opportunities to Bridge the HRI Gap
The investigations and development of DSSs in this work have interesting implications on the pursuit
to bridge the HRI gap between academia and industry because this serves to identify critical needs and
avenues for future research in decision support for human-in-the-loop experimental design that could accelerate research and development in the robotics community. First, we believe that further investigation
into directed thought analysis and structure for experimental design can help strengthen communication
between academia and industry, specifically regarding more realistic quantification of the state-of-the-art
and key technological gaps, including prioritized problems of interest and shortcomings in systems and
knowledge. Such efforts should lead to more rigorous methods for greater reproducibility, principled experimentation of AI/ML-enabled technologies, and the explainability of complex, autonomous systems.
This aligns with recent trends in the scientific community where code, data, and implementations are expected to be submitted with publications to further progress the state-of-the-art and ensure reproducible
results. The scientific community will collectively benefit by having more systematic ways to document,
communicate, and build on experimental designs. DSSs specifically appear to be a promising path forward
in experimental design and could provide general frameworks, such as the questionnaire-based solution
proposed here, across broad robotic domains as well as be tailored for specific applications or fields of
interest to academia and industry. To this end, academia and industry could benefit from a way to collectively build knowledge bases about common experimental design procedures and insights. A joint effort
between academia and industry to build knowledge bases for DSSs could enable more sophisticated DSS
functionality involving AI and ML that expedite knowledge acquisition, scientific discovery, and improved
168
decision making. Building such databases is oftentimes difficult, but a concerted effort to align the community’s experimental design procedures could lay the groundwork for collecting the necessary knowledge
products.
169
Bibliography
[1] Riccardo Accorsi, Riccardo Manzini, and Fausto Maranesi. “A decision-support system for the
design and management of warehousing systems”. In: Computers in Industry 65.1 (2014),
pp. 175–186.
[2] Leonard Adelman. “Experiments, quasi-experiments, and case studies: A review of empirical
methods for evaluating decision support systems”. In: IEEE Transactions on Systems, Man, and
Cybernetics 21.2 (1991), pp. 293–301.
[3] Afsoon Afzal, Deborah S Katz, Claire Le Goues, and Christopher S Timperley. “A study on the
challenges of using robotics simulators for testing”. In: arXiv preprint arXiv:2004.07368 (2020).
[4] Afsoon Afzal, Deborah S Katz, Claire Le Goues, and Christopher S Timperley. “Simulation for
robotics test automation: Developer perspectives”. In: 2021 14th IEEE Conference on Software
Testing, Verification and Validation (ICST). IEEE. 2021, pp. 263–274.
[5] Charu C Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, and S Yu Philip. “Active learning:
A survey”. In: Data Classification: Algorithms and Applications. CRC Press, 2014, pp. 571–605.
[6] Ali Agha, Kyohei Otsu, Benjamin Morrell, David D Fan, Rohan Thakker,
Angel Santamaria-Navarro, Sung-Kyun Kim, Amanda Bouman, Xianmei Lei, Jeffrey Edlund, et al.
“Nebula: Quest for robotic autonomy in challenging environments; team costar at the darpa
subterranean challenge”. In: arXiv preprint arXiv:2103.11470 (2021).
[7] Nuha Aldausari, Arcot Sowmya, Nadine Marcus, and Gelareh Mohammadi. “Video generative
adversarial networks: a review”. In: ACM Computing Surveys (CSUR) 55.2 (2022), pp. 1–25.
[8] Hesham Alghodhaifi and Sridhar Lakshmanan. “Autonomous Vehicle Evaluation: A
Comprehensive Survey on Modeling and Simulation Approaches”. In: IEEE Access (2021).
[9] Steven Alter. “A taxonomy of decision support systems”. In: Sloan Management Review (pre-1986)
19.1 (1977), p. 39.
[10] Sarah C Andersen, Kathrine L Møller, Simon W Jørgensen, Lotte B Jensen, and Morten Birkved.
“Scalable and quantitative decision support for the initial building design stages of
refurbishment”. In: Journal of Green Building 14.4 (2019), pp. 35–56.
170
[11] Virgil L Anderson and Robert A McLean. Design of experiments: a realistic approach. CRC Press,
2018.
[12] Jiju Antony. Design of experiments for engineers and scientists. Elsevier, 2014.
[13] US Army. “Verification, Validation and Accreditation of Army Models and Simulations”. In:
Department of the Army Pamphlet PAM (1993), pp. 5–11.
[14] Jay E Aronson, Ting-Peng Liang, and Richard V MacCarthy. Decision support systems and
intelligent systems. Vol. 4. Pearson Prentice-Hall Upper Saddle River, NJ, USA: 2005.
[15] Asefeh Asemi, Andrea Ko, and Mohsen Nowkarizi. “Intelligent libraries: a review on expert
systems, artificial intelligence, and robot”. In: Library Hi Tech 39.2 (2020), pp. 412–434.
[16] Anthony Atkinson, Alexander Donev, and Randall Tobias. Optimum experimental designs, with
SAS. Vol. 34. Oxford University Press, 2007.
[17] Shady Attia, Elisabeth Gratia, André De Herde, and Jan LM Hensen. “Simulation-based decision
support tool for early stages of zero-energy building design”. In: Energy and buildings 49 (2012),
pp. 2–15.
[18] Iman Avazpour, Teerat Pitakrat, Lars Grunske, and John Grundy. “Dimensions and metrics for
evaluating recommendation systems”. In: Recommendation systems in software engineering.
Springer, 2014, pp. 245–273.
[19] Radhakisan Baheti and Helen Gill. “Cyber-physical systems”. In: The impact of control technology
12.1 (2011), pp. 161–166.
[20] Krishnakumar Balasubramanian, Aniruddha Gokhale, Gabor Karsai, Janos Sztipanovits, and
Sandeep Neema. “Developing applications using model-driven design environments”. In:
Computer 39.2 (2006), pp. 33–40.
[21] Vineeth Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk. Conformal prediction for reliable
machine learning: theory, adaptations and applications. Newnes, 2014.
[22] Theodore A Bapty, Jason Scott, Sandeep Neema, and Robert Owens. “Integrated modeling and
simulation for cyberphysical systems extending multi-domain M&S to the design community”.
In: Proceedings of the Symposium on Model-driven Approaches for Simulation Engineering. 2017,
pp. 1–12.
[23] Avital Bechar and Clement Vigneault. “Agricultural robots for field operations: Concepts and
components”. In: Biosystems Engineering 149 (2016), pp. 94–111.
[24] Jenay M Beer, Arthur D Fisk, and Wendy A Rogers. “Toward a framework for levels of robot
autonomy in human-robot interaction”. In: Journal of human-robot interaction 3.2 (2014), p. 74.
[25] Kirstie L Bellman and Christopher Landauer. “Towards an integration science: The influence of
Richard Bellman on our research”. In: Journal of Mathematical Analysis and Applications 249.1
(2000), pp. 3–31.
171
[26] Mickaël Binois, Jiangeng Huang, Robert B Gramacy, and Mike Ludkovski. “Replication or
exploration? Sequential design for stochastic simulation experiments”. In: Technometrics 61.1
(2019), pp. 7–23.
[27] Urs Birchler and Monika Bütler. Information economics. Routledge, 1999.
[28] Benjamin S Blanchard, Wolter J Fabrycky, and Walter J Fabrycky. Systems engineering and
analysis. Vol. 4. Prentice hall Englewood Cliffs, NJ, 1990.
[29] Robert H Bonczek, Clyde W Holsapple, and Andrew B Whinston. Foundations of decision support
systems. Academic Press, 2014.
[30] Jon Bornstein. DoD autonomy roadmap: autonomy community of interest. Tech. rep. Army
Research Laboratory Aberdeen Proving Ground United States, 2015.
[31] George EP Box, William H Hunter, Stuart Hunter, et al. Statistics for experimenters. Vol. 664. John
Wiley and sons New York, 1978.
[32] Jeffrey M Bradshaw, Paul J Feltovich, Hyuckchul Jung, Shriniwas Kulkarni, William Taysom, and
Andrzej Uszok. “Dimensions of adjustable autonomy and mixed-initiative interaction”. In:
International Workshop on Computational Autonomy. Springer. 2003, pp. 17–39.
[33] Bruce G Buchanan and Reid G Smith. “Fundamentals of expert systems”. In: Annual review of
computer science 3.1 (1988), pp. 23–58.
[34] Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. “Klee: Unassisted and automatic
generation of high-coverage tests for complex systems programs”. In: OSDI. Vol. 8. 2008,
pp. 209–224.
[35] Jinkang Cai, Weiwen Deng, Haoran Guang, Ying Wang, Jiangkun Li, and Juan Ding. “A survey on
data-driven scenario generation for automated vehicle testing”. In: Machines 10.11 (2022), p. 1101.
[36] Roberto Calandra, André Seyfarth, Jan Peters, and Marc Peter Deisenroth. “Bayesian optimization
for learning gaits under uncertainty”. In: Annals of Mathematics and Artificial Intelligence 76.1
(2016), pp. 5–23.
[37] Giuseppina Lucia Casalaro, Giulio Cattivera, Federico Ciccozzi, Ivano Malavolta,
Andreas Wortmann, and Patrizio Pelliccione. “Model-driven engineering for mobile robotic
systems: a systematic mapping study”. In: Software and Systems Modeling 21.1 (2022), pp. 19–49.
[38] Mateo Guaman Castro, Samuel Triest, Wenshan Wang, Jason M Gregory, Felix Sanchez,
John G Rogers, and Sebastian Scherer. “How does it feel? Self-supervised costmap learning for
off-road vehicle traversability”. In: 2023 IEEE International Conference on Robotics and Automation
(ICRA). IEEE. 2023, pp. 931–938.
[39] Kathryn Chaloner and Isabella Verdinelli. “Bayesian experimental design: A review”. In:
Statistical Science (1995), pp. 273–304.
172
[40] Crystal Chao, Maya Cakmak, and Andrea L Thomaz. “Transparent active learning for robots”. In:
5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 2010, pp. 317–324.
[41] Zheng Chen, Zhengming Ding, Jason M Gregory, and Lantao Liu. “IDA: Informed Domain
Adaptive Semantic Segmentation”. In: arXiv preprint arXiv:2303.02741 (2023).
[42] David Roxbee Cox. “Planning of experiments”. In: American Psychological Association (1958).
[43] David Roxbee Cox and Nancy Reid. The theory of the design of experiments. CRC Press, 2000.
[44] Eric Damm, Jason M Gregory, Eli Lancaster, Felix Sanchez, Daniel Sahu, and Thomas Howard.
“Terrain-Aware Kinodynamic Planning with Efficiently Adaptive State Lattices for Mobile Robot
Navigation in Off-Road Environments”. In: 2023 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). IEEE. 2023.
[45] DARPA. Competency-Aware Machine Learning (CAML).
https://www.darpa.mil/program/competency-aware-machine-learning. Accessed: 2023-06-03.
[46] DARPA. Symbiotic Design for Cyber Physical Systems.
https://www.darpa.mil/program/symbiotic-design-for-cyber-physical-systems. Accessed:
2023-06-03.
[47] Defense Advanced Research Projects Agency (DARPA). Symbiotic Design Proposers Day Brief 2019.
DARPA. 2019. (Visited on 08/12/2019).
[48] Ewen Denney, Ganesh Pai, and Josef Pohl. “AdvoCATE: An assurance case automation toolset”.
In: International Conference on Computer Safety, Reliability, and Security. Springer. 2012, pp. 8–21.
[49] Mark Dennison, Christopher Reardon, Jason Gregory, Theron Trout, and John G Rogers III.
“Creating a mixed reality common operating picture across C2 echelons for human-autonomy
teams”. In: Virtual, Augmented, and Mixed Reality (XR) Technology for Multi-Domain Operations.
Vol. 11426. SPIE. 2020, pp. 137–144.
[50] Munjal Desai and Holly A Yanco. “Blending human and robot inputs for sliding scale autonomy”.
In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication,
2005. IEEE. 2005, pp. 537–542.
[51] Anind K Dey. “Understanding and using context”. In: Personal and ubiquitous computing 5 (2001),
pp. 4–7.
[52] M Bernardine Dias, Balajee Kannan, Brett Browning, E Jones, Brenna Argall, M Freddie Dias,
Marc Zinck, M Veloso, and Anthony Stentz. “Sliding autonomy for peer-to-peer human-robot
teams”. In: Proceedings of the international conference on intelligent autonomous systems. 2008,
pp. 332–341.
[53] Wenhao Ding, Chejian Xu, Haohong Lin, Bo Li, and Ding Zhao. “A Survey on Safety-critical
Scenario Generation from Methodological Perspective”. In: arXiv preprint arXiv:2202.02215 (2022).
173
[54] Anca D Dragan and Siddhartha S Srinivasa. “A policy-blending formalism for shared control”. In:
The International Journal of Robotics Research 32.7 (2013), pp. 790–805.
[55] Tommaso Dreossi, Alexandre Donzé, and Sanjit A Seshia. “Compositional falsification of
cyber-physical systems with machine learning components”. In: Journal of Automated Reasoning
63.4 (2019), pp. 1031–1053.
[56] John Durkin. “Expert systems: a view of the field”. In: IEEE Intelligent Systems 11.02 (1996),
pp. 56–63.
[57] Hyun B Eom and Sang M Lee. “A survey of decision support system applications (1971–April
1988)”. In: Interfaces 20.3 (1990), pp. 65–79.
[58] Sean Eom and E Kim. “A survey of decision support system applications (1995–2001)”. In: Journal
of the Operational Research Society 57.11 (2006), pp. 1264–1278.
[59] Sean B Eom, Sang M Lee, Eyong B Kim, and C Somarajan. “A survey of decision support system
applications (1988–1994)”. In: Journal of the Operational Research Society 49.2 (1998), pp. 109–120.
[60] Jeff A Estefan et al. “Survey of model-based systems engineering (MBSE) methodologies”. In:
Incose MBSE Focus Group 25.8 (2007), pp. 1–12.
[61] Gregory Falco and Leilani H Gilpin. “A stress testing framework for autonomous system
verification and validation (V&V)”. In: 2021 IEEE International Conference on Autonomous Systems
(ICAS). IEEE. 2021, pp. 1–5.
[62] Valerii Vadimovich Fedorov. Theory of optimal experiments. Elsevier, 2013.
[63] Paul N Finlay. “Decision support systems and expert systems: a comparison of their components
and design methodologies”. In: Computers & operations research 17.6 (1990), pp. 535–543.
[64] Paul M Fitts. “Human engineering for an effective air-navigation and traffic-control system.” In:
(1951).
[65] Matthew Fontaine and Stefanos Nikolaidis. “A quality diversity approach to automatically
generating human-robot interaction scenarios in shared autonomy”. In: arXiv preprint
arXiv:2012.04283 (2020).
[66] Matthew C Fontaine and Stefanos Nikolaidis. “Evaluating Human-Robot Interaction Algorithms
in Shared Autonomy via Quality Diversity Scenario Generation”. In: ACM Transactions on
Human-Robot Interaction (2020).
[67] F Nelson Ford. “Decision support systems and expert systems: A comparison”. In: Information &
Management 8.1 (1985), pp. 21–26.
[68] Peter I Frazier. “A Tutorial on Bayesian Optimization”. In: arXiv preprint arXiv:1807.02811 (2018).
174
[69] Daniel J Fremont, Johnathan Chiu, Dragos D Margineantu, Denis Osipychev, and Sanjit A Seshia.
“Formal analysis and redesign of a neural network-based aircraft taxiing system with VerifAI”. In:
International Conference on Computer Aided Verification. Springer. 2020, pp. 122–134.
[70] Daniel J Fremont, Edward Kim, Yash Vardhan Pant, Sanjit A Seshia, Atul Acharya, Xantha Bruso,
Paul Wells, Steve Lemke, Qiang Lu, and Shalin Mehta. “Formal scenario-based testing of
autonomous vehicles: From simulation to the real world”. In: 2020 IEEE 23rd International
Conference on Intelligent Transportation Systems (ITSC). IEEE. 2020, pp. 1–8.
[71] Lex Fridman. “Human-centered autonomous vehicle systems: Principles of effective shared
autonomy”. In: arXiv preprint arXiv:1810.01835 (2018).
[72] Charles L Gardner, James R Marsden, and David E Pingry. “The design and use of laboratory
experiments for DSS evaluation”. In: Decision Support Systems 9.4 (1993), pp. 369–379.
[73] Atul Gawande. Checklist manifesto, the (HB). Penguin Books India, 2010.
[74] Mumine Gercek and Zeynep Durmuş Arsan. “Energy and environmental performance based
decision support process for early design stages of residential buildings under climate change”.
In: Sustainable Cities and Society 48 (2019), p. 101580.
[75] Seyed Hassan Ghodsypour and Christopher O’Brien. “A decision support system for supplier
selection using an integrated analytic hierarchy process and linear programming”. In:
International Journal of Production Economics 56 (1998), pp. 199–212.
[76] T Grandon Gill. “Early expert systems: Where are they now?” In: MIS quarterly (1995), pp. 51–81.
[77] Josep Ginebra. “On the measure of the information in a statistical experiment”. In: Bayesian
Analysis 2.1 (2007), pp. 167–211.
[78] Michael J Ginzberg and Edward A Stohr. “Decision support systems: issues and perspectives”. In:
(1982).
[79] Kevin Gomes, Danelle Cline, Duane Edgington, Michael Godin, Thom Maughan, Mike McCann,
Tom O’Reilly, Fred Bahr, Francisco Chavez, Monique Messié, et al. “ODSS: A decision support
system for ocean exploration”. In: 2013 IEEE 29th International Conference on Data Engineering
Workshops (ICDEW). IEEE. 2013, pp. 200–211.
[80] Marco Gori, Gabriele Monfardini, and Franco Scarselli. “A new model for learning in graph
domains”. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.
Vol. 2. IEEE. 2005, pp. 729–734.
[81] Gregor Gössler and Joseph Sifakis. “Composition for component-based modeling”. In: Science of
Computer Programming 55.1-3 (2005), pp. 161–183.
[82] Stewart Greenhill, Santu Rana, Sunil Gupta, Pratibha Vellanki, and Svetha Venkatesh. “Bayesian
Optimization for Adaptive Experimental Design: A Review”. In: IEEE access 8 (2020),
pp. 13937–13948.
175
[83] Jason Gregory, Jonathan Fink, Ethan Stump, Jeffrey Twigg, John Rogers, David Baran,
Nicholas Fung, and Stuart Young. “Application of multi-robot systems to disaster-relief scenarios
with limited communication”. In: Field and Service Robotics. Springer. 2016, pp. 639–653.
[84] Jason M Gregory, Iain Brookshaw, Jonathan Fink, and Satyandra K Gupta. “An investigation of
goal assignment for a heterogeneous robotic team to enable resilient disaster-site exploration”. In:
2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR). IEEE. 2017,
pp. 133–140.
[85] Jason M Gregory, Sarah Al-Hussaini, Ali-akbar Agha-mohammadi, and Satyandra K Gupta.
“Taxonomy of A Decision Support System for Adaptive Experimental Design in Field Robotics”.
In: arXiv preprint arXiv:2210.08397 (2022).
[86] Jason M Gregory, Sarah Al-Hussaini, and Satyandra K Gupta. “Heuristics-based multi-agent task
allocation for resilient operations”. In: 2019 IEEE International Symposium on Safety, Security, and
Rescue Robotics (SSRR). IEEE. 2019, pp. 1–8.
[87] Jason M Gregory, Christopher Reardon, Kevin Lee, Geoffrey White, Ki Ng, and Caitlyn Sims.
“Enabling intuitive human-robot teaming using augmented reality and gesture control”. In: arXiv
preprint arXiv:1909.06415 (2019).
[88] Jason M Gregory, Daniel Sahu, Eli Lancaster, Feix Sanchez, Trevor Rocks, Kaukeinen,
Jonathan Fink, and Satyandra K. Gupta. “Active Learning for Testing and Evaluation in Field
Robotics: A Case Study in Autonomous, Off-Road Navigation”. In: IEEE International Conference
on Robotics and Automation (ICRA). 2022.
[89] Jason M Gregory, Felix Sanchez, Eli Lancaster, Ali-akbar Agha-mohammadi, and
Satyandra K. Gupta. “Using Decision Support in Human-in-the-Loop Experimental Design
Toward Building Trustworthy Autonomous Systems”. In: IEEE International Conference on Robot
and Human Interactive Communication (RO-MAN). 2023.
[90] Jason M Gregory, Garrett Warnell, Jonathan Fink, and Satyandra K Gupta. “Improving trajectory
tracking accuracy for faster and safer autonomous navigation of ground vehicles in off-road
settings”. In: 2021 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).
IEEE. 2021, pp. 204–209.
[91] Satyandra K Gupta and Jason M Gregory. “Opportunities For Generative Artificial Intelligence To
Accelerate Deployment of Human-Supervised Autonomous Robots”. In: Artificial Intelligent for
Human-Robot Interaction Symposium at the AAAI Fall Symposium Series (2023).
[92] Huy Ha, Shubham Agrawal, and Shuran Song. “Fit2Form: 3D generative model for robot gripper
form design”. In: Conference on Robot Learning. PMLR. 2021, pp. 176–187.
[93] Laura E Hart. “Introduction to model-based system engineering (MBSE) and SysML”. In: Delaware
Valley INCOSE Chapter Meeting. Vol. 30. Ramblewood Country Club Mount Laurel, New Jersey.
2015.
176
[94] Ahnaf Rashik Hassan and Abdulhamit Subasi. “A decision support system for automated
identification of sleep stages from single-channel EEG signals”. In: Knowledge-Based Systems 128
(2017), pp. 115–124.
[95] Brian A Haugh, David A Sparrow, and David M Tate. The status of test, evaluation, verification,
and validation (TEV&V) of autonomous systems. Tech. rep. Institute for Defense Analyses, 2018.
[96] Frederik W Heger, Laura M Hiatt, Brennan Sellner, Reid Simmons, and Sanjiv Singh. “Results in
sliding autonomy for multi-robot spatial assembly”. In: (2005).
[97] Frederik W Heger and Sanjiv Singh. “Sliding autonomy for complex coordinated multi-robot
tasks: Analysis & experiments”. In: (2006).
[98] Tapio Heikkilä, Jukka Koskinen, and Lars Dalgaard. “Decision support for designing autonomous
robot systems”. In: 60th Anniversary Seminar of Finnish Society of Automation. Finnish Society of
Automation. 2013.
[99] Philipp Helle, Wladimir Schamai, and Carsten Strobel. “Testing of autonomous
systems–Challenges and current state-of-the-art”. In: INCOSE international symposium. Vol. 26.
Wiley Online Library. 2016, pp. 571–584.
[100] Kaitlin Henderson and Alejandro Salado. “Value and benefits of model-based systems engineering
(MBSE): Evidence from the literature”. In: Systems Engineering 24.1 (2021), pp. 51–66.
[101] Thomas M Howard and Alonzo Kelly. “Optimal rough terrain trajectory generation for wheeled
mobile robots”. In: The International Journal of Robotics Research 26.2 (2007), pp. 141–166.
[102] Xun Huan and Youssef M Marzouk. “Simulation-based optimal Bayesian experimental design for
nonlinear systems”. In: Journal of Computational Physics 232.1 (2013), pp. 288–317.
[103] WuLing Huang, Kunfeng Wang, Yisheng Lv, and FengHua Zhu. “Autonomous vehicles testing
methods review”. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems
(ITSC). IEEE. 2016, pp. 163–168.
[104] Sarah Al-Hussaini, Neel Dhanaraj, Jason M Gregory, Rex Jomy Joseph, Shantanu Thakar,
Brual C Shah, Jeremy A Marvel, and Satyandra K Gupta. “Seeking Human Help to Manage Plan
Failure Risks in Semi-Autonomous Mobile Manipulation”. In: Journal of Computing and
Information Science in Engineering (JCISE) 22.5 (2022), p. 050906.
[105] Sarah Al-Hussaini, Jason M Gregory, Neel Dhanaraj, and Satyandra K Gupta. “A
Simulation-Based Framework for Generating Alerts for Human-Supervised Multi-Robot Teams in
Challenging Environments”. In: 2021 IEEE International Symposium on Safety, Security, and Rescue
Robotics (SSRR). IEEE. 2021, pp. 168–175.
[106] Sarah Al-Hussaini, Jason M Gregory, Yuxiang Guan, and Satyandra K Gupta. “Generating alerts
to assist with task assignments in human-supervised multi-robot teams operating in challenging
environments”. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS). IEEE. 2020, pp. 11245–11252.
177
[107] Sarah Al-Hussaini, Jason M Gregory, and Satyandra K Gupta. “A policy synthesis-based
framework for robot rescue decision-making in multi-robot exploration of disaster sites”. In: 2018
IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE. 2018, pp. 1–7.
[108] Sarah Al-Hussaini, Jason M Gregory, and Satyandra K Gupta. “Generating Task Reallocation
Suggestions to Handle Contingencies in Human-Supervised Multi-Robot Missions”. In: IEEE
Transactions on Automation Science and Engineering (T-ASE) (2023).
[109] Sarah Al-Hussaini, Jason M Gregory, and Satyandra K Gupta. “Generation of context-dependent
policies for robot rescue decision-making in multi-robot teams”. In: 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS). IEEE. 2018, pp. 4317–4324.
[110] Sarah Al-Hussaini, Jason M Gregory, Shaurya Shriyam, and Satyandra K Gupta. “An
alert-generation framework for improving resiliency in human-supervised, multi-agent teams”.
In: arXiv preprint arXiv:1909.06480 (2019).
[111] Francis Indaheng, Edward Kim, Kesav Viswanadha, Jay Shenoy, Jinkyu Kim, Daniel J Fremont,
and Sanjit A Seshia. “A Scenario-Based Platform for Testing Autonomous Vehicle Behavior
Prediction Models in Simulation”. In: arXiv preprint arXiv:2110.14870 (2021).
[112] Craig Innes and Subramanian Ramamoorthy. “Automated testing with temporal logic
specifications for robotic controllers using adaptive experiment design”. In: 2022 International
Conference on Robotics and Automation (ICRA). IEEE. 2022, pp. 6814–6821.
[113] ISO/IEC/IEEE. Systems and software engineering — System life cycle processes. Standard.
International Organization for Standardization, Institute of Electrical and Electronics Engineers,
2023.
[114] Moksh Jain, Emmanuel Bengio, Alex Hernandez-Garcia, Jarrid Rector-Brooks,
Bonaventure FP Dossou, Chanakya Ajit Ekbote, Jie Fu, Tianyu Zhang, Michael Kilgour,
Dinghuai Zhang, et al. “Biological sequence design with gflownets”. In: International Conference
on Machine Learning. PMLR. 2022, pp. 9786–9801.
[115] Moksh Jain, Tristan Deleu, Jason Hartford, Cheng-Hao Liu, Alex Hernandez-Garcia, and
Yoshua Bengio. “GFlowNets for AI-driven scientific discovery”. In: Digital Discovery 2.3 (2023),
pp. 557–577.
[116] Arturas Kaklauskas. “Intelligent decision support systems”. In: Biometric and intelligent decision
making support. Springer, 2015, pp. 31–85.
[117] Arnold Kamis, Marios Koufaris, and Tziporah Stern. “Using an attribute-based decision support
system for user-customized products online: an experimental investigation”. In: MIS quarterly
(2008), pp. 159–177.
[118] Peter GW Keen. “Decision support systems: the next decade”. In: Decision Support Systems 3.3
(1987), pp. 253–265.
178
[119] Alonzo Kelly, Anthony Stentz, Omead Amidi, Mike Bode, David Bradley, Antonio Diaz-Calderon,
Mike Happold, Herman Herman, Robert Mandelbaum, Tom Pilarski, et al. “Toward Reliable Off
Road Autonomous Vehicles Operating in Challenging Environments”. In: The International
Journal of Robotics Research 25.5-6 (2006), pp. 449–483.
[120] Siddartha Khastgir, Stewart Birrell, Gunwant Dhadyalla, and Paul Jennings. “Calibrating trust
through knowledge: Introducing the concept of informed safety for automation in vehicles”. In:
Transportation research part C: emerging technologies 96 (2018), pp. 290–303.
[121] Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, and Sungroh Yoon. “FloWaveNet: A
generative flow for raw audio”. In: International Conference on Machine Learning (2018).
[122] Ross D King, Kenneth E Whelan, Ffion M Jones, Philip GK Reiser, Christopher H Bryant,
Stephen H Muggleton, Douglas B Kell, and Stephen G Oliver. “Functional genomic hypothesis
generation and experimentation by a robot scientist”. In: Nature 427.6971 (2004), pp. 247–252.
[123] Jack PC Kleijnen. “Design and analysis of simulation experiments”. In: International Workshop on
Simulation. Springer. 2015, pp. 3–22.
[124] Bing Cai Kok and Harold Soh. “Trust in robots: Challenges and opportunities”. In: Current
Robotics Reports 1 (2020), pp. 297–309.
[125] Philip Koopman and Michael Wagner. “Challenges in autonomous vehicle testing and validation”.
In: SAE International Journal of Transportation Safety 4.1 (2016), pp. 15–24.
[126] Ksenia Korovina, Sailun Xu, Kirthevasan Kandasamy, Willie Neiswanger, Barnabas Poczos,
Jeff Schneider, and Eric Xing. “Chembo: Bayesian optimization of small organic molecules with
synthesizable recommendations”. In: International Conference on Artificial Intelligence and
Statistics. PMLR. 2020, pp. 3393–3403.
[127] Oliver B Kroemer, Renaud Detry, Justus Piater, and Jan Peters. “Combining active learning and
reactive control for robot grasping”. In: Robotics and Autonomous Systems 58.9 (2010),
pp. 1105–1116.
[128] Johannes Kulick, Marc Toussaint, Tobias Lang, and Manuel Lopes. “Active learning for teaching a
robot grounded relational symbols”. In: 23rd International Joint Conference on Artificial
Intelligence. 2013.
[129] Sulabh Kumra, Shirin Joshi, and Ferat Sahin. “Antipodal robotic grasping using generative
residual convolutional neural network”. In: 2020 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). IEEE. 2020, pp. 9626–9633.
[130] Andrew Kusiak. “Convolutional and generative adversarial neural networks in manufacturing”.
In: International Journal of Production Research 58.5 (2020), pp. 1594–1604.
[131] Zsolt Lattmann, James Klingler, Patrik Meijer, Jason Scott, Sandeep Neema, Ted Bapty, and
Gábor Karsai. “Towards an analysis-driven rapid design process for cyber-physical systems”. In:
2015 International Symposium on Rapid System Prototyping (RSP). IEEE. 2015, pp. 90–96.
179
[132] Averill M Law. “How to build valid and credible simulation models”. In: 2022 Winter Simulation
Conference (WSC). IEEE. 2022, pp. 1283–1295.
[133] William F Lawless, Ranjeev Mittu, Donald A Sofge, Thomas Shortell, and Thomas A McDermott.
Systems Engineering and Artificial Intelligence. Springer, 2021.
[134] Agnieszka Lazarowska. “Ant colony optimization based navigational decision support system”.
In: Procedia computer science 35 (2014), pp. 1013–1022.
[135] Chung Won Lee, Nasif Nayeer, Danson Evan Garcia, Ankur Agrawal, and Bingbing Liu.
“Identifying the operational design domain for an automated driving system through assessed
risk”. In: 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2020, pp. 1317–1322.
[136] Shu-Hsien Liao. “Expert system methodologies and applications—a decade review from 1995 to
2004”. In: Expert systems with applications 28.1 (2005), pp. 93–103.
[137] Bob Lightsey. “Systems engineering fundamentals”. In: Defense acquisition univ ft belvoir va
(2001).
[138] Maxim Likachev. Search-Based Planning Library. https://github.com/sbpl/sbpl.
[139] Robert K Lindsay. “Applications of artificial intelligence for organic chemistry: the DENDRAL
project”. In: (No Title) (1980).
[140] John DC Little. “Decision support systems for marketing managers”. In: Journal of Marketing 43.3
(1979), pp. 9–26.
[141] Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, and Arun Mallya. “Generative adversarial
networks for image and video synthesis: Algorithms and applications”. In: Proceedings of the IEEE
109.5 (2021), pp. 839–862.
[142] Xiang Liu, Rui Lin Ma, Jingwen Zhao, Jia Ling Song, Jian Quan Zhang, and Shuo Hong Wang. “A
clinical decision support system for predicting cirrhosis stages via high frequency ultrasound
images”. In: Expert Systems with Applications 175 (2021), p. 114680.
[143] Yugang Liu and Goldie Nejat. “Robotic Urban Search and Rescue: A Survey from the Control
Perspective”. In: Journal of Intelligent & Robotic Systems 72.2 (2013), pp. 147–165.
[144] Shouyin Lu, Ying Zhang, and Jianjun Su. “Mobile Robot for Power Substation Inspection: A
Survey”. In: IEEE/CAA Journal of Automatica Sinica (2017).
[145] Ivano Malavolta, Grace Lewis, Bradley Schmerl, Patricia Lago, and David Garlan. “How do you
architect your robots? State of the practice and guidelines for ROS-based systems”. In: Proceedings
of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in
Practice. 2020, pp. 31–40.
180
[146] Aditya Mandalika, Sanjiban Choudhury, Oren Salzman, and Siddhartha Srinivasa. “Generalized
lazy search for robot motion planning: Interleaving search and edge evaluation via event-based
toggles”. In: Proceedings of the International Conference on Automated Planning and Scheduling.
Vol. 29. 2019, pp. 745–753.
[147] Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao,
John Emmons, Anchit Gupta, Emre Orbay, et al. “Roboturk: A crowdsourcing platform for robotic
skill learning through imitation”. In: Conference on Robot Learning. PMLR. 2018, pp. 879–893.
[148] Carmen Marcher, Andrea Giusti, and Dominik T Matt. “On the Design of a Decision Support
System for Robotic Equipment Adoption in Construction Processes”. In: Applied Sciences 11.23
(2021), p. 11415.
[149] Sonali Mathur and Shaily Malik. “Advancements in the V-Model”. In: International Journal of
Computer Applications 1.12 (2010), pp. 29–34.
[150] Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, and Willie Neiswanger. “An
experimental design perspective on model-based reinforcement learning”. In: arXiv preprint
arXiv:2112.05244 (2021).
[151] Johannes Merkert, Marcus Mueller, and Marvin Hubl. “A Survey of the Application of Machine
Learning in Decision Support Systems.” In: ECIS. 2015.
[152] Elena Messina and Adam Jacoff. “Performance standards for urban search and rescue robots”. In:
Unmanned Systems Technology VIII. Vol. 6230. International Society for Optics and Photonics.
2006, p. 62301V.
[153] Alessio Micheli. “Neural network for graphs: A contextual constructive approach”. In: IEEE
Transactions on Neural Networks 20.3 (2009), pp. 498–511.
[154] MIL-STD-499A. Mil-STD-499A Engineering Management. 1974.
[155] Filip Miljković, Raquel Rodríguez-Pérez, and Jürgen Bajorath. “Impact of artificial intelligence on
compound discovery, design, and synthesis”. In: ACS omega 6.49 (2021), pp. 33293–33299.
[156] Sumit Mohanty, Viktor K Prasanna, Sandeep Neema, and J Davis. “Rapid design space exploration
of heterogeneous embedded systems using symbolic search and multi-granular simulation”. In:
ACM SIGPLAN Notices 37.7 (2002), pp. 18–27.
[157] Alan A Montgomery, Tom Fahey, Tim J Peters, Christopher MacIntosh, and Deborah J Sharp.
“Evaluation of computer based clinical decision support system and risk chart for management of
hypertension in primary care: randomised controlled trial”. In: Bmj 320.7236 (2000), pp. 686–690.
[158] Douglas C Montgomery. Design and analysis of experiments. John Wiley & Sons, 2017.
[159] Galen E Mullins, Austin G Dress, Paul G Stankiewicz, Jordan D Appler, and Satyandra K Gupta.
“Accelerated testing and evaluation of autonomous vehicles via imitation learning”. In: 2018 IEEE
International Conference on Robotics and Automation (ICRA). IEEE. 2018, pp. 5636–5642.
181
[160] Galen E Mullins, Paul G Stankiewicz, R Chad Hawthorne, and Satyandra K Gupta. “Adaptive
generation of challenging scenarios for testing and evaluation of autonomous vehicles”. In:
Journal of Systems and Software 137 (2018), pp. 197–215.
[161] Saideep Nannapaneni, Abhishek Dubey, Sherif Abdelwahed, Sankaran Mahadevan,
Sandeep Neema, and Ted Bapty. “Mission-based reliability prediction in component-based
systems”. In: International Journal of Prognostics and Health Management 7.1 (2016).
[162] Eduardo Natividade-Jesus, João Coutinho-Rodrigues, and Carlos Henggeler Antunes. “A
multicriteria decision support system for housing evaluation”. In: Decision Support Systems 43.3
(2007), pp. 779–790.
[163] Himanshu Neema, Bradley Potteiger, Xenofon Koutsoukos, Gabor Karsai, Peter Volgyesi, and
Janos Sztipanovits. “Integrated simulation testbed for security and resilience of cps”. In:
Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 2018, pp. 368–374.
[164] Himanshu Neema, Janos Sztipanovits, Cornelius Steinbrink, Thomas Raub, Bastian Cornelsen,
and Sebastian Lehnhoff. “Simulation integration platforms for cyber-physical systems”. In:
Proceedings of the Workshop on Design Automation for CPS and IoT. 2019, pp. 10–19.
[165] Robert M O’Keefe and Daniel E O’Leary. “Expert system verification and validation: a survey and
tutorial”. In: Artificial Intelligence Review 7.1 (1993), pp. 3–42.
[166] OpenAI. GPT-4 Technical Report. 2023. arXiv: 2303.08774 [cs.CL].
[167] Leon F Osborne, Jeff Brummond, Robert Hart, Mohsen Zarean, Steven M Conger, et al. Clarus:
Concept of operations. Tech. rep. FHWA-JPO-05-072. United States. Federal Highway
Administration, 2005.
[168] Ipek Ozkaya. “Application of Large Language Models to Software Engineering Tasks:
Opportunities, Risks, and Implications”. In: IEEE Software 40.3 (2023), pp. 4–8.
[169] John V Pavlik. “Collaborating with ChatGPT: Considering the implications of generative artificial
intelligence for journalism and media education”. In: Journalism & Mass Communication Educator
78.1 (2023), pp. 84–93.
[170] G Phillips-Wren, Manuel Mora, Guisseppi A Forgionne, and Jatinder ND Gupta. “An integrative
evaluation framework for intelligent decision support systems”. In: European Journal of
Operational Research 195.3 (2009), pp. 642–652.
[171] Gloria Phillips-Wren. “Intelligent systems to support human decision making”. In: Artificial
Intelligence: Concepts, Methodologies, Tools, and Applications. IGI Global, 2017, pp. 3023–3036.
[172] Gloria Elizabeth Phillips-Wren. An agent-based intelligent decision support system for autonomous
robotic systems. University of Maryland, Baltimore County, 2002.
[173] Gloria Elizabeth Phillips-Wren, Eugene Hahn, and Guisseppi Forgionne. “A multiple-criteria
framework for evaluation of decision support systems”. In: Elsevier. Omega. 2004, pp. 323–332.
182
[174] Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W Bradley Holtz,
Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman,
et al. “Learning from learning machines: a new generation of AI technology to meet the needs of
science”. In: arXiv preprint arXiv:2111.13786 (2021).
[175] Daniel J Porter and John W Dennis. “Test & evaluation of AI-enabled and autonomous systems: A
literature review”. In: (2020).
[176] Anatoliy I Povoroznyuk, Anna E Filatova, Georgij R Mumladze, Alexandra S Zlepko,
Dmytro H Shtofel, Andrii I Bezuglyi, Waldemar Wójcik, and Ulzhalgas Zhunissova.
“Formalization of the stages of diagnostic and therapeutic measures in decision support systems
in medicine”. In: Photonics Applications in Astronomy, Communications, Industry, and High-Energy
Physics Experiments 2019. Vol. 11176. SPIE. 2019, pp. 866–873.
[177] Daniel J Power. Decision support systems: concepts and resources for managers. Greenwood
Publishing Group, 2002.
[178] Michael Prince. “Does active learning work? A review of the research”. In: Journal of Engineering
Education 93.3 (2004), pp. 223–231.
[179] Hazhar Rahmani, Dylan A Shell, and Jason M O’Kane. “Sensor selection for detecting deviations
from a planned itinerary”. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). IEEE. 2021, pp. 6511–6518.
[180] Nijat Rajabli, Francesco Flammini, Roberto Nardone, and Valeria Vittorini. “Software verification
and validation of safe autonomous cars: A systematic literature review”. In: IEEE Access 9 (2020),
pp. 4797–4819.
[181] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. “Hierarchical
text-conditional image generation with clip latents”. In: arXiv preprint arXiv:2204.06125 (2022).
[182] Carl Edward Rasmussen. “Gaussian Processes in Machine Learning”. In: Summer school on
machine learning. Springer. 2003, pp. 63–71.
[183] Christopher Reardon, Jason Gregory, Kerstin Haring, and John G Rogers III. “Factors affecting
human understanding of augmented reality visualization of changes detected by an autonomous
mobile robot”. In: Virtual, Augmented, and Mixed Reality (XR) Technology for Multi-Domain
Operations III. Vol. 12125. SPIE. 2022, pp. 7–14.
[184] Christopher Reardon, Jason Gregory, Carlos Nieto-Granda, and John G Rogers. “Enabling
Situational Awareness via Augmented Reality of Autonomous Robot-Based Environmental
Change Detection”. In: Virtual, Augmented and Mixed Reality. Design and Interaction: 12th
International Conference, VAMR 2020, Held as Part of the 22nd HCI International Conference, HCII
2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I 22. Springer. 2020, pp. 611–628.
[185] Christopher Reardon, Jason Gregory, Carlos Nieto-Granda, and John G Rogers III. “Designing a
mixed reality interface for autonomous robot-based change detection”. In: Virtual, Augmented,
and Mixed Reality (XR) Technology for Multi-Domain Operations II. Vol. 11759. SPIE. 2021,
pp. 136–143.
183
[186] Christopher Reardon, Kerstin Haring, Jason M Gregory, and John G Rogers. “Evaluating human
understanding of a mixed reality interface for autonomous robot-based change detection”. In:
2021 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE. 2021,
pp. 132–137.
[187] Florence Reeder, Carol Pomales, Diane Kotras, and Jim Lockett. “Enabling the Department of
Defense’s Future to Test and Evaluate Artificial Intelligence Enabled Systems”. In: IEEE
Instrumentation & Measurement Magazine 26.5 (2023), pp. 31–38.
[188] Hailin Ren and Pinhas Ben-Tzvi. “Learning inverse kinematics and dynamics of a robotic
manipulator using generative adversarial networks”. In: Robotics and Autonomous Systems 124
(2020), p. 103386.
[189] Sharon L Riedel and Gordon F Pitz. “Utilization-oriented evaluation of decision support systems”.
In: IEEE Transactions on Systems, Man, and Cybernetics 16.6 (1986), pp. 980–996.
[190] John G Rogers, Jason M Gregory, Jonathan Fink, and Ethan Stump. “Test your SLAM! the
SubT-tunnel dataset and metric for mapping”. In: 2020 IEEE International Conference on Robotics
and Automation (ICRA). IEEE. 2020, pp. 955–961.
[191] Paul Rook. “Controlling software projects”. In: Software engineering journal 1.1 (1986), pp. 7–16.
[192] Kun Ruan, Xiaohong Chen, and Zhi Jin. “Requirements Modeling Aided by ChatGPT: An
Experience in Embedded Systems”. In: 2023 IEEE 31st International Requirements Engineering
Conference Workshops (REW). IEEE. 2023, pp. 170–177.
[193] Rubén Ruiz, Concepción Maroto, and Javier Alcaraz. “A decision support system for a real vehicle
routing problem”. In: European Journal of Operational Research 153.3 (2004), pp. 593–606.
[194] Fatemeh Zahra Saberifar, Dylan A Shell, and Jason M O’Kane. “Charting the trade-off between
design complexity and plan execution under probabilistic actions”. In: 2022 International
Conference on Robotics and Automation (ICRA). IEEE. 2022, pp. 135–141.
[195] I SAE. “Taxonomy and definitions for terms related to driving automation systems for on-road
motor vehicles”. In: SAE (2018).
[196] Shady Salama and Amr B Eltawil. “A decision support system architecture based on simulation
optimization for cyber-physical systems”. In: Procedia Manufacturing 26 (2018), pp. 1147–1158.
[197] Alberto Sangiovanni-Vincentelli. “Quo vadis, SLD? Reasoning about the trends and challenges of
system level design”. In: Proceedings of the IEEE 95.3 (2007), pp. 467–506.
[198] Robert G Sargent. “Validation and verification of simulation models”. In: WSC’99. 1999 Winter
Simulation Conference Proceedings.’Simulation-A Bridge to the Future’(Cat. No. 99CH37038). Vol. 1.
IEEE. 1999, pp. 39–48.
[199] Robert G Sargent and Osman Balci. “History of verification and validation of simulation models”.
In: 2017 winter simulation conference (WSC). IEEE. 2017, pp. 292–307.
184
[200] Santosh Kumar Satapathy, D Loganathan, Hari Kishan Kondaveeti, and Rama Krushna Rath. “An
Improved Decision Support System for Automated Sleep Stages Classification Based on Dual
Channels of EEG Signals”. In: Proceedings of International Conference on Computational
Intelligence and Computing. Springer. 2022, pp. 169–184.
[201] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.
“The graph neural network model”. In: IEEE transactions on neural networks 20.1 (2008), pp. 61–80.
[202] J Ben Schafer, Joseph A Konstan, and John Riedl. “Meta-recommendation systems:
user-controlled integration of diverse recommendations”. In: Proceedings of the Eleventh
International Conference on Information and Knowledge Management. 2002, pp. 43–51.
[203] Andrew I Schein and Lyle H Ungar. “Active learning for logistic regression: an evaluation”. In:
Machine Learning 68.3 (2007), pp. 235–265.
[204] Malte Schilling, Stefan Kopp, Sven Wachsmuth, Britta Wrede, Helge Ritter, Thomas Brox,
Bernhard Nebel, and Wolfram Burgard. “Towards a multidimensional perspective on shared
autonomy”. In: 2016 AAAI Fall Symposium Series. 2016.
[205] Eldon Schoop, Forrest Huang, and Björn Hartmann. “Scram: Simple checks for realtime analysis
of model training for non-expert ml programmers”. In: Extended Abstracts of the 2020 CHI
Conference on Human Factors in Computing Systems. 2020, pp. 1–10.
[206] Adriana Schulz, Cynthia Sung, Andrew Spielberg, Wei Zhao, Robin Cheng, Eitan Grinspun,
Daniela Rus, and Wojciech Matusik. “Interactive robogami: An end-to-end system for design of
robots with ground locomotion”. In: The International Journal of Robotics Research 36.10 (2017),
pp. 1131–1147.
[207] Sailik Sengupta, Tathagata Chakraborti, Sarath Sreedharan, Satya Gautam Vadlamudi, and
Subbarao Kambhampati. “Radar—a proactive decision support system for human-in-the-loop
planning”. In: 2017 AAAI Fall Symposium Series. 2017.
[208] Burr Settles. “Active learning literature survey”. In: (2009).
[209] Burr Settles. “From theories to queries: Active learning in practice”. In: Active Learning and
Experimental Design workshop In conjunction with AISTATS 2010. JMLR Workshop and Conference
Proceedings. 2011, pp. 1–18.
[210] Burr Settles and Mark Craven. “An analysis of active learning strategies for sequence labeling
tasks”. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing.
2008, pp. 1070–1079.
[211] Guy Shani and Asela Gunawardana. “Evaluating recommendation systems”. In: Recommender
Systems Handbook. Springer, 2011, pp. 257–297.
[212] Ramesh Sharda, Steve H Barr, and James C McDonnell. “Decision support system effectiveness: a
review and an empirical test”. In: Management science 34.2 (1988), pp. 139–159.
185
[213] Wanggang Shen and Xun Huan. “Bayesian Sequential Optimal Experimental Design for
Nonlinear Models Using Policy Gradient Reinforcement Learning”. In: arXiv preprint
arXiv:2110.15335 (2021).
[214] Jung P Shim, Merrill Warkentin, James F Courtney, Daniel J Power, Ramesh Sharda, and
Christer Carlsson. “Past, present, and future of decision support technology”. In: Decision Support
Systems 33.2 (2002), pp. 111–126.
[215] Iurii E Shishkin, Aleksandr N Grekov, and Vladimir V Nikishin. “Intelligent decision support
system for detection of anomalies and unmanned surface vehicle inertial navigation correction”.
In: 2019 International Russian Automation Conference (RusAutoCon). IEEE. 2019, pp. 1–6.
[216] Sumukh Shivakumar, Hazem Torfah, Ankush Desai, and Sanjit A Seshia. “SOTER on ROS: a
run-time assurance framework on the robot operating system”. In: International Conference on
Runtime Verification. Springer. 2020, pp. 184–194.
[217] Ajay KS Singholi and Divya Agarwal. “Review of expert system and its application in robotics”.
In: Intelligent Communication, Control and Devices: Proceedings of ICICCD 2017. Springer. 2018,
pp. 1253–1265.
[218] Ho Chit Siu, Kevin Leahy, and Makai Mann. “STL: Surprisingly Tricky Logic (for System
Validation)”. In: arXiv preprint arXiv:2305.17258 (2023).
[219] Vladimir V Sklyar and Vyacheslav S Kharchenko. “Assurance Case Driven Design based on the
Harmonized Framework of Safety and Security Requirements.” In: ICTERI. 2017, pp. 670–685.
[220] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical Bayesian Optimization of Machine
Learning Algorithms”. In: Advances in Neural Information Processing Systems 25 (2012).
[221] Paul G Stankiewicz and Galen E Mullins. “Improving evaluation methodology for autonomous
surface vessel colregs compliance”. In: OCEANS 2019-Marseille. IEEE. 2019, pp. 1–7.
[222] Mark Stefik, Jan Aikins, Robert Balzer, John Benoit, Lawrence Birnbaum, Frederick Hayes-Roth,
and Earl Sacerdoti. “The organization of expert systems, a tutorial”. In: Artificial intelligence 18.2
(1982), pp. 135–173.
[223] Florence R Sullivan and Mary A Moriarty. “Robotics and discovery learning: Pedagogical beliefs,
teacher practice, and technology integration”. In: Journal of Technology and Teacher Education
17.1 (2009), pp. 109–142.
[224] Reed T Sutton, David Pincock, Daniel C Baumgart, Daniel C Sadowski, Richard N Fedorak, and
Karen I Kroeker. “An overview of clinical decision support systems: benefits, risks, and strategies
for success”. In: NPJ digital medicine 3.1 (2020), pp. 1–10.
[225] Yuriy Sverchkov and Mark Craven. “A review of active learning approaches to experimental
design for uncovering biological networks”. In: PLoS Computational Biology 13.6 (2017), e1005466.
186
[226] Stan Swanborn and Ivano Malavolta. “Robot runner: a tool for automatically executing
experiments on robotics software”. In: 2021 IEEE/ACM 43rd International Conference on Software
Engineering: Companion Proceedings (ICSE-Companion). IEEE. 2021, pp. 33–36.
[227] J Sztipanovits, T Bapty, Z Lattmann, and S Neema. “Composition and Compositionality in CPS”.
In: Handbook of System Safety and Security. Elsevier, 2017, pp. 15–38.
[228] Janos Sztipanovits, Ted Bapty, Xenofon Koutsoukos, Zsolt Lattmann, Sandeep Neema, and
Ethan Jackson. “Model and tool integration platforms for cyber–physical system design”. In:
Proceedings of the IEEE 106.9 (2018), pp. 1501–1526.
[229] Janos Sztipanovits, Xenofon Koutsoukos, Gabor Karsai, Nicholas Kottenstette, Panos Antsaklis,
Vijay Gupta, Bill Goodwine, John Baras, and Shige Wang. “Toward a science of cyber–physical
system integration”. In: Proceedings of the IEEE 100.1 (2011), pp. 29–44.
[230] Teck Yan Tan, Li Zhang, and Ming Jiang. “An intelligent decision support system for skin cancer
detection from dermoscopic images”. In: 2016 12th International conference on natural
computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE. 2016, pp. 2194–2199.
[231] Ahmad Tariq and Khan Rafi. “Intelligent decision support systems-A framework”. In: Information
and Knowledge Management. Vol. 2. 6. Citeseer. 2012, pp. 12–20.
[232] Annalisa T Taylor, Thomas A Berrueta, and Todd D Murphey. “Active Learning in Robotics: A
Review of Control Principles”. In: Mechatronics 77 (2021), p. 102576.
[233] Dov Te’eni. “Feedback in DSS as a source of control: experiments with the timing of feedback”. In:
Decision Sciences 22.3 (1991), pp. 644–655.
[234] PivotPoint Technology. SysML Open Source Project. https://sysml.org/. Accessed: 2023-09-01.
2023.
[235] Eric Thorn, Shawn C Kimmel, Michelle Chaka, Booz Allen Hamilton, et al. A framework for
automated driving system testable cases and scenarios. Tech. rep. United States. Department of
Transportation. National Highway Traffic Safety, 2018.
[236] Chuck Thorpe and Hugh F Durrant-Whyte. “Field Robots”. In: ISRR. 2001, pp. 329–340.
[237] Nava Tintarev and Judith Masthoff. “Explaining recommendations: Design and evaluation”. In:
Recommender Systems Handbook. Springer, 2015, pp. 353–382.
[238] Hazem Torfah, Sebastian Junges, Daniel J Fremont, and Sanjit A Seshia. “Formal Analysis of
AI-Based Autonomy: From Modeling to Runtime Assurance”. In: International Conference on
Runtime Verification. Springer. 2021, pp. 311–330.
[239] Hazem Torfah, Shetal Shah, Supratik Chakraborty, S Akshay, and Sanjit A Seshia. “Synthesizing
pareto-optimal interpretations for black-box models”. In: 2021 Formal Methods in Computer Aided
Design (FMCAD). IEEE. 2021, pp. 153–162.
187
[240] Russell Toris, Julius Kammerl, David V Lu, Jihoon Lee, Odest Chadwicke Jenkins,
Sarah Osentoski, Mitchell Wills, and Sonia Chernova. “Robot web tools: Efficient messaging for
cloud robotics”. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS).
IEEE. 2015, pp. 4530–4537.
[241] Jeffrey N Twigg, Jason M Gregory, and Jonathan R Fink. “Towards online characterization of
autonomously navigating robots in unstructured environments”. In: 2016 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS). IEEE. 2016, pp. 1198–1205.
[242] Chinonso Udokporo, Anthony Anosike, and Ming Lim. “A decision-support framework for Lean,
Agile and Green practices in product life cycle stages”. In: Production Planning & Control 32.10
(2021), pp. 789–810.
[243] Gerrit H Van Bruggen, Ale Smidts, and Berend Wierenga. “Improving decision making by means
of a marketing decision support system”. In: Management Science 44.5 (1998), pp. 645–658.
[244] Gerrit H Van Bruggen, Ale Smidts, and Berend Wierenga. “The impact of the quality of a
marketing decision support system: An experimental study”. In: International Journal of Research
in Marketing 13.4 (1996), pp. 331–343.
[245] Oleg Varlamov. ““Brains” for Robots: Application of the Mivar Expert Systems for
Implementation of Autonomous Intelligent Robots”. In: Big Data Research 25 (2021), p. 100241.
[246] Sergey Vasilyev, Vladimir Slabunov, Oleg Voevodin, and Alexandra Slabunova. “Development of a
decision support system at the stages of pre-design studies and design of irrigation systems based
on IDEFo functional modelling methodology”. In: Irrigation and Drainage 69.4 (2020), pp. 546–558.
[247] Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. “ChatGPT for robotics: Design
principles and model abilities”. In: Microsoft Auton. Syst. Robot. Res 2 (2023), p. 20.
[248] Dolores R Wallace and Roger U Fujii. “Software verification and validation: an overview”. In: Ieee
Software 6.3 (1989), pp. 10–17.
[249] Rebecca Wiczorek, Dietrich Manzey, and Anna Zirk. “Benefits of Decision-Support by Likelihood
versus Binary Alarm Systems: Does the number of stages make a difference?” In: Proceedings of
the Hhuman Factors and Ergonomics Society Annual Meeting. Vol. 58. SAGE Publications Sage CA:
Los Angeles, CA. 2014, pp. 380–384.
[250] Peder Wikström, Lars Edenius, Björn Elfving, Ljusk Ola Eriksson, Tomas Lämås, Johan Sonesson,
Karin Öhman, Jörgen Wallerman, Carina Waller, and Fredrik Klintebäck. “The Heureka forestry
decision support system: an overview”. In: (2011).
[251] Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou.
“Aggressive driving with model predictive path integral control”. In: 2016 IEEE International
Conference on Robotics and Automation (ICRA). IEEE. 2016, pp. 1433–1440.
[252] Julia L Wright, Jessie YC Chen, and Michael J Barnes. “Human–automation interaction for
multiple robot control: the effect of varying automation assistance and individual differences on
operator performance”. In: Ergonomics 61.8 (2018), pp. 1033–1045.
188
[253] Lingfei Wu, Peng Cui, Jian Pei, Liang Zhao, and Xiaojie Guo. “Graph neural networks:
foundation, frontiers and applications”. In: Proceedings of the 28th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining. 2022, pp. 4840–4841.
[254] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. “A
comprehensive survey on graph neural networks”. In: IEEE Transactions on Neural Networks and
Learning Systems 32.1 (2020), pp. 4–24.
[255] Yuxin Xiao, Eric P Xing, and Willie Neiswanger. “Amortized Auto-Tuning: Cost-Efficient
Bayesian Transfer Optimization for Hyperparameter Recommendation”. In: arXiv preprint
arXiv:2106.09179 (2021).
[256] Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, and Lantao Liu.
“Kernel-based Diffusion Approximated Markov Decision Processes for Off-Road Autonomous
Navigation and Control”. In: arXiv preprint arXiv:2111.08748 (2021).
[257] Junhong Xu, Kai Yin, Jason M Gregory, and Lantao Liu. “Causal Inference for De-biasing Motion
Estimation from Robotic Observational Data”. In: 2023 IEEE International Conference on Robotics
and Automation (ICRA). IEEE. 2023, pp. 3008–3014.
[258] Kai Yu, Jinbo Bi, and Volker Tresp. “Active learning via transductive experimental design”. In:
Proceedings of the 23rd International Conference on Machine Learning. 2006, pp. 1081–1088.
[259] Zhaoyu Zhai, José Fernán Martínez, Victoria Beltran, and Néstor Lucas Martínez. “Decision
support systems for agriculture 4.0: Survey and challenges”. In: Computers and Electronics in
Agriculture 170 (2020), p. 105256.
[260] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla.
“Heterogeneous graph neural network”. In: Proceedings of the 25th ACM SIGKDD international
conference on knowledge discovery & data mining. 2019, pp. 793–803.
[261] Qishen Zhang, Tamas Kecskes, Janos Mathe, and Janos Sztipanovits. “Towards bridging the gap
between model-and data-driven tool suites for cyber-physical systems”. In: 2019 IEEE/ACM 5th
International Workshop on Software Engineering for Smart Cyber-Physical Systems (SEsCPS). IEEE.
2019, pp. 7–13.
[262] Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, and
Gaurav S Sukhatme. “LUMINOUS: Indoor Scene Generation for Embodied AI Challenges”. In:
arXiv preprint arXiv:2111.05527 (2021).
[263] Ning Zheng, Deng Huang, et al. “Optimal experimental design for systems involving both
quantitative and qualitative factors”. In: Proceedings of the 2003 Winter Simulation Conference,
2003. Vol. 1. IEEE. 2003, pp. 556–564.
[264] Hexiong Zhou, Zheng Zeng, and Lian Lian. “Adaptive re-planning of AUVs for environmental
sampling missions: A fuzzy decision support system based on multi-objective particle swarm
optimization”. In: International Journal of Fuzzy Systems 20.2 (2018), pp. 650–671.
189
[265] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang,
Changcheng Li, and Maosong Sun. “Graph neural networks: A review of methods and
applications”. In: AI open 1 (2020), pp. 57–81.
[266] Zhi-Jie Zhou, Guan-Yu Hu, Chang-Hua Hu, Cheng-Lin Wen, and Lei-Lei Chang. “A survey of
belief rule-base expert system”. In: IEEE Transactions on Systems, Man, and Cybernetics: Systems
51.8 (2019), pp. 4944–4958.
190
Abstract (if available)
Abstract
The rapid advancement of artificial intelligence, machine learning, human robot interaction, and safe learning and optimization has led to significant leaps in component- and behavior-level capabilities for autonomous robots. However, human capability-enhancing research, development, testing, and evaluation toward understanding and building trust in next-generation autonomous robots still requires additional attention. This is important because autonomous robots cannot be deployed safely alongside humans without sufficient understanding of system performance, limitations, and trustworthiness of capabilities, which necessitates experimentation. The process of constructing experiments, i.e., experimental design, is a supreme step in the concept-to-fielding life cycle of an autonomous robot because it dictates the amount of information gained by the experimenter, the cost of information acquisition, and the rate of building system understanding. Conducting experiments is challenging, though, due to complexity and context. Autonomous robots can be massively complex, multi-disciplinary systems that use artificial intelligence and machine learning across a range of components (e.g., perception, state estimation, localization, mapping, path planning, and control) and experiments are specific to a given scenario, system, experimenter, and set of multi-objective metrics defined by the intended application. To assist with the adaptive, sequential decision-making process of experimental design, a Decision Support System (DSS) can potentially augment the human's abilities to construct more informative, less wasteful experiments. This dissertation aims to provide conceptual and computational foundations for DSSs in the domain of adaptive experimental design for autonomous, off-road ground vehicles. First, I present a six-stage taxonomy of DSSs for experimental design of ground robots, which is informed and inspired by the vast body of literature of DSS development in domains outside of robotics. This taxonomy also serves as a roadmap to guide ongoing development of DSSs for experimental design. Next, I develop and evaluate a Stage 1 DSS that provides design assistance to experimenters in the form of prompts for the purposes of experimental design conceptualization and structured thought analysis. Building on this I propose and evaluate a Stage 2 DSS to provide proactive decision support in the form of alerts so that low value experimental designs might be avoided. Finally, I lay the groundwork for a Stage 3 DSS to provide narrowly-scoped experimental design recommendations for assisting with subsequent experimental selections. I anticipate that this work will help improve human decision-making of experimental design for real-world autonomous ground vehicles.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
AI-driven experimental design for learning of process parameter models for robotic processing applications
PDF
Quality diversity scenario generation for human robot interaction
PDF
Algorithms and systems for continual robot learning
PDF
Multi-robot strategies for adaptive sampling with autonomous underwater vehicles
PDF
Leveraging prior experience for scalable transfer in robot learning
PDF
Towards socially assistive robot support methods for physical activity behavior change
PDF
Leveraging cross-task transfer in sequential decision problems
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Closing the reality gap via simulation-based inference and control
PDF
Data-driven acquisition of closed-loop robotic skills
PDF
Program-guided framework for your interpreting and acquiring complex skills with learning robots
PDF
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
Decision making in complex action spaces
PDF
Planning and learning for long-horizon collaborative manipulation tasks
PDF
High-throughput methods for simulation and deep reinforcement learning
PDF
Adaptive sampling with a robotic sensor network
PDF
Nonverbal communication for non-humanoid robots
PDF
Scaling robot learning with skills
Asset Metadata
Creator
Gregory, Jason Michael
(author)
Core Title
Decision support systems for adaptive experimental design of autonomous, off-road ground vehicles
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-05
Publication Date
02/26/2024
Defense Date
11/21/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
adaptive experimental design,Decision Support Systems,mobile robotics,off-road ground vehicles,sequential decision making
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Satyandra (
committee chair
), Culbertson, Heather (
committee member
), Nguyen, Quan (
committee member
), Nikolaidis, Stefanos (
committee member
), Sukhatme, Gaurav (
committee member
)
Creator Email
gregoryjasonm@gmail.com,jasongre@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113842369
Unique identifier
UC113842369
Identifier
etd-GregoryJas-12673.pdf (filename)
Legacy Identifier
etd-GregoryJas-12673
Document Type
Dissertation
Format
theses (aat)
Rights
Gregory, Jason Michael
Internet Media Type
application/pdf
Type
texts
Source
20240304-usctheses-batch-1127
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
adaptive experimental design
Decision Support Systems
mobile robotics
off-road ground vehicles
sequential decision making