Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Active sensing in robotic deployments
(USC Thesis Other)
Active sensing in robotic deployments
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ACTIVE SENSING IN ROBOTIC DEPLOYMENTS by Christopher Evan Denniston A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2023 Copyright 2023 Christopher Evan Denniston Dedication To my family: Tracey, Owen, Alexander, and Jacob, for the continuing support and love they have provided me through the years. To my aunts, uncles and grandparents: Jeffrey, Megan, and Ethan, Mary Lee, Charles, Constance, and Keith, for their support of me as a child and young adult, as well as their broad perspectives on life. To the countless number of people who have helped, encouraged, and mentored me along the way. ii Acknowledgements Thank you to everyone who co-authored a paper with me: Isabel Rayas, Gautam Salhotra, Baskın Şen- başlar, Eric Heiden, David Millard, Yun Chang, Andrzej Reinke, Oriana Peltzer, Joshua Ott, Sangwoo Moon, Kamak Ebadi, Sung Kim, Aravind Kumaraguru, Thomas Krogstad, Stephanie Kemna, Ali Agha- mohammadi, Sung Kim, Luca Carlone, Fabio Ramos, Benjamin Morrell, and David A. Caron. Thank you to my advisor Gaurav S. Sukhatme, for his continued support, advice, editing help, and for taking a chance on me. He is truly a shining star among Ph.D. advisors. Thank you to everyone who has helped me in immeasurable ways: Abigail Perez, Frankie “Flumpo", Buddy, Vlad, Charlie, Joey, Spicy, and all my friends who have aided me, encouraged me, and helped me de-stress to continue this program. Thank you to Paul Browning for design advice for many of the figures present in this work and others, as well as creating fig. 1.2. Than you to everyone in the Robotic Embedded Systems Laboratory and elsewhere who has helped me along the way. iii TableofContents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Common Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2.1 Factor Graph based Informative Path planning . . . . . . . . . . . . . . . 12 2.2.2.2 Pose Graph Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Online Informative Path Planning (IPP) . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Objective Functions for IPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2.1 Submodular Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2.2 Bayesian Optimization Objectives . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Informative Path Planning as a POMDP . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 Partially Observable Monte Carlo Planning (POMCP) . . . . . . . . . . . . . . . . . 19 2.4 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Continuous Non-Convex Gradient-Free Optimization . . . . . . . . . . . . . . . . . . . . . 21 2.5.1 Cross Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.3 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3: Automatic and Efficient Deployment of Active Sensing Robots . . . . . . . . . . . . . . 23 3.1 Active Sensing using POMDPs with Domain-Specific Considerations . . . . . . . . . . . . 24 3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1.1 Multi-Armed Bandits (MAB) . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Formulation and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.2.1 Rollout Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iv 3.1.2.2 Exploration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.2.3 Plan Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.3.1 Grid Search for Rollout Allocation . . . . . . . . . . . . . . . . . . . . . . 33 3.1.3.2 Comparison of Exploration Algorithms . . . . . . . . . . . . . . . . . . . 34 3.1.3.3 Comparison of Plan Commitment Algorithms . . . . . . . . . . . . . . . 34 3.1.3.4 Comparison of Baseline with Combined Improvements . . . . . . . . . . 35 3.1.3.5 Comparison of time saving to baseline . . . . . . . . . . . . . . . . . . . 36 3.1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Learned Parameter Selection for Robotic Information Gathering . . . . . . . . . . . . . . . 38 3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1.2 Combining Informative Path Planning and Reinforcement Learning . . . 41 3.2.2 Formulation and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2.1 Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2.2 POMDP Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2.3 ParameterSelection Agent . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.2.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.3.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.3.3 Field Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 4: Sensing with Robot State Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Incorporating Localization Noise Into Active Sensing . . . . . . . . . . . . . . . . . . . . . 56 4.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1.3.1 Gaussian Processes With Input Uncertainty . . . . . . . . . . . . . . . . 58 4.1.3.2 Path Planning for Adaptive Sensing with Input Uncertainty . . . . . . . 59 4.1.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.4.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.4.2 Comparison of Monte-Carlo Methods . . . . . . . . . . . . . . . . . . . . 60 4.1.4.3 Noisy Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.4.4 Experiments with Data Collected from Real Environment . . . . . . . . . 61 4.1.4.5 Pose Graph-Based Localization . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.4.6 Particle Filter-Based Localization . . . . . . . . . . . . . . . . . . . . . . 63 4.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Loop Closure Prioritization for Efficient and Scalable Multi-Robot SLAM . . . . . . . . . . 66 4.2.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1.1 Point Cloud Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1.2 Loop Closure Prioritization Systems . . . . . . . . . . . . . . . . . . . . . 68 4.2.1.3 Active Loop Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1.4 Graph Neural Networks and SLAM . . . . . . . . . . . . . . . . . . . . . 69 4.2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.2.1 SLAM System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.2.2 Loop Closure Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 v 4.2.2.3 Loop Closure Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.2.4 Loop Closure Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.3 Graph Information Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.3.1 Computing the Objective Function . . . . . . . . . . . . . . . . . . . . . 73 4.2.3.2 Minimizing the Objective Function . . . . . . . . . . . . . . . . . . . . . 74 4.2.3.3 Observability Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.4.2 Ablation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.4.3 System Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Chapter 5: Sensing for Environmental Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1 Active Sensing to Estimate Quantiles for Environmental Analysis . . . . . . . . . . . . . . 86 5.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.1.1.1 Physical Specimen Collection . . . . . . . . . . . . . . . . . . . . . . . . 88 5.1.1.2 Estimation of Quantiles and Quantile Standard Error . . . . . . . . . . . 89 5.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.1.3.1 Informative Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.1.3.2 Location Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.1.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.1.4.1 Informative Path Planning: Objective Functions . . . . . . . . . . . . . . 94 5.1.4.2 Location Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.1.4.3 Field trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 6: Modeling Challenging Signal and Environment Interactions . . . . . . . . . . . . . . . . 104 6.1 Fast and Scalable Signal Inference for Active Robotic Source Seeking . . . . . . . . . . . . 106 6.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.1.3.1 World Traversability Belief . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.1.3.2 Signal Belief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.1.3.3 Hierarchical Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.1.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.1.4.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.1.4.2 Hardware Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1.4.3 Extended Hardware Experiment . . . . . . . . . . . . . . . . . . . . . . . 119 6.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.2 Active Signal Emitter Placement In Complex Environments . . . . . . . . . . . . . . . . . . 121 6.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.4 Background / Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5 Notation and Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.6 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.6.1 Analytical Lighting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.6.2 Probabilistic Lighting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.6.3 Informative Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 vi 6.6.4 Light Source Reconfiguration Trigger . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.6.5 Light Source Configuration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.7.1 Component Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.7.1.1 Probabilistic Lighting Model . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.7.1.2 Light Source Reconfiguration Trigger . . . . . . . . . . . . . . . . . . . . 134 6.7.2 System Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.7.3 Field Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Chapter 7: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 vii ListofTables 2.1 InformativePathPlanningasaPOMDP. After [135, 38]. . . . . . . . . . . . . . . . . . 19 3.1 TimeComparison Wall-clock time (seconds) required to complete five episodes. . . . . . 36 3.2 Parameters for the POMDP solver which are chosen by the parameter selection agent. These parameters critically impact the effectiveness of the POMDP solver and are difficult to set in the field for non-technical users. . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 DescriptionofComparedMethods. Naive POMCP is an POMDP solver using manually chosen conservative parameters and the t-test heuristic [116]. Empirically-tuned POMCP improves upon Naive POMCP with domain-specific features and tuned parameters [116]. Metadata RL-POMCP has access to the current task metadata. Fixed-Length RL-POMCP has access to the metadata as well as the last ten samples (locations and values). Metadata RL-POMCP and Fixed-Length RL-POMCP demonstrate the need for the parameter selection module to use the complete set of samples. LSTM methods have access to the previous sample and uses an LSTM with a hidden state to access information from previous environment steps. GNN methods use the all of the current samples in a graph neural network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Comparison on Datasets ATE and Verified or inlier loop closures for different prioritization methods as % of queued loop closures. . . . . . . . . . . . . . . . . . . . . . . 81 6.1 Taskcomplexitylevelsforsystemexperiments. The number of obstacles, number of unknown light sources, number of target light sources used to generate the desired light intensities, and number of configured light sources that the robot can control increase from task level A to C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 viii ListofFigures 1.1 Survey in Clear Lake, California The red area is the area that can be covered by a drone flying for a thirty minute battery life, doing a reasonable overlap back and forth coverage pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Activesensingrobotteamforharmfulalgaebloommonitoring. The team consists of a base station operator, a team of active sensing robots and fixed sensors, and humans performing measurement and sample collection. . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 The active sensing robot The environment is the physical location of the robot as well as the sensed signal. Measurements from the environment inform the model of the environment. The planner uses the model of the environment to infer signal strengths at previously unseen locations and instructs the robot on actions, typically movement and sensing, to execute in the environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Informative Path Planning Comparison Figure 1.4a shows the ground truth test function, fig. 1.4b shows a fixed lawnmower path halfway complete with 600 measurements. Figure 1.4c shows a coverage based informative path planning plan taking as many measurements as fig. 1.4b. Figure 1.4d shows an informative path planning path with which seeks to find maximal areas with the same amount of measurements as fig. 1.4b. 6 1.5 SensedSignals Various signals which will be discussed. We will discuss similarities and differences in signals, but rough genres can by biological processes (fig. 1.5a, fig. 1.5b and fig. 1.5c), electromagnetic waves (fig. 1.5d, fig. 1.5e), and robot state (fig. 1.5f). . . . . . . . . 8 2.1 ExampleofGaussianprocess. A robot has taken 5 measurements at the white circles, taking the path described by the white arrows. The Gaussian process tends to smooth data from incomplete measurements and allows for estimates at unseen locations. . . . . 10 2.2 FactorGraphEnvironmentModeling Figure 2.2c and fig. 2.2d show the factor graph after two measurements are added. Betweeni 0 ,i 1 andi 2 ,i 3 there is an obstacle which the signal can permeate through, represented by an increase in the uncertainty in the link factor. Figure 2.2e and fig. 2.2f show the factor graph after four measurements, showing that the uncertainty has decreased overall and the predicted posterior has become more accurate. Additionally, we can see that the factor graph accurately models the posterior uncertainty due to the per-measurement uncertainty in the measurement. . . . . . . . . . 12 ix 2.3 RolloutBasedPOMDPSolverforactivesensing The POMCP based solvers maintain both action and observability. Reward estimates are calculated from action nodes by performing rollouts, i.e. randomly taking actions until a horizon lengthh. . . . . . . . . . 20 3.1 Overview of Approach Fig. a shows an agent’s trajectories on a hyperspectral orthomosaic collected in Clearlake, California. Blue is low value, red is high value. The baseline trajectory overlaps with itself (wasted time), goes outside the bounds of the orthomosaic (no reward outside workspace), and mostly samples near the starting position. The trajectory from our proposed method avoids such behavior and samples regions further away. Fig. c shows POMDP planning with the POMCP planner. Portions in grey are areas we study and improve in this work. . . . . . . . . . . . . . . . . . . . . . 25 3.2 Resultsfromagridsearchoverpossiblerolloutcurves. fig. 3.2a presents the curves that were searched over. fig. 3.2b shows the three most optimal curves and a linear allocation curve, colored by their mean accumulated reward at the end of the episode. . . . 32 3.3 Comparison of Exploration Algorithms (a, b, and c) and Plan Commitment Algorithms(dande). UCT, Fixed is the baseline which evenly splits the rollouts at each step and uses the UCT exploration algorithm (the default for MCTS). Other results use a curved rollout allocation. For plan commitment, fig. 3.3d shows the reward accumulation and fig. 3.3e shows the number of rollouts used in all POMCP calls for the whole episode. A small offset value is added to methods which overlap. . . . . . . . . . . . . . . . . . . . . 33 3.4 Comparisonofthecombinedproposedimprovements(ProposedMethod)against the baseline for all environments. fig. 3.4a and fig. 3.4b are the reward and number of rollouts used by the agent in the dynamic function environment. fig. 3.4c and fig. 3.4d are the reward and number of rollouts used by the agent in Validation Environment 1, fig. 3.4e and fig. 3.4f are the reward and number of rollouts used by the agent in Validation Environment 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 Dataests and Sample Trajectories. Figure 3.5a shows a dataset collected with an underwater robot, and fig. 3.5b and fig. 3.5c show example trajectories from a baseline implementation and our proposed implementation respectively. Figure 3.5d shows another, more complex, dataset collected in the same location, fig. 3.5e and fig. 3.5f show example trajectories from a baseline implementation and our proposed implementation respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Left: Overall workflow. Environment is the robot’s sensing space (e.g. a lake) and Environment model is its current understanding of the environment (modeled here by a Gaussian process). We introduce a learned Parameter Selection agent that selects parameters (e.g. number of rollouts) ofActionSelection, which determines the actions the robot should take (a POMDP solver in our work). The process repeats until a fixed number of informative path planning iterations are completed. Right: Field Trial. We demonstrate that our approach transfers to a previously unseen real robot setup using a light sensor mounted on a mobile robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 x 3.7 Resultson Ocean-CTD fromthedatasetofpubliclyavailableglidertrajectories which the models were trained on. We compare over 30 different trajectories and show the cumulative reward after 50 timesteps. We compare the three objective functions described in section 2.3.2. We find that our proposed methods outperform previous methods in all three objective functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.8 Resultson Lake-Chlorophyllfromadatasetofsmallerlaketrajectoriesusinga chlorophyllfluorescencesensorthatthemodelswerenottrainedon. We compare each objective function on 5 datasets with 3 seeds each. We find that GNN RL-POMCP outperform previous methods with the entropy objective function. All RL-POMCP agents perform well on expected improvement and probability of improvement . . . . . . . . . . 50 3.9 ResultsforBay-Camera,asimulateddroneflightenvironment. In this environment, a drone is simulated flying over previously collected imagery. This imagery was collected using a hyper-spectral camera over a lake with algae mats on it. We evaluate on 2 images with 3 seeds each. Features which are not water were masked out. We find that our proposed methods perform comparably to prior methods on entropy. On the Bayesian optimization objective functions our methods perform well compared to the other approaches, demonstrating the ability of our approach to generalize to previously unseen sensor types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.10 Trajectories for each parameter across all evaluation environments for the expected improvement objective function (eq. (2.6)). LSTM RL-POMCP and GNN RL-POMCP learn distinct policies: GNN RL-POMCP many rollouts with a shorter horizon until the end when it tends to increase the horizon and reduce the number of rollouts, while LSTM RL-POMCP varies around a mean value the entire time. . . . . . . . . . . . . 52 3.11 FieldResults. Top: Comparison of trajectories for Empirically Tuned POMCP and GNN RL-POMCP deployed on a real robot taking light measurements. The red arrows show the trajectory taken by the robot. Two lights were deployed with existing light in the area. Middle: Comparison of the sum of the measurements taken over the number of actions taken by the robot and a picture of the experimental setup. Bottom: Experimental setup used with two placed lights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Error when predicting from noisy data using algorithm 1. Environment 1 is sampling eq. (4.3), Environment 2 is eq. (4.4). All MC methods were run for 1000 iterations. Six seeds each. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Error when predicting from noisy data using algorithm 1. Environment 1 is sampling eq. (4.3), Environment 2 is eq. (4.4). The robot is simulated for a fixed budget and compared against a noiseless evenly spaced ground truth. Each method runs with three seeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Environments created from data gathered in Clear lake, California from data collected by a UAV with a hyperspectral camera (fig. 4.3d). Figure 4.3e shows an algal mass in true color while fig. 4.3f shows the hyperspectral reflectivity at the target band. 63 xi 4.4 Errorbetweennoisysamplesandgroundtruthonfourenvironmentsrepresented in section 4.1.4.4. Noise is generated from a pose-graph based localization using the UAV’s GPS/IMU. 10 seeds used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.5 Error between noisy samples and ground truth on four environments collected from the environments represented in section 4.1.4.4. A particle filter with noisy measurement to 10 landmarks is used. 3 seeds run. . . . . . . . . . . . . . . . . . . . . . . 65 4.6 Demonstration of the benefit of our loop closure prioritization system. We showing the accuracy of SLAM maps generated with (b) and without (a) loop closure prioritization. Data is from four robots driving over six km in the Kentucky Underground Mine. We are only able to compute 0.5% of the candidate loop closures. Picture taken within mine to demonstrate composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Diagram of SLAM multi-robot front-end and loop closure processing pipeline. Modules in blue are proposed in this work. Robot maps are merged on a base station and passed to the generation module, which performs proximity based analysis to generate candidate loop closures. These candidate loop closures are sorted by the prioritization modules, which feed into a queue. The loop closure solver computes the transformations between pairs of lidar scans which is given to the SLAM back-end which adds a loop closure factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.8 Example output of the Graph Information Prioritization module on the Urban dataset. Green lines are selected loop closures, red are not selected. Orange, blue and purple edges are individual robot odometry. . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.9 Visualillustrationofpointcloudmapwithintensitycoloredbythenormalized observability score. Note how the lower observability areas are usually those of long straight corridors with less features. With observability prioritization, we prioritize loop closures at corners and junctions: which are less susceptible to perceptual aliasing. . . . . 78 4.10 Selected loop closure candidate to inlier ratio with the different prioritization methods. With no prioritization ("Bsln"), little to no inliers are detected from the candidates in the latter part of the run for the Urban, Finals, and Kentucky Underground datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.11 DirectcomparisonofwithandwithoutprioritizationontheTunneldataset. The red line marks the end of the mission. The full system is able to find a similar number of inlier loop closures and a lower error at the end of the mission than the baseline can after computing all loop closures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.12 Trajectoryerrorcomparisonwithforconsumerlaptoponthefourunderground datasets.We find the full system is able to consistenly outperform the baseline, odometry, and random solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.13 Trajectory error comparison on the powerful server on the four underground datasets.We find that the powerful server and less powerful computer (fig. 4.12) have similar performance when using our system. . . . . . . . . . . . . . . . . . . . . . . . . . . 84 xii 5.1 Full system. First, the parameters for the robotic survey are chosen, such as the area bounds and the quantiles for specimen collection. The robot performs informative path planningusing our proposed objective functions, creating an environment model and an estimate of the quantile values.The quantile locations are then selected by minimizing our proposed loss function. These quantile locations correspond to physical locations which, when measured, have the estimated quantile values.After the quantile locations are chosen, humans go to them to collect field specimens which they later analyze in a laboratory setting. This work focuses on the steps shown in the blue shaded boxes. . . . . 86 5.2 Simulated drone planning experiments with real data. Error between ground truth quantile values and estimated quantile values. Datasets A and B collected using a hyperspectral camera in Clearlake, California. Units are400nm channel pixel intensity (0− 255).Each dataset and objective function pairing is run three times. . . . . . . . . . . 93 5.3 Experimental dataset distributions. Drone data is measured in pixel intensity; AUV data inµg/L chlorophyll. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4 Simulated AUV planning experiments with real data. Error between ground truth quantile values and estimated quantile values. Datasets C and D collected from a reservoir in California using an underwater robot with a chlorophyll fluorescence sensor. Units are µg/L chlorophyll.Each dataset and objective function pairing is run three times. . . . . . . 95 5.5 ComparisonbetweenquantilestandarderrorwithPOMDPsolverandSpiral-STC coverage planner [52]. Comparison on Drone Environment B estimating deciles, each over 3 trials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.6 Quantile location selection results. RMSE between the ground truth values at the selected locations and the corresponding true quantile values. Units areµg/L chlorophyll for AUV,400nm channel pixel intensity (0− 255) for Drone. . . . . . . . . . . . . . . . . . 97 5.7 Physical locations (stars) selected by the cross-entropy optimizer for deciles on thedrone experiment.[Left] Black lines: true quantile value contours, overlaid on the absolute error betweenµ (X # ) andGT(X # ). Note that the lowest error tends to follow the quantile contour lines.[Right] Red crosses: locations the robot visited, overlaid on the ground truth image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.8 Physical locations (stars) selected by the cross-entropy optimizer for the upper extrema quantiles with an AUV using a chlorophyll point sensor. Blue/green points are the measured locationsX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.9 Visualizationofafieldtrialmodelingacrophealthtask. Left: Crosses are locations where the drone took images. The drone is limited to only visit 20% of the possible locations to take images. The quantiles of interest are the deciles and the locations are chosen by cross-entropy. The 9 stars show locations suggested to collect physical specimen. The measurement of interest is the amount of green in each pixel.Right: The drone used in the field trial, in flight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 xiii 6.1 Overviewofapproach. Both the information roadmap (which models the traversabilty of the environment) and the signal inference system maintain global and local representations of the environment. The signal inference system topology is built from the information roadmap. The environment is a paved outdoor walkway which can be seen in the image as well as the traversability map. The local signal representation maintains both the mean (shown as colored spheres) as well as the variance (flat squares), both are shown from blue (low) to red (high). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 The signal belief is represented by two factor graphs, described in section 6.1.3.2. (a) The global factor graph represents the belief over the signal strength at locations the robot has received measurements, represented as the valuesg 1 ,..,g n . (b) The local factor graph is centered at the robot using the valuesl 1 ,...,l n to infer the signal locally around the robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 MapsoftheSignaltoNoiseRatiocollectedinarealparkinggarage. SNR is colored from blue (low) to yellow (high). Areas in white are not traversable by the robot. . . . . . 116 6.4 Simulation experiment results over two real-world environments. Left: the maximal signal reading up until timet. Right: the sum of the measurements taken until timet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.5 Comparisonoftimerequiredtoperforminferenceforthethreemodelsoverthe numberofmeasurementsinthemodel. Left: the data binned data with the standard deviation, Right: the raw data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.6 Experiment of our system on a real Spot robot. The robot explores a man made structure to find the radio. The collected measurements, colored from blue (low) to yellow (high), as well as the robot’s trajectory are shown. Black dots show the non-traversable areas such as walls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.7 ExperimentofoursystemonarealSpotrobotinextendedfieldconditions. Top: The robot explores a man made structure to find the radio. The collected measurements, colored from blue (low) to yellow (high), as well as the robot’s trajectory are shown. Black dots show the non-traversable areas such as walls and cars. Bottom: Spot robot platform used in this experiment and the experiment in section 6.1.4.2. . . . . . . . . . . . . . . . . 121 6.8 Overview: a) The desired lighting intensity, chosen by randomly placing lights, b) the lighting intensity in the environment which is unknown to the robot a priori, as well as the obstacles, c) the final light source configuration, lighting intensity and robot path produced by the proposed system, d) the final light source configuration, lighting intensity and robot path produced by the baseline system, e) example field trial. Lights circled in red are deployed by the robot, while lights circled in green are unknown to the robot. . . 122 6.9 OverviewofourSystem: A description of each module and their connections to other modules are described in section 6.6. The principle modules we propose in this work are the light source reconfiguration trigger, the light source configuration algorithm and the probabilistic lighting model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 xiv 6.10 ProbabilisticLightingModel. This model, described in section 6.6.2, allows the robot to combine the ray marching based analytical lighting model for the configured light sources with the real-world measurements of the light intensities using additive measurement factors,f m . In this example, the robot has taken a measurement at locations (0,0) and (3,1). The robot has previously taken a measurement at (1,1) before the light sources were re-configured, which is reflected to the previous configuration factor, f p . There is an obstacle at(2,1), therefore, there are no variables there and distance based link factors, f l , are removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.11 Light Belief Model Results We show the reduction in error when using the model proposed in section 6.6.2 when compared to a Gaussian process baseline model. . . . . . . 134 6.12 LightSourceReconfigurationTriggers. Comparison of RMSE when different triggers with different parameters are used. We see that there is a balance between reconfiguring the light sources more and less often in both Logprob and Every n triggers. Logprob α =1.07 results in lowest RMSE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.13 CumulativeNumberofLightSourcesReconfigurations. The cumulative number of times the robot has re-configured the lights over steps in the environment. The Logprob triggers tend to use more triggers early in the task when the uncertainty about unknown light sources is high. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.14 Module and System Improvement: Top: Improvement in our system due to various per-module improvements vs baseline modules, Bottom: Improvement in our system across various environment difficulties, see section 6.7.2 for a description of the task complexities. We report percent error due to differences in error magnitude between complexities and seeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.15 Field Test Results We compare our approach in a scenario with a real robot and light sources. We find that our approach is better able to find light source configurations which match the desired lighting configuration in the field. . . . . . . . . . . . . . . . . . . . . . . 139 xv Abstract Robots have the potential to greatly increase our ability to measure complex environmental phenomena, such as harmful algae blooms, which can harm humans and animals alike in drinking water. Such phe- nomena require study and measurement at a scale that is beyond what can be accomplished by robots that plan to completely measure the area. Despite this, many sensing robots still are deployed with non-active behaviors, such as fixed back-and-forth patterns. The lack of deployment of active sensing systems in practice is due to difficulties with problems encountered in real world robotic deployments. We identify and address solutions for four main issues which plague complex real robotic active sensing deployments. First, active sensing systems are difficult to use, with complex deployment-time decisions that affect the efficiency of sensing. We describe systems that eschew these decisions, allowing for efficient and automatic deployment. We find that these systems provide a non-technical deployment procedure and outperform hand-tuned behaviors. Second, these systems assume certainty in the state of the robot. While this assumption is valid in laboratory settings where these algorithms are developed, these systems must be deployed with uncertain sensing of the position of the sensor. We discuss a way to propagate this state uncertainty through the active sensing pipeline, and a system to use these active sensing techniques to reduce state uncertainty for a team of robots. Third, active sensing robots tend to perform a survey that maximizes some general goal and requires the user to interpret the collected data. We propose a system that, instead, plans for the specific user task of xvi collecting physical samples at limited, unknown locations. We demonstrate that planning for this specific task while sensing allows for more efficiency in the active sensing behavior. Finally, existing models for active sensing do not accurately model the interaction of the sensed sig- nal and obstacles in the environment. We propose two novel modeling techniques which allow active sensing of signals which have complex interactions with obstacles, such as electromagnetic waves. Both outperform traditional modeling techniques and enable scalable active sensing to a large number of mea- surements on a real robot. Additionally, we find that they allow the robot to actively place signal-emitting devices while sensing the signals from these placed devices. xvii Chapter1 Introduction One of the most important capabilities of modern humans is the ability to measure natural phenomena and to use these measurements to predict and influence these phenomena. A motivating example we will look at throughout this work is the biological phenomenon of harmful algae blooms, which threaten our drinking water and natural ecosystem. Harmful algae blooms are important to study and understand due to their complex interactions with human and environmental life. These blooms present myriad difficulties such as contaminating drinking water, hurting fisheries and destroying recreational tourism. The health hazards to humans are not well understood but drastic increases in the rates of hospitalization have been shown when blooms are present. In one study, hospital emergency department diagnoses increased by 19% for pneumonia, 40% for gas- trointestinal illnesses, and54% for respiratory illnesses compared to a period when harmful algae blooms were not present in the area [22]. The economic impacts of harmful algae blooms are enormous. By one estimate the the fishing industry loses as much as $34 million per year [3] and the the costs of hospital visits during a single harmful algae bloom for respiratory illness alone in a Florida county ranged between $0.5 to$4 million dollars [66]. These harmful algae blooms and other other environmental phenomena, require use of sensors to study their characteristics, and to determine their causes and remedies. While sensors can be deployed 1 Figure 1.1: Survey in Clear Lake, California The red area is the area that can be covered by a drone flying for a thirty minute battery life, doing a reasonable overlap back and forth coverage pattern. statically, these would only provide measurements at one spatial location. This is a common approach in real scientific studies, but robots can provide a better approach. Robots can move sensors through the environment to provide measurements at multiple spatial locations. Unfortunately, robots are still limited in their coverage ability due to their reliance on limited energy sources, such as batteries. The simplest deployment of robots, and the most common, is to have them follow simple fixed paths. These fixed paths generally cover the environment in a lawnmower pattern, wherein the robot goes back and forth across a workspace at some fixed spread of the long axis lines ∗ . This pattern works well for small scale studies and can provide the end-user with useful data, but fails when the phenomenon of interest is spread out across a large area. As an example, in fig. 1.1, we see that a fixed lawnmower pattern of a drone can only cover a small area of a lake of interest. Fortunately, a robot can exploit spatial correlations in the measured data to make more intelligent deci- sions of where to measure. A robot making intelligent decisions of where to measure, based on correlations in the measurements, is said to perform active sensing. ∗ See fig. 1.4b for an example 2 Figure 1.2: Active sensing robot team for harmful algae bloom monitoring. The team consists of a base station operator, a team of active sensing robots and fixed sensors, and humans performing measurement and sample collection. A proposed active sensing team for harmful algae blooms can be seen in fig. 1.2. A base station coordi- nator guides multiple robots and human teams. This is enabled through easy deployment, high-level goal specification, and efficient robot behaviors. This is in contrast to what we currently see in the field where multiple humans must coordinate each robot individually, and provide low level non-active, waypoint- based commands to the robot, based on expert knowledge of the phenomenon being studied. This tra- ditional type of fixed robotic behavior does not enable large-scale study of these systems, and requires intense coordination between human teams to ensure mission success. Active sensing falls under the umbrella of active perception. Active perception is generally a broad class of problems in which the robot takes actions to increase its understanding of the world. The primary technique of active sensing we will look at is informative path planning † . In informative path planning, a robot tries to plan a path which maximizes the information it gathers about some sort of previously unknown signal. The key distinction of informative path planning which separates it from other forms of active perception, such as next best view planning, is the focus on how the entire trajectory the robot navigates contributes to the robot’s understanding of the sensed signal in the environment. † Informative path planning is often also called adaptive sampling. 3 Active Sensing Robot Environment Model of Environment Planner Measurements Actions Belief About Unknown Measurements Figure 1.3: Theactivesensingrobot The environment is the physical location of the robot as well as the sensed signal. Measurements from the environment inform the model of the environment. The planner uses the model of the environment to infer signal strengths at previously unseen locations and instructs the robot on actions, typically movement and sensing, to execute in the environment. This active understanding of the world separates active sensing schemes from traditional robotic plan- ning. Typically, a traditional robot would have a high level goal location assigned by some user specified criteria. The robot progresses towards this goal while avoiding obstacles and expending minimal energy. The robot may contain a probabilistic model of the environment in the form of obstacle probability and may try to minimize risk in its plan but does not actively increase its understanding of these obstacle probabilities. Many times, this is good enough for vehicles moving through an environment. A human analogue ‡ may be that when humans drive a car they do not actively look around much when on an open road, but only begin actively perceiving the world in more detailed driving scenarios. In contrast, sensing these signals efficiently requires careful choice of measurement location, in order to be able to accurately describe the characteristics of the signal. The technical process for active sensing can be seen in fig. 1.3. The key components for an active sensing robot are a model of the environment and a planner. The model of the environment, also called the environment belief model, allows the robot to use measurements to infer the strength of the signal in ‡ We will tend to avoid analogies to human behavior and rely on empirical robot behavior after this. 4 the environment at previously unseen locations. These environment belief models assume there is some unknown underlying probabilistic function which represents the true state of the environment, operating similarly to a surrogate model in Bayesian optimization [135]. These environment belief models often use ideas from geospatial statistics, where it is often calledkriging, to inform how they use spatial correlations to predict signals [26]. Critically, these environment belief models also quantify the uncertainty they have in these predictions. This uncertainty is generally informed by the spatial coverage near the queried location. Given the environment belief model, and its ability to determine the uncertainty in prediction, the second important portion of an active sensing robot is the planner. The planner aims to drive the robot to gather information relevant to the desired user task. Generally in active sensing, the planner reduces the uncertainty in the belief model for some specific goal. In the broadest class of planner goals, the robot seeks to evenly cover the entire area. Another common class of goals is to find maximal areas of the signal, often applicable in the study of harmful algae blooms. Active sensing robots tend to be studied in lab settings, where we assume many problems related to robots deployed in real environments can be ignored. This provides a convenient abstraction allowing the study of behavior of the robot in isolation to provide direct comparison between methods, but unfortu- nately, does not allow the study of the effects of the real world on the behavior of the robot. One commonly ignored effect is uncertainty in robot state sensing due to noisy position sensors such as inertial measure- ment units and global positioning system sensors. Because of the noise and drift in these measurements, we do not know the exact position of the sensor when a measurement is taken. Typically, we assume the mean of the belief over robot positions is a good enough approximation. We show in chapter 4 that we can do better than the mean value assumption in some situations, and additionally, that we can treat the problem of robot state estimation as a quasi-active sensing problem to allow for good estimates of the location of robot teams. 5 (a) Ground Truth (Real Signal) (b) Lawnmower Path (Halfway Complete) (c) Informative Path Planning with Coverage (Entropy) Objective (d) Informative Path Planning with Bayesian Maximization (UCB) Objective Figure 1.4: InformativePathPlanningComparison Figure 1.4a shows the ground truth test function, fig. 1.4b shows a fixed lawnmower path halfway complete with 600 measurements. Figure 1.4c shows a coverage based informative path planning plan taking as many measurements as fig. 1.4b. Figure 1.4d shows an informative path planning path with which seeks to find maximal areas with the same amount of measurements as fig. 1.4b. Systems developed for active sensing should attempt to focus their sensing on some quality for which the end-user of the system is interested. Many times, this is to provide a good enough map of the sensed signal in the environment with a fixed energy budget. Other times, this may be to find the location of the maxima of the sensed signals. These two genres of active sensing provide a good base for many types of end-user tasks, but in many cases do not exactly provide the user with the set of measurements most useful to their task at hand. An example task we will work with in chapter 5 is to select locations to take physical samples of the environment. This could be in the form of collected vials of water or cuttings of 6 plants. We will show that directly planning for this end-user task can outperform the more generic genres of active sensing. Typically when modeling the environment, active sensing robots will rely on a model which uses corre- lations in the spatial difference to predict the sensed signal at unmeasured locations. This model, typically a Gaussian process, provides a good approximation and admits a continuous representation of both the predicted signal and the model’s uncertainty in that prediction, but does not easily allow for representation of obstacles. Gaussian processes also have difficulty directly representing per-measurement uncertainty which makes it onerous to integrate different sensors. Per-measurement uncertainty also permits the in- tegration of analytical models of signals into the modeling process, allowing for more accurate modeling of the environment. We will discuss a factor graph based environment modeling technique in chapter 6 which allows for obstacle modeling, per measurement uncertainty, and will allow for real-time efficient planning onboard a robot. Active sensing can be used to sense various signals which are important to understand for causes such as biological sciences, search and rescue, and sensor placement. Here, we consider six types of signals, shown in fig. 1.5. These signals share a strong spatial correlation, that is, they can be reasonably accurately predicted from measurements which are spatially close to the queried location. This has been shown to be a fundamentally important feature of the signal for informative path planning [135], as it allows accurate uncertainty and predictions to be made. At an extreme, if a signal is completely spatially uncorrelated, there can be no informative path planning as all measurements are equally informative. This defines a boundary on how effective informative path planning can be for a given signal type, i.e. the more spatial correlation the better informative path planning can be over fixed path planning. This theory also takes into account sensor noise, that is uncertainty in the sensing device. If a sensor is at rest and the measured signal is not time variant, the variance in the output of the sensor can be thought of as the uncertainty in 7 (a) Hyperspectral Pixel Intensity (b) Chlorophyll Concentration (c) Plant Health (d) Radio Signal Strength (e) Light Intensity (f) Robot State Figure 1.5: Sensed Signals Various signals which will be discussed. We will discuss similarities and dif- ferences in signals, but rough genres can by biological processes (fig. 1.5a, fig. 1.5b and fig. 1.5c), electro- magnetic waves (fig. 1.5d, fig. 1.5e), and robot state (fig. 1.5f). the sensing device. This sensing uncertainty is one of the most common types of uncertainty which affects the ability of the robot to do informative path planning effectively. 8 Chapter2 Background We discuss background topics which are common to different projeects contained within this work. We also use a common notation scheme across many works, but may differ in the exact terminology due to specifics of the problem at hand. Many active sensing solutions follow the outline presented in fig. 1.3, and we describe the algorithms commonly used to implement the planner and belief modeling. 2.1 CommonNotation We will primarily use a grid-based representation of the planning space,G # , which defines the set of locations that the robot could visit. For a robot that moves in R d ,G # ⊂ R d . While this grid-based planning may seem restrictive, it allows for fair comparison between different systems as there are a finite amount of locations at which the robot can measure. We define X # as the set of locations at which the robot could measure. A sensing function ψ : G # × G # → X # defines the set of locations that can be sensed by moving from one robot pose to another. At eachx∈X the robot can sense a valuey ∈Y . If the robot sensor has finer resolution than G # , e.g. if the robot uses a camera sensor or takes measurements while traversing between grid points, then|X # |>|G # |. We define X 0:t as the locations at which the robot has measured up to timet andY 0:t as the values the robot has measured up to timet. We define Y # as every value the robot could sense such 9 (a) Signal in environment (b) Predicted mean,µ (x) (c) Predictive uncertainty,σ 2 (x) Figure 2.1: ExampleofGaussianprocess. A robot has taken 5 measurements at the white circles, taking the path described by the white arrows. The Gaussian process tends to smooth data from incomplete measurements and allows for estimates at unseen locations. that|X # | = |Y # | In most projects described here, time will be discrete such that t ∈ Z and the robot typically completes one action pert i 2.2 ModelingTechniques An important aspect of an active sensing robot is the technique used to model the belief over the environ- ment. Models used for signal belief representation typically use correlations in collected measurements to predict the measurement at unknown locations. As these are belief models, they are able to quantify their uncertainty in the prediction of the sensed signal, typically by using the spatial distance between the predicted location and the previously measured location. We will mainly focus on two techniques: Gaussian processes and factor graphs. Gaussian processes tend to be the most common type of model for informative path planning due to their flexibility and relative simplicity. Other techniques exist such as voxel based representation, which are used in specific deployments of active sensing [89]. 2.2.1 GaussianProcesses Gaussian processes (GPs) are widely used modeling tools for active sensing because they provide a non- parametric, continuous representation with uncertainty quantification. Gaussian processes approximate 10 an unknown function from its known outputs by utilizing the similarity between points from a kernel func- tion (k(·,·)) [108]. GPs have been used to represent the belief distribution from observations in POMDP formulations of sequential Bayesian optimization and IPP [135, 87, 38]. An example Gaussian process can be seen in fig. 2.1. Function values y ∗ at any input locationx ∗ are approximated by a Gaussian distribution: y ∗ |y∼N K ∗ K − 1 y, K ∗∗ − K ∗ K − 1 K T ∗ (2.1) wherey is training output,y ∗ is test output,x is training input, andx ∗ is test input. K isk(x,x),K ∗ is k(x,x ∗ ),K ∗∗ isk(x ∗ ,x ∗ ). Gaussian processes have a principle problem in their efficiency: they require matrix inversion. While this can be alleviated in practice by using a Cholesky decomposition [108], or by other adaptive sensing specific techniques [91], we present modeling techniques in chapter 6 which sidestep kernel inversion altogether. 2.2.2 FactorGraphs Factor graphs are a common way to represent estimation problems for Simultaneous Localization and Map- ping (SLAM) and other estimation problems. A factor graph is a graphical model involving factor nodes and variable nodes. The variables, or values, represent the unknown random variables in the estimation problem, such as robot poses. The factors are probabilistic information on the variables and represent measurements, such as of the signal strength, or constraints between values. Performing inference in a factor graph is done through optimization, making it suitable for large scale inference problems [32]. Factor graphs have been used for very large-scale SLAM problems [16] and for estimation of gas con- centrations [50]. Factor graphs can be solved in different ways, by either running a sparsity exploiting optimizer to find the maximum likelihood estimate for each value, or by keeping a tree and performing local updates [32]. 11 Reflection Signal Source Measuring Robot Factors with Obstacle Uncertainty (a) Unary Measurement Factors Binary Link Factors Inferred Signal Values at Locations (b) (c) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Spatial Position − 4 − 2 0 2 4 6 Signal Intensity i0 i1 i2 i3 i4 μ 2σ Obstacle Measured Locations (2σ ) (d) (e) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Spatial Position − 2 − 1 0 1 2 3 4 Signal Intensity i0 i1 i2 i3 i4 μ 2σ Obstacle Measured Locations (2σ ) (f) Figure 2.2: FactorGraphEnvironmentModeling Figure 2.2c and fig. 2.2d show the factor graph after two measurements are added. Betweeni 0 ,i 1 andi 2 ,i 3 there is an obstacle which the signal can permeate through, represented by an increase in the uncertainty in the link factor. Figure 2.2e and fig. 2.2f show the factor graph after four measurements, showing that the uncertainty has decreased overall and the predicted posterior has become more accurate. Additionally, we can see that the factor graph accurately models the posterior uncertainty due to the per-measurement uncertainty in the measurement. 2.2.2.1 FactorGraphbasedInformativePathplanning Factor graphs have been used as a model of a continuous distribution by modeling the problem as a Markov random field. This approach has been used to monitor time varying gas distributions in spaces with ob- stacles by defining a factor graph over the entire space with an unknown value at each cell [50]. It has also been used to jointly estimate the concentration of gas and the wind direction [56]. We will use factor graphs for estimation of light intensity in section 6.2 and radio signal strength in section 6.1. Factor graphs for informative path planning typically follow a formulation of imposing a 12 mesh overX # , with factors linking the nodes in the mesh along the vertices. We term these binary factors which link the nodes in the graph togetherlinkfactors, which allow information to flow spatially through the graph. This can be thought of as somehwat analagous to the kernel function in Gaussian processes. Another type of common factor is a unarymeasurementfactor. This type of factor conditions the estimated value on a measurement, either produced by a real sensor or analytical model. Factor graphs allow for a more flexible modeling technique than Gaussian processes in some regards. Firstly, factor graphs allow for per-measurement uncertainty as there is no constraint that each measure- ment factor have the same uncertainty. This can be useful when there is spatial heteroscedasticity in the noise of the sensor, such that some areas have more noise in the sensed value than others. This can also help when measurements come from two sources, such as from an analytical model and from measured sensor values. Additionally, because there is no requirement that link factors have the same uncertainty, or that the topology of the graph is regular, it is easy to represent obstacles. Two possibilities are presented in this work. The first possibility arises when obstacles are fully known (not occupancy probabilities) and it is known that the signal does not permeate the obstacle. In this case, values inside the obstacle can be removed, and link factors which link through the obstacle can be removed. This leads to an irregular mesh, and prevents estimates from flowing through the obstacle, while reducing the computational load of the factor graph solver by increasing the sparsity. This is the technique applied in section 6.2 as the sensed quantity is light intensity and the obstacles are fully transparent. The second possibility arises when the obstacle/signal interaction is uncertain due to two possibili- ties: either the signal can permeate the obstacle, or it’s not known whether the obstacle fully blocks the signal. This can arise when the sensed quantity is an electromagnetic wave, such as radio signal, which may permeate seemingly solid obstacles. Additionally, on a real robot, obstacles may be estimated using 13 occupancy probabilities [75], which encode the probability an obstacle is traversible. This can be used as a proxy for the ability of the obstacle to block signal propagation. In either case, the solution is to increase the uncertainty on the link factor, either by estimating the signal permeation uncertainty, or by increasing it proportionally to the probability it blocks the sensed signal. This is the technique applied in section 6.1 where the robot senses radio signals and only has a probabilistic estimate of the obstacles onboard. An example of a sensing factor graph with probabilistic obstacles can be seen in fig. 2.2 where two obstacles are present. 2.2.2.2 PoseGraphOptimization A more common use for factor graphs is pose graph SLAM [16, 12, 41]. In the context of pose-graph SLAM, the graph nodes are robot poses, to be estimated, and the graph factors connect these nodes and represent measurements such as odometric information or loop closures [33]. The combination of factors and poses leads to a commonly used optimization problem called pose-graph optimization [12]. While regularly used to great effect, pose-graph optimization can be costly to perform at a high rate, as is needed when evaluating the impact of many loop closure candidates, as we will see in section 4.2. This is because this problem is highly non-linear due to robot poses typically belonging toSE(3) while sensed values in informative path planning tend to be inR, leading to a linear problem. 2.3 Planning Using a modeling technique, the active sensing robot is able to accurately understand both where the model is uncertain and predict the strength of the signal we wish to measure. This allows the robot to make plans and policies to improve this model. Planning techniques must take into account the infor- mation gained by all the measurements gathered along the path jointly, which will typically be less than sum of the information gathered for each sample individually, due to the submodularity of information 14 gathering [78]. It is also not necessarily true that the optimal path when considering the measurements individually is the optimal path when considering the measurements jointly, which causes solutions which greedily (myopically) consider only one action to be sub-optimal [60]. In general, planning for information gathering is NP hard [60, 85] due to the curse of dimensionality, and we will make use of rollout-based planners to allow long-horizon planning without exponential time [85]. 2.3.1 OnlineInformativePathPlanning(IPP) Online informative path planning consists of alternating between planning and taking an action, which usually corresponds to moving and measuring a value at the new location. A plan described by a partial trajectoryp is created to maximize some objective functionf over the measured locationsX and measured valuesY . The combined planP is the concatenation of partial trajectoriesp. The plan and act steps iterate until the costc(P) exceeds some predefined budget [68, 38]. Formally, this can be described by eq. (2.2). P ∗ =argmax P∈Φ f(P)|c(P)≤ B (2.2) whereΦ is the space of full trajectories, andP ∗ is the optimal trajectory. Planning for robotic information gathering has been extensively studied in literature and tested in field systems. Recently, rollout-based solutions have gained popularity to perform non-myopic planning for challenging missions including multi-robot deployments [90, 6], multiple objectives [20]. Field tests of these systems show great potential for widespread uses [47, 90, 104, 29]. Rollout-based algorithms are popular for informative path planning as they allow for anytime computation of plans on constrained hardware [90, 6]. In particular, Monte Carlo tree search-based solutions allow for long horizon plans to be built without requiring exhaustive search at all branch levels [87, 125]. 15 2.3.2 ObjectiveFunctionsforIPP Objective functions are used to specify the mission objective to maximize during planning. These functions encode the mission goals for the active sensing robot and describe the behavior. High level goals can include acquiring good spatial coverage of the phenomena, finding the maxima or high concentration areas, and other specialized objective functions. Entropy-based coverage objective functions are used when a robot has to spatially cover an area and maximally reduce the uncertainty about the concentration in the model. Bayesian optimization objec- tive functions are used when the maximal values of a spatial field are desired, or it is desirable to spend more time exploring higher concentration areas. Quantile estimation objective functions, introduced in section 5.1, are similar to Bayesian optimization objective functions but are used when certain quantile values are desired, instead of simply the maximal values. These objective functions take a sensed location, x i , and produce a scalar reward,r i = f(x), determined by the utility of sensing a value at that location. Given a set of points produced by the sensing functionX j =ψ (g j− 1 ,g j ), the objective function is denoted byf(X j )=Σ x i ∈X j f(x i ). 2.3.2.1 SubmodularObjectives Often, the objective in IPP is to generate good coverage of a distributed phenomenon using an information theoretic objective such as entropy [73] or mutual information [60]. Entropy is a common objective function for active sensing when only good spatial coverage of the environment is desired [73, 60, 34]. It provides a good baseline as it is often used when the specific values of the underlying concentration are unknown. Entropy is defined eq. (2.3): f en (x)= 1 2 log(2πeσ 2 (x)) (2.3) 16 Entropy exhibits submodularity characteristics [60]. Formally a function,F , is submodular if∀A⊂ B⊂ V and∀s∈V \B, we have thatF({s}∪A)− F(A)≥ F({s}∪B)− F(B) [78]. This naturally describes diminishing returns exhibited in many active sensing problems where taking a measurement provides more information if you have taken fewer samples before this. 2.3.2.2 BayesianOptimizationObjectives Bayesian optimization is a popular method for finding the maxima of a black box function. IPP and Bayesian optimization share a common theoretical framework, especially when the maximal values of a spatial field are desired [87, 135, 8]. Sequential Bayesian optimization objectives are used to locate ex- treme values or areas of high concentration and have shown success in doing so in simulation and in real field scenarios [87, 8, 128]. These specialized objective functions are typically tuned to locate a single point, often the extrema. The simplest Bayesian optimization objective which is used in practice is upper confidence bound (UCB). UCB contains a term, β which allows a user-defined exploration-exploitation trade-off [88]. The UCB equation defined by f ucb (x)=µ (x)+βσ 2 (x) (2.4) Two popular objective functions areExpectedImprovement andProbabilityofImprovement, which pro- vide various improvements over UCB for Bayesian optimization, namely, they do not require tuning of the β parameter and tend to trade off exploration-exploitation more automatically, though they do in practice have hyper-parameters. Both objectives use the notion of improvement, which is defined as I =(µ (x i )− max(µ (X # ))). This improvement is used when calculating a Z-score defined as Z = I σ 2 (x i ) . 17 Probability of Improvement is an objective function which seeks to find areas which have a high prob- ability of having a measured value higher than the largest previously measured value. Probability of im- provement is defined as f pi (x)=Φ (Z) (2.5) whereΦ is the CDF of the normal distribution. Expected Improvement differs from probability of improvement by incorporating how much the new measured value is expected to improve over the previously seen maximal value, rather than just looking at the likelihood of it improving. It has been used in IPP as an objective function when the maxima of a spatial field are desired [109, 86]. f ei (x)=IΦ (Z)+σ (x)ϕ (Z) (2.6) whereΦ ,ϕ are the CDF and PDF of the normal distribution. 2.3.3 InformativePathPlanningasaPOMDP Partially observable Markov decision processes (POMDPs) are a broad class of problems in which an agent must make decisions when some portion of the state of the world can only be estimated, rather than directly observed [135]. Formulating the informative path planning problem as a POMDP provides a formulation for planning for taking measurements, and allows use of generic solvers. POMDPs are a framework that can determine optimal actions when the environment is not fully ob- servable or there is uncertainty in the environment. Formulating IPP as a partially observable Markov decision process (POMDP) has been explored [135]; previous works have used this for finding maxima [87] and to adapt the parameters of a rollout-based POMDP solver online to improve its efficiency [38]. We use these works to formulate and solve the informative path planning problem. 18 POMDP InformativePathPlanning States Robot positiong t , Underlying unknown functionGT Actions Neighboring search points Observations Robot positiong t , Measured location(s)X t =o(g t ) Measured value(s)Y t =GT(X t ) Belief GP conditioned on previously measured locations(X 0:t− 1 ) and values(Y 0:t− 1 ) Rewards f(X t ) Table 2.1: InformativePathPlanningasaPOMDP. After [135, 38]. Using observations as measurements from the environment and a model to represent the belief dis- tribution, informative path planning can be formulated as a Bayesian search game. The state of the en- vironment (the ground truth GT ) is the hidden state and sensor measurements y t = GT(x t ) are the observations [135]. To adapt the Bayesian search game formulation to IPP, the belief state is augmented with the state of the robotg t and the actions the planner is allowed to take are restricted to local move- ments which are feasible for the robot [87]. A complete description of the POMDP for IPP can be seen in Table 2.1. The reward is defined by the objective function f(x t ), with future rewards discounted byγ d . 2.3.4 PartiallyObservableMonteCarloPlanning(POMCP) POMDPs have been used for adaptive sampling and informative path planners in many situations [80, 75, 87, 62]. Many of these use a traditional dynamic programming solution to solve the underlying POMDP. This is infeasible for large state spaces or complex belief representations that are typically present in many common informative path planning problems. Recently, attention has focused to solvers which are locally accurate using probabilistic rollout updates [87]. Work both in applying these POMDP solvers to informative path planning has progressed in parallel with work to advance these POMDP solvers. A state of the art algorithm for solving large POMDPs online is the Partially Observable Monte-Carlo Planning solver (POMCP) [124]. POMCP uses a Monte Carlo tree search (MCTS) which propagates reward information from simulated rollouts in the environment. At every iteration in the planner, the agent performs a search through the generated treeT to choose actions, using Upper-Confidence Tree (UCT) [77] exploration for 19 Initial Belief First Action Belief Update after Hypothetical Measurement Second Action Belief Update after two Hypothetical Measurements Rollout Figure 2.3: RolloutBasedPOMDPSolverforactivesensing The POMCP based solvers maintain both action and observability. Reward estimates are calculated from action nodes by performing rollouts, i.e. randomly taking actions until a horizon lengthh. partially observable environments, until it reaches a leaf nodeL ofT . From nodeL, the planner performs a rollout using a pre-defined policy (usually a random policy) until the agent reaches the planning horizon. The reward the agent collects while simulating this trajectory is then backpropagated up through the visited states. Finally, the first node the agent visited from the rollout is added as a child of L, and the tree is expanded. Once the specified number of iterations (rollouts) are completed, the tree is considered adequate for planning. The action from the root node that gives the highest expected reward is then chosen, and the agent executes the action in the environment. At each observation node ofT , the observation is estimated withµ x , wherex is the agent’s position at that node. To update the beliefb, the state-observation pair is integrated into the model. This process, seen in fig. 2.3, allows the robot to plan for the information gathered by collective sets of measurements, rather than each measurement individually. 20 2.4 GraphNeuralNetworks Graph neural networks allow neural networks to be learned on graph structure [150]. Graph structures arise in two ways in these works. Firstly, we can treat a collection of locations as a graph by encoding the local spatial structure into the edges in a k-nearest neighbors graph, as we will see in section 3.2. Secondly, we can approximate factor graph solves to speed up inference in factor graphs, as we will see in section 4.2. In particular, we use a Gaussian Mixture model graph convolution, which generalizes convolutional neural networks to graph based domains using mixture models [93]. This model learns an embedding of nodes in a graph by passing the node embedding as messages and integrating the edge features. 2.5 ContinuousNon-ConvexGradient-FreeOptimization Continuous non-convex gradient-free optimization is used to solve general black-box optimization prob- lems where the objective is non-convex and the variables are continuous. 2.5.1 CrossEntropy The cross-entropy (CE) method is a popular approach which works by maintaining an estimate about the distribution of good solutions, and iteratively updating the distribution parameters based on the quality of the solutions sampled at each step, where the quality is determined via some function of the solution configuration [31]. More precisely, a prior Pr and a posterior Po set of samples are maintained. Po containsn configurations sampled from N(⃗ µ ,⃗ σ ). ⃗ µ and⃗ σ are computed fromPr, which is the bestη % configurations from Po. This continues iteratively, minimizing the cross-entropy between the maintained and target distributions [31]. 21 2.5.2 SimulatedAnnealing Simulated annealing (SA) [76] is an iterative algorithm inspired by statistical mechanics. SA optimizes an energy function similar to a loss function in other optimization schemes, as configurations with lower energy are preferred. SA begins with a temperature T = T max . At each iteration, T is decreased expo- nentially, and the configuration is slightly perturbed randomly. The perturbed state is either accepted or rejected probabilistically based on its energy andT . 2.5.3 BayesianOptimization Bayesian optimization (BO) is a method for selecting parameters for difficult optimization problems such as hyperparameter selection [126]. BO is similar to informative path planning and sequential Bayesian optimization [87] in that these methods build a model using a Gaussian process and select points which maximally improve this model. In pure BO the optimizer is allowed to select any points in a defined boundary, as opposed to informative path planning and sequential Bayesian optimization, where the robot can only select points it can locally travel to. BO uses acquisition functions, such as expected improvement, and iteratively selects the best point to sample from [72, 107]. 22 Chapter3 AutomaticandEfficientDeploymentofActiveSensingRobots In this chapter we discuss problems which arise when deploying active sensing robots in the real world and strategies to alleviate them. Despite the apparent benefits of active sensing, deployment of active sensing robots has not seen much real world use, beyond a few academic trials [47, 91]. There are many reasons for the limited real world use, such as difficulty with parameter setting, inefficiency of planning on real robots, and safety. Parameters of the planner and belief model are often difficult to set, especially for non-roboticists, who use these systems to solve real world measurement problems. These parameters are largely non-physical and are difficult to translate without robotics knowledge. Nevertheless, they are extremely important and greatly affect the performance of the overall system. Inefficiency of the planner on real, computationally constrained, robots is a large barrier to real world use. Many times these systems are developed in an academic lab with large computational resources, and only their ability to outperform a baseline is studied. It is important however, that these methods are flexible in their computational resources, and that their effectiveness doesn’t diminish greatly on resources constrained robots. In the worst case, a robot will stop to re-plan, and only once the plan is computed, continue to move. This is almost always sub-optimal in active sensing, as it is generally more useful to gather measurements on a sub-optimal plan than to gather fewer measurements more optimally. 23 Finally, the largest barrier to deployment of these systems is safety. Safety of robotics systems is often discussed in reference to their ability to damage humans or valuable property. While this is a large consideration in the deployment of these robots, as they are often expensive and these adaptive algorithms tend to be more bug-prone than simpler ones, the bigger barrier to safety is mission safety. Mission safety is the likelihood that a mission will fail due to some problem in the adaptive sensing routine. As we have said previously, adaptive sensing tends to make better usage of the mission time than lawnmower patterns, but if there are problems with the adaptive sensing behavior, or incorrect parameters, an entire mission can be forfeited. For these three reasons and more, adaptive sensing has not been deployed much in real-world sci- entific studies. In section 3.1, we will investigate alleviating these problems using hand-tuned heuristics which promise easier parameter setting and reduced computational burden, thus increasing the mission safety of the active sensing robot. Following that in section 3.2 we will autonomously set the parameters for the mission planner, allowing for automatic deployments of these robots without manual parameter specification. 3.1 ActiveSensingusingPOMDPswithDomain-SpecificConsiderations ∗ Adaptive sensing can make use of domain-specific information when it is available. For example, re- searchers in the area of algal bloom monitoring in aquatic ecosystems find areas of high chlorophyll con- centration more valuable to study. Such use-cases naturally lend themselves to the integration of Bayesian optimization in adaptive sampling [30]. Solving adaptive sensing problems exactly is known to be NP-hard [78]. However, these problems can be solved using exact solvers [7], sampling-based planners [68] or Monte Carlo tree search (MCTS)-based solvers which sample random trajectories from the final reward distribution [87, 6]. Here we focus on ∗ Originally appeared as [114] 24 0 1 2 3 4 5 6 (a) (b) POMCP Number of Rollouts (Planner iterations) First-Level Exploratory Action MCTS Tree Agent Action Sequence Rollout Allocation Exploration Algorithm Plan Commitment (c) Figure 3.1: Overview of Approach Fig. a shows an agent’s trajectories on a hyperspectral orthomosaic collected in Clearlake, California. Blue is low value, red is high value. The baseline trajectory overlaps with itself (wasted time), goes outside the bounds of the orthomosaic (no reward outside workspace), and mostly samples near the starting position. The trajectory from our proposed method avoids such behavior and samples regions further away. Fig. c shows POMDP planning with the POMCP planner. Portions in grey are areas we study and improve in this work. using MCTS-based solvers to effectively sample complex environments. These iterative solvers generally use rollouts to sample a reward value for a given state by following trajectories from that state. The process of sampling from a final reward distribution using a random trajectory from a state at the leaf of the planning tree is called a rollout. Rollouts are used in local planners, such as POMCP [124], to sample discounted rewards over trajectories, from an unknown reward distribution. In adaptive sampling, this reward distribution is defined by some objective function over samples. Typically, rollouts are used to build an estimate of the mean reward for an action. By performing more rollouts, the planner improves its estimate of the expected reward for a particular sequence of actions. Often, planning for adaptive sampling is done online. A fixed number of environment steps (typically one) are enacted after planning for a fixed number of iterations. This process is repeated until the finite budget (e.g., path length or energy [68]) is exhausted. Here, we verify that this process of committing to a fixed number of steps and rollouts at each invocation of the planner can be modified to reduce the total number of rollouts needed over the entire episode. We find that, in information gathering problems, there is a period when the information gathered is sufficient to predict the field accurately enough to make more useful plans. The intuition behind our result is that this period should be allocated more rollouts than 25 the period when less information is known, or when gathering more samples does not result in as much reward. Additionally, more environment steps can be enacted from a single POMCP planning tree because the reward for future actions can be accurately predicted. We cast the adaptive sampling problem as a Partially Observable Markov Decision Process (POMDP) which is solved using a local solver that updates the expected rewards for action sequences by sampling some reward distribution in an iterative fashion using rollouts. Specifically, we investigate minimizing the number of rollouts performed to achieve comparable accumulated reward by altering three parameters: the number of rollouts to perform, the choice of which actions to take in the planning tree during a planning iteration, and how many steps of the planned trajectory to follow. fig. 3.1a shows a sample trajectory of a drone for a lake dataset performed by our method and the baseline POMCP solver in this work. 3.1.1 Background 3.1.1.1 Multi-ArmedBandits(MAB) MAB are a family of problems in which K actions are available and each action has an unknown as- sociated reward distribution, D k . At each time step, the agent chooses an action k ∈ K and receives a reward drawn fromD k . The goal of the agent is to maximize the cumulative reward over time or minimize risk, which is defined as the difference between the agent’s cumulative reward and some optimal policy. There is a natural exploration-exploitation trade-off in the MAB problem because at each step the agent receives more information about the distribution of one of the actions by sampling the reward [11]. This framework provides the mechanism for action selection in a variety of rollout-based algorithms, includ- ing POMCP [124], and is used when each rollout can be viewed as a draw from the reward distribution conditioned on the currently selected actions. 26 In contrast to optimal action selection algorithms, there is a family of algorithms which seek to identify the best action in terms of mean reward. These algorithms work in two settings: fixed-budget and fixed- confidence. In the fixed-budget setting, the agent finds the best action from a fixed number of samples. In the fixed-confidence setting, the agent finds the best arm having P[risk > ϵ ] < δ , in the fewest number of samples [51]. 3.1.2 FormulationandApproach Most online informative planning pathfinders use the same planning duration at all points in the planning sequence. We propose to modify the POMCP [124] planner, an online, rollout-based POMDP-planner, given the knowledge about the underlying problem. Our method selects how many rollout iterations to use at each environment interaction and which actions should be tried in the planner. Using this tree, it also adaptively selects how much of the tree can be incorporated into the executed plan, based on the rewards received during these rollouts. We show that we can produce similar models of the environment in fewer overall iterations of the (expensive) rollout sequence. An overview of how our improvements fit into the planning and action pipeline with POMCP is in fig. 3.1c. 3.1.2.1 RolloutAllocation The first improvement we propose is to alter how the overall rollout budget is handled. Typically, the total rollout budget is divided evenly and a fixed number of rollouts are allocated each time the planner is called to compute a new partial trajectory. This results in a linear increase in the number of rollouts used as the number of environment steps increases. We propose that this rollout allocation method should take advantage of three key ideas: cold-starting, submodularity, and starvation. The idea of cold-starting is well studied in adaptive sampling [73] and captures the notion that the planner cannot make useful decisions with little information to plan on. Planning with little information is futile since far away points 27 will generally return to the global mean with high variance. Typically, this is handled by having the robot perform a pre-determined pilot survey to gather initial information about the workspace [73]. This strategy wastes sampling time if too many pre-programmed samples are taken. The lack of information manifests itself in another problem when planning for adaptive sampling: submodularity of the objective function which necessitates effective sampling early on in the episode, since early samples provide the largest benefit. Additionally, because this information is used in planning later on, the importance of good early information gathering is compounded. There is, of course, a trade-off to allocating rollouts early on: plans which allocate too many rollouts early on could suffer from the problem of starvation of rollouts at later stages. This can cause the planner to make poor decisions when there is rich information to plan on and the areas of interest are well-known and understood. This rollout allocation trade-off in planning is an instance of the exploration-exploitation trade-off which is common in information gathering tasks. 3.1.2.2 ExplorationAlgorithm MCTS-based planners, such as POMCP, treat each action at each layer in the tree as an MAB problem. In the usual MCTS algorithm, the objective is to optimally explore possible trajectories by choosing an action at each layer according to some optimal action selection criterion [124]. We propose using optimal arm identification algorithms instead of optimal exploration algorithms because the final goal of the POMCP is to choose and execute the action with the highest mean reward, not to maximize the cumulative reward during searching. Shifting from the cumulative-reward setting to the fixed-budget setting allows the exploration algo- rithm to decide the exploration-exploitation trade-off based on the number of allocated rollouts at each planning step. When many rollouts are allowed, the algorithm can be more explorative and consider each 28 action for longer, while with fewer rollouts the algorithm typically becomes more exploitative. A fixed- budget action selection algorithm can only be used for the first action in the tree as these actions are the only ones for which the total number of combined rollouts is fixed. In this work, we investigate three exploration algorithms. The first, Upper Confidence Tree (UCT), is an optimal exploration algorithm and is the default choice for most MCTS-based solvers, including POMCP [124]. UCT provides a trade-off between exploration and exploitation by adding an exploration bonus to each action using the number of times the parent and child have been explored. UCT does not take into account a budget, but instead tries to maximize the sum of the reward samples during planning. Because of this, UCT may tend to be highly explorative [4]. The remaining two algorithms are fixed-budget algorithms that explicitly incorporate the amount of rollouts allotted and attempt to maximize the probability that the chosen action is, in fact, the best action. The first algorithm, UGapEb, uses an upper bound on the simple regret of the actions to choose the action which is most likely to switch to become the best one to exploit [51]. This algorithm incorporates the total budget into the confidence interval to make the most efficient use of the provided budget. It contains a (difficult to estimate) parameter, H ϵ , which requires the gap between actions to be known ahead of time. Both UCT and UGapEb additionally depend on a (difficult to estimate) parameter, b, which is multiplied by the bound to create the exploration-exploitation trade-off. In this work, we use the difference between the highest and lowest discounted rewards ever seen, but this may not be the optimal bound and requires the agent to know these values before exploring the workspace. The final exploration algorithm is Successive Rejects. This algorithm is unique in that it does not attempt to use a confidence bound like UCT and UGapEb. Instead, it chooses each action a number of times in successive rounds and eliminates an action in each round until a single action is found [4]. This algorithm is preferable in some situations because it is parameter-free, while UCT and UGapEb have hard to tune parameters. However it may waste some rollouts when an action is obviously inferior. 29 3.1.2.3 PlanCommitment Each call to the POMCP planner produces a planning tree of estimated rewards for action sequences. Typically, the agent executes the first action with the highest expected reward and replans with the part of the tree it went down. Since an adaptive sampling agent is approximating the underlying function using an initial belief, the generative model is different after each call to POMCP planner. This is because every time the agent takes an action in the environment and receives new samples, the initial belief for the underlying POMDP changes. Hence, the tree must be discarded and completely replanned after incorporating the observation from the environment into the initial belief state. We propose to take more than one action from the tree when there is certainty at lower levels about the optimality of the action. In order for the agent’s performance to be unaffected by taking further actions, there are two considerations it must have. The first consideration is of the quality of the estimate of the reward from an action. If rollouts are spread evenly, the number of rollouts performed for lower actions (deeper in the tree) will be exponentially less than higher actions (closer to the root), causing the estimate of their reward to be poor. The second consideration is the quality of the estimate of the observation at locations further away. The initial belief greatly affects the quality of observation estimates. Trajectories further away from what the agent has seen will generally have worse estimates for what their observations will likely be. With these two considerations the agent may be able to extract more than one plan step from the tree without significant deterioration in the accumulated reward. The simplest method is to take a fixed number of actions from the MCTS and execute them. This does not take into account any considerations of the quality of estimates of the reward for actions. With this method, the agent may have an inadequate understanding of the underlying world state at some point, but still take actions from the tree. If the agent accounts for the statistics of the samples for the reward for actions, more complex methods can be used. We use a method based on UGapEc, a fixed-confidence MAB scheme [51]. UGapEc determines 30 if there is an action that has a mean reward higher than all other actions with probability 1− δ . This algorithm can be used to verify if a fixed-confidence threshold is met by checking if the algorithm would not choose to explore further. Another method which uses a statistical test similar to UGapEc is a two tailed Welch’s t-test [139]. This test assumes the distributions the samples are from are Gaussian but does not assume the standard deviations are known or equal. Since the error in the estimate of standard deviation is quadratic in the number of samples, our estimate of the standard deviation deteriorates much faster than our estimate of the mean. Because of this, a more complex test must be used than a simple Gaussian confidence interval test since it may underestimate the sample standard deviation [61]. The unequal variances two tailed t-test tests the null hypothesis that the reward distributions for two actions have identical expected values. This method returns an estimate of the probability (p-value) that the means are equal. A threshold is set and if the p-value of the null hypothesis is below the threshold, the action is considered safe to take. This method is statistically robust. It causes the action to not be chosen in two cases. The first case is that there are not enough samples of each action and the second is that the means are too close to distinguish with the number of samples gathered. This method uses a Student’s t-distribution [57] and calculates the t statistic and the v value with eq. (3.1) which can be used to compute the p-value with a Student’s t-distribution. t= ¯µ 1 − ¯µ 2 q ¯σ 2 1 N 1 + ¯σ 2 2 N 2 v≈ ( ¯σ 2 1 N 1 + ¯σ 2 2 N 2 ) 2 ¯σ 4 1 N 2 1 (N 1 − 1) + ¯σ 4 1 N 2 2 (N 2 − 1) (3.1) where ¯µ i is the sample mean reward for distributioni, ¯σ i is the sample standard deviation for distri- butioni, andN i is the sample size for distributioni. We compare the reward distributions of the top two actions, with highest expected reward, to determine the p-value. We ignore other actions because of the asymmetrical nature of an MCTS tree causing the worst actions to have very few rollouts. 31 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Budget 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Rollout Grid Search Curves (a) 0 25 50 75 100 125 150 175 200 Time Steps 0 5000 10000 15000 20000 25000 30000 Used Rollouts Top Three Rollout Curves and Fixed 5 6 7 8 9 10 11 Mean Final Accumulated Reward (b) Figure 3.2: Resultsfromagridsearchoverpossiblerolloutcurves. fig. 3.2a presents the curves that were searched over. fig. 3.2b shows the three most optimal curves and a linear allocation curve, colored by their mean accumulated reward at the end of the episode. 3.1.3 Experiments We assess the performance of our improvements on three environments. The first environment is a test function for testing the effectiveness of sequential Bayesian Optimization using POMCP [87]. We use a dynamic (time-varying) two dimensional function as the underlying ground truth for testing our individual improvements. It corresponds to a Gaussian curve circling a fixed point twelve times. g(x,y,t)=e − ( x− 2− 1.5sin(24πt ) 0.7 ) 2 e − ( y− 2− 1.5cos(24πt ) 0.7 ) 2 (3.2) wherex∈ [0,5],y∈ [0,5],t∈ [0,1]. In this environment the agent starts out at the bottom-center of the time box and progresses towards the top, choosing actions in the x-y plane. In the other two environments, called Validation Environment 1 (fig. 3.5a) and Validation Environment 2 (fig. 3.5d), we use chlorophyll concentration data collected from a YSI Ecomapper robot as input data. These datasets are 186m by 210m, by 15m deep. We interpolate this with a Gaussian process to create the underlying function to estimate. In these scenarios the robot travels3m between each sample and can freely travel in any direction at any point. The robot starts at the center of the environment at 0 depth. 32 0 25 50 75 100 125 150 175 200 Environment Steps 0 2 4 6 8 10 12 14 16 Accumulated Reward Dynamic Function Action Selection Successive Rejects, Beta UCT, Beta UCT, Fixed UGapEb, Beta (a) 0 25 50 75 100 125 150 175 200 Environment Steps 0 200 400 600 800 1000 1200 1400 Accumulated Reward Validation Environment 1 Action Selection Successive Rejects, Beta UCT, Beta UCT, Fixed UGapEb, Beta (b) 0 25 50 75 100 125 150 175 200 Environment Steps 0 1000 2000 3000 4000 5000 6000 Accumulated Reward Validation Environment 2 Action Selection Successive Rejects, Beta UCT, Beta UCT, Fixed UGapEb, Beta (c) 0 25 50 75 100 125 150 175 200 Environment Steps 0 1 2 3 4 5 Accumulated Reward Dynamic Function Plan Commitment Algorithm n_steps 1 (baseline) tTest 0.05 tTest 0.1 uGapEc 10 uGapEc 5 (d) 0 25 50 75 100 125 150 175 200 Environment Steps 0 5000 10000 15000 20000 25000 30000 Used Rollouts Dynamic Function Plan Commitment Algorithm n_steps 1 (baseline) tTest 0.05 tTest 0.1 uGapEc 10 uGapEc 5 (e) Figure 3.3:ComparisonofExplorationAlgorithms(a,b,andc)andPlanCommitmentAlgorithms (d and e). UCT, Fixed is the baseline which evenly splits the rollouts at each step and uses the UCT ex- ploration algorithm (the default for MCTS). Other results use a curved rollout allocation. For plan com- mitment, fig. 3.3d shows the reward accumulation and fig. 3.3e shows the number of rollouts used in all POMCP calls for the whole episode. A small offset value is added to methods which overlap. For all environments, the agent is allowed to take 200 environment steps and is assumed to have a point motion model that can move to neighboring locations. We use the objective functionµ x +cσ x and use c = 10 for the dynamic function and c = 100 for the validation environments. All experiments are run for five seeds each. 3.1.3.1 GridSearchforRolloutAllocation To find the proper form of the rollout allocation and test the assertion that different parts of the POMDP planning process need different number of rollouts we perform a grid search over different curves that describe the rollout allocation. For each curve, if it would allocate less than one rollout per action, we allow the planner to perform a single rollout per action. We parameterize these curves by cumulative beta distributions because of their flexibility in representing many different kinds of curves. These curves 33 are parametrized by anα andβ parameter which determine the exact form of the curve. We search over α =[.75,1,2,3,4,5,6] andβ =[.75,1,2,3,4,5,6]. These curves can be seen in fig. 3.2a. The results of this experiment are shown in fig. 3.2b, which indicate that an exponential increase in the rollout allocations is desirable and a very flat curve is undesirable. We find that the best curve for the dynamic function to be α = 6,β = 1. We empirically find that this curve does worse on the validation environments, possibly due to overfitting, and that a curve with α =4,β =4 works best. We use this for tests involving a curved rollout allocation with them. 3.1.3.2 ComparisonofExplorationAlgorithms We test the effectiveness of alternative exploration algorithms to UCT and the interaction between the rollout allocation method and exploration algorithm. We test three exploration algorithms described in section 3.1.2.2: UGapEb, UCT, and Successive-Rejects on three environments. In fig. 3.3a all beta curve- based methods outperform the fixed method and all allocators work almost equally well, with UCT having a slight performance boost. In fig. 3.3b, UGapEb and Successive Rejects with curved rollout allocation perform approximately equally but out-perform UCT with both a fixed and curved rollout allocation. In fig. 3.3c all three curved allocators are out-performed by a fixed rollout allocation curve. This is likely because the rollout curve is poorly chosen for this environment due to not being chosen by grid search. UGapEb outperforms all curved allocators by a significant margin. 3.1.3.3 ComparisonofPlanCommitmentAlgorithms We test the methods described in section 3.1.2.3 for determining how many steps to take once the MCTS tree is generated. We test the unequal variances t-test and UGapEc methods with different parameters against a baseline method, which takes only the first action, across 5 seeds. fig. 3.3d and fig. 3.3e shows the comparison of all these combinations against the baseline. UGapEc and the baseline largely overlap 34 because UGapEc cannot confidently predict whether the chosen action is the best action with such few rollouts and a larger epsilon does not make sense for the scale of the rewards. We believe that UGapEc may be of use for complex environments where the agent cannot make strong assumptions about the underlying reward distributions and many more rollouts will be required for the POMCP algorithm. The unequal variances t-test performs the best amongst the options. Within the t-test parameters, the p-value of 0.1 requires slightly fewer rollouts than a p-value of 0.05 for similar reward. However, choosing 0.1 implies riskier behavior which can have a negative effect in complex environments and real-world datasets, such as our validation environments. Hence, we choose the unequal variance t-test with p = 0.05 as our best choice for plan commitment. fig. 3.3d shows the accumulated reward for a trajectory execution between the baseline and our choice. In fig. 3.3e it is clear that each of the algorithms take vastly different amounts of rollouts for obtaining this result. Hence, we see that the plan commitment t-test algorithm helps to significantly reduce the rollouts needed to solve the adaptive sampling problem. 3.1.3.4 ComparisonofBaselinewithCombinedImprovements fig. 3.4 shows the combined effect of all improvements from the preceding experiments: using a curved rollout allocation, using the UGapEb exploration algorithm and using the t-test plan commitment algo- rithm. We compare against a baseline which uses an equal number of rollouts at each step, uses the UCT exploration algorithm and takes only one action before replanning. We compare our method and this base- line for each environment. fig. 3.4a and fig. 3.4b show that the combined features achieve a much higher reward in fewer rollouts on the dynamic environment. fig. 3.4c and fig. 3.4f show that again the agent re- ceives a higher reward in many fewer rollouts than the baseline method. fig. 3.4e and fig. 3.4f indicate that our method is comparable to the baseline in terms of reward but achieves this reward in fewer rollouts. 35 0 25 50 75 100 125 150 175 200 Environment steps 0 2 4 6 8 10 12 14 Accumulated Reward Dynamic Function Proposed method baseline (a) 0 25 50 75 100 125 150 175 200 Environment steps 0 5000 10000 15000 20000 25000 30000 Used Rollouts Dynamic Function Proposed method baseline (b) 0 25 50 75 100 125 150 175 200 Environment steps 0 200 400 600 800 1000 1200 Accumulated Reward Validation Environment 1 Proposed method baseline (c) 0 25 50 75 100 125 150 175 200 Environment steps 0 5000 10000 15000 20000 25000 30000 Used Rollouts Validation Environment 1 Proposed method baseline (d) 0 25 50 75 100 125 150 175 200 Environment steps 0 1000 2000 3000 4000 5000 Accumulated Reward Validation Environment 2 Proposed method baseline (e) 0 25 50 75 100 125 150 175 200 Environment steps 0 5000 10000 15000 20000 25000 30000 Used Rollouts Validation Environment 2 Proposed method baseline (f) Figure 3.4: Comparisonofthecombinedproposedimprovements(ProposedMethod)againstthe baselineforallenvironments. fig. 3.4a and fig. 3.4b are the reward and number of rollouts used by the agent in the dynamic function environment. fig. 3.4c and fig. 3.4d are the reward and number of rollouts used by the agent in Validation Environment 1, fig. 3.4e and fig. 3.4f are the reward and number of rollouts used by the agent in Validation Environment 2. 3.1.3.5 Comparisonoftimesavingtobaseline Because our proposed method reduces the number of rollouts, the time to compute a plan is reduced. Additionally, rollouts from later environment steps are cheaper to compute because the remaining budget is lower. This causes our method to finish the episode faster because it allocates more rollouts to later environment steps. These two effects combine to produce a saving in wall-clock time for the entire episode, as shown in table 3.1. Experiments were run on a server with 2 Intel®Xeon Gold processors and 256GB RAM. Method Dynamic Function Validation Environment 1 Validation Environment 2 Baseline 2061.84 2687.73 3542.72 Proposed Method 1371.77 2497.36 3120.90 Table 3.1: TimeComparison Wall-clock time (seconds) required to complete five episodes. 36 0 25 50 75 100 125 150 175 0 50 100 150 200 2 4 6 8 10 12 14 16 18 (a) 0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175 200 4 6 8 10 12 14 16 18 (b) 0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175 200 4 6 8 10 12 14 16 18 (c) 0 25 50 75 100 125 150 175 0 50 100 150 200 2 4 6 8 10 12 14 16 18 (d) 0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175 200 2 4 6 8 10 12 14 16 (e) 0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175 200 2 4 6 8 10 12 14 16 (f) Figure 3.5: DataestsandSampleTrajectories. Figure 3.5a shows a dataset collected with an underwa- ter robot, and fig. 3.5b and fig. 3.5c show example trajectories from a baseline implementation and our proposed implementation respectively. Figure 3.5d shows another, more complex, dataset collected in the same location, fig. 3.5e and fig. 3.5f show example trajectories from a baseline implementation and our proposed implementation respectively. 3.1.4 Conclusion We present improvements for online adaptive sampling with a Monte Carlo-based POMDP solver which uses specific knowledge of the adaptive sampling problem structure. This allows the agent to estimate a non-parametric function by taking samples of the underlying phenomenon such as the concentration of chlorophyll in a body of water. First, we show that by changing the amount of rollouts that are allocated to more heavily favor later stages in planning, a better overall model of the environment can be created. We believe this is due to later stages having more information to plan on and therefore able to develop better and longer plans. We show 37 that searching for an optimal curve can lead to high performance increases and that reasonable curves chosen can lead to increased performance. Second, we show that the agent’s total reward can increase by changing the action exploration algorithm to one that explicitly incorporates knowledge of the number of rollouts allocated for each planning step. This works with the rollout allocation to improve selection when few rollouts are allocated. We also show that by modifying the amount of steps the agent takes from a planning tree, the overall planning can be made more efficient. We show a statistical test can be used to determine if an action can be confidently determined to be the best action. With this test we are able to reduce the number of rollouts needed to reach a comparable accumulated reward. Finally, we show that these improvements are synergistic and when used together can greatly improve the planning over a fixed-step, optimal exploration, fixed-rollout allocation planner. 3.2 LearnedParameterSelectionforRoboticInformationGathering † The dominant paradigm in field deployments is for robots to execute pre-planned paths, such as lawn- mower trajectories [111]. On the other hand, adaptive sensing approaches [68, 116] allow for greater flex- ibility and efficiency [104]. However, such approaches are rarely used in practice because they typically require significant parameter tuning and their behavior is difficult to predict and understand - formidable barriers to adoption by non-technical users. We focus on the first, parameter tuning, for a broad class of adaptive sampling techniques collectively called informative path planning (IPP) [68], in which a robot creates a plan to take sequential measurements to build an approximate map of some continuous process of interest by maximizing an objective function. The map is used to guide planning for further measurement or scientific study. Active sensing based systems have seen limited field use [47, 90] but their uptake is hampered by the need to specify complicated parameters. These parameters are abstract, difficult to explain to non-technical † Originally appeared as [39] 38 Figure 3.6: Left: Overall workflow. Environment is the robot’s sensing space (e.g. a lake) and Envi- ronment model is its current understanding of the environment (modeled here by a Gaussian process). We introduce a learned Parameter Selection agent that selects parameters (e.g. number of rollouts) of Action Selection, which determines the actions the robot should take (a POMDP solver in our work). The process repeats until a fixed number of informative path planning iterations are completed. Right: FieldTrial. We demonstrate that our approach transfers to a previously unseen real robot setup using a light sensor mounted on a mobile robot users, and can greatly affect system performance. In the worst case, mis-specifying them significantly reduces performance [90, 116] compared to a pre-planned path - a serious problem since a single missed day of data collection at sea can cost tens of thousands of USD on research vessels [95]. Conversely, correctly specifying them can more than double information gathering performance in some scenarios [116]. A typical formulation for the IPP problem is a Partially Observable Markov Decision Process (POMDP) whose (exact) solution is the (optimal) action for each possible belief over the world states (ActionSelec- tion in fig. 3.6). As the robot executes these actions in its surroundings (the Environment), it samples the phenomenon of interest and builds a representation (EnvironmentModel) which becomes an input to theActionSelection process. We show that learning-based parameter selection results in an9.53% mean improvement in the cumu- lative reward across varied environments when compared to to end-to-end learning-based approaches for 39 action selection. We also find a 3.82% mean improvement in the cumulative reward when compared to a baseline, while expertly chosen parameters only achieve a0.3% increase in mean cumulative reward when compared to the same baseline. A field experiment illustrates the superior performance of our system out of the box. 3.2.1 Background We frame the IPP problem as a POMDP and the problem of parameter selection as a Markov Decision Process (MDP) which we solve by training a policy network through reinforcement learning. We discuss related work in IPP and the associated objectives to be maximized through information gathering. Follow- ing this, we briefly discuss reinforcement learning as a solution for complex MDPs and how reinforcement learning and robotic information gathering have been previously combined and compare to our proposed system. 3.2.1.1 ReinforcementLearning Policy gradient methods are a class of reinforcement learning methods where the robot policy is repre- sented as a distribution over actions. Proximal Policy Optimization (PPO) is a popular policy gradient method used to solve robotics tasks, and has shown success in many continuous control tasks [120]. In this work, we compare against two end-to-end RL systems to demonstrate the need for combining classical IPP planning techniques with a learning system. Long short-term memory (LSTM) [67, 131] is a neural network architecture that has feedback connections and a hidden state, which enables it to process sequential data. Graph Neural Networks (GNNs) [93, 118] are a family of neural networks that can process data represented by graphical data structures. These networks can readily be applied to IPP because the LSTM cells can capture the temporal relationship whereas the GNNs can capture the spatial relation between the data samples. 40 3.2.1.2 CombiningInformativePathPlanningandReinforcementLearning Planning and optimization have been combined with RL for robotics applications [143, 141]. Specifically, informative path planning has been augmented with RL in different ways. In previous work, a learned system has been used to select between different adaptive sampling policies [23]. Recurrent policies have also been deployed directly to IPP - the policy is trained end to end such that the policy network selects the best action at each step on a graph [138]. RL on graphs has shown great promise in information gathering tasks. Particularly, work has been done to train agents to automatically predict exploratory actions by encoding a SLAM pose graph as the input to a graph neural network [18]. RL has also been combined with rollout-based solvers for IPP by learning to improve the policy used for tree exploration and rollouts by learning to predict the value function [112]. 3.2.2 FormulationandApproach We approach the problem of finding an optimal path in two steps, namely planner parameter selection and action planning. At each IPP iteration the Parameter Selection agent chooses parameters to configure the planner which then solves the underlying IPP POMDP. The Parameter Selection agent is trained using RL, to allow it to perform long horizon planning over the course of a trajectory. 3.2.2.1 ObjectiveFunctions We compare three classes of objective functions introduced in section 2.3.2 which are used for IPP to demonstrate the performance of ourParameterSelection agent. 3.2.2.2 POMDPSolver To solve this POMDP we use a rollout-based solver POMCP [124] - a Monte Carlo tree search-based al- gorithm that continually improves the estimate of the value of each action by using rollouts. The rollouts 41 simulate the robot taking actions and receiving observations to a fixed horizon h and update the value function [124]. We adopt the t-test heuristic to allow the robot to confidently take multiple steps from a single POMCP plan [116]. Because the observations GT(x) for unseen locations are not known during planning, the predicted value from the GP conditioned on previous observations is used [87]. Parameter Range Meaning POMDP Rollouts [10,300] Amount of rollouts to use from the POMDP root node POMDP Solverγ [10 − 1 ,.99] Planner Discount for future rewards T-Test Value [10 − 3 ,0.4] Determines when the planner can take multiple steps from a single plan[116] Max Plan- ning Depth h [3,15] The number of generator calls from the root node before a rollout is ter- minated Table 3.2: ParametersforthePOMDPsolverwhicharechosenbytheparameterselectionagent. These parameters critically impact the effectiveness of the POMDP solver and are difficult to set in the field for non-technical users. 3.2.2.3 ParameterSelectionAgent Rollout-based POMDP solvers are effective in computing long horizon non-myopic plans for IPP [87]. They have a suite of parameters that can be modified to greatly improve the reward gained by the planner thereby drastically impacting its performance [116]. These parameters can be set statically but do not correspond to physical quantities so they are difficult to set even by experts. Previous work has shown that these parameters can be more effectively set if they are changed at each iteration of the planner [116]. We focus on the parameters in table 3.2 because of their difficulty to set by a human operator and importance to the quality of the solution. We define the problem of setting the parameters for a POMDP solver as an MDP with a stochastic reward obtained by executing the POMDP solver’s policy in the environment. TheParameterSelection agent’s actions at time stept are the parametersθ sent to the POMDP solver, and can be represented asθ = a t ∼ π ϕ (·|s t ) with state (robot measurement and locations)s t ∈S and action (POMDP solver parameters) 42 a t ∈A, whereϕ denotes the parameters of the policyπ (policy network weights). The POMDP solver uses parametersθ to plan and execute an action for the robot. On execution in the training environment, the Parameter Selection agent receives a rewardr t = R(s t ,a t ) = f(x t ) from the environment. To solve this MDP, we employ proximal policy optimization (PPO), a widely adopted policy-gradient RL algorithm, which trains a network to directly output the selected parameters from the given state [120]. PPO trains a policy network which can be used later for task inference. A problem that arises in using RL for IPP is the lack of a clear fixed length vector input for the policy network as the agent needs to consider a variable length history of samples. Typically, in IPP the agent uses information from all gathered samples to determine the value of the next action. In the POMDP solver, this is is done by using a GP which allows the solver to reason about the sensed value at a location and the uncertainty associated with the sensed value at that location. We propose two approaches for the policy network. LSTM One solution is to use a recurrent neural network, such as a long short-term memory (LSTM), as the policy network. This architecture allows information from previous samples to be propagated forward to the current time step using the internal recurrent state. However, this method has a drawback for IPP as its structure does not capture the spatial relationship between points naturally, but instead captures a sequential relationship. We employ a network consisting of two 256 neuron layers followed by a 256 neuron LSTM layer. The LSTM network only receives the most recent sample and the metadata because it can preserve information about the previous samples in its recurrent hidden state. GNN A natural approach to encode spatial information between collected samples is to use a graph neural network (GNN). GNNs use the spatial relationships between samples by constructing a nearest neighbor graph [21]. This allows the policy network to use a non-fixed input size and to incorporate information about the spatial relationship of samples into its decision making process. The GNN consists of two inputs: the graph side, and the metadata side. The graph input is constructed by adding edges from each node 43 representing samples to the 6 nearest samples and adding the local Cartesian coordinates to the edge features. The graph side contains three Gaussian mixture model layers with a kernel size of 3 [93]. This layer type was chosen because it captures the spatial relationship between samples as edge features. The graph layers are followed by a global additive layer which adds up all of the node embeddings and is followed by a single 192 neuron dense layer. The metadata side consist of a single 92 neuron dense layer. These layers are concatenated together and processed by a 256 neuron dense output layer. Ablations We perform two ablations of our method. Both are provided access to the metadata (the re- maining number of rollouts, remaining fraction of rollouts, remaining number of IPP iterations, remaining fraction of IPP iterations, and a one-hot encoded vector of the selected objective function) of the current episode. One is additionally provided the last 10 samples as a fixed-length history. We train each model on all objective functions simultaneously. To train the Parameter Selection agent, we collect 50 environment steps on 32 parallel workers time training environment for each agent update step. We update the agent 250 times, which allowed convergence for all agents. For all other parameters, such as learning rate, optimizer, etc, we use the default values for the implementation of PPO from Ray 1.6 [94], as training appeared to be robust using these values. For performance on diverse objective functions and environments, we normalize the reward by the mean and variance (µ obj ,σ obj ) of the rewards for each objective function. We also clip this reward to[− 3,3] and add a small survival bonus (b survival = 1). To discourage the agent from superfluously using generator calls, we multiply the number of generator calls used by a small penalty (p gen =10 − 5 ) designed to encourage it to try to use as few calls as possible. The resulting reward function is R(r t )=clip (r t − µ obj ) σ obj ,− 3,3 +b survival − p gen GC(t) (3.3) 44 whereGC(t) is the number of generator calls used in the POMDP solver at timet andr t is the environment reward r t = f(x t ). The environment reward is defined by computing the objective function using the environment value for the sample, not the Gaussian process belief,µ (x i ), which is used during planning for the POMDP solver. 3.2.2.4 DataAcquisition Learning-based approaches for field robotics have suffered from a lack of training data for the phenom- ena being sampled by the robot. We gather a large corpus (1080 trajectories) of publicly available data from ocean surveys performed by ocean glider robots tasked with measuring conductivity, salinity, and temperature. We use this dataset to train the Parameter Selection agent and show that by training on this corpus, our approach generalizes to other environments and phenomena (e.g. chlorophyll sampling in small-scale lakes) on which the models are not trained. We assimilated data from three publicly available sources: The Southeast Coastal Ocean Observing Regional Association (SECOORA) [127], the Central and Northern California Ocean Observing System (CeNCOOS) [15], and the Integrated Ocean Observing System’s National Glider Data Assembly Center (NGDAC) [137]. Each of these organizations collects data in a variety of locations (e.g. the Pacific and Atlantic ocean, Gulf of Mexico, etc.). We collect and assemble this data using NOAA’s ERDDAP system [96]. We gather trajectories collected between January 1st 2005 and July 25th 2020 and manually visualize each trajectory, discarding those which were too small or discontinuous. This results in a corpus of 1080 trajectories which include the latitude, longitude, depth and time values as well as the sensed values at those locations. We train all the models used in this paper on this dataset of robot trajectories. During training and evaluation, we subsample each trajectory to be the same spatial scale. We also randomly select a sensor 45 Classical End-To-End RL Ablation Proposed Method Naive POMCP Empirically Tuned POMCP [116] GNN RL LSTM RL [138] Metadata RL- POMCP Fixed- Length RL- POMCP GNN RL- POMCP LSTM RL- POMCP LearnedModel SampleInput N/A N/A k-NN Graph Previous Sample None Last 10 Samples k-NN Graph Previous Sample UsesPOMDP Solver True True False False True True True True LearnedModel Chooses Environment Actions False False True True False False False False Trainedon GliderData False False True True True True True True Parameter Selectionhas Metadata False True False False True True True True Table 3.3: Description of Compared Methods. Naive POMCP is an POMDP solver using manually chosen conservative parameters and the t-test heuristic [116]. Empirically-tuned POMCP improves upon Naive POMCP with domain-specific features and tuned parameters [116]. Metadata RL-POMCP has access to the current task metadata. Fixed-Length RL-POMCP has access to the metadata as well as the last ten samples (locations and values). Metadata RL-POMCP and Fixed-Length RL-POMCP demonstrate the need for the parameter selection module to use the complete set of samples. LSTM methods have access to the previous sample and uses an LSTM with a hidden state to access information from previous environment steps. GNN methods use the all of the current samples in a graph neural network. type (one of conductivity, salinity, and temperature) and objective function (section 2.3.2) for each sub- trajectory. 3.2.3 Experiments To validate our approach we perform experiments in three simulated environments and a field scenario. Via the simulation experiments, we show that our approach can be trained on a large corpus of data from one environment and perform well on environments not used for training. In the field scenario, we show that our system is easy to deploy and achieves superior results with no in-field parameter tuning. In all experiments, the models are trained from the dataset described in section 3.2.2.4. We do not retrain the models for each environment. 46 3.2.3.1 SimulationExperiments In the simulated environments, we compare cumulative reward once the robot takes 50 environment steps. We compare the three objective functions defined in section 2.3.2. We perform the simulation experiments in three types of environments. In each environment, we simulate a continuous environment by interpo- lating the datasets. We populate the GP with a small amount of random samples before sampling. Ocean-CTD is derived from the dataset used to train the models (section 3.2.2.4), a large corpus of publicly available glider robot trajectories. A subset of the data is held out for testing and used to validate the performance of the algorithms on data similar to what the models are trained on. We test on 30 different trajectories for each of the three objective functions. Lake-Chlorophyll uses is a dataset of lake-scale lake surveys from a small underwater robot. This environment is different from the Ocean-CTD environment in two ways. First, the distance between robot states is reduced to3m from25m. Second, it uses chlorophyll fluorescence as the sensed quantity which is not a sensor type the models are trained on. We use five different datasets collected in a small reservoir with three seeds each. This data was collected during tests of other systems using an underwater robot equipped with a chlorophyll fluorescence sensor. This data was collected between 2017 and 2018 using lawnmower surveys of the lake while oscillating between a fixed depth and the surface. This environment is used to validate that the models can generalize and be used in environments they are not explicitly trained for. In tests with the Ocean-CTD and Lake-Chlorophyll environments, the robot (a simulated underwater vehicle) is able to move in 6 directions: up, down, left, right, forward and back and collects samples while traveling between locations. Bay-Camera is an environment in which the robot is simulated as a fixed height and fixed yaw drone over hyper-spectral orthomosaics collected from previous drone flights over a small bay in a lake. The simulated robot chooses from four actions in a plane (forward, back, left and right) and has a hyper-spectral camera which, unlike a chlorophyll fluorescence or salinity sensor does not measure a single scalar at each 47 location, instead capturing a hyper-spectral image composed of one value per pixel corresponding to each location in the viewing frustum of the camera. Additionally, in this environment the agent only makes measurements at discrete locations, unlike the previous two in which it continues to sample while traveling from one location to another. Finally, this environment is not defined by a regular cube, but by the bounds of the orothomosaic which presents difficult choices for planning. We use two different variants of the Bay-Camera environment with three seeds each. This data was collected by a drone on a field research mission for studying chlorophyll fluorescence distribution in a lake. We compare our method variations against two different classes of methods with two methods each, shown in table 3.3. We compare four different variations of our method, which we call RL-POMCP, on each task. The first, Metadata RL-POMCP, only has access to the task metadata, that consists of the remaining number of rollouts, remaining fraction of rollouts, remaining number of IPP iterations, remaining fraction of IPP iterations, and a one-hot encoded vector of the selected objective function. The second, Fixed- Length RL-POMCP, has access to the metadata and the previous ten samples (locations and values). The third, GNN RL-POMCP, uses all measured samples (as input to a GNN, to indicate spatial correlation) as well as the metadata. The fourth, LSTM RL-POMCP, uses the current sample and internal recurrent state, as well as the metadata. Metadata RL-POMCP and Fixed-Length RL-POMCP represent ablated versions of our system and demonstrate the need for considering the collected samples in parameter selection. Metadata RL-POMCP is a minimal encoding which would allow for a minimal adaptive behavior. Fixed-Length RL-POMCP al- lows parameter selection to consider a finite history of samples, in contrast to LSTM RL-POMCP and GNN RL-POMCP which allow for consideration of all samples. We compare these variations to two methods which do not use learning to select the POMDP param- eters (Classical methods), and two methods which do not use a POMDP solver but directly learn to map from states to actions using reinforcement learning (End-to-End RL methods). The first classical method 48 176 178 180 182 184 186 188 190 192 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Entropy 35 40 45 50 55 60 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Expected Improvement 70 80 90 100 110 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Probability Improvement Figure 3.7:ResultsonOcean-CTD fromthedatasetofpubliclyavailableglidertrajectorieswhich themodelsweretrainedon. We compare over 30 different trajectories and show the cumulative reward after 50 timesteps. We compare the three objective functions described in section 2.3.2. We find that our proposed methods outperform previous methods in all three objective functions. uses a manually chosen set of parameters. The second classical method uses the empirically tuned planner in [116], tuned on data from the Lake-Chlorophyll environment. The two end-to-end RL methods use the LSTM and GNN in the same way as their RL-POMCP counterparts. However, they directly choose actions, as opposed to selecting parameters for a POMDP solver which chooses the actions. 3.2.3.2 SimulationResults Ocean-CTD The results for the Ocean-CTD environment can be seen in fig. 3.7. On the entropy objec- tive function, the GNN, Fixed length, and Metadata RL models outperform the others. For the expected improvement objective function we find that all the RL-POMCP models perform well. The naive POMCP 49 172.5 173.0 173.5 174.0 174.5 175.0 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Entropy 40 45 50 55 60 65 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Expected Improvement 80 85 90 95 100 105 110 115 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Probability Improvement Figure 3.8:ResultsonLake-Chlorophyllfromadatasetofsmallerlaketrajectoriesusingachloro- phyllfluorescencesensorthatthemodelswerenottrainedon. We compare each objective function on 5 datasets with 3 seeds each. We find that GNN RL-POMCP outperform previous methods with the entropy objective function. All RL-POMCP agents perform well on expected improvement and probability of improvement performs as well as the POMCP model that was emprically tuned on theLake-Chlorophyll dataset. For the probability of improvement objective function, all RL-POMCP methods outperform the baselines. In all environments, the naive POMCP and empirically tuned POMCP methods are comparable, which is likely because the empirically tuned model is tuned on data from theLake-Chlorophyll dataset. Lake-Chlorophyll The results for the Lake-Chlorophyll environment can be seen in fig. 3.8. This dataset is purposely chosen to show the generalization capabilities of our models when used on a dataset it was not directly trained on. We find that our LSTM RL-POMCP model performs exceptionally well on the 50 3800 4000 4200 4400 4600 4800 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Entropy 1200 1250 1300 1350 1400 1450 1500 1550 1600 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Expected Improvement 2300 2400 2500 2600 2700 2800 2900 3000 3100 Cumulative Reward Naive POMCP Empirically Tuned POMCP GNN RL LSTM RL Metadata RL-POMCP Fixed Length RL-POMCP GNN RL-POMCP LSTM RL-POMCP Probability Improvement Figure 3.9: Results for Bay-Camera, a simulated drone flight environment. In this environment, a drone is simulated flying over previously collected imagery. This imagery was collected using a hyper- spectral camera over a lake with algae mats on it. We evaluate on 2 images with 3 seeds each. Features which are not water were masked out. We find that our proposed methods perform comparably to prior methods on entropy. On the Bayesian optimization objective functions our methods perform well com- pared to the other approaches, demonstrating the ability of our approach to generalize to previously unseen sensor types. entropy objective function, followed by the GNN RL-POMCP model. In the expected improvement objec- tive function, we find that both the LSTM RL-POMCP and GNN POMCP-RL models again performs well, and that the naive parameters and empirically tuned system performs well. For the probability of improve- ment objective function, the metadata and fixed length RL-POMCP models perform the best, followed by the GNN and LSTM RL-POMCP models. Bay-Camera The results for the Bay-Camera environment can be seen in fig. 3.9. This environment contains both a different sensor (a camera) and a different sensed quantity (hyper-spectral reflection on 51 0 10 20 30 40 50 Step 0 50 100 150 200 250 Parameter Value Rollouts 0 10 20 30 40 50 Step 0.2 0.4 0.6 0.8 Parameter Value POMDP Solver Gamma 0 10 20 30 40 50 Step 0.15 0.20 0.25 0.30 0.35 0.40 Parameter Value T-T est Value Method Empirically Tuned POMCP GNN RL-POMCP LSTM RL-POMCP 0 10 20 30 40 50 Step 7 8 9 10 11 12 Parameter Value Max Planning Depth Figure 3.10: Trajectoriesforeachparameteracrossallevaluationenvironmentsfortheexpected improvement objective function (eq. (2.6)). LSTM RL-POMCP and GNN RL-POMCP learn distinct policies: GNN RL-POMCP many rollouts with a shorter horizon until the end when it tends to increase the horizon and reduce the number of rollouts, while LSTM RL-POMCP varies around a mean value the entire time. a lake) than the dataset the models were trained on. We find that in Entropy and Expected Improvement the methods which just use a POMDP solver perform well and the methods which use end to end RL perform very poorly. GNN RL-POMCP performs well on these two objectives but is outperformed by the empirically tuned system. For the probability of improvement objective function, all RL-POMCP methods perform well, except for the Fixed length RL-POMCP. The naive POMCP performs slightly worse than the RL-POMCP methods. ParameterTrajectories To show that theParameterSelection agent learns a policy which varies the chosen parameters, we plot the parameter values over time in fig. 3.10. This demonstrates that the agents do not simply chose a fixed set of parameters for all environment steps, but instead vary their parameters with the received measurements. 3.2.3.3 FieldExperiment We deploy our system on a small mobile robot (Turtlebot2) with a light visible sensitive sensor. To model a complex distribution, we deploy two 1440 lumen light sources: one omnidirectional light source, and one directed one. The robot moves in a 15x15 grid, spaced 0.5m apart. The robot takes 50 steps in the environment (actions and measurements). Based on simulation results, we deployed GNN RL-POMCP because of its superior performance in unknown environments with the expected improvement objective function. The trajectory the robot took using both the GNN RL-POMCP and Empirically Tuned POMCP 52 Best Empirically Tuned POMCP Trajectory 1 m Best GNN RL-POMCP Trajectory Robot Trajectory Light Source Robot Location 0 10 20 30 40 50 Step 0 20 40 Cumulative Light Measurement Method Empirically Tuned POMCP GNN RL-POMCP Figure 3.11: FieldResults. Top: Comparison of trajectories for Empirically Tuned POMCP and GNN RL- POMCP deployed on a real robot taking light measurements. The red arrows show the trajectory taken by the robot. Two lights were deployed with existing light in the area. Middle: Comparison of the sum of the measurements taken over the number of actions taken by the robot and a picture of the experimental setup. Bottom: Experimental setup used with two placed lights. can be seen in fig. 3.11, as well as the cumulative sum of measurements over three trials for each method. Our system is able to find high lighting areas even in the presence of complex distributions and reflections against the walls. Qualitatively, we find that the system delivers on our goal of easy deployment for IPP. In the field, we had to select the area for the robot to move in in, the number of measurements to take, and the objective function. This is comparable to the number and type of parameters in a lawnmower survey. 53 All the parameters are comprehensible to a non-technical user, in contrast to the typical parameters in table 3.2. 3.2.4 Conclusion We presented a reinforcement learning-based approach to automatically tune the parameters governing action selection for a broad class of adaptive sampling techniques collectively called informative path planning. To train the parameter selection agent, we collected a corpus of 1080 publicly available and highly diverse field robot trajectories.We evaluated on 37 instances of 3 distinct environments, where our proposed method for learning-based parameter selection results in a 9.53% mean improvement in the cumulative reward when compared to end-to-end RL approaches and a3.82% mean improvement in the cumulative reward when compared to a baseline method, while expertly chosen parameters only achieve a 0.3% mean improvement when compared to the same baseline. We demonstrated our approach in the field showing that it is practical and effective for automatic deployment of IPP. 54 Chapter4 SensingwithRobotStateUncertainty In this chapter we discuss how to make use of the uncertainty in the robot’s state. Uncertainty in the robot’s state can arrive from imperfect sensing on the environment from sensors like inertial measurement units and global positioning systems, which have noise and drift over time. Typically, in active sensing these real-world problems are ignored, and the algorithm simply assumes perfect localization of the robot, or that localization of the robot is a secondary task. In section 4.1 we discuss how to use the uncertainty estimates from heterogeneous probabilistic state estimators to increase the accuracy and properly propagate uncertainty to the sensing planner. In sec- tion 4.2 we discuss how to prioritize loop closure information for a SLAM system to produce the best possible estimate of a multi-robot teams location and map. We use strategies and algorithms from active sensing and treat the problem as a quasi-active sensing problem in which loop closures are treated as the sensed signal, and the certainty in robot state is treated as the model. This also serves as a precursor to chapter 6 as we are building techniques to perform sampling on factor graphs. 55 4.1 IncorporatingLocalizationNoiseIntoActiveSensing ∗ Adaptive sensing systems generally assume that the samples are taken at precise locations. In practice, this assumption is violated because there is always some uncertainty in the robot’s pose. This is especially true in the case of autonomous underwater vehicles (AUVs) when the robot does not have GPS and relies on integrating inertial measurements. To model this uncertainty, a state estimator is used to derive a belief distribution over the current pose of the robot. Here, we utilize this distribution to more accurately model the phenomenon of interest instead of using the expectation of the robot’s location as a point estimate. When the noise is heteroskedastic, the robot should pay more attention to areas of higher positional noise since the model in these regions can be improved with additional noisy samples [105]. We are motivated by the design of aquatic sampling missions that exploit a previously generated map of a phenomenon of interest from overhead imagery from an unmanned aerial vehicle (UAV) as a prior for an autonomous underwater vehicle (AUV). The adaptive sensing method must take into account both the uncertainty on the prior data from the UAV as well as the uncertainty of the sampling robot while it is progressing in the mission. In this work we aim to extend prior methods for long horizon adaptive sensing and adaptive sens- ing with uncertain inputs to produce a method which can perform long term planning with localization uncertainty that is agnostic to the belief distribution about the robot state. 4.1.1 ProblemStatement Active sensing with localization uncertainty and a finite budget can be formulated as follows P ∗ =argmax P∈Φ E[F(P)]|c(P)≤ B (4.1) ∗ Originally appeared as [36] 56 where Φ is the space of all robot trajectories, B is some budget, F is some objective function and c is some cost function,P ∗ is the optimal trajectory [68]. In this work we maximize the expectation the agent has for the objective over the trajectory. This differs from the statement in [68] which does not use the expected objective, but the objective directly. Our work makes the same assumptions about the form of the cost function and objective function (i.e. the cost function is monotonically increasing and the objective function is submodular) as those in [68]. 4.1.2 RelatedWork The adaptive sampling survey [70] notes the usage of filters to remove sensor and localization noise, but does not cover methods explicitly incorporating this noise. Prior work [82] using UAV data to determine sampling locations, does not explicitly incorporate either the UAV or ground robot’s localization noise. A framework for Bayesian optimization when a robot’s pose is uncertain is found in [97]. The authors probabilistically bound the regret and show applications to robots, using a greedy method for objective maximization. A framework for multiple step prediction with uncertainty in input location with a Monte Carlo approximation is given in [54]. An approach [68] incorporating Rapidly Exploring Random Trees (RRT) into the adaptive sampling setting shows that a modified RRT algorithm (RIG-Tree) can maximize submodular functions. This assumes that there is certainty in the pose of the robot and that the pose can be fully specified. In [105] the effectiveness of informative path planning with uncertainty on the input is shown, with the key insight that modeling localization uncertainty can improve the overall mapping quality. This is achieved by augmenting the acquisition function with an explicit term for uncertainty. The modeling assumption in [105] assumes that the mean position is a good description of the state when the sample is taken, which may not be true in multimodal localization scenarios or when localization is particularly bad in some areas. Previous works have investigated using noisy samples from the environ- ment to decrease the uncertainty in the robot pose using techniques similar to adaptive sampling [142]. 57 Unlike our work, [142] does not try to use the characteristics of these noisy samples to approximate the underlying concentration. Our work extends the framework [105, 97] by utilizing the RIG-Tree algorithm to perform longer term planning, using a different modeling technique to incorporate noise, and allowing non-Gaussian sampling distributions. 4.1.3 Approach 4.1.3.1 GaussianProcessesWithInputUncertainty To incorporate localization uncertainty into the environmental model, we extend the framework in [108] for querying a Gaussian process with uncertainty on the input points. The Gaussian process posterior with uncertain inputs [54] is calculated as p(y|D) = R p(y|x)p(x)dx where y ∗ is the estimated sample value, x is a sampling location drawn from D (the belief distribution over states at that time from the state estimator). When it is possible to sample from D, Monte Carlo sampling can be used, resulting in p(y|D) ≃ 1 N Σ N n=1 p(y|x n ) where x n are independent samples from p(D). Approximating p(y|D) as normal by taking the mean and variance of the resulting samples yields ¯µ (D) and ¯σ (D). ¯µ (D)=E x [µ (x)] ¯ σ 2 (D)=E x [σ 2 (x)]+var x (µ (x)) (4.2) We extend this formulation in two ways. We calculate the likelihood of the samples according to the Gaussian process distribution and draw the sampled points from their distribution at every step. Critically, our formulation only depends on the ability to sample the robot localizer’s belief state at past and future locations and is agnostic to the parameterization of the belief state, which allows this method to support localizers such as pose graphs, Kalman filters and particle filters. 58 Algorithm1:GaussianProcessWithSampleUncertainty Dq is the belief distribution over the query points, Dt is the belief distribution over the previously collected points, l i is the ith likelihood which is used for weighting and sub-sampling andθ are pretrained hyper-parameters fori∈iterationsdo x i t ← sample(Dt) l i ← P(yt|GP(x i t ,yt,θ ))P(x i t |Dt) x i q ← sample(Dq) µ i q ,σ i q =GP(x i t ,yt,θ ).query(x i q ) µ ← Eq[µ q] σ 2 ← Eq[σ 2 q ]+varq(µ q) 4.1.3.2 PathPlanningforAdaptiveSensingwithInputUncertainty RIG-Tree [68] is an algorithm for long-horizon adaptive sensing planning. Each node in the tree keeps track of the total objective and cost along a path to that node. The objective functions for adaptive sensing generally useµ andσ from the Gaussian process model which can be extended to the stochastic case using algorithm 1. The objective used for adaptive sensing is conditional entropy (which has also been used in prior work as the objective for sensor placement [59]). Given the ability to calculate the stochastic objective function, we present Uncertain Input RIG-Tree, an extension of RIG-Tree to the stochastic case using algorithm 1 to compute the expected information. Uncertain Input RIG-Tree must also extend the formulation of RIG nodes to keep track of the state of the filter at each proposed location along a trajectory. This is done in order to calculate the objective at proposed sampling location. 4.1.4 Experiments 4.1.4.1 SimulationExperiments To test the performance of algorithm 1 when applied to adaptive sampling, we test both the ability of the Monte-Carlo approximation to approximate the underlying distribution and the ability to approximate the objective function. Testing was done in environments designed to test interpolation methods (eqs. (4.3) and (4.4)). x,y andz are the robot’s global coordinates. 59 f(x,y,z)=4(x− 2+8y− 8z 2 ) 2 +(3− 4y) 2 +16 √ z+1(2z− 1) 2 (4.3) f(x,y,z)=100(e − 2/x 1.75 +e − 2/y 1.5 e − 2/z 1.25 ) 2 (4.4) 4.1.4.2 ComparisonofMonte-CarloMethods In order to test the predictive power of algorithm 1, we sample the environment at 1000 evenly spaced samples and add Gaussian noise. Permuted subsets of the samples are used as training data for the different methods to be tested. This training data are used to predict the ground truth and compare the error between the two. The data span a 3D cube with all dimensions in [0,1]. The noisy data were generated by adding Gaussian noise with σ = 0.01. The following are tested: A noiseless GP representing the ground truth, a traditional GP representing a model on noisy data without any corrections, subsampling the data based on likelihood, and weighing the data based on likelihood. fig. 4.1 shows the outcome of this experiment, illustrating that likelihood weighting and keeping all samples produces the best estimate. When subsampling,All keeps all iterations,100 keeps only the 100 most likely iterations andSingle keeps only the most likely iteration. 4.1.4.3 NoisyObservations We compare our approach with RIG-Tree without doing noise modeling. Both methods use eq. (2.3) as the objective, but the baseline does not use algorithm 1 for computing the objective. The environments are the same as fig. 4.1. The robot runs for a fixed time budget of 3000 m in a 30x30x30m cube taking samples every meter. The internal model is compared to a ground truth model over30 3 evenly spaced points. Prior data are given to the robot in the form of 10 by 10 samples (modeled after pictures from a UAV) taken as noisy samples from the function, with σ = 5. For the noise characteristics, we use a simplified noise 60 200 400 600 800 1000 Kept Amount 0 2 4 6 8 10 RMSE Error on Environment 1 MC 100 MC 100 Weighted MC All MC All Weighted MC Single Noiseless Traditional GP 200 400 600 800 1000 Kept Amount 0 50 100 150 200 250 Error on Environment 2 Figure 4.1: Error when predicting from noisy data using algorithm 1. Environment 1 is sampling eq. (4.3), Environment 2 is eq. (4.4). All MC methods were run for 1000 iterations. Six seeds each. model. The noise characteristics for the underwater vehicle are varied linearly fromσ = 1 at minimum depth toσ =3 at maximum depth in all spatial directions. fig. 4.2 illustrates the results of this experiment. Using Monte Carlo sampling reduces the variance between seeds and, in the first environment, lowers the average error. In environment two, the average error is comparable, but the variance between runs is smaller. 4.1.4.4 ExperimentswithDataCollectedfromRealEnvironment In order to evaluate the usability of our algorithm on real world data we evaluate the error of a robot taking samples from orthomosaics created by flying a drone over water. These orthomosaics were created from data taken in Clear lake, California and created using structure from motion software. The UAV used to collect these images carries a hyperspectral imaging camera with a 710nm band. We use the reflectivity of this band as the desired concentration to be modeled. 61 0 1000 2000 3000 Budget (m) 20 25 30 35 40 RMSE Error On Environment 1 RIG Uncertain RIG (Ours) 0 1000 2000 3000 Budget (m) 222 223 224 225 226 227 Error On Environment 2 Figure 4.2: Error when predicting from noisy data using algorithm 1. Environment 1 is sampling eq. (4.3), Environment 2 is eq. (4.4). The robot is simulated for a fixed budget and compared against a noiseless evenly spaced ground truth. Each method runs with three seeds. 4.1.4.5 PoseGraph-BasedLocalization To model the covariance of the agent at different locations, we take the GPS/IMU readings from the UAV and construct and solve a pose graph using GTSAM [46]. This is to model the error the robot would have if it were to plan using the raw imagery provided by the UAV. The robot is run for 2000m, re-planning every 20m. We compare with a baseline method which simply integrates the noisy samples and attempts to predict from those. This uses the RIG-Tree planner method similar to the last experiment. This is compared against two other methods. The first method does not use algorithm 1 in calculating the variation for the objective equation, eq. (2.3), but does use Monte Carlo estimation for computing the underlying concentration to compare for computing the error. The final method uses algorithm 1 for calculating both the objective function and the estimate of the underlying concentration. The results of the experiment are seen in fig. 4.4. The full method outperforms both the baseline method and the method which does not plan with the added variance from the uncertainty in the robots pose. Both improvements work together to provide a boost in accuracy from both a better estimate of the underlying value and better planning to take more samples from higher noise areas. In environment two the baseline 62 (a) (b) (c) (d) (e) (f) Figure 4.3:EnvironmentscreatedfromdatagatheredinClearlake,Californiafromdatacollected byaUAVwithahyperspectralcamera(fig.4.3d). Figure 4.3e shows an algal mass in true color while fig. 4.3f shows the hyperspectral reflectivity at the target band. method on average performs better but has a very high variance in the final error, while the Monte Carlo methods tend to have a low variance in the end error. In environment four the robot cannot make a better model by taking noisy samples because there is too much noise on the samples to accurately reconstruct without doing a Monte Carlo estimation. 4.1.4.6 ParticleFilter-BasedLocalization To test the ability of the proposed method to work with non-parameteric posterior distributions from state estimators, we simulate a surface vehicle which takes noisy ranges measurements from 10 landmarks and localizes itself using a particle filter. The robot is run for 2000m, re-planning every 20m. To update the localizer, the robot receives noisy measurements to 10 randomly placed landmarks withσ = 3. The filter is initialized with 100 particles normally distributed around the robot’s true starting location. The baseline method uses the weighted mean of the particle filter as the sampling location while the Monte-Carlo methods sample the distribution of particles according to their weight to sample from D. To simulate the filter when planning, the particles and weights are added to each node. When extending samples, the particle filter is simulated forward with the parent’s particles and weights in order to calculate the 63 7.45 7.50 7.55 7.60 7.65 7.70 RMSE Environment = 1 Method Baseline MC Model MC Objective MC Model 12.95 13.00 13.05 13.10 13.15 13.20 13.25 Environment = 2 0 500 1000 1500 2000 Time steps 8.60 8.65 8.70 8.75 RMSE Environment = 3 0 500 1000 1500 2000 Time steps 3.2 3.4 3.6 3.8 4.0 4.2 Environment = 4 Figure 4.4: Errorbetweennoisysamplesandgroundtruthonfourenvironmentsrepresentedin section 4.1.4.4. Noise is generated from a pose-graph based localization using the UAV’s GPS/IMU. 10 seeds used. objective for the sampling location. As seen in fig. 4.5 the Monte-Carlo based methods outperform the baseline method with a non-Gaussian posterior. In environment four, the Monte Carlo run is outperformed by the baseline method. This environment has the lowest data range of all the environments and may not benefit from using a Monte Carlo estimation as much. Both Monte Carlo methods do suffer from high variance which could potentially be reduced with more iterations of the model estimator. 4.1.5 Conclusion We propose a novel way to incorporate localization noise into adaptive sampling. When there is sufficient and varying noise on the state observations the robot’s state estimator will not be accurate. This is common in field scenarios where the pose observations contain noise. We find that frameworks which use the state estimator’s maximum likelihood estimate of the robot’s pose may fail to properly model the true 64 7.50 7.55 7.60 7.65 7.70 RMSE Environment = 1 Method Baseline MC Model MC Objective MC Model 12.8 12.9 13.0 13.1 13.2 Environment = 2 0 500 1000 1500 2000 Time steps 8.45 8.50 8.55 8.60 8.65 8.70 RMSE Environment = 3 0 500 1000 1500 2000 Time steps 2.5 3.0 3.5 4.0 Environment = 4 Figure 4.5: Errorbetweennoisysamplesandgroundtruthonfourenvironmentscollectedfrom theenvironmentsrepresentedinsection4.1.4.4. A particle filter with noisy measurement to 10 land- marks is used. 3 seeds run. underlying field being sampled. By doing Monte-Carlo estimation from the belief distribution over robot states generated by the localizer, we are able to better estimate the concentration of an external field from sparse samples compared to previous methods. We show that the field estimation may be poor if the uncertainty in the robot’s state estimator is not taken into account. Our framework is a novel approach for adaptive sampling under different types of robot pose uncertainty, such as those generated by a state of the art pose estimation framework. We show that doing a Monte-Carlo estimate of both the objective function during planning and the field estimation improves the planning and model used in adaptive sampling. This modification of an existing adaptive sampling algorithm improves performance when modeling from noisy pose observations and reduces the variance in the error between the robot’s model and ground truth. We validate our findings on synthetic functions showing that the plan and model can be improved by our method. We also validate our findings on four real world data-sets from aerial imagery over a lake using complex state estimators from real world dynamics. 65 4.2 Loop Closure Prioritization for Efficient and Scalable Multi-Robot SLAM † Simultaneous Localization and Mapping (SLAM) has become an essential piece in the perception stack of most robots and autonomous vehicles. SLAM systems typically rely on two sources of information: odometry measurements (which describe the motion of the robot) and loop closures (which occur when a robot revisits an already seen location). Loop closures are especially relevant with long operation times or large environments, since they are necessary in order to correct odometry drift when a robot revisits a previously seen part of the map. In the multi-robot scenario, loop closures also serve the purpose of aligning the maps from individual robots to create a consistent, drift-free map. This is especially important in the context of the DARPA Subterranean Challenge where robots must find hidden artifacts in a large underground environment [27]. For a multi-robot system operating in a large-scale environment, it is typically impractical to process every detected loop closure in the allotted mission time, since the number of possible candidates grows rapidly with time and the number of robots. For example, in the Tunnel dataset (see Section 4.2.4.1) we have around 150,000 candidate loop closures. As our system can process these at around 0.13 seconds per loop closure, it would take up to four and a half hours to process every loop closure, compared to a mission time of one hour. Additionally, not all loop closures provide improvement in the overall SLAM solution; in fact, many loop closures, such as those in parallel hallways, are false positives, and can cause the overall SLAM solution to degrade if these loop closures are added. To address this challenge, we propose a loop closure system that prioritizes certain loop closure candidates as to ensure that loop closures that are likely to be effective are computed first. The system consists of three modules that each address the problem of choosing loop closures in different ways. We develop this system in a centralized multi-robot SLAM framework [42, 16]. † Originally appeared as [35] 66 Figure 4.6:Demonstrationofthebenefitofourloopclosureprioritizationsystem. We showing the accuracy of SLAM maps generated with (b) and without (a) loop closure prioritization. Data is from four robots driving over six km in the Kentucky Underground Mine. We are only able to compute 0.5% of the candidate loop closures. Picture taken within mine to demonstrate composition. The first module approaches the prioritization problem by inspecting the underlying pose graph and predicting which loop closures will result in the greatest reduction in pose uncertainty. The second module prioritizes loop closures by assessing the observability of the associated sensor data. In this work we focus on the lidar, where more observable point clouds have less ambiguity in geometry. Higher observability reduces the chances of false loop closures, and increases the chances of accurate alignment. The third module uses artificial, deployed beacons to provide high-priority loop closures. Here, we focus on deployed radio beacons that transmit a unique ID to provide unambiguous place detection (like [49]), but using the Received Signal Strength Indicator (RSSI) for approximate proximity detection. 67 4.2.1 BackgroundandRelatedWork 4.2.1.1 PointCloudRegistration In lidar based SLAM systems, loop closures are usually computed by estimating the transform between two point clouds using point cloud registration. This is either based on iterative techniques to align dense point clouds in the case of Iterative Closest Point (ICP) [122], or on the detecting and matching of local geometric structures in the case of feature-based registration method [43, 40]. Importantly, point cloud registration is an expensive operation and cannot be performed for every possible candidate in the case of large-scale, multi-robot SLAM. In this work, we prioritize loop closures before performing registration to maximize the accuracy improvements for the computation expended. 4.2.1.2 LoopClosurePrioritizationSystems The system we present in this paper performs loop closure prioritization to support large-scale, multi- robot SLAM. At the time of publication we know of no other directly comparable system, however there is existing work in loop closureselection [144, 83, 19]. These systems perform an outlier rejection function by selecting the most consistent inliers from a set of loop closures. To do this selection, however, these systems require the loop closure transforms to be computed first, a computationally expensive step. In contrast, our work selects and prioritizes loop closures before computation, to increase robustness and accuracy in a computationally efficient way. 4.2.1.3 ActiveLoopClosure Another similar area to our system is active SLAM, where an agent will be guided to create a loop closure that helps the overall SLAM solution [13, 12]. Our work is similar to these works methodologically because we estimate the utility of candidate loop closures but differs in that we are passively observing candidate 68 Loop Closure Generation Individual Robot Maps Loop Candidates Observability Prioritization Graph Information Prioritiziation RSSI Prioritization Queue Prioritized Loop Candidates Loop Closure Computation Consolidated Loop Candidates SLAM Backend Loop Closure Transforms x 0 x 1 x 2 x 3 x 4 Solver Idle Message x 0 x 1 ... Robot 1 Robot 2 Robot N Figure 4.7:DiagramofSLAMmulti-robotfront-endandloopclosureprocessingpipeline. Modules in blue are proposed in this work. Robot maps are merged on a base station and passed to the generation module, which performs proximity based analysis to generate candidate loop closures. These candidate loop closures are sorted by the prioritization modules, which feed into a queue. The loop closure solver computes the transformations between pairs of lidar scans which is given to the SLAM back-end which adds a loop closure factor. loop closures. We also differ in our goal, active SLAM typically guides robot navigation, whereas our approach looks to maximize the efficiency of computation. 4.2.1.4 GraphNeuralNetworksandSLAM Graph neural networks have been used in SLAM, factor graphs and other probabilistic models before, in a similar way to what we propose [145, 133, 117]. The closest to our work is the application of GNNs for selecting actions which increase the localization certainty in target coverage [151]. Our work differs by selecting loop closure candidates to perform point cloud registration on, rather than selecting actions, as in [151]. 69 4.2.2 Approach In this section, we first contextualize the multi-robot SLAM front-end in LAMP [42, 16], the system that our approach was tested in. Then, we provide a description of the various steps and modules in the multi- robot SLAM front-end. Finally, we highlight the prioritization modules, which are the main contribution of this work. 4.2.2.1 SLAMSystemOverview LAMP consists of three main components: the single-robot front-end, the multi-robot front-end, and the multi-robot back-end. The single-robot front-end runs on all the robots and processes raw input sensor data to send pose graphs and keyed-scans to the base station. The nodes in the pose graph represent estimated poses placed every2m traversal and the edges represent the relative odometric poses between between nodes. A keyed lidar scan is attached to each node in the pose graph and is the point clouds collected at the corresponding time. The multi-robot front-end runs at the base station and consists of the loop closure detection system, which collects the graph nodes and keyed-scans to produce loop closures. The multi-robot back-end then collects the graph produced by the single-robot front-ends and the loop closures into a pose graph. The back-end solves a pose graph optimization problem using Graduated-Non- Convexity (GNC) [144] for additional robustness to erroneous loop closures. We then stitch the keyed- scans together using the optimized poses to create the final LAMP map. For more detail see [42, 16]. The system does not assume that the robots are always in communication with the multi-robot base station. If a robot loses connection with the base station, it accumulates incremental pose graphs and keyed scans until communication has been regained. The system is initialized by aligning all robots to a common fixed frame using a specially designed calibration gate. The full multi-robot SLAM front-end is shown in Fig. 4.7, and consists of three main steps: loop closure generation, loop closure prioritization, and loop 70 closure computation. All the SLAM front-end modules run in parallel and operate in a queued message passing system. 4.2.2.2 LoopClosureGeneration While there are numerous ways to generate potential loop closures, such as place recognition [14, 119, 25] or junction detection [43], we use proximity based generation in our experiments. This approach generates candidates from nodes that lie withing a certain distanced from the most recent node in the pose graph; d is adaptive and is defined as d = α |n current − n candidate |, which is dependent on the relative traversal between two nodes for the single-robot case and d = αn current , which is dependent on the absolute traversal for the multi-robot case, wheren current andn candidate are the index of the current and candidate nodes respectively and α is a constant (0.2m). Loop closures are generated when new keyed scans are sent from a robot to the base station. If the robots are not in communication range, the prioritization and computation nodes continue to work on the backlog of loop closures. 4.2.2.3 LoopClosurePrioritization In order to select the best loop closures to process during the mission execution we propose three indepen- dent modules that orders the loop closures to process in batches. TheGraphInformationPrioritization module, which we will discuss in Section 4.2.3, uses a graph neural network to determine which loop clo- sure candidates would decrease the error the most if they were added to the graph. The Observability Prioritization module, which we will discuss in Section 4.2.3.3, prioritizes loop closure candidates with point clouds that have many geometric features and are particularly suitable for ICP-based alignment. The RSSIPrioritization module, which we do not discuss here, attempts to find loop closures which are close to a known radio beacon. 71 A queue is used to combine the loop closure candidates suggested from the prioritization modules, using a simple round-robin queue which takes loop closure candidates from each prioritization module equally. Once these loop closure candidates are ordered, they are stored until the loop closure computa- tion module is done computing the transforms of the previous loop closures. Each time the loop closure computation node is done processing the previous set, the queue sends another fixed-size set of loop clo- sures for computation. This allows the generation and computation rates to be independent, and the loop closure computation module to be saturated as often as possible. 4.2.2.4 LoopClosureComputation The loop closure computation module performs a two-stage process to compute the loop closure transform: first we use SAmple Consensus Initial Alignment (SAC-IA) [113], a feature-based registration method, to first find an initial transformation, and then perform Generalized Iterative Closest Points (GICP) [122] to refine the transformation. We discard the loop closures that either have a mean error that exceeds some maximum threshold after SAC-IA (32m in our experiments) or have a mean error that exceeds some max- imum threshold after GICP (0.9m in our experiments). The remaining loop closures are then sent to the back-end, and the computation module requests more loop closures to both the queuing and the Graph In- formation Prioritization modules. To compute the loop closures in an efficient and scalable manner, we use an adaptable-size thread pool to perform computation in parallel across the current consolidated loop clo- sure candidates. The most computationally expensive step in this system is the loop closure computation step. The prioritization approach is designed to achieve maximum benefit from computation expended in loop computation, by only processing the loop closure candidates most likely to succeed (Observability, RSSI), and improve localization (GraphInformation) 72 4.2.3 GraphInformationPrioritization The graph information prioritization module seeks to find sets of loop closures that reduce the uncertainty in the poses in the underlying pose graph. The graph information module selects a set ofB loop closures each time the loop closure computation module is idle, selecting the next set ofB loop closures while the loop closure computation module processes the current set. The graph information prioritization module continues generating sets of loop closures until the mission ends. We propose a system which analyzes the desirability of a loop closure by learning to estimate the reduction in uncertainty attained by adding a set of loop closures. Formally, we seek to minimizef(l)=argmin l⊂ L e(PG∪l)s.t.|l|=B wherePG is a pose graph,L is the set of all the current candidate loop closures,B is some fixed amount of loop closures per batch, ande is the objective function which is the trace of the covariance of the pose estimates after pose graph optimization. Directly optimizing this equation is difficult for two reasons. The first difficulty is that computinge(PG∪l) directly requires solving the pose graph with each hypothetical set of loop closures. This is a large computational cost for each evaluation of the objective function. Secondly, there are numerous possible combinations ofl⊂ L, more preciselyO(L B ). 4.2.3.1 ComputingtheObjectiveFunction Computing e(PG∪l) requires solving the underlying pose graph, which is prohibitively expensive for optimization as this objective must be queried many times. To solve the problem of computinge(PG∪l) efficiently, we train a graph neural network (GNN) which learns to predict the covariance trace after the pose graph is solved. We call this estimate of the objective˜ e(PG∪l). The graph neural network is quicker to evaluate than solving the underlying pose graph, allowing us to compute the objective function online many times for optimization. We found that in the Tunnel dataset (see Section 4.2.4.1) the graph neural network takes on average 0.8 seconds to evaluate while the pose graph optimizer takes an average of 15 seconds. 73 The input to the graph neural network is constructed by adding graph neural network nodes for each pose in the pose graph. The features of these nodes are the translation of the pose as well as the transla- tional covariance for each pose, in the world frame. The graph neural network edges are constructed by taking the factors between poses and labeling each edge with a one hot encoding of its factor type, such as loop closure or odometry. The graph neural network is constructed by three layers of graph convolutions followed by a global additive layer and four dense 64 neuron wide layers interleaved with batch normalization layers. In this work, we use the Gaussian Mixture model layer with four kernels for the GNN layers [93]. This layer type is chosen because of its ability to handle edge features, such as the type of factor. To train the graph neural network we compile a dataset of pose graphs solved with various loop clo- sures from previous simulated and real datasets. The target output of the graph is the trace of the covari- ances of the pose graph after it is solved by the back-end, equivalent toe(PG∪l). By using a graph neural network, we are able to computee(PG∪l) much more efficiently, albeit with less accuracy. 4.2.3.2 MinimizingtheObjectiveFunction We aim to minimize e(PG∪l) by selecting a set of loop closures l. This set of|l| = B loop closures is chosen while the loop closure computation module aligns the previous set of loop closures. The number of loop closures in each set,B, is chosen to trade off between the time it takes to optimize the set of loop closures and the delay in incorporating new candidate loop closures intoL, the set of all current candidate loop closures being optimized over. To minimize the objective function we modify the simulated annealing optimization algorithm [5]. This algorithm maintains a set of loop closures and iteratively chooses new sets of loop closures which decrease the estimated error. Simulated Annealing is used to explore sets of loop closures stochastically without the use of gradient information. The cooling constant,c, determines 74 Figure 4.8:ExampleoutputoftheGraphInformationPrioritizationmoduleontheUrbandataset. Green lines are selected loop closures, red are not selected. Orange, blue and purple edges are individual robot odometry. Algorithm2: Graph Information Prioritization Algorithm PG is the current pose graph,L is the set of candidate loop closures,Tmax is the starting temperature,T min is the final minimum temperature,c is the cooling constant,step is the current iteration,U is the uniform distribution l← HEURISTIC(PG,L) err← ˜ e(PG∪l) while loop closure computation module is not idle do T ← max(Tmaxc (step) ,Tmin) l ′ ← EDGE_SWAP(l,L\l) err ′ ← ˜ e(PG∪l ′ ) δ =err ′ − err if δ < 0.0 ore − δ T >U[0,1]then l,err← l ′ ,err ′ l← BEST(l) the probability of accepting a worse solution than the current one. This acceptance probability exponen- tially decreases with the number of steps. Simulated annealing is an anytime algorithm, which is required as we do not know ahead of time how long the current set of loop closures will take to process in the loop closure computation module, and would like to improve our solution in parallel while the previous set of loop closures are processed. Simulated annealing has been shown to be effective at minimizing in- formation theoretic objectives, such as in sensor positioning [79, 134]. We modify the standard simulated annealing algorithm in a few ways and our adapted algorithm can be seen in Algorithm 2. First, we use a HEURISTIC function to select the initial set of loop closures. We compared a few different heuristics and selected one which attempts to maximize the spatial distance between the loop closures. Secondly, we 75 weigh the probability of swapping a pair of loop closures between the current solution and the set of all loop closures by the inverse of their distance, in the EDGE_SWAP method. This penalizes swapping a loop closure for another loop closure which is a large distance away. This encourages each set of loop closure produced to be spatially diverse. This spatially distanced heuristic is designed to avoid having sets of loop closures only focused on a single high uncertainty junction, and instead add loop closures at different loca- tions to reduce the overall graph uncertainty. When the loop closures are aligned by the solver, a message is generated which stops the optimization process and produces a solution. The Graph Information Pri- oritization module then generates another set of loop closures in parallel to the loop closure computation module aligning the current set. We show an example output of the Graph Information Prioritization module in fig. 4.8. Loop closures are spatially spread out where there is overlap of trajectories. In this example, the Graph Information Prioritization module selectsB =250 loop closures from a set of|L|=500 loop closures. 4.2.3.3 ObservabilityPrioritization Perceptual aliasing is the problem of mistaking two non-overlapping locations as overlapping in a lidar scan due to the presence of similar perceptual features. This is especially common in long feature-less corridors. Filtering out the feature-poor loop closure candidates saves us from spending computationally expensive point cloud registration time on error-prone loop closures. Prioritizing feature-rich areas allows us to perform point cloud registration on areas that will more likely return better loop closures. To quantify how feature-rich or feature-poor a point cloud scan is, we draw inspiration from the observability metric and inverse condition number presented in [132, 148] where the observability is described in terms of the minimum eigenvalue of the Information matrixA of the Point-To-Plane ICP cost for two cloudsi andj. A= n X i=0 H ⊤ k H k (4.5) 76 In eq. (4.5),n is the number of correspondences andH k is the residual Jacobian, which can be derived as, H k = (p (k) j × (R ∗ ij ) ⊤ n (k) i ) ⊤ ((R ∗ ij ) ⊤ n (k) i ) ⊤ (4.6) wherep (k) j is the position of the k-th point correspondence in cloud j,R ∗ ij is the rotation computed by ICP, andn (k) i is the normal of thek-th correspondence in cloudi. Since we want to quantify the observability before executing ICP the rotationR ∗ ij in eq. (4.6) is un- known. However, we observe that for two identical point clouds, we can assume thatR ∗ ij = I 3 . Hence, we define observabilityforasinglepointcloud based on the eigenvalues of the matrix ˜ A, ˜ A= n X i=0 ˜ H ⊤ k ˜ H k (4.7) wheren is the number of points in the point cloud and ˜ H k = (p (k) × n (k) ) ⊤ (n (k) ) ⊤ (4.8) We define the observabilityscore of a scan cloud as the minimum eigenvalue of the observability matrix ˜ A and thenormalizedobservabilityscore as the observability score divided by the largest observability score calculated up to the current point in time (i.e. the first point cloud will always have a normalized observability score of 1). Fig. 4.9 shows an example map colored by the normalized observability score. We observe that long corridors have low observability scores, while intersections and feature-rich areas have high observability scores. For prioritization, we first filter out the candidates for which the sum of the normalized observability scores of the two point clouds involved in the loop closure are below a certain threshold. The remaining candidates are then sorted based on the normalized observability score sum, 77 Figure 4.9: Visual illustration of point cloud map with intensity colored by the normalized ob- servability score. Note how the lower observability areas are usually those of long straight corridors with less features. With observability prioritization, we prioritize loop closures at corners and junctions: which are less susceptible to perceptual aliasing. and the candidates with the higher scores are prioritized in adding to the candidate queue for loop closure computation. 4.2.4 Experiments We demonstrate the performance of loop closure prioritization in challenging environments. We show that our system reduces the error when compared to a baseline system. This baseline system is the older version of LAMP, presented in [42], which does not include any loop closure prioritization and process loop closure candidates for loop closure computation in the order they are generated. We also compare against a solution which only uses the odometric data from the robots and does not perform any loop closures. We perform an ablation study of our system to demonstrate combined power of our proposed modules. 78 4.2.4.1 ExperimentalSetup We evaluate the loop closure prioritization modules on four datasets collected by Team CoSTAR. The first dataset is the Tunnel Dataset and includes two Huskies traversing up to 2.5 km combined, in a coal mine that consists of mostly featureless narrow tunnels. The second dataset is the Urban Dataset and includes two Huskies and a Spot traversing up to 1.5 km combined in an abandoned nuclear power plant consisting of a two-floor environment with open areas, small rooms, narrow passageways, and stairs. The third dataset is the Finals Dataset and includes three Spots and a Husky traversing up to 1.2 km combined in the DARPA-built course including tunnel, cave, and urban-like environments. The last dataset is the Kentucky Underground Dataset and includes four Huskies traversing up to 6 km combined in a limestone mine, which consists of large 10-20 m wide tunnels. We are only able to evaluate the RSSI prioritization module on theFinalsDataset due to the requirement of deploying radio beacons. We compare a variety of systems in both the ablation study (Section 4.2.4.2) and the system study (Section 4.2.4.3). Bsln is the baseline system which does not have any loop closure prioritization and processes the loop closures in the order they arrive. Obs only uses Observability prioritization, GI uses only the Graph Information priorizitation and Full uses both prioritization methods. Rand selects loop closures randomly. Odom does not use any loop closures and only uses the odometry measurements from the robots. The experiments are performed using the recorded field data on a laptop with an Intel i7-8750H pro- cessor with 12 cores. To compare results on a machine with higher computational power, we also perform a set of experiments on a powerful server, which has an AMD Ryzen Threadripper 3990x processor with 64 cores. 79 4.2.4.2 AblationResults We perform an ablation study by running LAMP [42] on the centralized server with no loop closure pri- oritization, with the individual prioritization modules, and with the complete system. For each test, we allow the system to run only for the duration of the mission (i.e. we stop the run and record the results as soon as there are no more new sensor messages). The full ablation results are shown in Table 4.1, which shows in detail the amount of loop closures, as a percentage of the loop candidates that are added to the computation queue. Since prioritization also prunes a number of candidates, in general, the number of loop candidates in the computation queue is higher for the baseline system. The categories are: passing loop closure computation ("Verified"), passing GNC as inliers ("GNC-Inliers"), and having error under a threshold (0.05m, 0.05 rad) to the ground truth ("Inliers"). We also include the final absolute trajectory error (ATE) for context. Notice that with prioritization, we are able to process significantly more loop closures that end up passing computation and are true inliers, with the exception of the Tunnel dataset, since many loop closure candidates are left unprocessed without prioritization. Experiments with observability prioritization show a higher percentage of loop closures that pass computation because observability prioritization prioritizes loop closure candidates that are suitable for ICP-based alignment. In all the datasets, with either graph information prioritization or observability prioritization, or both, we are able to process more inliers; the higher percentage of true inliers shows that we are able to process a larger number of accurate loop closure with prioritization, and the higher percentage of GNC inliers shows that more loop closures survive outlier rejection to contribute to refining the trajectory estimate of the SLAM system, which reflects in the better trajectory estimates for all of the datasets except for Urban when using prioritization. Fig. 4.10 shows the inlier-to-queued loop closure ratio over the course of the run. We see that for the baseline method without prioritization, the ratio is usually heavily biased towards the beginning, since the loop closures are computed in the order of generation, and the system will not compute any loop closures 80 Table 4.1: Comparison on Datasets ATE and Verified or inlier loop closures for different prioritization methods as % of queued loop closures. Bsln Obs Rand GI Full Tunnel Verified (%) 18.192 13.816 14.353 14.406 13.445 GNC-Inliers (%) 3.724 7.973 5.838 6.300 7.200 Inliers (%) 3.851 4.679 3.684 4.056 4.082 ATE (m) 1.12 0.75 0.72 0.77 0.94 Urban Verified (%) 3.578 9.200 6.421 6.017 6.319 GNC-Inliers (%) 0.518 4.312 2.460 2.271 2.213 Inliers (%) 0.06 0.899 1.098 0.858 0.928 ATE (m) 0.93 0.93 0.99 1.00 1.06 KU Verified (%) 0.062 0.555 0.418 0.539 0.561 GNC-Inliers (%) 0.009 0.344 0.247 0.2293 0.334 Inliers (%) 0.0 0.059 0.044 0.061 0.056 ATE (m) 4.41 6.02 3.87 3.32 3.00 Finals Verified (%) 4.105 5.462 5.382 4.953 4.614 GNC-Inliers (%) 0.271 1.461 1.435 1.318 1.466 Inliers (%) 0.242 1.219 1.224 1.065 0.97 ATE (m) 0.69 0.27 0.34 0.32 0.23 in the later part of the mission before the end of the mission. With loop closure prioritization, we are able to maintain a reasonable ratio for the entire length of the experiment, showing that we are able to successfully detect loop closure that are inliers across the complete length of the experiment. Moreover, the higher inlier-to-queued ratio for the experiments running the prioritization methods show that we are using less time on loop closure computation for the outliers, while the low ratio on the baseline means that most computation is spent on outliers. In fig. 4.11 we further showcase the importance of prioritization. On the tunnel dataset, we allowed the system to run past the duration of the mission and plot the number of loop closures detected along with the trajectory error. With prioritization, we were able to detect more loop closures in less than one-quarter of the time compared to without prioritization. The trajectory error is also reduced earlier due to the earlier detection of the loop closures, giving us a better trajectory estimate before the conclusion of the mission. 81 (a) Tunnel (b) Kentucky Underground Figure 4.10: Selectedloopclosurecandidatetoinlierratiowiththedifferentprioritizationmeth- ods. With no prioritization ("Bsln"), little to no inliers are detected from the candidates in the latter part of the run for the Urban, Finals, and Kentucky Underground datasets. 4.2.4.3 SystemResults The comparisons of the final trajectory error with the different configuration are shown in Fig. 4.12, show- ing that any prioritization method improves the final result relative to the baseline. Fig. 4.6 compares the final mapping results without loop closure prioritization and with our full proposed system which provides a large increase in performance. The results also show that the combination of observability and graph in- formation prioritization improve overall system performance. For instance, in the Kentucky Underground environment, we find that observability performs more poorly than a policy which takes random loop candidates but that graph information performs well, causing the full system to perform well. In all envi- ronments except Urban the Graph Information Prioritization module has better median performance than randomly selecting edges, the baseline and odometry-only solutions. The same is true for Observability Prioritization in all environments except the Kentucky Underground. We find that in the Finals dataset, the baseline performs worse than an odometry only solution. We believe this is because there are many false positive loop closures which make the overall SLAM solution worse. 82 (a) Loop closures accepted over time (b) Average trajectory error over time Figure 4.11: DirectcomparisonofwithandwithoutprioritizationontheTunneldataset. The red line marks the end of the mission. The full system is able to find a similar number of inlier loop closures and a lower error at the end of the mission than the baseline can after computing all loop closures. (a) Tunnel (b) Urban (c) Finals (d) Kentucky Underground Figure 4.12: Trajectory error comparison with for consumer laptop on the four underground datasets. We find the full system is able to consistenly outperform the baseline, odometry, and random solutions. Finally, in Fig. 4.13 we provide another set of results by running the set of different configurations on the powerful server. Notice that the results obtained here is very similar to the set we obtained on the laptop, which further emphasizes the benefit our proposed techniques have on the scalability challenge in loop closure detection. We again find that the baseline and random solutions are outperformed by an odometry only solution in some situations. This is because there are many false positive loop closures generated which make the SLAM solution worse and these loop closures are not chosen by our prioritization methods as frequently. 83 (a) Tunnel (b) Urban (c) Finals (d) Kentucky Underground Figure 4.13: Trajectory error comparison on the powerful server on the four underground datasets. We find that the powerful server and less powerful computer (fig. 4.12) have similar perfor- mance when using our system. 4.2.5 Conclusion Loop closure prioritization is a central problem to large-scale multi-robot SLAM as good loop closures al- low for the creation of a large drift-free map. In this paper, we tackled the problem of making loop closure detection more scalable for multi-robot systems by providing techniques to prioritize loop closure candi- dates for computation. We demonstrated our techniques in the LAMP system and showcased our results with challenging field datasets and demonstrating that, with our techniques, we were able to improve the performance of the SLAM system by selecting better loop closures during mission execution. We also demonstrate that our results are largely invariant to the computational power of the central computing system for a multi-robot team, reinforcing how efficient the system is. 84 Chapter5 SensingforEnvironmentalInteraction When active sensing robots are deployed, there is typically a high level goal that a user is interested in achieving. Often times, in the literature, it is said that an active sensing robot is trying to find a good cov- erage of the environment, such as in section 2.3.2.1, or trying to find the maxima, such as in section 2.3.2.2. These work well for certain applications, but somewhat forsake the proposed benefits provided by active sensing, by not making decisions in accordance with the task at hand. These generic behaviors, afforded by generic objective functions, can be changed by modifying the informative path planning ob- jective to more closely model the task at hand. Generally, directly optimizing for the task at hand is impossible, such as when the task is to chose measurement locations that substantiate a theory. It is often too difficult to write down that sort of goal in a way that can be sequentially maximized in a mission safe way on a robot. We will show that for some tasks, such as the task of choosing locations to take physical samples (i.e. water samples in vials), there are objective functions which directly optimize for the task at hand while being able to be executed on a real robot. We will also demonstrate that these objective functions outperform generic entropy-based and Bayesian optimization-based objective functions. 85 Desired Quantiles and Workspace Informative Path Planning Location Selection Field Specimen Collection Scientific Analysis Parameters Quantile Locations (Analysis Locations) Collected Specimens Q = [0.25,0.5,0.75] Plan next Step Move to Location Update Model Initialize Propose New Solution Accept/ Reject Environment Model and Quantile Values Measure Environment Update Best Figure 5.1: Fullsystem. First, the parameters for the robotic survey are chosen, such as the area bounds and the quantiles for specimen collection. The robot performs informative path planning using our pro- posed objective functions, creating an environment model and an estimate of the quantile values. The quantile locations are then selected by minimizing our proposed loss function. These quantile locations correspond to physical locations which, when measured, have the estimated quantile values. After the quantile locations are chosen, humans go to them to collect field specimens which they later analyze in a laboratory setting. This work focuses on the steps shown in the blue shaded boxes. 5.1 ActiveSensingtoEstimateQuantilesforEnvironmentalAnalysis ∗ In order to understand biological phenomena, scientists take a small number of physical specimens for later analyses to characterize the biological community and the contextual environmental conditions at the site. A marine biologist may capture water in a container or filter media at a location for later analysis, and an agricultural scientist may capture a small portion of a plant. Scientists later analyze these captured portions of the environment in a detailed laboratory setting. It is often expensive to collect these physical specimens as the scientist must go to the location and use a physical reagent, such as a container or filter media, which are limited in the field. Traditionally, expert heuristics underpin the selection of locations for scientific analysis. These expert heuristics generally attempt to spread out the analysis locations in the phenomena of interest in order to take specimens with differing concentrations for characterization of ∗ Originally appeared as [110] 86 heterogeneity [64]. The heuristics may make use of robotic surveys, but currently most robotic surveys use either pre-programmed paths [64], or autonomy which seeks maxima [87] or to cover the area spatially [73, 60], and do not directly plan for the specimen collection tasks. In contrast, we propose performing an adaptive robotic survey to find locations of interest for scientific analysis. To specify these locations, we propose calculating the quantiles of the distribution of interest so that scientists can capture specimens at varied locations in the environment based on differing values of some phenomenon of interest to them. For instance, if a marine biologist is interested in taking 9 physical water specimens that are spread over a range of chlorophyll concentrations, they may choose to use the deciles of the concentration. If only a small number of locations can be analyzed and the upper extrema values are of interest, they may choose to perform analysis at the locations of the(0.90,0.95,0.99) quantiles. The quantiles of interest are a flexible way to describe the objective of specimen collection that is largely invariant to the exact phenomena being measured. Our goal is to select locations for detailed analysis by scientists in two steps. We use a robot to perform an adaptive survey, and then find locations to suggest for specimen collection based on the measurements it takes. Specifically, we first aim to find the desired quantile values of the measurement distribution by adaptively selecting robot measurement locations that maximize an objective function designed to estimate quantile values, and then we produce suggested locations for specimen collection that are likely to contain these values. Figure 5.1 summarizes the approach, showing how our objective function for informative path planning and our loss function for location selection fit into an overall physical specimen collection pipeline. 87 5.1.1 Background 5.1.1.1 PhysicalSpecimenCollection is the process of collecting portions of the environment for later analysis, which we choose using the quantiles of the distribution of the measured value in this work. The choice of physical specimen collection locations is driven by the spatial heterogeneity of the studied phenomena [64], which include freshwater algal growth and crop health, two potential applications for this work. We are primarily motivated by the study of algal growth and distribution in freshwater and marine ecosystems. Spatial heterogeneity of algal and cyanobacterial blooms is often obvious as conspicuous accumulations at down-wind and down-current locations at the surface of lakes, and vertical heterogene- ity can occur due to water column stratification, differential growth at different depths, or active vertical migration by some planktonic species [121, 69]. Such heterogeneity thwarts studies designed to quantify the spatial distribution of algal biomass, and the concentrations of algal toxins that may be produced by some algae and cyanobacteria. Nonetheless, accurately characterizing this heterogeneity is fundamental for investigating average conditions (indicated by median quantiles) for investigating trends across many lakes or large geographical areas [65]. Assessing heterogeneity is also essential for assessing the worst case scenarios for exposure of animals and humans to algal or cyanobacterial toxins (specifically the highest quantiles). Such information contributes to ecotoxicological studies used to develop thresholds that con- stitute significant exposure to these toxins [92]. Robotic approaches for quantifying algal biomass across quantiles (often using chlorophyll or chlorophyll fluorescence as a proxy for algal biomass) have become a mainstay for quantitatively documenting heterogeneity in natural ecosystems [149, 123]. Adaptive surveys for selecting collection locations have typically focused on finding high-concentration areas from which to measure, and perform the specimen collection onboard the robot or by a following robot [28, 84]. Our work differs from such works as we do not seek to only find maximal concentration areas, but seek to find a variety of concentration values specified by the quantiles. 88 5.1.1.2 EstimationofQuantilesandQuantileStandardError has been proposed using statistical techniques. We estimate the quantile value from measurements using ˜ v =x ⌊h⌋ +(h−⌊ h⌋)(x ⌈h⌉ − x ⌊h⌋ ) † whereh=(n− 1)q+1,n is the number of measurements, andq is the quantile (Equation 7 in [71]). Standard error of a quantile estimate is a measure of the uncertainty, and is typically used in the construction of confidence intervals of a quantile estimate. Given a probability density functionp, the standard error of theqth quantile is p q(1− q)/( √ np(˜ v)). Typicallyp is not known, but can be estimated from samples, using a density estimator [140]. Here we use a Gaussian kernel density estimator when planning. 5.1.2 Formulation We define the ground truth quantile values as V = quantiles(Y # ,Q) where quantiles is the function described in Section 5.1.1.2 which computes the valuesV of the quantilesQ of a set of measurements, in this caseY # . To define the robot’s estimated quantile values, we compute ˜ V =quantiles(µ (X # ),Q) that is, the quantile values of the predicted values from the robot’s current Gaussian process for all locations the robot could sense. This is done to prevent the number of measurements from which the quantile values are estimated from changing as the robot explores (instead of, e.g. usingµ (X)). By doing this, we ensure we always estimate the quantiles across the entire measurable area. During planning, we aim to minimize this error by taking actions which maximize an objective functionf that minimizes the error in the quantile value estimate. To suggest locations for the quantile values, we aim to find a set of |Q| locationsQ in the continuous space whose values at those locations are equal to the quantile ground-truth values. A set of locations is defined as Q ∈ Q # whereQ # ⊂ R d×| Q| andQ # is continuous over the space ofX # . In practice, the robot only has access to ˜ V (notV ) during the selection process, so the problem of finding the estimated † ⌊x⌋ =max{a∈Z|a≤ x} and⌈x⌉ =min{b∈Z|b≥ x} 89 quantile spatial locations ˜ Q can be stated as shown in eq. (5.1) with some selection loss function l s (see eq. (5.5)). ˜ Q=arg min Q ′ ⊂Q # l s ( ˜ V,Q ′ ) (5.1) 5.1.3 Approach Figure 5.1 illustrates our method. We separate our approach into two steps: the survey, and the suggestion of locations. When performing the survey using IPP (Section 5.1.3.1), the robot takes measurements of the environment to improve its estimate of the quantiles. After the survey has concluded, location selection (Section 5.1.3.2) produces locations for scientists to visit to perform specimen collection. 5.1.3.1 InformativePathPlanning To plan which locations to measure, the robot uses a POMDP formulation of IPP. In order to generate a policy, we use the partially observable Monte Carlo planner (POMCP) [125]. POMCP uses Monte Carlo tree search to create a policy tree. To expand the tree and estimate rewards, the tree is traversed until a leaf node is reached. From the leaf node, the reward conditioned on that action is estimated by a random policy rollout which is executed until the discounted reward is smaller than some valueϵ . We modify the rollout reward to be fixed horizon, giving a reward of zero after a certain number of random policy steps. We adopt the t-test heuristic for taking multiple steps from a POMCP plan for IPP to improve performance of the planner with fewer rollouts [38]. We define GP i− 1 =GP(X 0:i− 1 ,Y 0:i− 1 ;θ ) as a Gaussian process conditioned on the previous locations (X 0:i− 1 ) and measurements (Y 0:i− 1 ) before measuring a proposed value, and GP i = GP(X 0:i− 1 ∪X i ,Y 0:i− 1 ∪Y i ;θ ) as a Gaussian process conditioned on the previous and proposed locations (X i ) and measurements (Y i ), whereθ are Gaussian process parameters andY i = 90 GP i− 1 (X i ). Because the observations GT(x) for unseen locations are not known during planning, the predicted mean fromGaussianprocess i− 1 is used [87]. ObjectiveFunctionsforQuantileEstimation We develop two novel objective functions to improve the quality of quantile value estimates. Both compare a measure of the quality of the quantiles estimated by the Gaussian process before and after adding a measurement to the Gaussian process, and include an exploration termc plan σ 2 (x i ) inspired by the upper confidence bound objective function [87], where c plan is a chosen constant. For both proposed objective functions we use eq. (5.2), where δ is defined by the objective: f(X i )= δ (X i ) |Q| + X x j ∈X i c plan σ 2 (x j ), (5.2) The first objective function, which we call quantile change, is based on the idea of seeking out values which change the estimate of the quantile values by directly comparing the estimated quantiles before and after adding the measured values to the Gaussian process. The idea behind this is that a measurement which changes the estimate of the quantiles indicates that the quantiles are over- or under-estimated. This can be seen in in eq. (5.3): δ qc (X i )=∥quantile(µ GP i− 1 (X # ),Q)− quantile(µ GP i (X # ),Q)∥ 1 (5.3) The second objective function we develop, which we call quantile standard error, is based on the change in the estimate of the standard error for the estimated quantiles. It draws from the same idea that if the uncertainty in the quantile estimate changes after observing a measured value, then it will change the estimate of the quantile values, shown in eq. (5.4): δ se (X i )=∥se(µ GP i− 1 (X # ),Q)− se(µ GP i (X # ),Q)∥ 1 (5.4) 91 se is an estimate of the standard error of the quantile estimate for quantilesQ.se uses a Gaussian kernel density estimate (we found it faster and more stable than other standard error estimators). 5.1.3.2 LocationSelection Our final goal is to produce a set of |Q| locationsQ, at which the concentration values will be equal to V , the values of the quantilesQ. The selection process can be done offline as it does not affect planning. Finding locations that representQ is difficult because the objective function over arbitrary phenomena in natural environments will likely be non-convex, and in a real-world deployment, the robot will only have an estimate ˜ V of theV it searches for. With the location selection problem formulation as in Equation (5.1), we propose the loss function l s ( ˜ V, ˜ Q)=∥ ˜ V − µ ( ˜ Q)∥ 2 +c select σ 2 ( ˜ Q), (5.5) where c select σ 2 ( ˜ Q) is an added penalty for choosing points that the Gaussian process of collected mea- surements is not confident about ‡ , andl s can be used in any optimization scheme. During optimization, eq. (5.5) is evaluated using ˜ V and returns the suggested specimen collection locations ˜ Q. We compare three optimization methods in our experiments to determine which best minimizes our selection loss function (eq. (5.5)). A strength of these types of optimization methods is that the formulation allows for suggesting points that may be spatially far from locations the robot was able to measure if they have values closer to the quantile values of interest. 5.1.4 Experiments To evaluate our approaches for planning and for location selection, we compare against baselines in two different informative path planning tasks in simulation using four datasets collected in the real world. ‡ The parametersc plan andc select are distinct. 92 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 RMSE Env: A, Deciles 1.0 1.5 2.0 2.5 3.0 3.5 RMSE Env: A, Quartiles 35 36 37 38 RMSE Env: A, Extrema Method Entropy Quantile Change Quantile Standard Error Expected Improvement 1.2 1.4 1.6 1.8 2.0 2.2 2.4 RMSE Env: B, Deciles 1.5 2.0 2.5 3.0 3.5 RMSE Env: B, Quartiles 16 17 18 19 20 21 22 23 RMSE Env: B, Extrema Figure 5.2: Simulateddroneplanningexperimentswithrealdata. Error between ground truth quan- tile values and estimated quantile values. Datasets A and B collected using a hyperspectral camera in Clearlake, California. Units are 400nm channel pixel intensity (0− 255). Each dataset and objective func- tion pairing is run three times. In the first task, a simulated drone with a virtual camera gathers data from orthomosaics (a single image produced by combining many smaller images, called orthophotos) collected of a lake using a hyperspectral sensor. The orthomosaics, A and B, are taken in the same location but on different days and times. The drone collects many measurements from one location, where each is a pixel in a downsampled image. As a proxy for chlorophyll concentration, we measure the 400nm channel pixel intensity (0− 255). The drone maintains a constant altitude and moves in a 2D plane with a north-fixed yaw, moving in either the x or y direction per step. 93 In the second task, an autonomous underwater vehicle (AUV) explores an environment. Two AUV surveys, C and D, which were taken in the same reservoir but at different times and different areas, were conducted in a 3D lawnmower pattern using a chlorophyll sensor and are interpolated using a Gaussian process toX # . At each step, the AUV moves in one x, y, or z direction and takes five evenly spaced measurements when moving between locations. We evaluate each task on their two respective datasets (A/B, C/D) and three different quantiles: deciles (0.1,0.2,...,0.8,0.9), quartiles(0.25,0.5,0.75), and upper extrema(0.9,0.95,0.99). See Figure 5.3 for a summary of the dataset distributions. 5.1.4.1 InformativePathPlanning: ObjectiveFunctions To evaluate how well our proposed IPP objective functions estimate the ground truth quantile values in real environments, we compare quantile change (eq. (5.3)) and quantile standard error (eq. (5.4)) against a baseline entropy objective function (eq. (2.3)). For the upper extrema quantiles, we also compare against expected improvement (eq. (2.6)), as it is similar to a sequential Bayesian optimization based IPP task. 0 200 400 600 800 Environment grid point index (sorted) 0 50 100 150 200 250 Environment grid point value Drone Environment A Drone Environment B 0 500 1000 1500 2000 Environment grid point index (sorted) 0 20 40 60 80 Environment grid point value AUV Environment C AUV Environment D Figure 5.3: Experimentaldatasetdistributions. Drone data is measured in pixel intensity; AUV data in µg/L chlorophyll. 94 0.30 0.35 0.40 0.45 0.50 RMSE Env: C, Deciles 0.4 0.5 0.6 0.7 0.8 0.9 RMSE Env: C, Quartiles 2.1 2.2 2.3 2.4 2.5 2.6 2.7 RMSE Env: C, Extrema 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 RMSE Env: D, Deciles 0.8 1.0 1.2 1.4 1.6 1.8 RMSE Env: D, Quartiles 3 4 5 6 7 8 9 10 RMSE Env: D, Extrema Figure 5.4:SimulatedAUVplanningexperimentswithrealdata. Error between ground truth quantile values and estimated quantile values. Datasets C and D collected from a reservoir in California using an underwater robot with a chlorophyll fluorescence sensor. Units are µg/L chlorophyll. Each dataset and objective function pairing is run three times. Setup In the planner, we useγ =0.9, and each trial is run over 3 seeds. The objectivec plan parameter is set to the approximate magnitude of the rewards seen for each environment, which we found experimen- tally to be an adequate value. The Gaussian process mean was set to 0 and the datasets were normalized, while the lengthscale was set to 12. The exploration constantc plan was set to1E− 6 for quantile change and1E− 2 for quantile standard error. To compare the performance of the approaches, we use the RMSE between the ground truth quantiles,V , and estimated quantile values after performing a survey, ˜ V . Bay-Camera The drone is allowed to take 30 simulated pictures out of a grid with about 300 posi- tions. Each picture is downsampled to8× 5 pixels (40 measurements) with37.1 ◦ by27.6 ◦ field of view, 95 similar to the drone used in the field trial reported in section 5.1.4.3. While this is downsampled from the true image, it captures the coarse trends of the concentration, which is both still scientifically useful and performant for Gaussian process evaluations. For each trial, the Gaussian process is seeded with 100 evenly spaced measurements across the workspace, as prior data. The planner uses300 rollouts per step, and the maximum planning depth is 7. LakeChlorophyll The AUV is simulated for 200 steps in a12× 14× 2 grid. The planner uses130 rollouts per step and a maximum depth of10. The Gaussian process is seeded with measurements from 50 locations. Discussion Overall, our results are robust across multiple experimental environments as well as robot sensor types. We find that planning with the quantile standard error objective function has a 10.2% mean reduction in median error across all environments when compared to using entropy. This shows that an objective function tailored to estimating a set of quantiles will outperform a coverage planner, such as entropy, that would typically achieve low overall error in reconstructing the entire environment model. The focus of our approach is to estimate the quantiles in a targeted way rather than to achieve uniform coverage. 0 20 40 60 80 100 Number of Images Taken 0 10 20 RMSE Env: B, Deciles Method Coverage Quantile Standard Error Figure 5.5: ComparisonbetweenquantilestandarderrorwithPOMDPsolverandSpiral-STCcov- erageplanner [52]. Comparison on Drone Environment B estimating deciles, each over 3 trials. 96 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 RMSE AUV: Deciles 0 5 10 15 20 25 RMSE AUV: Quartiles 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 RMSE AUV: Extrema 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 RMSE Drone: Deciles 0 10 20 30 40 50 RMSE Drone: Quartiles Selection Method Simulated Annealing Cross-Entropy Bayesian Optimization Best Visited 50 60 70 80 RMSE Drone: Extrema Figure 5.6: Quantilelocationselectionresults. RMSE between the ground truth values at the selected locations and the corresponding true quantile values. Units areµg/L chlorophyll for AUV, 400nm channel pixel intensity (0− 255) for Drone. Figures 5.2 and 5.4 show the results of planning with the proposed objective functions for the drone and AUV experiments, respectively. In the drone tasks (A/B), quantile change and quantile standard error outperform the baseline entropy in estimating the deciles and upper extrema. In quartiles in environment A, both proposed methods perform well, but in environment B entropy outperforms our methods. For the extrema, both methods perform better than expected improvement, and entropy outperforms expected improvement in environment A. We believe this is because expected improvement focuses explicitly on improving the (single) maximal value and does not do a good job of localizing high concentration areas, thus overestimating the quantile values. For the AUV tasks (C/D), quantile standard error outperforms entropy in environment C when esti- mating deciles and quartiles, and performs equally well as entropy in all other tasks besides estimating 97 the extrema in environment D. Expected improvement performs poorly in estimating the extrema due to similar issues as with the drone. Quantile change performs poorly in most AUV tasks in contrast to the drone tasks, where it performs comparably. This may indicate that quantile change performance decreases with point sensors, or it is more sensitive to variation in the absolute measurement values. Thus, we rec- ommend using quantile standard error unless there are computational constraints, since its performance is more consistent. We also compare our approach using the proposed quantile standard error objective function to the Spiral-STC coverage planner [52]. Figure 5.5 shows that the proposed adaptive solution is able to reduce the error between the estimated quantiles and the ground truth quantiles more rapidly than the non-adaptive coverage planner. 5.1.4.2 LocationSelection To evaluate how close the values at locations suggested by our proposed loss function are to the estimated quantile values, we use the optimization algorithms simulated annealing, cross-entropy, and Bayesian optimization with the results from the quantile estimation task (Section 5.1.4.1) using the quantile standard error objective function.Setup We compute the error byRMSE(V, ˜ Q)=∥(V− GT( ˜ Q))∥ 2 . We setc select to 15 for deciles, 200 for quartiles, and 30 for extrema. We compare the results against a Best Visited (BV) baseline which selects the location the robot took a measurement at that is closest to each quantile value, i.e. solve eq. (5.6). ˜ Q ∗ =arg min ˜ Q bv ⊂ X ∥ ˜ V − µ ( ˜ Q bv )∥ 2 (5.6) For SA, we use T max = 5, T min = 0.001, and cooling rate cr = 0.005 which leads to approximately 1000 optimization steps, and reset to the best solution every 100 steps. We start the optimization using the solution found by the BV baseline. For CE, we use α = 0.9, η = 0.9, 50 samples per iteration, and 98 100 iterations. α is a weighting factor on new samples, which is used to prevent premature convergence. For BO, we use the expected improvement acquisition function and initialize the Gaussian process with 50 randomly selectedQs as well as the solution found by the Best Visited baseline. We use 100 iterations and report the best found solution. Discussion The error between the values at the estimated quantile locations and the true quantile values for the drone and the AUV experiments can be seen in Figure 5.6. Overall, we find that cross- entropy and Bayesian optimization produced the locations with values closest to the true quantile values. Both these methods perform a global search in the space of possible locations, indicating that global search optimizes eq. (5.5) more effectively. Simulated annealing had greater variability in performance. We believe this is because it is a local search method and may fail to escape local optima. Best Visited produced a good initialization for the other methods, but was easily outperformed. We find that CE performs the best in three out of six possible scenarios. In particular, we see a 15.7% mean reduction in median error using CE with our proposed loss function compared to the BV baseline across all environments when using our proposed quantile standard error optimization function for quantile estimation during exploration. In general, methods can find the best points when selecting for quartiles or deciles, while the upper extrema are more difficult. Because the robot is limited in the amount of environment it can explore, the upper extrema are less likely to be measured during exploration. This leads to these quantiles being more challenging to select representative points for. Figure 5.7 shows results for one seed of the drone task when monitoring deciles. The suggested loca- tions, shown as stars, align relatively closely with the true quantile valuesV , shown by the contours on the left image. This demonstrates the ability of the optimizers to produce good location suggestions to guide environmental analysis. The right part of Figure 5.7 shows the same locations on top of the orthomosaic of what the drone could measure during exploration. This part of the figure highlights the difficulty of the problem of IPP 99 Figure 5.7: Physical locations (stars) selected by the cross-entropy optimizer for deciles on the drone experiment. [Left] Black lines: true quantile value contours, overlaid on the absolute error be- tweenµ (X # ) andGT(X # ). Note that the lowest error tends to follow the quantile contour lines. [Right] Red crosses: locations the robot visited, overlaid on the ground truth image. for quantiles. The robot could only explore 15% of the total environment. With only partial knowledge of the distribution, the robot’s model of the phenomenon will vary based on the particular points it visited, which in turn affects the estimates of the quantiles. Similarly, Figure 5.8 shows the suggested locations for the upper extrema for one seed of the AUV task. For both tasks, all the optimization methods, with the exception of Best Visited, suggest points that may be spatially far from locations the robot has been able to visit. This allows for points with values potentially closer to the true quantile values to be selected for scientific analysis. 5.1.4.3 Fieldtrial The goal of our final experiment is to demonstrate our method on real hardware for a crop health moni- toring task in an open, grassy field, where the objective is to estimate the deciles of the green present in each pixel of the images (we use green as a proxy for plant health). Setup We use a commercial, off-the-the shelf drone with a standard camera to take measurements of the field at a constant height of 3m. Similar to the simulated drone, the drone in this experiment moves in a 2D plane with a north-fixed yaw. The drone has a limited budget of 20 pictures (planning steps) in a 16× 16 m square grid, where|G # |=10× 10 locations, and each picture is downsampled to8× 5 px. For 100 m 25 50 75 100 125 150 m 50 100 150 200 m 2 5 8 10 12 15 18 −7.42 19.17 Chl ug / L Figure 5.8: Physical locations (stars) selected by the cross-entropy optimizer for the upper ex- tremaquantileswithanAUVusingachlorophyllpointsensor. Blue/green points are the measured locationsX. planning, we use the quantile change objective function because it performs well for camera sensors and is faster than quantile standard error. We operate the drone with a custom app and use onboard Gaussian processS and IMU for localization. Discussion Figure 5.9 shows the resulting suggested locations using CE based on the 20 steps the robot took. We find that, although the robot cannot explore the entire workspace due to its limited budget, and in fact does not visit a large green area, the system is able to suggest varied locations for specimen collection. Some selected locations (e.g., those representing the 0.3,0.4,0.5 quantiles) are spatially close to each other, which suggests this area contains a large gradient, or those quantile values are similar. As the goal is to suggest locations that contain the estimated quantile values, such locations may be near one another. 5.1.5 Conclusion Scientists traditionally collect physical specimens at locations selected using heuristics. They later analyze these specimens in a laboratory to characterize a phenomenon of interest (e.g., the distribution of algae 101 Figure 5.9: Visualization of a field trial modeling a crop health task. Left: Crosses are locations where the drone took images. The drone is limited to only visit 20% of the possible locations to take images. The quantiles of interest are the deciles and the locations are chosen by cross-entropy. The 9 stars show locations suggested to collect physical specimen. The measurement of interest is the amount of green in each pixel. Right: The drone used in the field trial, in flight. in the water). We propose to, instead, choose these specimen collection locations by first performing an informative path planning survey with a robot and then proposing locations which correspond to the quantiles of interest of the distribution. To accomplish this, we propose two novel objective functions, quantile change and quantile standard error, for improving the estimates of quantile values through informative path planning. We test these in three settings: a drone with a camera sensor over lake imagery, an underwater vehicle taking chlorophyll measurements, and a field trial using a drone for a crop health task. Our objective functions perform well and outperform information-theoretic and Bayesian optimization baselines. In our experiments, our proposed quantile standard error objective function has a 10.2% mean reduction in median error when compared to entropy. 102 We also show that using our proposed loss function with black-box optimization methods can select environment locations for analysis that are representative of a set of quantiles of interests. We find that a cross-entropy optimizer using our loss function outperforms a baseline of using the best measured points, with a 15.7% mean reduction in median error in values across all environments. Our approach can be used to guide physical specimen collection in real field scenarios, as demonstrated by our field trial. We also plan to incorporate our specimen collection location suggestions into a larger field campaign involving multiple scientists. 103 Chapter6 ModelingChallengingSignalandEnvironmentInteractions Often, active sensing algorithms are developed in controlled environments which contain no obstacles, or for missions in which the effect of the obstacles on the sensed signal can be ignored. Ignoring obstacles is often acceptable in many scenarios we would like to apply active sensing to, such as underwater robots measuring chlorophyll, or a flying drone taking pictures. In these scenarios, there tend to be very few obstacles, and the algal mass tends to simply move around them if they exist. There may be some effects on the algae concentration due to the obstacle, but this generally falls below the noise threshold that the robot can reliably measure. Because there are no obstacle-signal interactions, it is suitable to simply use spatial correlations between measurements to infer belief of the signal strength at unseen locations. This has given rise to the common use of Gaussian processes as the belief model for informative path planning, due to its relative simplicity and nice mathematical properties. In the case of, for example, a robot measuring a light source, it is easy to construct examples where, even though a measurement may be taken at a closer location, it is less correlated with the unseen location. For example, given the robot has measured two locations, x 0 andx 1 , and the robot wants to predict the value of the signal at unseen location x ∗ , and∥x 0 − x ∗ ∥ 2 < ∥x 1 − x ∗ ∥ 2 , that is, x ∗ is closer to x 0 than x 1 . In this case, which measurement (y 0 ,y 1 ) shouldy ∗ be more correlated with? The natural answer is it should be more likey 0 thany 1 , due to the closer proximity of the measurement, but affected by both. But 104 if there is an opaque wall between x 0 and x ∗ , it is likely the measurement will be more like x 1 than x 0 . If the signal reflects or otherwise interacts with the obstacles, there may be more complex dependencies on the spatial position of x 0 ,x 1 ,x ∗ and the geometry of the environment, leading to a drastically more complicated modeling and planning scenario. This complexity in modeling signals in these environments leads to usage of factor graphs for infor- mative path planning (see section 2.2.2 for background). We introduce two works which make use of this modeling technique, representing modeling the signal-obstacle interaction differently depending on the nature of the signal, understanding of obstacles, and the task at hand. In section 6.1, we will introduce a highly scalable approach to radio source localization. This approach builds a factor graph centered at the robot’s position to perform inference on the signal strength. The signal in this work may also pass through obstacles, but at an uncertain rate. Additionally, in this work, the robot only has a probabilistic understanding of its environment, due to reliance on occupancy maps to build the environment. For these two reasons, we model the obstacles as an increase in uncertainty for factors which pass through obstacles. This allows a flexible representation of the signal which can be used to drive the robot towards the high-signal area, where the source emitter is. In section 6.2, we will somewhat change the active sensing problem, and the modeling as well. We will introduce a problem where the robot is not only measuring the signal, but also choosing where to place the signal emitters. In that work we will focus on lights as the electromagnetic signal emitters for their important in search and rescue operations. We will combine an analytical model of the light propagation with a belief model of the corrections from in-situ measurements. The belief model will not only guide the robot on where to measure the light intensity, it will also inform the robot on when to re-configure the lights, as well as the locations to place them. In this way, we will both actively sense and create the signal. This leads to a complex modeling scenario where the belief model must model both the placed lights with reflection effects, as well as the correction due to previously unknown light sources and un-modeled 105 effects. To this end, we will model obstacles as surfaces which reflect light in the analytical model, and remove link factors which move through the opaque obstacle. 6.1 FastandScalableSignalInferenceforActiveRoboticSourceSeeking ∗ Prompt and accurate situational awareness is crucial for first responders to emergencies, such as natural disasters or accidents. Visual information is often unavailable or insufficient to detect people needing assistance in disaster-stricken environments. Wireless signals, such as radio, Wi-Fi, or Bluetooth from mobile devices, can be important in search and rescue missions. Autonomous exploration and signal source seeking by a robotic system can significantly improve the situational awareness of the human response team operating in hazardous and extreme environments. In active source seeking, a robot not only attempts to get closer to the predicted source location but also gathers more information to actively reduce the source localization uncertainty. Signal source seeking in large, unknown, obstacle-rich environments is an important but challenging task. Source seeking systems typically consist of two components: a model of the environment that pro- vides a belief over the signal strength at unknown locations, and a planner which generates a policy to improve the model and locate the source. The model of the environment, as well as the planner, must scale to a large area and a large number of collected measurements. Models for signal inference typically do not scale well to large numbers of measurements, relying on either approximation or having exponential growth in the time required for inference [74, 91, 108]. In the presence of unknown obstacles, the robot must continually re-plan at a high rate and adapt the signal model to account for newly discovered ob- stacles. The presence of unknown obstacles has complex impacts on signal propagation and requires the robot have the ability to re-visit un-visited corridors which potentially lead to the signal source. ∗ Originally appeared as [37] 106 Real Environment Traversability Map Local Traversability Local Signal Belief Global Traversability Global Signal Belief Robot Figure 6.1: Overviewofapproach. Both the information roadmap (which models the traversabilty of the environment) and the signal inference system maintain global and local representations of the environ- ment. The signal inference system topology is built from the information roadmap. The environment is a paved outdoor walkway which can be seen in the image as well as the traversability map. The local signal representation maintains both the mean (shown as colored spheres) as well as the variance (flat squares), both are shown from blue (low) to red (high). 107 In this work, we formulate the problem of finding the highest signal strength as a problem of finding the location with the maximum radios signal to noise ratio (SNR). To find this maximum, the robot takes actions which increase its understanding of the signal while the planner is creating the next policy for the robot to execute. We formulate this as an Bayesian informative path planning problem in which the robot must balance the exploration-exploitation trade off [88]. In order to find the signal source, the robot must explore to better the model of the signal in the environment, while being driven to the maxima of the signal strength. To improve understanding of the signal strength, the robot must be able to estimate both the strength at unknown spatial locations, as well as the uncertainty in that estimate. This probabilistic signal model gives the robot the ability to plan over future measurements to determine which locations to measure given the information about the signal strength. To facilitate large-scale active source seeking both the planner and model of the environment must be highly scalable. To this end, we have developed a factor-graph based signal inference model and extended a large-scale multi-robot planner to handle these challenges. Both the planner and model maintain global and local representations of the environment, which allows for decomposition of the problem. This problem decomposition can be seen in fig. 6.1, which shows that both the information roadmap (which contains traversibility information) and the signal inference are split into global and local components. 6.1.1 RelatedWork Active source seeking has been used in locating radiation sources by representing the environment as voxels [89]. Active source seeking has also been used to map plumes of unknown hazardous agents [146]. Gaussian processes have been used as the model in active source seeking of radio signals by developing a control law that allows the robot to move towards the source, while updating the model in an online 108 fashion [45]. Active source seeking has been extended to the multi-robot setting by using a particle filter based model [17]. 6.1.2 Formulation In informative path planning for active source seeking, the robot is tasked to localize the source of a radio signal without any prior map of the environment and within limited time constraints. We define the state ass=(q,W s ,W r ) whereq is the robot state,W s is the signal state (the signal location) andW r is the world traversal risk state. The task is to find a policy that maximizes an objective function S while minimizing its action cost. Typically, the objectiveS corresponds to the expected reduction in uncertainty in the belief over the signal source locationW s . We define the reward for taking action a at states as R(s,a)=f(S(q ′ |W s ),C(q,a|W r )) s.t. q ′ =T(q,a|W r ) (6.1) whereC(q,a|W r ) is the cost of taking actiona from states andT is the robot state transition function, andf is a weighted combination of the two terms defined in PLGRIM [75]. In order to maximize S, the robot must be able to infer signal strength at unseen locations, given the history of locationsX 0:t and measurementsY 0:t , as well as hypothetical future locations and measure- ments, predicted by the model. This multiple step look-ahead allows the robot to plan a policy which is non-myopic and looks ahead to the reward for complex, multiple step policies. We denote the belief at time t asW t s = M(X 0:t ,Y 0:t ) whereM is a model of the belief over the signal strength in the environment. Under this model, we wish to find an optimal policy π ∗ =argmax π ∈Π T X t ′ =t E[R(b t ′,π (b t ′))] (6.2) 109 Algorithm3: Add To Global Graph Add measurementy i at locationx i , Given global graphG g , Current EstimateV g , Distance based signal variance functionσ 2 dist , Measurement varianceσ 2 meas , Minimum new global node distanced min , Global valuesg i ,g i− 1 if∥xi− xi− 1∥>dmin then f g link ←N (gi− gi− 1| 0,σ 2 dist (xi,xi− 1))) f g meas ←N (gi|yi,σ 2 meas )) G g ← G g ∪{f g link ,f g meas } V g ← SOLVE(G g ,V g ) wheret is the current planning time,b t ′ is the belief over states at timet ′ , andT is the robot’s time budget. As the robot progresses in the mission and updates its world belief, we re-solve the problem in a receding horizon fashion over the remaining time interval[t,T]. 6.1.3 Approach In this section, we introduce novel signal belief componentsW g s andW l s and propose a hierarchical planner for solving Eq. (6.2).We decompose the signal, traversibility, and planning portions into global and local components to allow our system to scale easily through decomposition. 6.1.3.1 WorldTraversabilityBelief Hierarchical planning frameworks address both space complexity (due to the large and increasing envi- ronment size) and model uncertainty challenges by breaking down the environment representation into components of different scales and resolutions. In PLGRIM [75], the world belief representation is com- posed of a meter-level resolution lattice representation of the robot’s local surroundings, termed Local Information Roadmap (Local IRM), and a topological graph representing the explored space, called Global Information Roadmap (Global IRM). Together, the Local IRMW l r and Global IRMW g r compose the world traversability beliefW r . This can be seen in fig. 6.1 where the local and global information roadmaps are shown next to a traversability map of the environment. 110 Measurement Factor Global Node Link Factor (a) Global factor graph Global Node Posterior Factor Local Node Link Factor (b) Local factor graph Figure 6.2: The signal belief is represented by two factor graphs, described in section 6.1.3.2. (a) The global factor graph represents the belief over the signal strength at locations the robot has received measurements, represented as the valuesg 1 ,..,g n . (b) The local factor graph is centered at the robot using the valuesl 1 ,...,l n to infer the signal locally around the robot. 6.1.3.2 SignalBelief In order to facilitate large scale signal inference the model is split into global and local representations of the signal over the environment. The local representation in the planner and model represent local beliefs and plans about the area immediately surrounding the robot. The global representation describes the belief about the signal strength at locations the robot has received measurements. The posterior distributions over the global signal representation are used as priors in the local signal representation. GlobalFactorGraph(G g ,V g ) is seen in fig. 6.2a where each value g n ∈ V g is connected to at least one measurement factor,f meas ∈ G g . These measurement factors are unary factors which represent the real world measurements taken by the robot at the location x n . Each global value is connected to the previous and next global value through a link factor, f g link ∈ G g . The link factor connects the previous and current global values, based on distance traveled, while the measurement factor connects the global graph to real world signal measurements. The global graph does not consider obstacles as the robot only moves a small distance between global node generation. This link factor, used in both the global and local factor graphs to connect values which have a spatial relationship, serves a similar purpose to the kernel distance in Gaussian processes. This link factor has an expected mean of 0, with an uncertainty that grows in proportion to the distance the location corresponding to the values are apartσ 2 dist (x i ,x j )= 111 Algorithm4: Create Local Graph Produces a new local graphG l and local valuesV l , given new local information roadmapIRM l , global graphG g , global values V l , global k-nearest neighbors functionKNN g , number of global nodes to link tokg , and robot locationxt V l ,G l ←∅ ,∅ forxi∈nodes(IRM l )do V l ← V l ∪{vi} forxj ∈neighbor(xi,IRM l )do σ 2 occ ← σ 2 occ (xi)+σ 2 occ (xj) f l link ←N (li− lj | 0,σ 2 dist (xi,xj)+σ 2 occ ) G l ← G l ∪{f link } forx g ∈KNN g (xt,kg)do l← CLOSEST(x g ,V l ) µ (x g ),σ 2 (x g )← V g (x g ) f l global ←N (l|µ (x g ),σ 2 (x g )+σ 2 dist (l,x g ) G l ← G l ∪{f l global } ϵ +α dist ∥x i − x j ∥ whereϵ is a small positive number andα dist is an experimentally determined constant. In order to add new measurements to the global factor graph, algorithm 3 is used, in which a new global value is created only when the robot has moved sufficiently from the current location. If the robot has not moved far enough from the current location, the another measurement factor is added to the previous value. LocalFactorGraph(G l ,V l ) is built from the local information roadmap over the area centered at the robot. The local information roadmap is ann× m grid which describes the area the robot can traverse in the local policy, described in section 6.1.3.1. The local factor graph (fig. 6.2b) models the local IRM topology by adding link factors between valuesl∈V l in the local information roadmap which are connected. The local link factors, f l link ∈ G l , are similar to the global link factors as uncertainty increases with the distance between the spatial locations, but also has the addition of occupancy information. We assume that if a location corresponding to a local value is not traversable for the robot, the model estimate of the signal will have higher uncertainty in that area due to potential attenuation or reflection in the case of solid obstacles. To model this we increase the uncertainty proportional to the occupancy probability of that location, according toσ 2 occ (i) = α occ p occ (i), whereα occ is an experimentally chosen constant, set to 0.5 in this work. In order to incorporate the real world measurements into the local factor graph, we find 112 Algorithm5: Inference at locations Produces a belief over signal at location xq given hypothetical future locationsX h and measurementsY h , initially empty local value cacheC, local graphG l and local valuesV l . ifX h ∈C then returnC(X h ,xq) G l h =G l for(x,y)∈ (X h ,Y h )do l← CLOSEST(x,V l ) µ (l),σ 2 (l)← V l (l) G l h ← G l h ∪{N(l|y,σ 2 i )} V l h ← SOLVE(G l h ,V l ) C(X h ) =V l h returnV l h (xq) the closestk g locations corresponding to values in the global factor graph to the robot location. For each location that is close to the robot location, we add a factor,f l global ∈G l , to the closest local value which uses the posterior estimate from the global value as well as the distance between the local and global locations. We assume the distance is small and do not consider obstacles between the global and local nodes. The algorithm for this can be seen in algorithm 4, which describes how both the local graph and local values are constructed. Inference In order for the planner to do active source seeking, the signal belief must be able to ef- ficiently infer a distribution over the signal at unknown spatial locations. The model must also support the ability to condition the model on locations and measurements the robot plans to take, but has not taken yet, termed hypothetical locationsX h and hypothetical valuesY h . This allows the planner to make non-myopic plans in which the future values take into account measurements that will be taken in the pol- icy [88]. To this end, algorithm 5 describes the algorithm to infer the posterior belief at a spatial location x q . Typically, the graph cannot be re-solved with every query so the algorithm determines if it has already computed the local values given the current hypothetical locationsX h by checking if it is in the cache C. The cache C allows the re-use of previously solved values as the posterior distribution over x q has already been computed when computing a different query location. C is emptied when a new local IRM is received due to the robot moving, or a new measurement is added to the global graph. If algorithm cannot 113 use the cached values, the algorithm adds unary factors representing the conditioning on the hypothetical measurements, using the previous prior uncertainty for the unary factor. The algorithm then re-solves the factor graph for the entire local information roadmap and adds the newly solved factors to the cache. This cache is emptied each time a new local factor graph is constructed. To produce a belief over locations that are not in the local information roadmap, query locations are added to the global roadmap with distance based link factors and the resulting graph is solved. 6.1.3.3 HierarchicalSolver Our approach is to compute a local planning policy that solves eq. (6.2) onW l r andW l s over a receding hori- zont l , which we refer to asLocalPlanning, while simultaneously solving eq. (6.2) over the full approximate world representationW g r andW g s , which we refer to as Global Planning. LocalPolicy In Receding Horizon Planning (RHP), the objective function in Eq. (6.2) is modified: π ∗ t:t+T (b)=argmax π E " t+t l X t ′ =t γ t ′ − t R(b t ′,π t ′(b t ′)) # (6.3) where t l is a finite planning horizon for a planning episode at time t. The expected value is taken over future measurements and the reward is discounted byγ [9]. For the local policy, the reward is based on the current signal belief and uses the upper confidence bound (UCB) acquisition function, which is commonly used for informative path planning for Bayesian optimization [88]. We use this objective function for local exploration as the robot should investigate areas that the model is uncertain about, but should still prefer high concentration areas. In our experiments we set β = 3 to encourage exploration. GlobalPolicy Boundaries in between explored and unexplored space, termedfrontiers, are encoded in the topological graph of the environment (Global IRM) and represent goal waypoints. By visiting frontiers, 114 the robot takes new signal measurements and uncovers swaths of the unexplored environment. The robot must plan a sequence of frontiersp that it can reach within the time budget that maximizes its expected rewardS g . Equation (6.2) becomes π ⋆ =argmax p X n∈p F t p (n);W g r )· S g n;W g s ) (6.4) wheret p (n) is the time required to reach frontiern throughp, F is the front-loading function proposed in [103], and wherep should verify the time budget constraint. F t p (n);W g r ) is a greedy incentive that encourages gathering reward earlier in time in order to trade off long-term finite-horizon planning with immediate information gain. We wish to maximize the expected improvement (EI) for signal measurement taken at a frontier n i (corresponding to locationx i ), given current signal beliefW g s . EI favors actions that offer the best improve- ment over the current maximal value, with an added exploration termξ to encourage diverse exploration. Expected improvement is chosen as the global objective function because the robot should prefer frontiers which have a gain in the expected signal strength, while during local exploration the robot should seek out information rich locations rather than only seeking the maxima. Policy Selection The local planning policy π l provides immediate signal reward, and is computed over the local space immediately surrounding the robot, while the global policy π g is computed over a sparse representation of the entire environment the robot has explored so far. Therefore, the global policy has the potential to bring the robot to valuable locations it has not yet explored, but may also require more travel time. As in [100], we select the planning policy π according to its utility U(b t ;π ) computed from eq. (6.2) and the probability ˆ P(π ) that the plan will be successfully executed: π ⋆ t =arg max π ∈{π g t:t+T ,π ℓ t:t+t l } ˆ P(π )U(b t ;π ). (6.5) 115 Figure 6.3: MapsoftheSignaltoNoiseRatiocollectedinarealparkinggarage. SNR is colored from blue (low) to yellow (high). Areas in white are not traversable by the robot. 6.1.4 Experiments We demonstrate our system on two simulation environments as well as a hardware trial. The simulated environments were collected in a parking garage with a robot with an attached radio and a radio placed in the environment. In the simulation environments, we first run the robot in a lawnmower path to collect signal strength data and traversibility data in the environment. We position the radio in two different locations (fig. 6.3). Location A is less accessible, so that stronger readings are only seen in direct line of sight, in location B the radio is placed so that it is within line of sight of more areas of the environment. The robot starts in the same location in both environments, and the traversability is the same in both environment. In the experiments we compare our approach (labeled Factor graph) with a standard Gaussian process using a squared exponential kernel. The kernel hyper-parameters are set to a length scale of 3 and an observation noise of 0.01. This Gaussian process also only adds new measurements when the robot has moved a minimum distance (d min ), which was set to0.25m in our experiments. We also compare against a Gaussian process (GP) which is limited during inference to the same number of measurements (k g ) during 116 0 200 400 600 Time (s) 0 20 40 60 Maximum SNR Measured Radio A Method Factor Graph GP GP (Limited) 0 200 400 600 Time (s) 20 30 40 50 Maximum SNR Measured Radio B 0 200 400 600 Time (s) 0 10000 20000 30000 40000 Sum of SNR Measurements Radio A 0 200 400 600 Time (s) 0 20000 40000 60000 80000 Sum of SNR Measurements Radio B Figure 6.4: Simulation experiment results over two real-world environments. Left: the maximal signal reading up until timet. Right: the sum of the measurements taken until timet. inference as the Factor graph, which we term Gaussian process (limited), (GP (limited)). This limited Gaussian process is added to show that the performance of our system is not merely due to limiting the number of measurements included in the model. For the factor graph, we set the measurement noise to 0.01,α occ to0.5, andα dist to0.1. 6.1.4.1 SimulationExperiments We compare our novel Factor graph approach with the two baseline approaches (Gaussian Process and Gaussian process (limited)) in both environments (location A and location B). To compare against the baselines, we show two metrics for each environment (fig. 6.4). The first shows the maximal SNR mea- surement recorded up until time t. It compares how closely the robot approaches the signal source, and how quickly it approaches the signal source. The second metric is the cumulative sum of the SNR mea- surements the robot has taken until time t. This metric, inspired by the cumulative reward metric [88], 117 shows how the robot makes decisions to find maximal areas over time. For each environment, we run the robot three times and report each trial. As can be seen in fig. 6.4, the factor graph model allows the robot to find a much higher signal strength in 2 out of 3 trials, and the performance is more consistent. Limiting the Gaussian process to only 200 measurements does not have a big impact on the performance compared to the regular Gaussian process, but it still does not perform as well as the factor graph model. In the comparison of the cumulative signal measurement the factor graph outperforms the other methods in all 3 trials, with the other two methods performing about as well. This implies the factor graph consistently allows the planner to make better decisions through the entire experiment. We also present a study of the scalabilty of the approaches, shown in fig. 6.5. We find that the Gaussian process has the typical exponential scaling with in the time required for inference as the number of mea- surements in the model grows. We also find that even limiting the Gaussian process to a fixed number of measurements per inference is still is more expensive than the factor graph on average. The factor graph has a bimodal distribution in its performance, when the graph has to be re-solved it is slower than the Gaussian process methods, but due to its ability to cache the model posterior, most of the inference is a hash table lookup. We believe the consistent performance of the factor graph is due to to two reasons. The first is that the factor graph is generally faster for inference than Gaussian process methods, allowing the planner to plan at a higher frequency. The second reason is the ability of the factor graph to represent uncertainty in the signal due to obstacles. This allows the factor graph model to infer that signal strengths on the opposite side of obstacles and around corners are more interesting as the signal uncertainty increases through the obstacle. 118 0 200 400 Number of Measurements 0.00 0.02 0.04 0.06 Solve Time (s) Binned Name Factor Graph GP GP (Limited) Figure 6.5:Comparisonoftimerequiredtoperforminferenceforthethreemodelsoverthenum- berofmeasurementsinthemodel. Left: the data binned data with the standard deviation, Right: the raw data. 6.1.4.2 HardwareExperiments We demonstrate our system on a real robot locating a radio in an unknown environment. The robot, a Boston Dynamics Spot [129] equipped with custom sensing and computing systems [1, 99, 10], is initially started at one end of a parking garage. The radio is deployed at another end of the hallway after two lefthand turns. We use the factor graph model as withk g equal to 100 due to the fact that the computer payload on the robot is limited. As can be seen in fig. 6.6, the robot explores an open passage before going directly to the source of the radio. We find that our system is able to be easily tuned to the computational limits of the robot and efficiently finds the signal source. As with fig. 6.5 we found that the robot does not increase its planning time drastically when more measurements are added to the model. 6.1.4.3 ExtendedHardwareExperiment We present an extended study of the performance of our system in the field. The setup is similar to section 6.1.4.2, including robot platform. The robotic platform is limited not only in computational power, but also further limited due to the multiple other systems which must run in parallel on the robot to allow 119 Figure 6.6: ExperimentofoursystemonarealSpotrobot. The robot explores a man made structure to find the radio. The collected measurements, colored from blue (low) to yellow (high), as well as the robot’s trajectory are shown. Black dots show the non-traversable areas such as walls. for localization, control, and high level mission objectives. To allow for rapid re-planning, we sub-sample the local factor graph, discarding every other node. The environment is a parking garage with the target source placed outside of the garage. As can be seen in fig. 6.7, the robot exits the first corridor, attempts to go down the second corridor but discovers that it cannot directly go to the source in that corridor. It then exits the second corridor, attempting to discover an unseen passage, then goes to the third corridor to go directly to the source. This demonstrate the ability of our system to guide the robot to the signal source in the presence of limited computation and difficult real-world environments. 6.1.5 Conclusion Large scale active source seeking using a robot is a complex and challenging task which requires the ability to scale the planner and model. In this work we have demonstrated the ability of our factor graph based model to decompose the problem into local and global tasks which allows the model to scale easily with a large number of measurements. We have also demonstrated the model’s ability to handle the propagation 120 Robot Trajectory Starting Location Radio Location 15 m Figure 6.7: ExperimentofoursystemonarealSpotrobotinextendedfieldconditions. Top: The robot explores a man made structure to find the radio. The collected measurements, colored from blue (low) to yellow (high), as well as the robot’s trajectory are shown. Black dots show the non-traversable areas such as walls and cars. Bottom: Spot robot platform used in this experiment and the experiment in section 6.1.4.2. of signals through unknown obstacles by modeling the explicit uncertainty due to these obstacles. Through simulation experiments we have shown that the ability to plan rapidly due to the inference speed of the model as well as the ability to represent signal uncertainty due to obstacles allow our overall system to perform active source seeking better than the baseline methods. We have also shown the ability of our model to be scaled to the computational needs of a real life robot and demonstrated our system in a real life unknown environment. 6.2 ActiveSignalEmitterPlacementInComplexEnvironments 6.3 Introduction Electromagnetic wave emitting devices such as ultrawideband beacons (UWB), Wi-Fi access points, radios, and light sources are commonly used for localization, communication, and illumination. The coverage provided by these devices is highly dependent on the exact locations of them in complex obstacle rich environments. Electromagnetic waves interact with obstacles in the environment by reflection, refraction, 121 (a) Desired Lighting (b) Unknown Lighting (c) Proposed System Result (d) Baseline Result (e) Field Trial Figure 6.8: Overview: a) The desired lighting intensity, chosen by randomly placing lights, b) the lighting intensity in the environment which is unknown to the robot a priori, as well as the obstacles, c) the final light source configuration, lighting intensity and robot path produced by the proposed system, d) the final light source configuration, lighting intensity and robot path produced by the baseline system, e) example field trial. Lights circled in red are deployed by the robot, while lights circled in green are unknown to the robot. and scattering depending on surface qualities and the nature of the wave. Often, models of electronmag- netic wave propagation fail to exactly predict how the wave will interact with the environment due to unmodeled effects, imprecise parameter specification, and previously unknown emitting sources. In this work, we investigate the active signal emitter placement problem, focusing specifically on auto- matic deployment of light sources in obstacle-rich environments. We focus on light source placement due the visible interaction of light with obstacles, which allows us to qualitatively compare different methods in the real world, as well as the importance of the task itself. Actively placing lights can be used in appli- cations ranging from search and rescue, where some locations have higher prioritization than others, to cinematic lighting in movie sets, as well as complex exploration scenarios. Autonomously deploying emitters is a challenging task due to previously mentioned model prediction errors. We propose a formulation where imprecise signal models are improved using in-situ signal strength 122 measurements from sensors mounted on a robot, which are subsequently used for automatic deployment of emitters to match a user-specified desired signal strengths at specific locations. To plan to sense the environment, we use an informative path planning setup, with a novel factor-graph based environment representation which allows combining an analytical signal model with measurements taken from the environment to correct for errors in the analytical model. The robot can only measure the combined signal strength of the placed emitters, their unmodeled effects, and the unknown background signal, making disambiguating these signals difficult. As the robot senses the environment, it must decide when the appropriate time to re-configure the emitter placement is. This causes the robot to trade-off between re- planning for emitter placement and collecting measurements. Re-planning increases the accuracy of the resulting signal strength in the environment given robot’s current information about the environment, but causes the robot to take fewer measurements between emitter replacements. To this end, we propose a novel signal strength condition, i.e., the placement trigger, which estimates the likelihood of observing the desired signal strength given the model’s current information. The robot iteratively moves to a new position, senses the signal strength at its current position, and decides whether it should re-plan emitter placement based on the placement trigger. The concrete setup for light source placement that we use for this problem can be seen in fig. 6.8 where the baseline and proposed system in this work are compared. In fig. 6.8a, the desired lighting is specified by the user. † Figure 6.8b shows the unknown lighting, which is present at a lower intensity than the desired lighting, as well as the obstacles in the environment. Figure 6.8c and fig. 6.8d show example trajectories taken by the robot while sensing, as well as the final lighting placement in the environment for each system. The contributions of our work are as follows: † In this work, the desired light placement is specified by placing lights in an obstacle-free environment. 123 • We propose a novel factor-graph based formulation for signal emitter placement problem, which allows improving the accuracy of the analytical signal model using sensor data. • We introduce a novel signal uncertainty condition, i.e. the reconfiguration trigger, which decides between data collection and re-planning emitter placement. • We evaluate our method in simulations using visible light based experiments and show that it ourper- forms a baseline method. We show our method’s applicability to the real-world using physical robot experiments. 6.4 Background/RelatedWork Electromagneticemitterplacement In this work, we aim to optimally place light sources using in-situ measurements from a robot. Other electromagnetic signals have been considered, with specific propoga- tion properties. The placement of UWB anchors have been extensively studied due to the effectiveness in localizing robots in GPS-denied environments. UWB anchors placement has been optimized with ge- netic algorithms and a Gaussian process based residual model [101]. It has also been noted that the exact placement of these beacons greatly affects their performance and it is difficult to analytically determine their range measurements at a given location [101, 106]. Wi-Fi access point optimization has also been considered using line of site and non-line of site models, using non-convex optimization [106]. Base sta- tion placement for cellular mobile communication systems is investigated, in which global coverage [53] or demand coverage [136] are common objectives. Mesh node placement for wireless mesh networks with the objectives of coverage and connectivity is considered in [48], where mesh node locations are selected sequentially by greedily optimizing a performance metric. Autonomously deploying robotic mesh network nodes in disaster scenarios using a two tiered connectivity, in which the connectivity between survivors 124 are established via short-range Bluetooth Low Energy (BLE) and the connectivity between robots is estab- lished with long-range very high frequency (VHF) links is proposed [44]. Distributed self-deployment of acoustic underwater sensors has been investigated, which can only move vertically, guaranteeing connec- tivity and increasing coverage using locally computable movement rules [2]. 6.5 NotationandFormulation We use a grid-based representation of the environment for robot movements and measurements. Let X # ⊂ R 2 be the set of grid positions that the robot can visit and measure the light intensity. Letx t ∈X # be the position that the robot visited and measured the light intensity at time stept. Letx 0:t =[x 0 ,...,x t ] be the sequence of visited positions from time0 tot. Lety t ∈R be the measured light intensity at timet at positionx t , andy 0:t =[y 0 ,...,y t ] be the sequence of measurements. Configuration s∈R 3N ofN omni-directional light sources contains positionp i ∈R 2 and brightness b i ∈ [0,1] of each light sourcei∈{1,...,N}. Lets t ∈R 3N be the configuration of light sources at time stept. LetO be the set of obstacles in the environment, where each obstacleO j ∈O is a subset ofR 2 . We assume the obstacles are static and known ahead of time. LetW ⊂ R 2 be the workspace of the operation, which can be set to a rectangular box defining the position bounds, such that X # ⊂W . Let L(x,s;O) :X # × R 3N → R be the light intensity at positionx when light sources have con- figuration s, which is conditioned on the obstaclesO in the environment. LetL d (x) :X # → R be the desired light intensity andL u (x) :X # →R be the unknown light intensity at each positionx∈X # . ‡ LetL s (x,s;O) :X # × R 3N →R be the light intensity at positionx generated by light sources config- ured by the robot with configuration s, conditioned on obstacles. The light intensity with the light source configuration s is given byL(x,s;O)=L u (x)+L s (x,s;O) for eachx∈X # . ‡ Note that we do not condition unknown light intensity to obstacles, while in reality, this dependence could exist. 125 Informative Path Planning POMCPOW Probabilistic Lighting Model Factor Graph Analytical Lighting Model SDF Based Environment Light Source Reconfiguration Trigger Decrease In Probability Light Source Configuration Algorithm Nealder-Mead User Input Desired Light Intensities Movement Actions Belief Of Light Intensities Expected Light Intensities from Light Sources Light Source Configuration Light Source Configuration Determine if reconfiguration is necessary Light Intensity Measurements Belief Of Light Intensities Figure 6.9: OverviewofourSystem: A description of each module and their connections to other mod- ules are described in section 6.6. The principle modules we propose in this work are the light source reconfiguration trigger, the light source configuration algorithm and the probabilistic lighting model. In this work, we aim to find the optimal light source configuration s ∗ that minimizes the cumula- tive difference between the desired light intensities L d and the actual light intensitiesL according to the following formulation: s ∗ =arg min s∈R 3N X x∈X # |L d (x)− (L u (x)+L s (x,s;O))| s.t. p i / ∈O j ∀O j ∈O,∀i∈{1,...,N} p i ∈W ∀i∈{1,...,N} B i ∈[0,1] ∀i∈{1,...,N}. (6.6) 6.6 Approach Here, we describe our system design as well as each module of our system. An overview of the system can be seen in fig. 6.9, where we show the inputs and outputs of each module, as well as their connections. 126 The user specifies the the desired light intensities L d (x) for allx ∈X # . The light source reconfig- uration trigger determines when the light source configuration algorithm should re-configure the light sources. § The analytical lighting model takes this configuration and computes the expected light intensity from the light sources at all positionsx∈X # . The probabilistic lighting model integrates measurements from the environment and the expected light intensities from the analytical lighting model to compute the belief over the light intensities for allx ∈ X # , which is used by the POMDP solver to plan robot movement in the environment to decrease the uncertainty. At each time stept, the environment provides light intensity measurements at the position the robot is currently at, i.e.,x t , based on both the light source configuration s t , the unknown background lightingL u , and the obstaclesO. We discuss each module in depth in the following sections. 6.6.1 AnalyticalLightingModel We use a ray based light propagation model where rays are cast from each light source i ∈ {1,...,N} in a circular pattern until they hit an obstacle and are reflected at the same angle of incidence across the surface normal [55, 24]. We use a signed distance function based approach for representing the obstacles in the environment. We model the light propagation by a ray-marching algorithm [63]. To determine the intensity of a positionx∈X # , we use the following equation: ˜ L(x)= X r∈Rx 1 d(p r ,x) 2 b r P Y O j r ∈Or R j r (6.7) whereR x is the set of all rays which pass throughx, p r and b r are the position and brightness of the light source ray r is originated from respectively, P is the constant scaling factor for light brightness corresponding to the light intensity at maximum brightness 1, O r is the set of all obstacles that ray r § In this work, we do not explicitly model the redeployment process of the light sources. We change positions of light sources immediately in the simulations. A human operator changes light source positions in real world experiments. 127 reflected off before arriving at x,R j r ∈[0,1] is the reflectivity of obstacles O j r andd(p r ,x) is the distance of the light source generating rayr fromx alongr. While this lighting model is considerably simpler than modern photo realistic lighting models [102], we find that it provides a reasonable approximation of the real lighting conditions when we compare predicted light intensities with collected sensor measurements in field trials in section 6.7.3, especially when considering the noise present in commodity lighting sensors. 6.6.2 ProbabilisticLightingModel The model described in section 6.6.1 allows the robot to determine the expected lighting based on where the light sources are placed, but the model ignores many components of actual light propagation such as refraction. Additionally, the model has no way of incorporating previously unknown background lighting. For this reason, analytical lighting model alone is often inaccurate as we show in our experimental results. In order to allow the robot to both integrate sensor measurements into the analytical predictions and determine the uncertainty of predicted light intensities, we use a probabilistic estimation model. The model is inspired by residual physics models [147] in which a simulated physical model is corrected with a learned model. This, for example, can use a traditional Gaussian process to model the difference in the output of the analytical lighting model and the measured light intensity. However, using Gaussian processes in our case presents two major problems: i) Gaussian processes have difficulty representing obstacles and ii) Gaussian processes cannot represent the uncertainty due to both the analytical lighting model’s inability to model all the effects present in real light propogation as well as the noise on the sensor measurements, as a Gaussian process can only represent this as one combined uncertainty. To tackle these issues, we propose a factor graph based model which incorporates four types of factors and two types of variables, as can be seen in fig. 6.10. 128 Distance Based Link Estimated Light Intensityfrom Placed Lights at Analytical Lighting Model Output Unknown Lighting Intensity at Additive Measurement Previous Configuration Prior Estimated Values Factors Figure 6.10: Probabilistic Lighting Model. This model, described in section 6.6.2, allows the robot to combine the ray marching based analytical lighting model for the configured light sources with the real- world measurements of the light intensities using additive measurement factors,f m . In this example, the robot has taken a measurement at locations(0,0) and(3,1). The robot has previously taken a measure- ment at(1,1) before the light sources were re-configured, which is reflected to the previous configuration factor, f p . There is an obstacle at (2,1), therefore, there are no variables there and distance based link factors,f l , are removed. The first type of variables (blue variables in fig. 6.10) represent the estimated true light intensity gen- erated by the configured light sources at each position x∈X # , corresponding toL s (x,s;O), which are connected to unary measurement factors (f a in fig. 6.10) based on the output from the analytical lighting model. The second type of variables estimate the unknown light intensity at each positionx∈X # (pur- ple variables in fig. 6.10), corresponding to L u (x). Both types of variables are linked to their neighbors by a distance based link factor (f l in fig. 6.10) when there is not an obstacle in between. This allows the factor graph to model the estimation problem around the topology of the environment. The uncertainty in these factors is based on the spatial distance between the values, similar to the length-scale in a Gaussian Process. In order to integrate sensor measurements into the system, we propose a novel factor called the additive measurement factor (f m in fig. 6.10). At each time step t, the sensor reads the light intensityy t at the current positionx t . The robot does not know how much of the intensity comes from the configured light sources 129 and how much is unknown light. This factor models the measurements the sensor has taken as being y t =L u (x t )+L s (x t ,s t ;O). To achieve this, we add a factor with a likelihoodN(v s (x t )+v u (x t )|y t ,σ 2 ), where v p (x t ) and v u (x t ) are the estimated light intensity from configured light sources and estimated unknown light intensity atx t respectively. When the robot re-configures the lights, the additive measurement factor is no longer valid as L s changes. To allow the robot to continue using its previous measurements, we add a unary factor called the previous configuration factor ( f p in fig. 6.10) to each unknown lighting intensity variable where an additive measurement factor has previously been. The mean and variance of this factor are the posterior mean and variance for this unknown lighting variable. Solving the factor graph amounts to running an optimizer to find the maximum a-posteriori (MAP) values for the values, e.g. the most likely light intensity at each location. Finding the uncertainty of the estimated light intensity for each location requires computing the Gaussian marginals for each value [32]. The output of the probabilistic lighting model is a set of normal distributions with mean µ t (x) and standard deviationσ t (x) for eachx∈X # at each time stept. 6.6.3 InformativePathPlanning We treat the problem of determining the optimal path to move the robot through the environment to collect light intensity measurements as an IPP problem which we solve with a rollout based POMDP solver, based on the POMCPOW method [130, 116]. In informative path planning, the robot iteratively takes actions and measurements to maximize an information-theoretic objective function [68]. The IPP objective function we use is the entropy of eachx∈X # similarly to [58, 81], which relates to how well the model spatially covers the workspace, shown in eq. (2.3). This allows the robot to iteratively decrease the uncertainty in the lighting intensity by maximally reducing the uncertainty in the model along a trajectory. 130 We adopt the t-test heuristic proposed by Salhotra et. al [115] for taking multiple actions from one plan. 6.6.4 LightSourceReconfigurationTrigger To determine when the robot should reconfigure the light sources, we propose a likelihood based trigger. This trigger allows the robot to compute a new configuration when necessary, creating a balance between reconfiguring the light sources in accordance with the new information, while simultaneously allowing the robot to have enough measurements with each light configuration so that it can deduce where the light intensities are different than the desired ones. To this end, we propose a trigger which looks at the likelihood of observing the desired light intensities (L d ) given the current estimated light intensities obtained from the probabilistic lighting model at each time stept: L t (L d )= Y x∈X # N L d (x)|µ t (x),σ t (x) . (6.8) We assume that the observation likelihoods across grid positions are independent, while in reality, this dependence exists. The independence assumption allows us to compute the joint observation likelihood in a computationally efficient way. In order to determine whether to reconfigure the light sources, the robot determines if the likelihood of observing the desired lighting intensity has fallen significantly since the last time step t c at which light sources are reconfigured. This occurs when the robot collects measurements that make observing the desired lighting less likely given the current configuration. To this end, the robot re-configures the lights when the following inequality holds log h L t (L d ) i <α max τ ∈[tc:t− 1] log h L τ (L d ) i (6.9) 131 whereα ∈ [1,∞) is an experimentally chosen scaling constant. Once the log-likelihood has sufficiently decreased from it’s maximal since last reconfiguration, the robot re-configures the light sources and cal- culates a new configuration. 6.6.5 LightSourceConfigurationAlgorithm We solve the optimization problem described in eq. (6.6) to reconfigure the light sources, which includes computing position p i and brightness b i of each light source i ∈ {1,...,N}, which are the decision variables of our optimization formulation. Because L u and L s cannot be measured directly, we use the estimates of L u from the probabilistic lighting model, and use analytical lighting model to predict L s . We minimize the error to find the optimal lighting configuration given the incomplete information in the probabilistic lighting model and inaccuracies of the analytical model. We use the Nelder-Mead optimizer to minimize this quantity [98]. 6.7 Experiments We demonstrate the effectiveness of the components of our proposed system when compared to alterna- tives as well as the overall effectiveness of it when compared to a baseline. We also demonstrate our system in a real field test scenario, in which light sources are placed in a warehouse environment. In all experiments, we provide the user input, i.e., the desired light intensities L d , by randomly con- figuring light sources in an obstacle-free environment, and computing analytical light intensities from the configured light sources, which the robot must match in an environment with obstacles. The robot is given a fixed number of light sources to configure in order to match the desired light intensities generated by the randomly configured light sources. The robot always configures all available light sources, i.e, it does not configure them sequentially. We do not explicitly model the redeployment process of the light sources and reflect new configurations to the environment immediately. In real world experiments, a human operator 132 changes the light source positions before robot continues operation. In simulations, we create unknown light sources at half brightness randomly placed in the environment. We use 1 reflection while predicting L s using analytical lighting model, but use 5 reflections while providing measurements to the robot. This is done to increase the error in the analytical lighting model used by the robot, simulating unmodeled effects in the analytical lighting model. In simulations, the robot is able to control the brightness of the lights to match the desired light intensities, while in real-world experiments, the brightness of light sources are fixed and the robot is able control the positions of light sources only. The robot operates in a 13× 13 grid environment with a grid spacing of0.35m. The robot can take four actions (up, down, left, right) at each grid cell (unless it would run into an obstacle). Each experiment begins with the same initial light source configuration for the same environment number and seed. The robot is allowed to take100 environment steps in simulation experiments. Each step consists of a movement action followed by a sensing action once the robot arrives at the desired location, possibly followed by light configuration depending on the light source reconfiguration trigger. The robot is allowed to take70 environment steps in the field trial due to time constraints. The planner is given 50 rollouts to evaluate future actions. We differentiate between environments (seeds for the placement of obstacles, ambient lights and desired lights) and seeds each (seeds for the starting location of the robot) to allow for diverse environment generation. 6.7.1 ComponentExperiments We present experiments which demonstrate the improvement of our system due to the proposed compo- nents. For each component test, the robot places 10 light sources to match desired light intensities of 7 randomly placed lights. We place40 unknown light sources to create randomly to create unknown light intensityL u . 133 6000 7000 8000 9000 Error Factor Graph Residual Gaussian Process Model Figure 6.11: LightBeliefModelResults We show the reduction in error when using the model proposed in section 6.6.2 when compared to a Gaussian process baseline model. 6.7.1.1 ProbabilisticLightingModel We show the effectiveness of our factor graph based probabilistic lighting model proposed in section 6.6.2, by comparing it with a residual Gaussian process. The residual Gaussian process models the difference between the analytical lighting model and the real world measurements. The residual Gaussian process uses a squared exponential kernel, which is standard in informative path planning [74]. The residual Gaussian process can provide uncertainties and predictions at each positionx∈X # . We configure the light sources every 10 environment steps to provide comparison. We compute the average RMSE over each measured location across five environments with five seeds each, totaling 25 trials for each model, shown in fig. 6.11. Our model provides a large reduction in final RMSE. We believe this is due to the ability of our model to represent i) obstacles, ii) errors in the analytical model better than the baseline model. The factor graph based model produces qualitatively better trajecto- ries which take obstacles into account during informative path planning. This can be seen in fig. 6.8 where the proposed trajectories cover the space well. 6.7.1.2 LightSourceReconfigurationTrigger We compare our proposed trigger in section 6.6.4 against three baselines. The first baseline does not use any of the collected measurements after the initial light source configuration and simply uses the analytical model. We term this triggerFirstStep, which represents configuring light sources using a priori 134 5000 6000 7000 8000 9000 10000 11000 Error n = 5 ***n = 10 n = 20 n = 30 n = 50 First Step Last Step = 1.01 = 1.05 *** = 1.07 = 1.1 = 1.2 = 1.3 = 1.5 = 2 Trigger Type Every n Other Logprob Figure 6.12:LightSourceReconfigurationTriggers. Comparison of RMSE when different triggers with different parameters are used. We see that there is a balance between reconfiguring the light sources more and less often in both Logprob and Everyn triggers. Logprobα =1.07 results in lowest RMSE. information. The second baseline uses the initial configuration of light sources for the entire measurement mission and after the very last step, reconfigures the lights. We term this Last Step, which represents conducting a sampling mission without interleaving light re-configuring actions. The third baseline re- configures the light sources at set intervals of environment steps, which we call the everyn baseline. For example, Every 10 re-configures the light sources every 10 steps. We consider this the strongest trigger baseline as it provides a parameter (n) which can be optimized to provide a strong behavior. We compare these baselines against our proposed trigger, given in eq. (6.9), which we call the Logprob trigger. This trigger looks at the decrease in log-likelihood of observing the desired light intensity given the current model, and re-configures the light sources once it is below α times the last highest log-likelihood. We show the results for various configurations of triggers across five environments with three seeds each in fig. 6.12. We find that First Step and Last Step are both outperformed by all of the attempted Every n and Logprob triggers. 135 0 20 40 60 80 100 Step 0 5 10 15 20 25 Cumulative Triggers Trigger n = 5 ***n = 10 = 1.05 *** = 1.07 = 1.1 Method Every n Logprob Figure 6.13: Cumulative Number of Light Sources Reconfigurations. The cumulative number of times the robot has re-configured the lights over steps in the environment. The Logprob triggers tend to use more triggers early in the task when the uncertainty about unknown light sources is high. For the Every n triggers, we find that n = 10 is optimal. We see that there is a balance between re-configuring the lights too frequently and too infrequently, where 10 appears to strike a balance in our experiments. We see a similar trend of balancing how often the lights are re-configured with the Logprob triggers. We see that α = 1.07 is optimal, with a lower median and upper quantile error than optimal Every n trigger. To show how the number of light source re-configurations increase over the task, we plot the cumula- tive number of light source reconfiguration triggers over environment steps in fig. 6.13. We see that Every n and Logprob α = 1.07 converge to the same amount of cumulative triggers, but differ in where they are used. Logprobα = 1.07 tends to result in more light source reconfigurations at the beginning when the robot is rapidly decreasing its uncertainty in the unknown light distribution, and tends to decrease the number of light source reconfigurations at the end. This shows that not only the number of triggers, but temporal distribution matters for this task. 136 Level Obstacles Unknown Light Source Target Light Source Configured Light Source A 8 40 7 10 B 12 60 9 15 C 16 70 12 20 Table 6.1: Task complexity levels for system experiments. The number of obstacles, number of un- known light sources, number of target light sources used to generate the desired light intensities, and number of configured light sources that the robot can control increase from task level A to C. -5% 0% 5% 10% 15% 20% 25% 30% Model Trigger Ablation 0% 5% 10% 15% 20% Percent Error Reduction A B C T ask Complexity Figure 6.14: Module and System Improvement: Top: Improvement in our system due to various per- module improvements vs baseline modules, Bottom: Improvement in our system across various environ- ment difficulties, see section 6.7.2 for a description of the task complexities. We report percent error due to differences in error magnitude between complexities and seeds. 6.7.2 SystemExperiments We compare our system to a baseline system across various task complexities. The baseline system consists of the residual Gaussian process model and the Every 10 trigger. The proposed system uses our novel factor graph based probabilistic lighting model proposed in section 6.6.2 and theLogprobα =1.07 trigger proposed in section 6.6.4. We compare across three task complexities described in table 6.1. Task complexity A is the same task complexity used in the component experiments. Level B is harder than level A, and level C is harder than level B. For each of these task complexities, we use five different environments (seeds for the placement 137 of obstacles, ambient lights and desired lights) with five seeds each (seeds for the starting location of the robot). The results are given in fig. 6.14 as improvement percentages of the proposed system over the baseline system in final RMSE. The proposed system provides a median improvement of 7.9%, 7.5%, and 9.8% in RMSE on task complexities A, B, and C respectively. This demonstrates that, our system considerably reduces the error, especially in difficult tasks. We also see in fig. 6.14 that the factor graph based probabilistic lighting model provides a larger im- provement than the light source reconfiguration trigger, implying that modeling the obstacles provides a greater benefit than accurately choosing when to re-configure the lights. 6.7.3 FieldTests We conduct a test of our system on a Turtlebot2 robot with a light sensor. We test in two types of en- vironments: one with no purposely placed unknown light sources, and one with two purposely placed unknown light sources. ¶ These tests are conducted in a warehouse with light bleed due to windows caus- ing unknown lighting in both environments. Each experiment has desired light intensities generated from three light sources. There are five obstacles, which are 0.5× 0.5m 2 cardboard boxes. Each environment uses the same desired lighting configuration and obstacle configuration. In the environment with no un- known light sources, the robot configures five light sources, and in the environment with two unknown light sources the robot places four light sources. The configured light sources are 1440 lumen battery pow- ered lights placed high enough for the sensor on the robot to detect. The unknown light sources are 1800 lumen lights plugged into wall power. The environment can be seen in fig. 6.8e. After the robot has taken 70 actions, the robot goes to eachx ∈X # and collects measurements to determine how accurately the final lighting intensities compared to the desired lighting intensities. ¶ We call these unknown light sources purposely placed, because there are other light sources in the environment, e.g., lights of computers in the environment as well as light reflected from the ceiling of the experiment space, that are not purposely placed. 138 No Unknown Lights T wo Unknown Lights Trial 0 2000 4000 6000 8000 10000 12000 14000 16000 Error Type Baseline Proposed (a) Point based RMSE (b) Desired (c) Proposed: No Unknown (d) Proposed: 2 Unknown 1 m (e) Baseline: No Unknown Placed Lights Unknown Lights Obstacle Robot Path (f) Baseline: 2 Unknown Figure 6.15:FieldTestResults We compare our approach in a scenario with a real robot and light sources. We find that our approach is better able to find light source configurations which match the desired lighting configuration in the field. 139 We show the results in fig. 6.15, where the per-point error is computed for each environment. We see that our approach provides a considerably lower error in the environment with no purposely added unknown light sources, which is the nominal operating regime for this kind of system. We also see that when we add two bright unknown light sources, our approach is able to achieve a lower upper quartile error and lower maximum error, but comparable median error. We believe this is because this environment is very complicated and the high upper quantile error from the baseline is due to not incorporating the light from the unknown light sources. 6.8 Conclusion Autonomously deploying light sources and other electromagnetic signal emitters is an important and chal- lenging task for indoor navigation, localization, and connectivity of humans and robots. We have shown that by interleaving light placement and active sensing, the accuracy of electromagnetic signal emitter placement can be increased. We also have shown that integrating an analytical model of the light prop- agation from placed light source into the belief modeling allows for more accurate modeling of both the light intensities in the environment, and the uncertainty associated with them. We have also shown that this model can be used to decide when to reconfigure light sources, proposing a custom trigger using the likelihood of the desired lighting being measured. We demonstrate the effectiveness of our system on both simulated and real-world experiments, showing that our method outperforms a baseline system. The pre- sented approach can readily be extended to other electromagnetic signals as long as an analytical model for them is provided and the robot has sensors to measure the signal strength. 140 Chapter7 Conclusions Active sensing, despite the benefits over static and fixed path sensing, continues to see limited real robotic deployments for scientific study. Despite the pressing need for large scale study of physical phenomena and extension of human understanding, they are still largely confined to academic labs and studies. Although these systems have shown great promise, these systems have been largely plagued by their intricacy, requiring a complex understanding of the inner workings of their behavior to be confident enough to stake important missions on them. We have seen several important example scenarios where active sensing behavior allows for large scale, real world, study of complex phenomena. An important example is harmful algae blooms which continue to threaten human and animal activity alike, requiring large-scale study and understanding of their characteristics and development. Search and rescue missions in indoor environments require com- plex lighting, large teams of autonomous robots, and difficult active source seeking missions. Despite the pressing need of large-scale sensing in these scenarios, robots still have largely been deployed with non-adaptive behavior. One problem which has limited the the usage of active sensing robots in the real world is the complexity of the planner which governs their behavior. We have demonstrated that we can make this planner more automatic, that is, requiring considerably fewer choices by human operators of parameters for the behavior 141 of the robot. We have shown in section 3.1 that heuristics which make use of the specific structure of the active sensing problem can improve the efficiency of the planner, allowing for better deployment of the systems. Furthermore, we have demonstrated in section 3.2 that we can use reinforcement learning to directly set these parameters on a per-step basis. This allows the user to deploy the system with a similar amount of decisions required for non-adaptive behavior, and critically, these decisions are based on the physical phenomena and not intricacies of the robot behavior. This allows the user to rely on their expert knowledge of the phenomena, and not intimate understanding of both the phenomena and robotic behavior. Active sensing robots also ignore a fact of the real world: they can only sense their location with noise. This leads to the robots only having a belief of their position, rather than an understanding with certainty. Active sensing behaviors have ignored this fact, and assume that there is either perfect sensing of the state of the robot, or that the state uncertainty is negligible. This may be acceptable when the state uncertainty is unbiased and low, but in many real-world situations this is not the case. We have shown this state uncertainty can be propagated through the environment belief model in section 4.1. This allows integration of heteroskedastic noise on the pose measurement of the robot, and allows for multi-robot teams which are agnostic to the type of state estimator used for each robot. We also show that the theory and practice of active sensing can be applied to SLAM loop closure choice in section 4.2. By treating the state of the robot as the sensed quantity, and the choice of SLAM loop closures as the sensing location, we allow for a scalable multi-robot state uncertainty reduction algorithm. We see that robot state uncertainty, when it cannot be ignored, can be effectively integrated into the active sensing behavior. Active sensing robots also lack specificity in their mission behavior. Active sensing algorithms have largely been developed for two kinds of missions: efficient coverage of the geometric workspace [73], and finding the signal maxima [85, 87]. While these mission goals cover a wide range of tasks that users would like to accomplish, it relies on the user to interpret the data collected by the robot. We have seen 142 in section 5.1 that sensing directly for scientific task that robots are used allows for greater efficiency of robot behavior. Physical specimen collection, such as water sampling, allows for greater scientific study of biological phenomena. In the past, this task has largely been guided by non-adaptive robot surveys and expert knowledge of the scientists performing the task. The quantile guided physical sampling survey allows for robots to assist scientists in the complex decision making process required for physical specimen collection. The final problem which obstructs active sensing deployments is the difficulty in adapting belief mod- els to environmental characteristics which have an effect on the sensed signal. Signals, especially those that are electromagnetic, have complex interactions with obstacles in the environment. Robots which sense these signals need to be aware of the environment and signal interactions and take measurements accordingly. In section 6.1, we have shown that active sensing on electromagnetic signals can be done with limited computational resources onboard a complex, real-world robot deployment. Additionally, in section 6.2, we show that this modeling technique can expand the paradigm of active sensing, to active sensing and placement of devices emitting the sensed signal. In this task, the robot is tasked with placing signal emitters, and actively sensing the signal emitted to improve the placement. We see that this allows the robot to accomplish this complex task, and expand the possibility of use for active sensing robots. With the promise of efficiently deployable active sensing robots, that account for real world state un- certainty, plan directly for the task at hand, and understand the uncertainty associated with real world signals, we believe active sensing robots are more able to assist humans in real world measurement tasks. We believe these robots can form teams with human counterparts and begin to assist in scientific chal- lenges, which require both large-scale and fine resolution measurement. 143 Bibliography [1] Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Rohan Thakker, Angel Santamaria-Navarro, Sung-Kyun Kim, Amanda Bouman, Xianmei Lei, Jeffrey Edlund, Muhammad Fadhil Ginting, Kamak Ebadi, Matthew Anderson, Torkom Pailevanian, Edward Terry, Michael Wolf, Andrea Tagliabue, Tiago Stegun Vaquero, Matteo Palieri, Scott Tepsuporn, Yun Chang, Arash Kalantari, Fernando Chavez, Brett Lopez, Nobuhiro Funabiki, Gregory Miles, Thomas Touma, Alessandro Buscicchio, Jesus Tordesillas, Nikhilesh Alatur, Jeremy Nash, William Walsh, Sunggoo Jung, Hanseob Lee, Christoforos Kanellakis, John Mayo, Scott Harper, Marcel Kaufmann, Anushri Dixit, Gustavo Correa, Carlyn Lee, Jay Gao, Gene Merewether, Jairo Maldonado-Contreras, Gautam Salhotra, Maira Saboia Da Silva, Benjamin Ramtoula, Yuki Kubo, Seyed Fakoorian, Alexander Hatteland, Taeyeon Kim, Tara Bartlett, Alex Stephens, Leon Kim, Chuck Bergh, Eric Heiden, Thomas Lew, Abhishek Cauligi, Tristan Heywood, Andrew Kramer, Henry A. Leopold, Chris Choi, Shreyansh Daftry, Olivier Toupet, Inhwan Wee, Abhishek Thakur, Micah Feras, Giovanni Beltrame, George Nikolakopoulos, David Shim, Luca Carlone, and Joel Burdick. “NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge”. In: Journal of Field Robotics (2021). [2] Kemal Akkaya and Andrew Newell. “Self-deployment of sensors for maximized coverage in underwater acoustic sensor networks”. In: Computer Communications 32.7 (2009), pp. 1233–1244. issn: 0140-3664.doi: https://doi.org/10.1016/j.comcom.2009.04.002. [3] Donald M. Anderson, Porter Hoagland, Yoshi Kaoru, and Alan W. White. Estimated annual economic impacts from harmful algal blooms (HABs) in the United States. Technical Report. Woods Hole Oceanographic Institution, Sept. 2000.doi: 10.1575/1912/96. (Visited on 11/30/2020). [4] Jean-Yves Audibert and Sébastien Bubeck. “Best Arm Identification in Multi-Armed Bandits”. en. In: June 2010, 13 p.url: https://hal-enpc.archives-ouvertes.fr/hal-00654404 (visited on 10/30/2020). [5] Dimitris Bertsimas and John Tsitsiklis. “Simulated Annealing”. In: Statistical Science (1993). [6] Graeme Best, Oliver M Cliff, Timothy Patten, Ramgopal R Mettu, and Robert Fitch. “Dec-MCTS: Decentralized planning for multi-robot active perception”. In: The International Journal of Robotics Research 38.3 (2019), pp. 316–337.doi: 10.1177/0278364918755924. (Visited on 07/01/2019). 144 [7] Jonathan Binney and Gaurav S. Sukhatme. “Branch and bound for informative path planning”. In: 2012 IEEE International Conference on Robotics and Automation. ISSN: 1050-4729. May 2012, pp. 2147–2154.doi: 10.1109/ICRA.2012.6224902. [8] Antoine Blanchard and Themistoklis Sapsis. “Informative Path Planning for Extreme Anomaly Detection in Environment Exploration and Monitoring”. In: arXiv:2005.10040 (2021). [9] Amanda Bouman, Joshua Ott, Sung-Kyun Kim, Kenny Chen, Mykel John Kochenderfer, Brett Lopez, Ali-akbar Agha-mohammadi, and Joel Burdick. “Adaptive Coverage Path Planning for Efficient Exploration of Unknown Environments”. In: IROS. 2022. [10] A Bouman∗ , MF Ginting∗ , N Alatur∗ , Matteo Palieri, David Fan, Thomas Touma, Torkom Pailevanian, Sung Kim, Kyohei Otsu, Joel Burdick, and Ali Agha-Mohammadi. “Autonomous Spot: Long-Range Autonomous Exploration of Extreme Environments with Legged Locomotion”. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2020. [11] Djallel Bouneffouf, Irina Rish, and Charu Aggarwal. “Survey on Applications of Multi-Armed and Contextual Bandits”. In: 2020 IEEE Congress on Evolutionary Computation (CEC). July 2020, pp. 1–8.doi: 10.1109/CEC48606.2020.9185782. [12] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, and John J. Leonard. “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age”. In: IEEE Transactions on Robotics (2016). [13] Luca Carlone, Jingjing Du, Miguel Kaouk Ng, Basilio Bona, and Marina Indri. “Active SLAM and Exploration with Particle Filters Using Kullback-Leibler Divergence”. In: Journal of Intelligent and Robotic Systems (Aug. 2014). [14] Daniele Cattaneo, Matteo Vaghi, and Abhinav Valada. “LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM”. In: 2021. [15] Central and Northern California Ocean Observing System. https://www.cencoos.org/. Feb. 2020. [16] Yun Chang, Kamak Ebadi, Christopher E. Denniston, Muhammad Fadhil Ginting, Antoni Rosinol, Andrzej Reinke, Matteo Palieri, Jingnan Shi, Arghya Chatterjee, Benjamin Morrell, Ali-akbar Agha-mohammadi, and Luca Carlone. “LAMP 2.0: A Robust Multi-Robot SLAM System for Operation in Challenging Large-Scale Underground Environments”. In: (2022). [17] Benjamin Charrow, Nathan Michael, and Vijay R. Kumar. “Cooperative multi-robot estimation and control for radio source localization”. In: International Journal of Robotics Research 33 (2012), pp. 569–580. [18] Fanfei Chen, John D. Martin, Yewei Huang, Jinkun Wang, and Brendan Englot. “Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs”. In: International Conference on Intelligent Robots and Systems. 2020. [19] Wei Chen, Jian Sun, and Qiang Zheng. “Fast Loop Closure Selection Method with Spatiotemporal Consistency for Multi-Robot Map Fusion”. In: Applied Sciences (2022). 145 [20] Weizhe Chen and Lantao Liu. “Pareto Monte Carlo Tree Search for Multi-Objective Informative Planning”. In: Robotics: Science and Systems (2019).doi: 10.15607/RSS.2019.XV.072. (Visited on 11/04/2021). [21] Yu Chen, Lingfei Wu, and Mohammed Zaki. “Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings”. In: Advances in Neural Information Processing Systems. 2020.url: https://proceedings.neurips.cc/paper/2020/file/e05c7ba4e087beea9410929698dc41a6-Paper.pdf. [22] Yung Sung Cheng, Yue Zhou, Clinton M. Irvin, Richard H. Pierce, Jerome Naar, Lorraine C. Backer, Lora E. Fleming, Barbara Kirkpatrick, and Dan G. Baden. “Characterization of Marine Aerosol for Assessment of Human Exposure to Brevetoxins”. In: Environmental Health Perspectives 113.5 (May 2005), pp. 638–643.issn: 0091-6765.doi: 10.1289/ehp.7496. (Visited on 11/30/2020). [23] Taeyeong Choi and Grzegorz Cielniak. “Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning”. In: ECMR 2021 ().url: http://arxiv.org/abs/2108.06618 (visited on 09/24/2021). [24] Robert L. Cook, Thomas Porter, and Loren Carpenter. “Distributed Ray Tracing”. In: SIGGRAPH Comput. Graph. 18.3 (Jan. 1984), pp. 137–145.issn: 0097-8930.doi: 10.1145/964965.808590. [25] Konrad P. Cop, Paulo Vinicius Koerich Borges, and Renaud Dubé. “Delight: An Efficient Descriptor for Global Localisation Using LiDAR Intensities”. In: ICRA (2018). [26] Noel Cressie. “The origins of kriging”. en. In: Mathematical Geology 22.3 (Apr. 1990), pp. 239–252. issn: 1573-8868.doi: 10.1007/BF00889887. (Visited on 04/16/2023). [27] DARPA Subterranean (SubT) Challenge.url: https://www.darpa.mil/program/darpa-subterranean-challenge. [28] Jnaneshwar Das, Frédéric Py, Julio B J Harvey, John P Ryan, Alyssa Gellene, Rishi Graham, David A Caron, Kanna Rajan, and Gaurav S Sukhatme. “Data-driven robotic sampling for marine ecosystem monitoring”. In: The International Journal of Robotics Research 34.12 (2015), pp. 1435–1452.doi: 10.1177/0278364915587723. (Visited on 06/10/2019). [29] Jnaneshwar Das, Frédéric Py, Julio B.J. Harvey, John P. Ryan, Alyssa Gellene, Rishi Graham, David A. Caron, Kanna Rajan, and Gaurav S. Sukhatme. “Data-driven robotic sampling for marine ecosystem monitoring”. In: The International Journal of Robotics Research (2015).doi: 10.1177/0278364915587723. [30] Jnaneshwar Das, Kanna Rajany, Sergey Frolovy, Frederic Py, John Ryan, David A. Caron, and Gaurav S. Sukhatme. “Towards marine bloom trajectory prediction for AUV mission planning”. In: 2010 IEEE International Conference on Robotics and Automation. ISSN: 1050-4729. May 2010, pp. 4784–4790.doi: 10.1109/ROBOT.2010.5509930. [31] Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. “A tutorial on the cross-entropy method”. In: Annals of operations research 134 (2005). 146 [32] Frank Dellaert. “Factor Graphs and GTSAM: A Hands-on Introduction”. In: 2012. [33] Frank Dellaert. Factor graphs and GTSAM: A hands-on introduction. Tech. rep. Georgia Institute of Technology, 2012. [34] Chris Denniston, Aravind Kumaraguru, and Gaurav S. Sukhatme. “Comparison of Path Planning Approaches for Harmful Algal Bloom Monitoring”. In: OCEANS MTS/IEEE. 2019.doi: 10.23919/OCEANS40490.2019.8962687. [35] Christopher E. Denniston, Yun Chang, Andrzej Reinke, Kamak Ebadi, Gaurav S. Sukhatme, Luca Carlone, Benjamin Morrell, and Ali-akbar Agha-mohammadi. “Loop Closure Prioritization for Efficient and Scalable Multi-Robot SLAM”. In: IEEE Robotics and Automation Letters 7.4 (2022), pp. 9651–9658.doi: 10.1109/LRA.2022.3191156. [36] Christopher E. Denniston, Aravind Kumaraguru, David A. Caron, and Gaurav S. Sukhatme. “Incorporating Noise into Adaptive Sampling”. In: Experimental Robotics. Ed. by Bruno Siciliano, Cecilia Laschi, and Oussama Khatib. Cham: Springer International Publishing, 2021, pp. 198–208. isbn: 978-3-030-71151-1. [37] Christopher E. Denniston, Oriana Peltzer, Joshua Ott, Sangwoo Moon, Sung-Kyun Kim, Gaurav S. Sukhatme, Mykel J. Kochenderfer, Mac Schwager, and Ali-akbar Agha-mohammadi. Fast and Scalable Signal Inference for Active Robotic Source Seeking. 2023. arXiv: 2301.02362 [cs.RO]. [38] Christopher E. Denniston, Gautam Salhotra, David A. Caron, and Gaurav S. Sukhatme. “Adaptive Sampling using POMDPs with Domain-Specific Considerations”. In: ICRA. 2020. [39] Christopher E. Denniston, Gautam Salhotra, Akseli Kangaslahti, David A. Caron, and Gaurav S. Sukhatme. Learned Parameter Selection for Robotic Information Gathering. 2023. arXiv: 2303.05022[cs.RO]. [40] Renaud Dubé, Daniel Dugas, Elena Stumm, Juan I. Nieto, Roland Y. Siegwart, and César Cadena. “SegMatch: Segment based place recognition in 3D point clouds”. In: ICRA (2017). [41] Kamak Ebadi, Lukas Bernreiter, Harel Biggie, Gavin Catt, Yun Chang, Arghya Chatterjee, Christopher E. Denniston, Simon-Pierre Deschênes, Kyle Harlow, Shehryar Khattak, Lucas Nogueira, Matteo Palieri, Pavel Petráček, Matěj Petrlík, Andrzej Reinke, Vít Krátký, Shibo Zhao, Ali-akbar Agha-mohammadi, Kostas Alexis, Christoffer Heckman, Kasra Khosoussi, Navinda Kottege, Benjamin Morrell, Marco Hutter, Fred Pauling, François Pomerleau, Martin Saska, Sebastian Scherer, Roland Siegwart, Jason L. Williams, and Luca Carlone. Present and Future of SLAM in Extreme Underground Environments. 2022. arXiv: 2208.01787[cs.RO]. [42] Kamak Ebadi, Yun Chang, Matteo Palieri, Alex Stephens, Alexander Hatteland, Eric Heiden, Abhishek Thakur, Nobuhiro Funabiki, Benjamin Morrell, Sally Wood, Luca Carlone, and Ali-akbar Agha-mohammadi. “LAMP: Large-Scale Autonomous Mapping and Positioning for Exploration of Perceptually-Degraded Subterranean Environments”. In: ICRA (2020). 147 [43] Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, and Ali-akbar Agha-mohammadi. “DARE-SLAM: Degeneracy-Aware and Resilient Loop Closing in Perceptually-Degraded Environments”. In: Journal of Intelligent and Robotic Systems (2021). [44] Ludovico Ferranti, Salvatore D’Oro, Leonardo Bonati, Francesca Cuomo, and Tommaso Melodia. “HIRO-NET: Heterogeneous Intelligent Robotic Network for Internet Sharing in Disaster Scenarios”. In: IEEE Transactions on Mobile Computing 21.12 (2022), pp. 4367–4380.doi: 10.1109/TMC.2021.3078050. [45] Jonathan Fink and Vijay Kumar. “Online methods for radio signal mapping with mobile robots”. In: IEEE International Conference on Robotics and Automation (ICRA). 2010, pp. 1940–1945.doi: 10.1109/ROBOT.2010.5509574. [46] Christian Forster, Luca Carlone, Frank Dellaert, and Davide Scaramuzza. “IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation”. In: Robotics: Science and Systems. 2015.doi: 10.15607/RSS.2015.XI.006. [47] Trygve Olav Fossum, Jo Eidsvik, Ingrid Ellingsen, Morten Omholt Alver, Glaucia Moreira Fragoso, Geir Johnsen, Renato Mendes, Martin Ludvigsen, and Kanna Rajan. “Information-driven robotic sampling in the coastal ocean”. In: Journal of Field Robotics (2018). issn: 15564959.doi: 10.1002/rob.21805. (Visited on 06/07/2019). [48] A. Antony Franklin and C. Siva Ram Murthy. “Node Placement Algorithm for Deployment of Two-Tier Wireless Mesh Networks”. In: IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference. 2007, pp. 4823–4827.doi: 10.1109/GLOCOM.2007.915. [49] Nobuhiro Funabiki, Benjamin Morrell, Jeremy Nash, and Ali-akbar Agha-mohammadi. “Range-aided pose-graph-based SLAM: Applications of deployable ranging beacons for unknown environment exploration”. In: IEEE Robotics and Automation Letters (2020). [50] Javier G. Monroy, Jose-Luis Blanco, and Javier Gonzalez-Jimenez. “Time-variant gas distribution mapping with obstacle information”. en. In: Autonomous Robots 40.1 (Jan. 2016), pp. 1–16.issn: 0929-5593, 1573-7527.doi: 10.1007/s10514-015-9437-0. (Visited on 05/31/2022). [51] Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. “Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger. Curran Associates, Inc., 2012, pp. 3212–3220.url: http://papers.nips.cc/paper/4640-best-arm- identification-a-unified-approach-to-fixed-budget-and-fixed-confidence.pdf (visited on 10/02/2020). [52] Y. Gabriely and E. Rimon. “Spiral-STC: an on-line coverage algorithm of grid environments by a mobile robot”. In: ICRA. 2002. [53] A. Gamst, E.-G. Zinn, R. Beck, and R. Simon. “Cellular Radio Network Planning”. In: IEEE Aerospace and Electronic Systems Magazine 1.2 (1986), pp. 8–11.doi: 10.1109/MAES.1986.5005038. 148 [54] Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, and Roderick Murray-Smith. “Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting”. In: Advances in Neural Information Processing Systems 15. MIT Press, 2003, pp. 545–552. (Visited on 06/29/2020). [55] Andrew S. Glassner. An Introduction to Ray Tracing. en. Google-Books-ID: YPblYyLqBM4C. Morgan Kaufmann, Jan. 1989.isbn: 978-0-12-286160-4. [56] Andres Gongora, Javier Monroy, and Javier Gonzalez-Jimenez. “Joint estimation of gas and wind maps for fast-response applications”. In: Applied Mathematical Modelling 87 (2020), pp. 655–674. issn: 0307-904X.doi: https://doi.org/10.1016/j.apm.2020.06.026. [57] William Sealy Gosset. The Probable Error of a Mean. Originally published under the pseudonym. 1908. [58] Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. Near-Optimal Sensor Placements in Gaussian Processes. Tech. rep. 2005. (Visited on 04/07/2019). [59] Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. “Near-optimal Sensor Placements in Gaussian Processes”. In: Proceedings of the 22Nd International Conference on Machine Learning. ICML ’05. Bonn, Germany: ACM, 2005, pp. 265–272.isbn: 1-59593-180-5.doi: 10.1145/1102351.1102385. [60] Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. “Near-optimal sensor placements in Gaussian processes”. en. In: Proceedings of the 22nd international conference on Machine learning - ICML ’05. Bonn, Germany: ACM Press, 2005, pp. 265–272.isbn: 978-1-59593-180-1.doi: 10.1145/1102351.1102385. (Visited on 07/08/2020). [61] John Gurland and Ram C. Tripathi. “A Simple Approximation for Unbiased Estimation of the Standard Deviation”. In: The American Statistician 25.4 (Oct. 1971). Publisher: Taylor & Francis, pp. 30–32.issn: 0003-1305.doi: 10.1080/00031305.1971.10477279. (Visited on 10/28/2020). [62] Jiu Hai-Feng, Chen Yu, Deng Wei, and Pang Shuo. “Underwater Chemical Plume Tracing Based on Partially Observable Markov Decision Process”. In: International Journal of Advanced Robotic Systems 16.2 (Mar. 2019). Publisher: SAGE Publications, p. 1729881419831874.issn: 1729-8814. doi: 10.1177/1729881419831874. (Visited on 07/17/2020). [63] John C. Hart. “Sphere tracing: a geometric method for the antialiased ray tracing of implicit surfaces”. In: The Visual Computer 12.10 (Dec. 1996), pp. 527–545.issn: 1432-2315.doi: 10.1007/s003710050084. [64] High-Resolution Mapping of a Harmful Algal Bloom: UNDERSTANDING FINE-SCALE ENVIRONMENTAL FACTORS IN CLEAR LAKE, CALIFORNIA. Tech. rep. 2020.url: https://video.ysi.com/webinar-expand-your-hab-horizons. [65] Jeff C. Ho, Anna M. Michalak, and Nima Pahlevan. “Widespread global increase in intense lake phytoplankton blooms since the 1980s”. In: Nature (2019). 149 [66] Porter Hoagland, Di Jin, Lara Y. Polansky, Barbara Kirkpatrick, Gary Kirkpatrick, Lora E. Fleming, Andrew Reich, Sharon M. Watkins, Steven G. Ullmann, and Lorraine C. Backer. “The Costs of Respiratory Illnesses Arising from Florida Gulf Coast Karenia brevis Blooms”. In: Environmental Health Perspectives 117.8 (Aug. 2009), pp. 1239–1243.issn: 0091-6765.doi: 10.1289/ehp.0900645. [67] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation (1997). [68] Geoff A Hollinger and Gaurav S Sukhatme. “Sampling-based robotic information gathering algorithms”. In: The International Journal of Robotics Research 33.9 (2014), pp. 1271–1287. [69] Aya Hozumi, Ilia Ostrovsky, Assaf Sukenik, and Hezi Gildor. “Turbulence regulation of Microcystis surface scum formation and dispersion during a cyanobacteria bloom event”. In: Inland Waters (2020). [70] Jimin Hwang, Neil Bose, and Shuangshuang Fan. “AUV Adaptive Sampling Methods: A Review”. en. In: Applied Sciences 9.15 (Jan. 2019), p. 3145.doi: 10.3390/app9153145. (Visited on 10/18/2019). [71] Rob J. Hyndman and Yanan Fan. “Sample Quantiles in Statistical Packages”. In: The American Statistician 50 (1996). [72] Donald R. Jones, Matthias Schonlau, and William J. Welch. “Efficient Global Optimization of Expensive Black-Box Functions”. In: Journal of Global Optimization (1998). [73] Stephanie Kemna, Oliver Kroemer, and Gaurav S. Sukhatme. “Pilot Surveys for Adaptive Informative Sampling”. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). ISSN: 2577-087X. May 2018, pp. 6417–6424.doi: 10.1109/ICRA.2018.8460488. [74] Stephanie Kemna, John G. Rogers, Carlos Nieto-Granda, Stuart Young, and Gaurav S. Sukhatme. “Multi-robot coordination through dynamic Voronoi partitioning for informative adaptive sampling in communication-constrained environments”. In: IEEE International Conference on Robotics and Automation (ICRA). 2017, pp. 2124–2130.doi: 10.1109/ICRA.2017.7989245. [75] Sung-Kyun Kim, Amanda Bouman, Gautam Salhotra, David D Fan, Kyohei Otsu, Joel Burdick, and Ali-akbar Agha-mohammadi. “PLGRIM: Hierarchical value learning for large-scale exploration in unknown environments”. In: arXiv preprint arXiv:2102.05633 (2021). [76] Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. “Optimization by Simulated Annealing”. In: Readings in Computer Vision. Elsevier, 1987. [77] Levente Kocsis and Csaba Szepesvári. “Bandit based monte-carlo planning”. In: European conference on machine learning. Springer. 2006, pp. 282–293. [78] Andreas Krause and Carlos Guestrin. “Submodularity and its applications in optimized information gathering”. en. In: ACM Transactions on Intelligent Systems and Technology 2.4 (July 2011), pp. 1–20.issn: 2157-6904, 2157-6912.doi: 10.1145/1989734.1989736. (Visited on 07/08/2020). 150 [79] Daniel Leitold, Agnes Vathy-Fogarassy, and Janos Abonyi. “Network Distance-Based Simulated Annealing and Fuzzy Clustering for Sensor Placement Ensuring Observability and Minimal Relative Degree”. In: Sensors (2018). [80] Pierre F.J. Lermusiaux, Tapovan Lolla, Patrick J. Haley Jr., Konuralp Yigit, Mattheus P. Ueckermann, Thomas Sondergaard, and Wayne G. Leslie. “Science of Autonomy: Time-Optimal Path Planning and Adaptive Sampling for Swarms of Ocean Vehicles”. In: Springer Handbook of Ocean Engineering. Ed. by Manhar R. Dhanak and Nikolaos I. Xiros. Cham: Springer International Publishing, 2016, pp. 481–498.isbn: 978-3-319-16649-0.doi: 10.1007/978-3-319-16649-0_21. [81] Kian Hsiang Low, John M. Dolan, and Pradeep Khosla. “Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing”. In: International Conference on Automated Planning and Scheduling (2009).url: http://arxiv.org/abs/1305.6129 (visited on 11/17/2020). [82] Travis Manderson, Sandeep Manjanna, and Gregory Dudek. “Heterogeneous Robot Teams for Informative Sampling”. In: Workshop on Informative Path Planning and Adaptive Sampling at Robotics Science and Systems (June 2019). (Visited on 06/19/2019). [83] Joshua G. Mangelson, Derrick Dominic, Ryan M. Eustice, and Ram Vasudevan. “Pairwise Consistent Measurement Set Maximization for Robust Multi-Robot Map Merging”. en. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, QLD: IEEE, May 2018, pp. 2916–2923. [84] Sandeep Manjanna, Alberto Quattrini Li, Ryan N. Smith, Ioannis Rekleitis, and Gregory Dudek. “Heterogeneous Multi-Robot System for Exploration and Strategic Water Sampling”. In: ICRA. 2018. [85] Roman Marchant and Fabio Ramos. Bayesian Optimisation for Informative Continuous Path Planning.isbn: 978-1-4799-3685-4. (Visited on 02/27/2019). [86] Roman Marchant and Fabio Ramos. “Bayesian Optimisation for informative continuous path planning”. In: International Conference on Robotics and Automation. 2014.doi: 10.1109/ICRA.2014.6907763. [87] Roman Marchant, Fabio Ramos, and Scott Sanner. “Sequential Bayesian optimisation for spatial-temporal monitoring”. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence . UAI’14. Arlington, Virginia, USA: AUAI Press, July 2014, pp. 553–562.isbn: 978-0-9749039-1-0. (Visited on 10/30/2020). [88] Roman Marchant, Fabio Ramos, and Scott Sanner. “Sequential Bayesian optimisation for spatial-temporal monitoring”. In: Conference on Uncertainty in Artificial Intelligence (UAI) . UAI’14. Arlington, Virginia, USA: AUAI Press, July 2014, pp. 553–562.isbn: 978-0-9749039-1-0. (Visited on 10/30/2020). 151 [89] Frank Mascarich, Taylor Wilson, Christos Papachristos, and Kostas Alexis. “Radiation Source Localization in GPS-Denied Environments Using Aerial Robots”. In: IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE Press, 2018, pp. 6537–6544.doi: 10.1109/ICRA.2018.8460760. [90] Seth McCammon, Gilberto Marcon dos Santos, Matthew Frantz, T. P. Welch, Graeme Best, R. Kipp Shearman, Jonathan D. Nash, John A. Barth, Julie A. Adams, and Geoffrey A. Hollinger. “Ocean front detection and tracking using a team of heterogeneous marine vehicles”. In: Journal of Field Robotics (2021). [91] Seth McCammon, Gilberto Marcon dos Santos, Matthew Frantz, T. P. Welch, Graeme Best, R. Kipp Shearman, Jonathan D. Nash, John A. Barth, Julie A. Adams, and Geoffrey A. Hollinger. “Ocean front detection and tracking using a team of heterogeneous marine vehicles”. In: Journal of Field Robotics 38.6 (2021), pp. 854–881.doi: https://doi.org/10.1002/rob.22014. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.22014. [92] Alvine C. Mehinto, Jayme Smith, Ellie Wenger, Beckye Stanton, Regina Linville, Bryan W. Brooks, Martha A. Sutula, and Meredith D. A. Howard. “Synthesis of ecotoxicological studies on cyanotoxins in freshwater habitats – Evaluating the basis for developing thresholds protective of aquatic life in the United States”. In: Science of The Total Environment (2021). [93] Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda, and Michael M. Bronstein. “Geometric deep learning on graphs and manifolds using mixture model CNNs”. In: IEEE Conference on Computer Vision and Pattern Recognition (2017).url: http://arxiv.org/abs/1611.08402. [94] Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A Distributed Framework for Emerging AI Applications. 2017.doi: 10.48550/ARXIV.1712.05889. [95] National Research Council. Science at Sea: Meeting Future Oceanographic Goals with a Robust Academic Research Fleet. The National Academies Press, 2009.isbn: 978-0-309-14557-2.doi: 10.17226/12775. [96] NOAA Erddap. https://coastwatch.pfeg.noaa.gov/erddap/index.html. [97] Rafael Oliveira, Lionel Ott, and Fabio Ramos. “Bayesian Optimisation Under Uncertain Inputs”. In: 2nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)] (Feb. 2019). (Visited on 03/05/2020). [98] Donald M. Olsson and Lloyd S. Nelson. “The Nelder-Mead Simplex Procedure for Function Minimization”. In: Technometrics 17.1 (Feb. 1975), pp. 45–51.issn: 0040-1706.doi: 10.1080/00401706.1975.10489269. (Visited on 04/28/2023). [99] Kyohei Otsu, Scott Tepsuporn, Rohan Thakker, Tiago Stegun Vaquero, Jeffrey A. Edlund, William Walsh, Gregory Miles, Tristan Heywood, Michael T. Wolf, and Aliakbar Agha-mohammadi. “Supervised autonomy for communication-degraded subterranean exploration by a robot team”. In: IEEE Aerospace Conference. 2020. 152 [100] Joshua Ott, Sung-Kyun Kim, Amanda Bouman, Oriana Peltzer, Mamoru Sobue, Harrison Delecki, Mykel J. Kochenderfer, Joel Burdick, and Ali-akbar Agha-mohammadi. Risk-aware Meta-level Decision Making for Exploration Under Uncertainty. 2022. arXiv: 2209.05580. [101] Hao Pan, Xiaogang Qi, Meili Liu, and Lifang Liu. “Indoor scenario-based UWB anchor placement optimization method for indoor localization”. In: Expert Systems with Applications 205 (2022), p. 117723.issn: 0957-4174.doi: https://doi.org/10.1016/j.eswa.2022.117723. [102] Steven G. Parker, Heiko Friedrich, David Luebke, Keith Morley, James Bigler, Jared Hoberock, David McAllister, Austin Robison, Andreas Dietrich, Greg Humphreys, Morgan McGuire, and Martin Stich. “GPU ray tracing”. In: Communications of the ACM 56.5 (May 2013), pp. 93–101. issn: 0001-0782.doi: 10.1145/2447976.2447997. (Visited on 04/28/2023). [103] Oriana Peltzer, Amanda Bouman, Sung-Kyun Kim, Ransalu Senanayake, Joshua Ott, Harrison Delecki, Mamoru Sobue, Mykel Kochenderfer, Mac Schwager, Joel Burdick, and Ali-akbar Agha-mohammadi. “FIG-OP: Exploring Large-Scale Unknown Environments on a Fixed Time Budget”. In: IROS. 2022. [104] Jose Pinto, Maria Costa, Keila Lima, Paulo Dias, João Pereira, Manuel Ribeiro, Renato Campos, Zara Mirmalek, Renato Mendes, Francisco Castejón, Javier Gilabert, Maria Paola Tomasino, Catarina Magalhães, J. da Silva, Paulo Relvas, Trent Lukaczyk, Kay Skarpnes, Martin Ludvigsen, Alexander Chekalyuk, and Kanna Rajan. “To boldly dive where no one has gone before: Experiments in coordinated robotic ocean exploration”. In: Experimental Robotics: The 17th International Symposium (2021).url: https://books.google.com/books?id=TxomEAAAQBAJ. [105] Marija Popovic, Teresa A. Vidal-Calleja, Jen Jen Chung, Juan Nieto, and Roland Siegwart. “Informative Path Planning for Active Mapping under Localization Uncertainty.” In: Robotics & Automation Letters. 2019. [106] Nila Feby Puspitasari, Hanif Al Fatta, and Ferry Wahyu Wibowo. “Implementation of Greedy and Simulated Annealing Algorithms for Wireless Access Point Placement”. In: 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS) . 2015, pp. 165–170.doi: 10.1109/AIMS.2015.35. [107] Chao Qin, Diego Klabjan, and Daniel Russo. “Improving the Expected Improvement Algorithm”. In: NeurIPS. Long Beach, California, USA, 2017. [108] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. ISSN: 0129-0657. MIT press, 2006.isbn: 0-262-18253-X.doi: 10.1142/S0129065704001899. [109] Isabel M. Rayas Fernandez, Christopher E. Denniston, David A. Caron, and Gaurav S. Sukhatme. “Informative Path Planning to Estimate Quantiles for Environmental Analysis”. In: IEEE Robotics and Automation Letters 7.4 (2022), pp. 10280–10287.doi: 10.1109/LRA.2022.3191936. [110] Isabel M. Rayas Fernández, Christopher E. Denniston, David A. Caron, and Gaurav S. Sukhatme. “Informative Path Planning to Estimate Quantiles for Environmental Analysis”. In: IEEE Robotics and Automation Letters 7.4 (2022), pp. 10280–10287.doi: 10.1109/LRA.2022.3191936. 153 [111] Kim R. Reisenbichler, Mark R. Chaffey, Francois Cazenave, Robert S. McEwen, Richard G. Henthorn, Robert E. Sherlock, and Bruce H. Robison. “Automating MBARI’s midwater time-series video surveys: The transition from ROV to AUV”. In: OCEANS MTS/IEEE Monterey. 2016.doi: 10.1109/OCEANS.2016.7761499. [112] Julius Rückin, Liren Jin, and Marija Popović. “Adaptive Informative Path Planning Using Deep Reinforcement Learning for UAV-based Active Sensing”. In: arXiv:2109.13570 [cs] (2021).url: http://arxiv.org/abs/2109.13570 (visited on 01/10/2022). [113] Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. “Fast Point Feature Histograms (FPFH) for 3D registration”. In: ICRA. 2009, pp. 3212–3217. [114] Gautam Salhotra, Christopher E. Denniston, David A. Caron, and Gaurav S. Sukhatme. “Adaptive Sampling Using POMDPs with Domain-Specific Considerations”. In: 2021 IEEE International ConferenceonRoboticsandAutomation(ICRA). Xi’an, China: IEEE Press, 2021, pp. 2385–2391.doi: 10.1109/ICRA48506.2021.9561319. [115] Gautam Salhotra, Christopher E. Denniston, David A. Caron, and Gaurav S. Sukhatme. “Adaptive Sampling Using POMDPs with Domain-Specific Considerations”. In: 2021 IEEE International ConferenceonRoboticsandAutomation(ICRA). Xi’an, China: IEEE Press, 2021, pp. 2385–2391.doi: 10.1109/ICRA48506.2021.9561319. [116] Gautam Salhotra, Christopher E. Denniston, David A. Caron, and Gaurav S. Sukhatme. “Adaptive Sampling using POMDPs with Domain-Specific Considerations”. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021.doi: 10.1109/ICRA48506.2021.9561319. [117] Victor Garcia Satorras and Max Welling. “Neural Enhanced Belief Propagation on Factor Graphs”. In: arXiv:2003.01998 [cs, stat] (June 2020). [118] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. “The Graph Neural Network Model”. In: IEEE Transactions on Neural Networks (2009). [119] Lukas Schaupp, Mathias Bürki, Renaud Dubé, Roland Y. Siegwart, and César Cadena. “OREOS: Oriented Recognition of 3D Point Clouds in Outdoor Scenarios”. In: IROS (2019). [120] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms”. In: (2017). [121] Bridget Seegers, James Birch, Roman Marin, Chris Scholin, David Caron, Erica Seubert, Meredith Howard, George Robertson, and Burton Jones. “Subsurface seeding of surface harmful algal blooms observed through the integration of autonomous gliders, moored environmental sample processors, and satellite remote sensing in southern California: Harmful algal blooms subsurface seeding”. In: Limnology and Oceanography (2015). [122] A. Segal, D. Haehnel, and S. Thrun. “Generalized-ICP”. In: RSS (2009). 154 [123] Samantha L. Sharp, Alexander L. Forrest, Keith Bouma-Gregson, Yufang Jin, Alicia Cortés, and S. Geoffrey Schladow. “Quantifying Scales of Spatial Variability of Cyanobacteria in a Large, Eutrophic Lake Using Multiplatform Remote Sensing Tools”. In:FrontiersinEnvironmentalScience (2021).doi: 10.3389/fenvs.2021.612934. [124] David Silver and Joel Veness. “Monte-Carlo Planning in Large POMDPs”. In: Advances in Neural Information Processing Systems 23. Ed. by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta. Curran Associates, Inc., 2010, pp. 2164–2172.url: http://papers.nips.cc/paper/4031-monte-carlo-planning-in-large-pomdps.pdf (visited on 06/24/2020). [125] David Silver and Joel Veness. “Monte-Carlo Planning in Large POMDPs”. In: NeurIPS. Vol. 23. 2010. [126] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical Bayesian Optimization of Machine Learning Algorithms”. In: Neurips. Vol. 25. 2012. [127] Southeast Coastal Ocean Observing Regional Association. https://secoora.org/. Mar. 2021. [128] J. R. Souza, R. Marchant, L. Ott, D. F. Wolf, and F. Ramos. “Bayesian optimisation for active perception and smooth navigation”. In: ICRA. 2014. [129] Spot® - The Agile Mobile Robot. en. https://www.bostondynamics.com/products/spot.url: https://www.bostondynamics.com/products/spot (visited on 09/15/2022). [130] Zachary Sunberg and Mykel Kochenderfer. “Online algorithms for POMDPs with continuous state, action, and observation spaces”. In: arXiv:1709.06196 [cs] (Sept. 2018). arXiv: 1709.06196. url: http://arxiv.org/abs/1709.06196 (visited on 12/11/2020). [131] Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. “LSTM neural networks for language modeling”. In: Thirteenth annual conference of the international speech communication association. 2012. [132] Andrea Tagliabue, Jesus Tordesillas, Xiaoyi Cai, Angel Santamaria-Navarro, Jonathan P. How, Luca Carlone, and Ali-akbar Agha-mohammadi. “LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments”. In: ISER. 2020. [133] Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone. Neural Trees for Learning on Graphs. 2021. arXiv: 2105.07264[cs.LG]. [134] K H Tong, N Bakhary, and A BH Kueh. “Optimal Sensor Placement for Structural Health Monitoring Using Improved Simulated Annealing”. In: Australasian Structural Engineering Conference. 2012. [135] Marc Toussaint. “The Bayesian Search Game”. en. In:TheoryandPrincipledMethodsfortheDesign of Metaheuristics. Ed. by Yossi Borenstein and Alberto Moraglio. Series Title: Natural Computing Series. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp. 129–144.isbn: 978-3-642-33205-0 978-3-642-33206-7.doi: 10.1007/978-3-642-33206-7_7. (Visited on 10/09/2020). 155 [136] K. Tutschku. “Demand-based radio network planning of cellular mobile communication systems”. In: Proceedings. IEEE INFOCOM ’98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98. Vol. 3. 1998, 1054–1061 vol.3.doi: 10.1109/INFCOM.1998.662915. [137] U.S. Integrated Ocean Observing System program. https://ioos.github.io/ioosngdac/index.html. [138] Yongyong Wei and Rong Zheng. “Informative Path Planning for Mobile Sensing with Reinforcement Learning”. In: arXiv:2002.07890 [cs] (2020).url: http://arxiv.org/abs/2002.07890 (visited on 12/20/2021). [139] Bernard L. Welch. “The generalisation of student’s problems when several different population variances are involved”. eng. In: Biometrika 34.1-2 (1947), pp. 28–35.issn: 0006-3444.doi: 10.1093/biomet/34.1-2.28. [140] Rand Wilcox. Introduction to Robust Estimation and Hypothesis Testing (Third Edition). 3rd Edition. Statistical Modeling and Decision Science. Academic Press, 2012, pp. 103–136. [141] Fei Xia, Chengshu Li, Roberto Martin-Martin, Or Litany, Alexander Toshev, and Silvio Savarese. “Relmogen: Leveraging motion generation in reinforcement learning for mobile manipulation”. In: International Conference on Robotics and Automation (2021). [142] Nuo Xu, Kian Hsiang Low, Jie Chen, Keng Kiat Lim, and Etkin Baris Ozgul. “GP-Localize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model”. In: arXiv:1404.5165 [cs, stat] (Apr. 2014). arXiv: 1404.5165.url: http://arxiv.org/abs/1404.5165 (visited on 11/20/2020). [143] Jun Yamada, Youngwoon Lee, Gautam Salhotra, Karl Pertsch, Max Pflueger, Gaurav S. Sukhatme, Joseph J. Lim, and Peter Englert. “Motion Planner Augmented Reinforcement Learning for Obstructed Environments”. In: Conference on Robot Learning. 2020. [144] Heng Yang, Pasquale Antonante, Vasileios Tzoumas, and Luca Carlone. “Graduated Non-Convexity for Robust Spatial Perception: From Non-Minimal Solvers to Global Outlier Rejection”. In: IEEE Robotics and Automation Letters (Apr. 2020). [145] K. Yoon, R. Liao, Y. Xiong, L. Zhang, E. Fetaya, R. Urtasun, R. Zemel, and X. Pitkow. “Inference in Probabilistic Graphical Models by Graph Neural Networks”. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers. Nov. 2019. [146] Sung Yoonchang, Deeksha Dixit, and Pratap Tokekar. Online Multi-Robot Exploration of a Translating Plume: Competitive Algorithm and Experiments. Tech. rep. arXiv: 1811.02769v2. (Visited on 06/03/2019). [147] Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, and Thomas Funkhouser. “TossingBot: Learning to Throw Arbitrary Objects With Residual Physics”. In: IEEE Transactions on Robotics 36.4 (2020), pp. 1307–1319.doi: 10.1109/TRO.2020.2988642. 156 [148] Ji Zhang, Michael Kaess, and Sanjiv Singh. “On degeneracy of optimization-based state estimation problems”. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). 2016, pp. 809–816. [149] Yanwu Zhang, Brian Kieft, Brett W. Hobson, John P. Ryan, Benedetto Barone, Christina M. Preston, Brent Roman, Ben-Yair Raanan, Roman Marin III, Thomas C. O’Reilly, Carlos A. Rueda, Douglas Pargett, Kevan M. Yamahara, Steve Poulos, Anna Romano, Gabe Foreman, Hans Ramm, Samuel T. Wilson, Edward F. DeLong, David M. Karl, James M. Birch, James G. Bellingham, and Christopher A. Scholin. “Autonomous Tracking and Sampling of the Deep Chlorophyll Maximum Layer in an Open-Ocean Eddy by a Long-Range Autonomous Underwater Vehicle”. In: IEEE Journal of Oceanic Engineering (2020).doi: 10.1109/JOE.2019.2920217. [150] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. “Graph Neural Networks: A Review of Methods and Applications”. In: arXiv:1812.08434 [cs, stat] (July 2019). [151] Lifeng Zhou, Vishnu D. Sharma, Qingbiao Li, Amanda Prorok, Alejandro Ribeiro, and Vijay Kumar. “Graph Neural Networks for Decentralized Multi-Robot Submodular Action Selection”. In: arXiv:2105.08601 (May 2021). 157
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Advancing robot autonomy for long-horizon tasks
PDF
Multi-robot strategies for adaptive sampling with autonomous underwater vehicles
PDF
Informative path planning for environmental monitoring
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Leveraging prior experience for scalable transfer in robot learning
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
From active to interactive 3D object recognition
PDF
Data-driven acquisition of closed-loop robotic skills
PDF
Situated proxemics and multimodal communication: space, speech, and gesture in human-robot interaction
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
Algorithms and systems for continual robot learning
PDF
Data-driven robotic sampling for marine ecosystem monitoring
PDF
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
Information theoretical action selection
PDF
Motion coordination for large multi-robot teams in obstacle-rich environments
PDF
Nonverbal communication for non-humanoid robots
PDF
Managing multi-party social dynamics for socially assistive robotics
PDF
Closing the reality gap via simulation-based inference and control
PDF
Towards socially assistive robot support methods for physical activity behavior change
Asset Metadata
Creator
Denniston, Christopher Evan
(author)
Core Title
Active sensing in robotic deployments
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2023-08
Publication Date
07/05/2023
Defense Date
04/21/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
active perception,active sensing,adaptive sampling,estimation,factor graphs,field robotics,informative path planning,marine robotics,OAI-PMH Harvest,robotics,sensing,SLAM
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sukhatme, Gaurav S. (
committee chair
), Caron, David A. (
committee member
), Thomason, Jesse (
committee member
)
Creator Email
cdennist@usc.edu,cedenniston@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113260753
Unique identifier
UC113260753
Identifier
etd-DennistonC-12016.pdf (filename)
Legacy Identifier
etd-DennistonC-12016
Document Type
Dissertation
Format
theses (aat)
Rights
Denniston, Christopher Evan
Internet Media Type
application/pdf
Type
texts
Source
20230705-usctheses-batch-1061
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
active perception
active sensing
adaptive sampling
estimation
factor graphs
field robotics
informative path planning
marine robotics
robotics
sensing
SLAM